My efforts to build a supercomputer at home using Linux and Beowulf technology have evolved much over the past year. My early efforts centered on building a cluster I called the Pondermatic. This page outlines my current efforts involving budget multiprocessor systems, Redhat Linux 6.0 and the parallel version of the freeware raytracer, Povray. The results have been impressive.
For the original version of this file detailing the use of Redhat 5.2 and Povray 3.01 see http://www.cris.com/~rjbono/html/oldpondermatic.html.
My intital experiments indicated that raytracing with clusters consisting of processors having nearly equal speed were more efficient than a cluster with a variety of different CPU speeds. In many cases, the slower machines would bog down the rest of the cluster. The result was poor performance as the faster machine ran out of things to process.
At this point I decided that I wanted to try using the SMP features in the new Linux kernel. At the same time, I read a review on the new Abit BP6 motherboard. This motherboard allows the use of two relatively inexpensive Intel Celeron processors to be used in SMP dual processor mode. This is a relatively inexpensive (~$130) board that is chock full of very impressive features, including precise control of the CPU & bus speeds and hardware monitoring functions.
The Intel Celeron CPU is an inexpensive, Pentium II processor that was initially designed to compete against the low cost AMD K6 and Cryix CPU's. The Celeron is known to overclock quite well…
Before I go on I must state the obligatory disclaimer regarding overclocking. Overclocking is bad, can ruin your system, cause unexplained hair loss, will void any warranty you may have with Intel and is generally not recommended. I do not advocate its use (although it works for me!) and any attempts you make to overclock your machine are at your own risk!
…with a 95% success rate in overclocking a 366 MHz CPU to 458 MHz by simply increasing the bus speed to 83 MHz.
With this in mind I decided to see what two dual Celeron systems overclocked to 458 MHz would provide in terms of price and performance in a simple Beowulf cluster using 10Mbit/sec Ethernet.
All the components for the two machines were purchased via the Internet and sourced through some very helpful price search engines: Pricewatch & Killerapp. For about $550 you can get an ATX case, Abit BP6 motherboard, two Celeron 366 MHz CPU's w/fans, 64 MB of PC100 SDRAM, an 8MB AGP video card, floppy drive, sound card and 4.3 GB hard drive. I reused monitors, keyboards, CD-ROMs, zip drives etc to round out one system as a multiple boot (Win98, WinNT4, & Redhat 6.0) machine and the other dedicated to running Linux.
I sourced Redhat Linux 6.0 from Linux System Labs which in my humble opinion is the best source for linux distributions on the Internet.
Linux supports SMP nicely and PVMPOV allows you to take advantage of the multiple CPU's.
The base system configuration for each of two machines in Pondermatic IV are as follows:
The QED machine has an external modem, IDE zip drive, and Mitsumi CD-RW drive as well.
Redhat has a slight advantage over other distributions due to the RPM method of installing applications. RPM's for key Beowulf software such as MPICH, LAM & PVM are available at the Redhat ftp site.
1. Insert CD-ROM in Drive and get to a DOS prompt.
2. Change to the CD-ROM drive (for this example say its d:)
3. Change to the images subdirectory. The CD-ROM install needs only one boot disk.
4. Create the boot disk using rawrite to transfer the boot.img file to a floppy. The command is d:\dosutils\rawrite.
5. Enter the name of the boot image: boot.img
6. Enter the floppy drive destination: a
7. Rawrite then creates the boot image.
NOTE: If you choose to not install Lilo on a multiple CPU system, the boot disk will default to a uniprocessor kernel. When using the loadlin method outlined above you will need to copy the smp enabled kernel from the /boot directory to your windows c:\ drive.
The second machine I call Pondermatic and the installation is considerably different. This is a Linux only box that does not have a CD drive. Here I used the NFS install. A different boot disk is required for an NFS install. Follow the bootdisk creation instructions outlined above but use bootnet.img instead of boot.img. Next I had to configure QED to be my NFS server for Pondermatic. Once logged in to QED as root do the following to setup NFS.
QED is now ready to be a server for an NFS install to Pondermatic. To proceed:
A Beowulf Cluster basically works with one of two message passing libraries. One is MPI (Message Passing Interface) and the other is PVM (Parallel Virtual Machine). When compiled into the application these libraries pass intermediate data between machines. Both MPI & PVM use the TCP/IP protocol to communicate with other machines. Further, they use the rsh command to initiate sessions with the other machines. This handy Unix command allows you to issue command lines to remote machines. Handy, but not very secure. This isn't a huge problem for the home supercomputer but something to keep in mind if you are doing something larger that is full-time connected to the internet.
The following items must be performed on each machine (I'll use Pondermatic as an example):
192.168.0.1 |
qed.synergetics.org |
qed |
192.168.0.2 |
pondermatic.synergetics.org |
pondermatic |
qed.synergetics.org |
pondermatic.synergetics.org |
Given that this works right, your networking is now configured to work with PVM.
After experimenting with different versions of PVM I found that the best thing to do was download the latest version (pvm3.4.0) and compile it. The RPM of pvm 3.3 on the beowulf CD works but does not have any of the files needed to compile PVM enabled programs. The 3.3 release version does not compile under Redhat 5.X for some reason. The installation is pretty straightforward:
PVM_ROOT=$HOME/pvm3
PVM_DPATH=$PVM_ROOT/lib/pvmd
PVM_ARCH=LINUX
export PVM_ROOT PVM_DPATH PVM_ARCH
If you made it this far you have a working cluster! Now on to the first parallel program. Type halt to exit pvm.
Pov-ray is a multiplatform, freeware raytracer. Many people have modified its source code to produce special "unofficial" versions. One of these unofficial versions is PVMPOV, which enables POVray to run on a beowulf cluster. PVMPOV has evolved quite a bit since first written. Many thanks to Andreas Dilger, Harald Deischinger, & Jakob Flierl for writing and maintaining the patches that make PVMPOV work. The beowulf CD has the RPMS for this program, however I found that they were much slower that the normal program for some reason. For this reason I decided to compile POVray from source after applying the PVM patches. The instructions for the 3.1e version of POVray follow:
I suggest going to the POVray benchmarking site and downloading the skyvase.pov file for your first rendering. By using this file you can compare the rendering time of your cluster against other computers and clusters. Copy the skyvase.pov file into the home directory of each of the computers running pvm.
./pvmpov +iskyvase.pov +h480 +w640 +FT +v1 -x -d +a0.300 -q9 -mv2.0 -b1000 -nw32 -nh32 -nt4 -L/home/rjbono/pvmpov3_1e_1/povray31/include
This is the benchmark option command-line with the exception of the -nw and -nh switches, which are specific to pvmpov and define the size of image each of the slaves will be working on. The -nt4 switch is specific to the Pondermatic IV configuration. It starts four tasks, one for each CPU.
The messages on the screen should show that slaves were successfully started. The cluster is now rendering the image. When complete, PVMPOV will display the slave statistics as well as the total render time.
My first cluster, the Pondermatic, consisting of five machines (mostly 486 machines) rendered the Povbench test image in 1 minute, 45 seconds. To put this in perspective, a single 486-66 running the same job takes ~20 minutes to complete. A 266 Mhz MMX processor scored in at 3 minutes, 5 seconds.
The overclocked, dual processor machines scored considerably better. Using single processor mode the render time was 1 minute, 4 seconds. Using both CPU's on a single machine dropped the render time to 39 seconds. Adding the second machine's dual CPU's dropped the time to 22 seconds.
The original Pondermatic cluster compares well with 300 & 400 MHz Pentium II's as well as a DEC Alpha 500 MHz machine. The Pondermatic IV cluster performed quite well in PVMPOV as compared to other parallel machines. The SMILE cluster was 1 second faster and consists of $27,000 of Pentium II 350MHz machines!
True parallel supercomputing is now easily within the reach of the home user. Applications in raytracing are readily available as are applications in molecular modeling, electromagnetics and weather forecasting.
Two modest (200-266Mhz) machines can perform nearly as well as a 400Mhz Pentium II machine. Older, slower 486 class machines can help further reduce processing times, but the real benefits seem to be in having machines that are nearly equal in speed and power. A modest cluster can allow a raytracer like PVMPOV to produce quality animations quickly.
Dual processor, SMP performance can be reliably obtained by using Intel Celeron CPU's using the Abit BP6 motherboard for under $600. Those daring enough to void the their Intel warranties can tweak the Celeron 366 to 458MHz with little extra effort.
PVMPOV performance is image related. The main parameter to tweak is the -nh & -nw switch values. There is an optimum based on the image being rendered and the cluster configuration.
I'd like to follow this initial work up in the following areas:
The bottom-line is if you feel the need for speed and are on a budget a beowulf cluster may well be the answer.
Contact Rick Bono at: rjbono@hiline.net