There are are three main methods of installing the client nodes.
First is cloning the nodes using the dd
command. The second
method is the one I used in the first stage of our topcat system, that
is installing the operating system on each client separately and then
running a configuration script on the server which performs the rest
of the setup. The third method is to use disk-less clients in which
case all installation and configuration is done on the server. I
shall describe the last two methods in detail because this is how I
configured our topcat
system.
The basic concept of cloning is making an exact copy of a partition from one drive onto a partition on another drive. You can install one client, configure it, and make an exact copy of the disk. You can use this disk image on other clients, and you should only have to change few settings like the IP address and hostname. If your clients have their own disk with the operating system, then this method is the easiest way of achieving it. Cloning is described in more detailed by Jan Lindheim in Building a Beowulf System http://www.cacr.caltech.edu/beowulf/tutorial/beosoft/. It is basically copying a partition from one disk to another exactly, sector by sector.
This method is different to the previous two because all client configuration is done on the server. This is because the clients have no physical disk of their own, and all their files are stored on the server node. if you want more information about booting a disk-less client you should read both NFS Root mini howto http://metalab.unc.edu/LDP/HOWTO/mini/NFS-Root.html and the NFS Root Client HOWTO.
Because on a disk-less client all system files are actually on the server, this is where the client configuration will be done. I have followed the NFS-root howto when configured our system with minor modifications.
First compile the kernel which you will use on the clients. Start with configuring :
make menuconfigMake sure you compile support for NFS-root :
CONFIG_ROOT_NFS,
CONFIG_RNFS_BOOTP, CONFIG_RNFS_RARP
.
After you have configured all the options in the kernel you can start compiling it. Issue following commands :
make dep && make clean && make zImageNow you will have to change the root device of the kernel to NFS-root. I adopted this trick of making a dummy device from NFS-root Mini-Howto
mknod /dev/nfsroot b 0 255 cd /usr/src/linux/arch/i386/boot rdev zImage /dev/nfsrootAll there is to do now is to copy the kernel image onto a floppy disk.
dd if=zImage of=/dev/fd0
If all your clients are the same you will be able to use the same image to boot all systems. In my case I had to create two different floppies, one for single CPU systems and one for SMP machines.
/var
and /etc
. Simply cut and
paste the
sdct script into a file and run it. The script
will create all necessary directories and copy all needed files. Note
that this script does NOT create a root directory for any of the
clients but simply a template which will be used by another script to
create these root directories. You will have to run the
adcn script to create the NFS-root file system for
each of the clients.
/tftpboot
. The most common way
of running this script is:
adcn -n node2 -i 10.0.0.2 -d beowulf.my.domain -l -D eth1Let us look at the command line options:
-n node2
specifies the first name of the client. This
must not be a fully qualified domain name.
-i 10.0.0.2
specifies the IP address of the client
-d beowulf.my.domain
is the DNS domain of the cluster.
If this option is not specified, server's DNS domain
(/bin/dnsdomainname) will be used. You should only have to use this
if server's domain is different to cluster's domain. In our case,
clients full name would be node2.beowulf.my.domain
-l
means listen for RARP request. When this option is
used, adcn
will listen for RARP requests on the interface
specified with the -D
option (see next paragraph) and use the
MAC address from the first "sniffed" RARP request as client's hardware
address. This option uses tcpdump
to sniff the MAC address,
so please make sure you have it installed.
-D
specifies the device connected to the cluster. If
you have more than one device connected to your cluster (cluster is
divided into more than one subnet) then you should use the interface
directly connected to the network to which the disk-less client is
connected to. This option will read the device information from
/etc/sysconfig/network-scripts/ifcfg-*
to find out the
network, broadcast, netmask, and gateway for the cluster (server's IP
will be used as the gateway). The device information will also be
used by the -l
option, telling tcpdump
which device
to "sniff" on.
adcn -h
for more information. In most cases
the example usage show above will be what you need. You can put
multiple commands in a script and setup the whole disk-less client
cluster using one command. For example, to setup a 16 node disk-less
client cluster with eth1
being server's interface connected
to the cluster, you could run this script:
#!/bin/bash adcn -n node2 -i 10.0.0.2 -d beowulf.my.domain -l -D eth1 adcn -n node3 -i 10.0.0.3 -d beowulf.my.domain -l -D eth1 adcn -n node4 -i 10.0.0.4 -d beowulf.my.domain -l -D eth1 adcn -n node5 -i 10.0.0.5 -d beowulf.my.domain -l -D eth1 adcn -n node6 -i 10.0.0.6 -d beowulf.my.domain -l -D eth1 adcn -n node7 -i 10.0.0.7 -d beowulf.my.domain -l -D eth1 adcn -n node8 -i 10.0.0.8 -d beowulf.my.domain -l -D eth1 adcn -n node9 -i 10.0.0.9 -d beowulf.my.domain -l -D eth1 adcn -n node10 -i 10.0.0.10 -d beowulf.my.domain -l -D eth1 adcn -n node11 -i 10.0.0.11 -d beowulf.my.domain -l -D eth1 adcn -n node12 -i 10.0.0.12 -d beowulf.my.domain -l -D eth1 adcn -n node13 -i 10.0.0.13 -d beowulf.my.domain -l -D eth1 adcn -n node14 -i 10.0.0.14 -d beowulf.my.domain -l -D eth1 adcn -n node15 -i 10.0.0.15 -d beowulf.my.domain -l -D eth1 adcn -n node16 -i 10.0.0.16 -d beowulf.my.domain -l -D eth1
Because your clients do not have a video card or a keyboard attached to them you cannot access them directly as you can with the server. There might be a time (specially during changes of configuration) when there is a problem with the network and you cannot telnet or rlogin to the clients so you must access them some other way. There are basically to methods of accessing clients' consoles. The first one is using monitor and keyboard switches as described by Jan Lindheim in Building a Beowulf System http://www.cacr.caltech.edu/beowulf/tutorial/building.html, and the other is using a serial terminal.
If you are installing off a CD-ROM and only have one drive for the whole system, you will have to move the CD-ROM drive from client to client after each install, or do an NFS install. If you have only one floppy disk drive you will have to move it as well. In my case I installed all the nodes from our local ftp server so I only had to move the floppy drive. To cut down on the installation time I recommend installing the full distribution. Selecting packages to install is a real pain and it is even worse if you have 16 nodes to install. These days the smallest hard disks you can buy are well over a 2 GB so you should not have to worry about disk space shortage.