ScratchDocumentation

PREPARING AN IMAGE

1) Install any software (BLAST, R, et cetera) on any computer (ie, wolf###) that has been imaged with base_install 2a) On this machine, logged in as admin, run the command: sudo /usr/sbin/si_prepareclient --server castaway 2b) You will be prompted for a password, which is the same password as the one you logged in with (ie, account admin's password) 3) The image is now ready to be retrieved

RETRIEVING AN IMAGE

1) Log into castaway.public.stolaf.edu 2a) While logged in as admin, run the command: sudo /usr/sbin/si_getimage -golden-client wolf### -image IMAGENAME --post-install reboot --exclude "/data/*" 2b) Substitute the ### in wolf### for the number of the machine you're capturing 2c) Substitute IMAGENAME with the name of the image you want to create / update. By convention, use username-identifier, where username is your username and identifier is some sort of identifier, such as the date, what you're doing (ie, if you were playing around with R, do username-R). 3) The image will now be captured. Be patient!  This can take awhile

If you are merely updating an image (ie, you've run 2a with the same IMAGENAME before), it will take much less time than creating a whole new image.

DO NOT CREATE MORE THAN FIVE OF YOUR OWN IMAGENAMES! The hard drive is only so big. ..

IMAGING A NEW MACHINE

This process assumes that the hardware has been fully prepared (optical, zip, floppy drive removed, ram standardized, et cetera).

1) Plug an ethernet cord, power cord, monitor (VGA) cord, keyboard, and mouse to the target machine. Turn it on and enter BIOS (F1 on the gateway machines) 2) Set the time, and change the boot order to be PXEboot (Intel UNDI or something), and then hard drive (IDE or some such). Remove everything else. Get a pencil and paper. Save and exit BIOS. 3) The PXE boot screen will come up. For about 40 seconds, you will have the opportunity to copy down the MAC address of the ethernet card (which is used for imaging the machine over the network).  Copy this down. 4) Open the /etc/dhcpd.conf file on castaway.public.stolaf.edu and create a new entry (as admin account, sudo emacs /etc/dhcpd.conf). If the last entry says wolf003, create a wolf004 entry. Thus the entry for wolf004 with a MAC address of 00:03:47:9B:93:82 will look like this:

host wolf004 { hardware ethernet 00:03:47:9B:93:82; fixed-address 192.168.0.4; }

Note that the hardware ethernet (MAC) entry changed, as did the fixed-address field (the IP). The last part of the IP corresponds to the number of the machine (ie, wolf004 would have an IP that ends in .4, whereas wolf003 would have an IP ending in .3). 5) Save, and then type: sudo service dhcpd restart 6) Restart the machine we are imaging (in our running example, wolf004). Imaging should now successfully complete.

If it does not image, then take a peek in the /tftpboot/pxelinux.cfg/ directory on castaway. If imaging does not occur, then the symbolic link for default points to hdboot (which of course boots to the hard drive), or, a file beginning with C exists that is the hex value of the IP address of the machine we're targeting (the hex address of 192.168.0.4 is C0A80004; cryptic, I know). Remove the C0A80004 file if it is there, and change the symbolic link on default to image the machine ('sudo ln -sf imagemachine default' will do the trick). When the machine is done imaging, it will restart and be ready to go. You may wonder, why doesn't it image again? Take a peek at the /tftpboot/pxelinux.cfg/ directory. A C0A80004 file will exist! A daemon on castaway monitors the progress of an imaging machine and creates this file (which causes the imaged machine, upon booting, to find this file on reboot, and this file says that the machine should boot from its hard drive) when imaging completes.

=Devel Cluster Layout=

The cluster will consist of a gateway, an administrative machine, and nodes. The gateway will be connected to St. Olaf. The gateway will also be connected to the administrative machine. The administrative machine will be connected to each node via switches. Each machine is running Fedora Core Linux with the gateway machine being PowerPC and the rest being x86. Everything is connected via 100 megabit networking.

=Imaging from a machine= Creating a loopback from a fedora disc: mount -o loop -t iso9660 FC-6-i386-DVD.iso /mnt/iso

=Ganglia= Install rrdtool (latest). Use ./configure --prefix=/usr/local/rrdtool-version

Install ganglia. Be sure to also get the startup scripts installed.

=Forwarding SSH from gateway to admin with iptables= We want traffic on the public interface destined for gateway:22 to be forwarded to admin:22. We also want to change the port number of sshd on gateway from 22 to 24.

1. Change the Port line in /etc/ssh/sshd_config from Port 22 to Port 24 2. Change stuff in Firestarter: port forwarding, opening up the new ports

Forwarding the Web Server with port forwarding
1. Turn off apache on Gateway 2. Use firestarter to forward port 80 to admin

epkg
1. Install latest epkg. 2. When installing new software, configure it with --prefix=/usr/local/encap/[packagename-version] 3. To install all packages in the encap directory, the latest version, with root privileges at the command line, type: epkg -b. (for batch). 4. To remove a package, epkg -r [package-name]. 5. To revert to an older version of software installed in encap, (say, openmpi-1.1.4 and openmpi-1.2), type: epkg -i openmpi-1.1.4

modifying new version of openmpi
1. On the admin machine, append the following to the end of the openmpi-mca-params.conf: btl_tcp_if_include = eth0 btl_tcp_if_exclude = lo,eth1

You need to do this for each new installation of mpi, since each version has its own custom mca-params file.

systemimager
1. Install systemimager server files and systemconfigurator on the machine to be the server (admin). 2. Install systemimager client files on a golden node. 3. Since images will reside on the secondary hard drive, create a directory on the second hard drive that will store these files and in the /var/lib/systemimager directory, make symbolic links that reference an images and a tarballs file from the backup hard drive directory. 4. There is an option in the systemimager config file to change the directory from /var/lib/systemimager to be something else, but don't do this because there is a bug and it will mess up everything. 5. Do not use systemimager with the epkg system. It's better to have these sorts of things go directly in /usr/sbin (the default install location) anyways, because they're more root-ish. 6. In /etc/systemimager/systemimager.conf, change the NET_BOOT_DEFAULT value to equal local (default is net). What this does: after a node has reimaged itself, it will now tell the pxeboot system to boot from the local hard drive. If this wasn't done, a node would re-image itself indefinitely. 7. In /etc/systemimager/bittorrent.conf. Set BT_INTERFACE=eth0. Set BT_COMPRESS=n (no) (future experimentation required). Set BT_UPDATE=yes (there may be a bug that forces an update on every startup. look into this. everytime you restart the demon, it would have to make as many torrent files as there are images, which is no good.). Change BT_IMAGES to be a comma, no space, deliminated list of images that use bittorrent. 8. Modify /etc/systemimager/pxelinux.cfg/syslinux.cfg. Change the append variable to look like

APPEND vga=extended initrd=initrd.img root=/dev/ram ramfdisk_blocksize=1024 ramdisk_size=80000 BITTORRENT=y 9. Install a bittorrent client on both server and golden node. The one we chose was the stardard bittorrent distribution. Use yum. 10. On the server machine, turn on the following startup scripts: systemimager-server-bittorrent, systemimager-server-netbootmond.

Using systemimager
On the golden node, run the following command:

si_prepareclient --server admin

On the server (admin), type the following:

si_getimage --golden-client [goldenclienthostname] --image [nameofimage] --exclude '/media/*' --exclude '/var/log/*' --exclude '/data/*' --exclude '/var/lib/boinc/projects' --exclude '/var/lib/boinc/sched_reply_setiathome.berkeley.edu.xml' --exclude '/var/lib/boinc/sched_request_setiathome.berkeley.edu.xml' --exclude '/var/lib/boinc/slots' --exclude '/var/lib/boinc/lockfile' --post-install reboot

Use si_addclients to assign clients to an image. Literally, just run it and it will take you through what to do. Use si_mkbootserver to configure the /tftpboot directory. (Maybe initrd image and kernel files). RUNNING THIS COMMAND WILL ERASE EVERYTHING IN /tftpboot/pxelinux.cfg !

Use si_mkclientnetboot to setup which nodes will reimage at reboot.

Updating the data directory
1. Log on to admin. 2. Become root. 3. Change to root's home 4. Run:

mpirun -np 26 -hostfile hostfile /data/ClusterManagement/update-data-dir