Beowulf Installation Guide

This guide is for Administrators who are installing cluster software from scratch.

Admin OS Installation

 * 1) Set up a PXE Boot server or use a CD Rom to install Linux (Fedora 7) on the admin and gateway nodes. Each of these nodes should have 2 network interfaces.
 * 2) Don't use LVM partitions. Create a custom partition layout with
 * 3) * 2r MB of swap, where r is the amount of RAM in MB
 * 4) * 100 MB ext3 /boot
 * 5) * ext / that fills to the available space
 * 6) Consider these general guidelines when selecting what packages to install. You will probably be installing more software later, anyway.
 * 7) * Install X Windows/GNOME. It just comes in handy sometimes, and it can always be turned off later.
 * 8) * Install all development packages available, except Java.
 * 9) * Don't install Java.
 * 10) * Don't install obviously unneeded packages like videoconferencing, bluetooth, or smartcards.
 * 11) * Completely disable SELinux.
 * 12) Remember to yum update all machines that you install from the Fedora disc image.

Admin Configuration
The admin machine will serve as the DHCP server for the compute network and as an NFS and NIS server for all the machines in the cluster.

GoTemp
GoTemp uses a "custom" Linux driver. We use a Linux driver tutorial to write this driver - that just happens to use the GoTemp its example. So, we really aren't writing our own driver. This driver will need to be recompiled each time you update the kernel.


 * 1) Obtain the ols_2005_driver_tutorial_example_code.tar.gz file. We are interested in the step-6 code.
 * 2) In gotemp.c, comment out lines 113-115 and 197 and change the author info at the end of the file.
 * 3) Type 'make' to make the driver.
 * 4) Copy gotemp.ko to /lib/modules/`uname -r`/kernel/drivers/usb/misc
 * 5) Type  . This may or may not return an error.
 * 6) Move the ldusb.ko driver somewhere else or rename it to a non-ko file. Make it non-executable.
 * 7) Make gotemp.ko executable.
 * 8) Run depmod.
 * 9) Plug in GoTemp and restart the system.
 * 10) Test GoTemp with the appropriate scripts.

DHCP Server

 * 1) Configure /etc/dhcpd.conf
 * 2) Add the internal interface to DHCPARGS in /etc/sysconfig/network
 * 3) Use ntsysv --level 35 to make the service start on boot.
 * 4) Use /sbin/service dhcpd start to start the service.
 * 1) Use /sbin/service dhcpd start to start the service.

NIS Client
This should be done after the NIS server is ready on admin.


 * 1) Add a line NISDOMAIN=helios.public.stolaf.edu (or castaway) to /etc/sysconfig/network
 * 2) Add a line   (or castaway) to /etc/yp.conf
 * 3) Start the ypbind service and use ntsysv or chkconfig to make it start on boot
 * 4) Modify nsswitch.conf so that nis is used after files for passwd, group, hosts, and services.

NIS Server

 * 1) Enable NIS service
 * 2) Start NIS
 * 1) Start NIS

hosts
Sun Grid Engine (SGE) is very picky about hostnames. We want every machine to have exactly one hostname defined in the hosts file, which will be NIS shared to all machines.

services
Add services for sge_qmaster and sge_execd.

passwd
Create all cluster user accounts here.

client configuration

 * 1) There is a bug in NFS/NIS/RPC. To fix, create a file /etc/sysconfig/rpcbind with RPCBIND_ARGS=-i
 * 2) Set the NISDOMAIN variable in /etc/sysconfig/network. This should be the same for all machines in the given cluster.

NFS Server
Export /home and /fedora from admin. Make home (rw).

Firewall
Admin will not have a firewall, per se, but you need to add a rule to iptables for NAT forwarding. This is best done manually.

Local Update Repository
To save on bandwidth, package updates will be downloaded from an off-site mirror only once for the cluster. The nodes will access the updates from a local repository on the admin node.

Admin Repository
Apache must be running (httpd). Use ntsysv or chksys to enable it upon startup.

Create directories for both the base and update sets of packages:

$ mkdir /var/www/html/yum/{base,updates} $ mkdir /var/www/html/yum/updates/9

(note: using fc9; replace "9" with installed Fedora release number)

Make the base directory a repository with :

$ yum install createrepo ... $ createrepo /var/www/html/yum/base

This creates a directory. Check that  contains three gzipped files and one more called.

Fill the base package repository. You may either download all available packages from a mirror, or copy from an installation disk:

$ mount dev/cdrom /mnt $ cd /mnt/Packages $ cp -v * /var/www/html/yum/base

Choose a mirror with I2 and rysnc and sync it with the updates directory:

$ rsync -avrt rsync:// /updates/9/i386 --exclude=debug/ /var/www/html/yum/updates/9

Nodes
Once the repository has been set up, the nodes need to be configured to update from it. On each node:

In the  file:


 * change  to
 * In the  section comment out the line starting with
 * In the  section add the line:   (where "10.2.2.254" is the IP of the computer home to the repository).

In the  file:


 * change  to
 * In the  section comment out the line starting with
 * In the  section add the line:   (where "10.2.2.254" is the IP of the computer home to the repository).

Now a  or other package update script, cron job, etc., will use the local repository.

SGE qmaster
The SGE files will be installed on admin and NFS mounted on the other machines. This setup is fairly simple, but the downside is that NFS traffic to the SGE share might interfere with computational traffic. Another option would be to mount only some parts of the SGE tree by NFS and store others locally.

The admin machine will be the master node for SGE. So aside from copying the files into the SGE root directory, we must also run install_qmaster. This qmaster daemon must be installed before the execution daemon is installed on the execution node.

1) Download the SGE common and linux binaries from Sun and

NIS Client
This should be done after the NIS server is ready on admin.


 * 1) Add a line NISDOMAIN=helios.public.stolaf.edu (or castaway) to /etc/sysconfig/network
 * 2) Add a line   (or castaway) to /etc/yp.conf
 * 3) Start the ypbind service and use ntsysv or chkconfig to make it start on boot
 * 4) Modify nsswitch.conf so that nis is used after files for passwd, group, hosts, and services.

NFS Client

 * 1) Add a line   to /etc/fstab and type

Firewall
Gateway will use Fedora's built-in firewall, in addition to the NAT forwarding rule that admin uses.


 * 1) Configure Fedora's firewall to allow HTTP,HTTPS, and SSH.
 * 2) Make a custom rules file for the NAT forwarding rule, and configure it in Advanced Options.

LDAP Authentication
Only the gateway will use LDAP authentication. This will effectively prevent ordinary users from logging in to the admin machine and nodes.

ssl start_tls pam_password md5
 * 1) In the file /etc/ldap.conf
 * 2) Comment out the line beginning with
 * 3) Change the line beginning with   to   and change it to "no".
 * 4) At the end of the files, enter uri ldap://ldap.stolaf.edu/
 * 1) Enter
 * 2) Restart ypbind and re-enable it in ntsysv or chkconfig.

Kickstart
Kickstart automates the process of installing Fedora by answering the questions asked during installation which normally require user input (i.e. partition settings, packages, etc). Using Kickstart along with pxelinux allows Fedora to install on all nodes simultaneously. Because the same distribution and configuration are used, this method also maintains consistancy of software architecture among all the nodes.

pxelinux
First make sure the DHCP server is configured on the admin node. In the  file, edit the following lines to read:

filename "pxelinux.0"; next-server 10.1.2.254;

(note: settings for Beowulf; next-server is 10.2.2.254 for Castaway)

Restart the DHCP server.

$ service dhcpd restart

Then configure a TFTP server. Install if not already: $ yum install tftp-server

To store pxelinux files, make a directory.

Download syslinux and copy the file  to.

Find the installation tree of the desired distribution. If it's on an installation disk, find it under. Change to the  directory and copy ,  , and   to.

Make sure the installation tree (or .iso installation disk image) is somewhere on the hard disk such as in.

Kickstart Configuration
The pre-defined options that automate the install process are stored in a single Kickstart configuration file. Such a file was probably automatically created at  when Fedora was installed on admin. A graphical interface is available to edit this file or create a new one from scratch:

$ yum install system-config-kickstart $ system-config-kickstart

Carefully go through and set all the installation options. Here is a useful resource. In particular, be sure to set NIS as the installation method, specify the NIS server (i.e. 10.1.2.254 for Beowulf), and specify the directory containing the installation tree or .iso disk image (i.e. ). Save the file in a directory such as   as , for example.

Installing Fedora
Look at the Anaconda reference and Kickstart reference.

Make sure you specify lang, keymap, noipv6, and ksdevice for Anaconda.

APPEND initrd=initrd-f9-64 noipv6 ksdevice=eth0 lang=en_US keymap=us ip=dhcp method=nfs:10.3.1.254:/fedora/f9-x86_64/

Now go back to the  directory and make a new directory. Inside, create a file  to specify boot parameters. It should look something like:

default linux serial 0,38400n8 label linux kernel vmlinuz append ksdevice=eth0 load_ramdisk=1 initrd=initrd.img network ks=nfs:10.2.2.254:/data/Kickstart/ks.cfg

(Change the  line to reflect the correct NFS server IP address, as well as the location and name of the Kickstart configuration file.)

Share the locations of the pxelinux, Kickstart, and installation tree files on the NFS. Edit  to include the lines:

/tftpboot /data /images

(or wherever these pieces are)

Be sure the DHCP, NIS, and TFTP services are running. Turn on the nodes and wait for Fedora installation to complete. Do not restart the nodes until you've reconfigured  to boot locally!

System Imager
In order to ensure consistency in software architecture between nodes, we used to use SystemImager to install Fedora on our nodes. We used version 3.8.1 of SI as of 6.20.07. Please note: use of SystemImager depends upon a properly configured DHCP server set up on the admin node, which will pass a proper IP address to a golden node.

Installation
There is currently a naming conflict between a class in SystemImager, and a Fedora library class. Thus, the source code will not compile. You will have to use the pre-compiled rpm files to install SystemImager. Download the RPMs with the following command lines: $ mkdir systemimager $ cd systemimager $ wget http://download.systemimager.org/pub/sis-install/install $ chmod u+x install $ ./install -v --download-only --tag stable --directory. systemconfigurator \ systemimager-client systemimager-common \ systemimager-i386boot-standard systemimager-i386initrd_template \ systemimager-server

Before installing, you will need to install a dependency. As root, type: $ yum install perl-XML-Simple perl-AppConfig $ yum update

Now, you're ready to install. On the admin machine, type: $ yum install system-config-netboot tftp-server dhcp httpd syslinux $ rpm -ivh systemconfigurator-* $ rpm -ivh systemimager-common-* systemimager-server-* systemimager-i386boot-standard-*

On the 'Golden Node' (the node that is completely configured), type: $ rpm -ivh systemconfigurator-* $ rpm -ivh systemimager-common-* systemimager-client-* systemimager-i386initrd_template-*

Preparing Clients
Log on to the golden node. Replace "192.168.xxx.xxx" with the IP address of admin, and type: $ si_prepareclient --server 192.168.xxx.xxx

When this is complete, log on to the admin node as root. Replace "192.168.xxx.xxx" with the IP address of the golden node and type: $ si_getimage --golden-client 192.168.xxx.xxx --image image_name --exclude '/media/*' --exclude '/lib/klibc/events/*' This will cause you to collect the image of the golden node, and will place this image in /var/lib/systemimager/images/[image_name].

The initrd.img and kernel files associated with this build are located at /usr/share/systemimager/boot/i386/[image_name]. Copy these two files to your /tftpboot/ directory. When they are copied, note their file names. Add them to the /tftpboot/pxelinux.cfg/default file.

Imaging
Start the SystemImager daemon as root. $ /etc/init.d/systemimager-server-rsyncd start DO NOT RUN si_mkbootserver.... it messes up tftp and xinetd.d.

Now we get to add clients. Once again, just follow the directions and define which clients you would like to image. Note: these clients must be defined in your dhcpd.conf file: $ si_addclients Now, simply boot the clients you would like to image. Everything should automate itself after that. Don't worry about warnings when imaging... the process should take about half an hour. Only reboot a machine if it runs into an error.

Define the following parameters to the kernel boot options of the clients:

* MONITOR_SERVER=IP|HOSTNAME: IP address or hostname of the monitor server * MONITOR_CONSOLE=yes|no: enable or disable full console view, if enabled it's possible to follow all the installation session of the clients (stdout and stderr) in the monitoring interface (default is no)

epkg
Before installing Open MPI, you must first install epkg. Download the version 2.3.8 binaries from http://www.encap.org/epkg/, and follow the installation directions. $ mkdir /usr/local/encap $ cd /usr/local/encap $ wget ftp://ftp.encap.org/pub/encap/pkgs/cites/ix86-linux2.4/epkg-2.3.8.tar.gz $ tar -xvvzf epkg* $ rm epkg-2.3.8.tar.gz $ epkg-2.3.8/bin/epkg epkg-2.3.8/

OpenMPI Installation
$ cd /usr/local/encap $ wget http://www.open-mpi.org/software/ompi/v1.2/downloads/openmpi-1.2.3.tar.gz $ tar -xvvzf openmpi-1.2.3.tar.gz  $ rm openmpi-1.2.3.tar.gz  $ cd openmpi-1.2.3 $ ./configure --prefix=/usr/local/encap/openmpi-1.2.3 $ make all install

Now we need to epkg the openmpi program (so that everyone can use it). So.... $ cd /usr/local/encap $ epkg openmpi-1.2.3

MPI is ready to go!!!

Passwordless SSH
In this tutorial, 'local' refers to the administration node, and 'remote' refers to a single beowulf node. You must be root to do these commands: local$ ssh-keygen -t rsa  ;Hit [Enter] through the options. local$ scp ~/.ssh/id_dsa.pub remote:~/.ssh/authorized_keys ; enter your password local$ ssh root@remote ;You should have passwordless ssh now. remote$ chmod 600 ~/.ssh/authorized_keys remote$ exit

Hadoop
TODO

Apache HTTPD
TODO