Beowulf Software Overview

The beowulf cluster operates not from a single piece of software, but rather from a collection of carefully selected packages that interact to create our high performance computing environment. The purpose of this page is to catalog what that software is, and how it interacts.

Operating System

 * Ubuntu Server — Linux distribution installed on the majority of our clusters. As of June 2014, the currently installed version is 12.04 LTS.
 * Gentoo Linux — Linux distribution installed on

System Administration Software

 * Apt, Aptitude, and Dpkg — Ubuntu's native package management system
 * Encap and epkg — lightweight package management for manually compiled software
 * SystemImager — distributing complete system installations across
 * TFTP server — protocol used for PXE network booting
 * rsync — file synchronization used during imaging process
 * Ganglia Monitoring System — web interface & notifications to monitor cluster status
 * etckeeper — system configuration version control

Virtualization Software
This software is used to run and manage the virtual machines that host the clusters' master nodes:


 * KVM — a hardware-accelerated virtualization system
 * QEMU — another virtualization system whose utilities are used by KVM
 * libvirt — a library (and suite of tools) abstracting many of the details of KVM and other virtualization systems
 * virsh — a management console built on top of libvirt
 * virt-manager — a management GUI built on top of libvirt
 * Ubuntu JeOS — a minimal flavor of the Ubuntu Server OS appropriate for virtual machines
 * LVM — a disk virtualization system

Network Administration Software

 * OpenSSH — remote logins and file copying
 * NIS — coordination of host and user data
 * DHCP — IP address assignment and network configuration
 * NTP — time synchronization
 * iptables and ufw — firewall

Historical System Administration Software
These software packages have been used on the clusters in the past. They are not currently installed, but might be considered again in the future:


 * IPMItool — remote hardware-level system monitoring & administration ( only)
 * APCUPSD — power supply monitoring ( only)
 * Firestarter — firewall

Cluster System Software
These software packages are used to support cluster applications:


 * Sun Grid Engine — job scheduling
 * Apache HTTPD — web server, used by Ganglia, WebMapReduce, HiPerCiC, many others
 * NFS — network file sharing
 * MySQL — database server
 * PostgreSQL — database server

Libraries, Languages, and Frameworks
These tools are used to create distributed cluster applications:


 * OpenMPI — API for distributed parallel computing
 * Hadoop — [[wp:map-reduce] framework
 * PHP — web scripting language, used by Ganglia, HiPerCiC, others
 * Python — general purpose scripting language, used by WebMapReduce, HiPerCiC, many system packages
 * Django — web application framework for Python, used by WebMapReduce], [[HiPerCiC
 * Ruby — flexible, object-oriented scripting language
 * Ruby on Rails — web application framework for Ruby, used by older versions of HiPerCiC

Cluster Applications
These programs are specific applications of parallelism, or tools that are used very directly for those applications:


 * HiPerCiC — framework for parallel computing applications in other disciplines
 * Riparian — HiPerCiC project
 * WebMapReduce — simple web interface for map-reduce in education

Historical Cluster Applications
These applications have been installed on the clusters in the past, but have not been installed on any recent cluster builds. They may be installed again in the future:


 * BLAST — biological sequence alignment
 * R — statistics programming environment
 * SNOW — R package providing parallel programming interface
 * Rmpi — MPI driver for R
 * CCT 2.0 — web interface for querying biological data
 * ATLAS — linear algebra package with parallelism support; often used in benchmarking