CAP F07

Overview
St. Olaf's Senior Capstone Seminar (CS 390, CAP) focuses on applications of Beowulf clusters in Fall 2007. Six senior CS majors will collaborate with three juniors and a sophomore to develop new and ongoing projects in Biology and other disciplines, and extend the base cluster software for our two clusters. This effort will be led by Prof. Brown and our two cluster managers, Todd Frederick '09 and Taylor Reece '09.

Biology, Ecology, Environmental Science

 * HiPerCiC (High-Performance Computing in the Classroom) Create a pedagogical system for exercises and projects that require analysis of data produced by computational models running on a Beowulf cluster. This involves building a user interface and back-end implementation so that students can request result data sets to be computed (using a particular model), then query those results, as they form hypotheses and ideas for subsequent data sets.  Initial application to Tony Waldschmidt's implementation of Prof. John Schade's model of nitrogen flow in a riparian organism, hopefully with applications this term to Prof. Schade's lab course this term.  (Jeremy Gustafson, Todd Frederick)
 * BLAST Standard tool for finding bioinformatics data, such as a string of DNA, in large data sets. How can we make it accessible and useful for biologists on campus? NCBI has a web interface to this program, but those searches run on NCBI's servers. One possible project is a web interface that runs queries on the Helios Cluster. The code for NCBI's interface is freely available, but past researchers have been unsuccessful at installing it. The researcher on this project should analyze the options for a web interface, which might include building one.
 * CCT Locally developed tool for tracking changes in external bioinformatics databases. Can we find a base of users and serve their needs? This tool needs to be ported to our new systems architecture, which includes a 64-bit OS and Sun Grid Engine. Depending on users' needs, other possible extensions to CCT include multi-user integration with the Cluster's web interface and shared databases between users.
 * Tools for Bioinformatics (CS 315, Spring)?
 * Sizer Statistical exploratory data analysis of ES data. Port this MATLAB application to Octave or C++
 * LANDIS II Forest Landscape Simulation model developed at UW. This program is mostly closed-source C#, but it might run under Linux in Mono. The goal is to parallelize the program. First, parallelize a part of the program called the Succession Module, for which the source is available. If you can show the LANDIS creators that you parallelized Succession, they might let you have the Core source code to work on. There is "broad interest" in a parallelized version of Landis. (John Giannini)
 * Charcoal dispersion. Extend existing locally written simulation to use high-precision computations. (Matt Baudino)
 * Riparian vegetation (Tony Waldschmidt)
 * DNA analysis of Tetrahymena thermophilia Make many runs of a PERL script to assess the likelihood of an interesting observed phenomenon. (John Giannini)

Other application areas

 * Seismic UNIX Improve performance of an analysis of sonar data that indicates the layered structure of polar ice. The CEGSIC project collects sonar data in Antarctica and performs a multi-step process to produce layer charts of the ice sheets. On a certain set of data, one particular step took four hours on a single computer. This summer, we parallelized that step so that the same data set runs in four minutes. Next year, CEGSIC will have seven times as much data to process, so they are interested in ways to do their analysis faster. First, develop an interface that allows this Physics project to transfer large (2 Gb) data files to and from Helios and run the processing program. Then, investigate ways in which the processing could be even further accelerated.
 * Stata Make it convenient for Statistics researchers to perform large exploratory statistical runs on a high-performance cluster node.
 * R Seek out cluster applications for this standard statistical package.
 * Palantir Seek out parallelization of computations, such as the Correspondence Problem in 3d vision. (Will Voorhees)

Cluster system development

 * UPS shutdown (Will Voorhees)
 * User interface projects, e.g., in PHP
 * Security/reliability review (Chad Norberg)
 * Parallelization projects?
 * Develop libraries for high-precision calculation and communication
 * Beowiki management, e.g., add references on wikitext

Ethical Analysis
Two ethical analysis teams have formed in the class.
 * Chad, Jeremy, Matt
 * John, Tony, and Will

Documentation Plan
The following documentation plan was developed on Friday 10/5.
 * All or most documentation will appear on this wiki
 * The following documentation will be created:
 * Security report on the cluster (Chad)
 * Process of taking a scientific model (e.g., Schade's Riparian model) and implementing in code (Tony)
 * Technical documentation for HiPerCiC user interface (Jeremy)
 * how to modify the GUI
 * architecture
 * Freedberg scripts parameter file (John)
 * Landis II documentation (John)
 * Efforts to date
 * External documentation for programmers
 * Palantir (Will)
 * Charcoal distribution code (Matt)
 * Charcoal
 * For developers; for users
 * Including architecture diagram, UML

Materials

 * Trapezoid Tutorial