Docs/parallelize helios

.

The following is intended to be a simple guide to parallelizing scientific problems on the St. Olaf cluster. This guide is specifically aimed at student researchers in the context of receiving a professor's scientific model, typically in a paper, and being asked to parallelize the problem using this base. I've also done a more general analysis of the process of parallelization that can be found here.

Steps

 * Take in the Science -- The first step is to gain at least a basic understanding of the science involved. Although it is tempting to jump straight to the methods or equations, you'll generally find that a conceptual understanding of the topic will allow you to optimize your algorithm or approach. Take the time to read through the paper and pose questions to the professor involved. While important, the amount of time you invest in this step is flexible and will depend on the nature of your project and the time constraints involved.
 * Extract the Algorithm -- Now the time has come to jump to the equations. Most of your needed material will likely lie in the methods section of the paper. Get a sense of any iterative processes involved and if it's helpful, sketch out pseudocode outlining loops or other programmatic structures. Professor feedback during this step is key, if you fail to get the algorithm right your time will be wasted.
 * Translation -- Once your structure is outlined in pseudocode and your equations are at hand, the process of translation to actual code is straightforward and should be a familiar process. At this point it will be a good idea to consider your plans for parallelization. What information needs to be shared? What code will be specific to certain nodes or processes and what will be general? Although you will likely implement the algorithm first without parallel elements, planning ahead by segmenting and commenting the sections and values that will be parallel will help.
 * Sample Runs -- If your code and up and running, the next step is to do sample runs. Consult your professor to find reasonable test values but also be sure to try special cases to validate your code's correctness.
 * Parallelize -- Now that your code has hopefully been validated, you can parallelize it for the cluster probably by utilizing MPI (message passing interface). Hopefully you outlined earlier what data needs to be sent between processes and what code is shared between different process types (if you have different roles). Debugging this process can be a little tricky but it is useful to have all processes output diagnostic messages normally. These messages will be redirected back to the head node by MPI but do be sure to label which node or process they're coming from. For more information, take a look I've provided in the preamble which discusses the process of generalization from a general perspective.
 * Cluster testing -- Testing your code on the cluster is relatively straightforward. Jobs are submitted using the qsub command and you may monitor the status of your job using the qstat command. The man pages on these two commands provide detailed instructions for their use. Once your job has finished running, your output (both STDIN and STDOUT) will be returned in the form of several files which you can then examine and debug.
 * Migration -- My experience migrating was quite straightforward. Simply sftp from the devel cluster to the helios cluster, copy over your program files, and run the program as you did before.