Riparian Paper

Abstract
Riparian zones, or strips of vegetation located adjacent to flowing water, are vital ecosystem resources that benefit surrounding areas primarily through nutrient biofiltration. Although the mechanism is unclear, researchers have shown an empirical relationship between riparian root biomass and levels of denitrification.

To investigate the role of riparian physiological response on these nitrogen dynamics, Schade ran two simulations: a control, and an experimental or response model where physiological adaptation was not constrained. Two feedback mechanisms, contingent on available nitrogen levels, were found during the course of this analysis. The second mechanism is worrying in that high nitrogen led to a retreat of root mass, a corresponding drop in denitrification, and consequently a self-perpetuating increase in nitrogen.

This model was recreated and adapted to run on the St. Olaf Beowulf cluster and initial analysis indicates that this finding holds true for a wide array of plant species and physiological characteristics. Future research is needed to verify these findings and also to potentially expand this research to an ecosystem level (riparian linking) and into the classroom (HiPerCiC).

Introduction: Riparian Zones
In order to provide a meaningful understanding of the aims of this research project, it is necessary to explain the basic ecological unit under investigation: the riparian zone. Riparian zones are defined as strips of vegetation adjacent to a body of continuously flowing water. Riparian vegetation may occur naturally, but in many cases farmers and wildlife managers plan and preserve these zones to act as biofilter buffering regions for nearby agricultural fields. Many vegetation types, ranging from desert seepwillow to temperate forest fauna, fall into the riparian category.

Riparian zones play an important role in the dynamics of an ecosystem. The most salient riparian feature is the mitigation of excess nitrogen runoff, but these zones serve many other ecological functions as well. Depending on the vegetation present in a riparian zone, the ecosystem may also benefit from falling erosion levels and an accompanying rise in water quality, increased biodiversity compared to agricultural monocultures, and a reduction in noise pollution [1].

Nature is not the only beneficiary of riparian zones. In addition to these boons to the natural ecosystem, residents in surrounding locales typically enjoy several additional advantages as well. Two of these that are frequently cited are a more pleasurable aesthetic, and an increased potential for recreational activities such as biking. Riparian regions may also increase privacy of nearby residents. For these reasons, riparian zones frequently generate a gain in surrounding property values [2].

Riparian Nitrogen Cycling
The principal purpose for planned riparian zones is biofiltration of excess nutrients, primarily nitrogen. To elucidate the dynamics of riparian filtration, we can start by examining the processes of nitrogen flow and plant mediation of denitrification.

As Figure 1 illustrates, excess nitrogen flows towards a riparian zone and riparian vegetation stimulates the activity of underground microbial communities. These bacterial communities initiate denitrification. Unfortunately, present research does not offer a precise mechanism for microbial stimulation although some researchers speculate that carbon or enzyme addition by riparian plants could be responsible [4]. Two key things are known about the functioning of this process, however. First, empirical evidence indicates that denitrification directly correlates with root mass and root detritus [3]. Second, this process induces underground bacterial communities to chemically convert available nitrogen, usually in a nitrate form, into atmospheric nitrogen (N2). This second aspect of the process is critical because it transforms nitrogen into a gaseous or atmospheric form that escapes the local ecosystem.

''Figure 1. A conceptual depiction of the dynamic mediation of microbial denitrification by plant root mass production (Schade et al.)''

Unfortunately, this is not the only plant response to consider. When nitrogen is readily available and flowing across these regions, riparian plants diminish the free supply of the nutrient in two ways: the roots uptake available nitrogen and incorporate the excess into plant tissue, and root biomass stimulates the aforementioned underground microbial communities to engage denitrification. The balance between these alternate pathways has significant ramifications for the ecosystem and is consequently a vital interest to this investigation. When a plant absorbs nitrogen through root uptake and incorporates it into leaf or stem tissue, this nitrogen is only temporarily removed from the nitrogen pool of the ecosystem. This nitrogen will re-enter the ecosystem as dead root matter, or detritus. This occurs because the plant continually reforms its root network in a process referred to as root turnover. The re-entry of nitrogen into the ecosystem via the process of root turnover is referred to as nitrogen recycling. Ideally, riparian zones would maximize denitrification rather than nitrogen recycling producing an overall export of nitrogen from the local ecosystem.

The following diagram represents the composite model of nitrogen flow in riparian zones as conceived in Professor Schade's model. This diagram represents the linkage of two submodels. The first represents the interactions between root biomass, root to shoot ratio (proportion of root nitrogen to plant shoot nitrogen), and root turnover. The second submodel represents the inorganic nitrogen pool and its interactions with microbial denitrification, plant uptake, and plant export. As you can see below in the diagram, these two submodels have been integrated and connected. This was achieved using the interaction of root biomass and root detritus on denitrification and recycled nitrogen, which are mediated at some level by the inorganic nitrogen pool. This mediation occurs via plant productivity and selective allocation of this productivity to roots versus shoots.

''Figure 2. Nitrogen Flow in Riparian Zones (Schade et al.)''

This diagram is somewhat complex and has many connections but there are several important ones to focus on. As discussed earlier, riparian plant communities stimulate denitrification. The mechanism for this stimulation is reflected by the root submodel in the diagram. As the arrows reflect, Denitrification is modulated principally by levels of root biomass and detritus (which is itself dependent on root biomass). The dynamics of the nitrogen pool are also evident in the diagram. Hydrologic flow and nitrogen recycling from root detritus are the two sources of available nitrogen while the forces accounting for nitrogen consumption are plant uptake, denitrification, while the leftover is export. These processes affect the level of available nitrogen and this factor in turn modulates the plants physiological response leading to changes in plant productivity, tissue nitrogen levels, and root to shoot ratio. Finally it is important to note the arrow leading from plant productivity to root production as this is one of the links between the submodels.

Schade's Analytical Approach to Uptake and Denitrification
To reiterate, the primary thrust of this research effort is to examine the plasticity of plant physiological response, specifically in the forms of tissue nitrogen and root to shoot ratios. This response is being utilized as a framework to examine nitrogen flow in riparian zones, especially in the context of high nitrogen loads. To investigate the relationship between the physiological response of riparian plants and nitrogen flow, Professor Schade used two types of simulations.

The first type of simulation involves holding the plant's tissue percent N and root to shoot ratio's invariant throughout the model run. This 'constant' simulation is important because it is the control or basis of comparison to the response model. The response model allows the plant to respond physiologically to the dynamics of nitrogen flow via modifications to tissue nitrogen and root to shoot ratio. By comparing these two simulation types we can isolate the impact of changes in plant physiology on the function of riparian zones, an area not traditionally emphasized by riparian researchers [3].

''Figure 3. Equations used to simulate Schade's nitrogen flow model (Schade et al.)''

Figure 3 provides all the equations utilized in simulating Schade's nitrogen model. The most salient features of this system are equations four, six, and seven. Equation four is an exponential decay function used to calculate root to shoot ratio and is dependent on available nitrogen. This equation is only used in the response model and is held invariant in the constant model. Equations six and seven have the %N term in them which is likewise plastic in the response model and invariant in the constant model.

Model Results
Professor Schade performed single-processor simulations using this set of equations in Matlab. Although computational power limited the scope of this parameter analysis, two interesting results were uncovered. When nitrogen availability is low, the response model exhibits lower levels of uptake and productivity and higher levels of denitrification than the constant model. Conversely, when nitrogen load is high, this effect is reversed and these relationships flipped.

These phenomenon can be accounted for by the dynamics of nitrogen flow examined earlier. In the response model, when nitrogen levels are low the plant responds physiologically by increasing the balance of roots over shoots in an attempt to absorb more nitrogen and balance out the nitrogen scarcity. This increase in root mass stimulates a corresponding increase in root detritus. As discussed earlier, denitrification is proportional to these two factors and therefore is also elevated. Because denitrification further decreases nitrogen levels, this process creates a positive feedback loop.

High nitrogen loads creates an inverse phenomenon that again turns out to be a positive feedback loop. The plant physiologically responds to high nitrogen availability by increasing shoot mass and tissue nitrogen levels. Because resources are shifted are shifted to the shoots, root biomass is reduced. The drop in root mass reduces root detritus and in turn diminishes denitrification levels. The drop in denitrification increases available nitrogen creating the positive feedback cycle.

Investigative Findings & Complications
After implementing Professor Schade's model in the C language and parallelizing the algorithm using the MPI interface, simulations were conducted on the helios cluster. Each parameter's interval was sampled using a geometric progression ranging from 25% to 400% of each parameter's base value (provided by Professor Schade). The initial results of this investigation are very promising. 7,086,244 parameter sets were simulated creating approximately 43GB of output data. If each parameter set's simulation was allowed to run for 5 years, 100% of these set's exhibited the switch between feedback types at high nitrogen levels. If these sets were instead run until steady state was reached (defined as a state where plant productivity changes by less than .001 / day), 97.98% exhibited the expected feedback switch observed by Schade.

Several unexpected complications arose during the exploration and testing of this model. Parameter space sampling was the first dilemma we encountered. The 6 parameters we were varying in our model runs unfortunately had no known and hard-set limits in nature. Our initial approach to sampling was been to perform a geometric progression ranging from 25% to 400% of our baseline value for each parameter. Nothing in the nitrogen flow system dictates a geometric pattern and this progression choice is therefore somewhat arbitrary. Our plan to address this issue in the future is to use a series of geometric progressions to find a viable parameter space before performing a uniform linear progression across this space for comparison.

The second unforeseen complication we ran into involved early termination of steady state. I discovered this issue after performing several test runs in which a large proportion of the model sets returned NaN (not a number) values for several of the outputs. Investigating this phenomenon, I found these outputs were occurring in cases where steady state was reached before denitrification ever started. It makes sense that this would occur in cases where the parameter for maximum plant productivity (P_max) were too low to allow the plant to physiologically respond to low nitrogen. However, we would like to investigate this phenomenon further to discover which other types of parameter sets may cause this early termination.

Parallelizing Scientific Problems
To set the stage for discussing the specifics of implementing this riparian flow model on the St. Olaf helios cluster, let us examine the general process of parallelization in the context of scientific problems. The following discussion on the general subject of parallelization is an abridged version of another wiki page located here

The first major obstacle to tackle in the process of parallelization is determining an appropriate paradigm to use for a problem. With most scientific problems or models, several possible parallel strategies may be able to correctly solve a problem and it is important to recognize the range of possible solutions.

An example of this would be Conway's Game of Life. Essentially this game consists of a grid of 'cells' that live, die, or come back to life according to how many live neighbors the cell has.

The most intuitive parallel approach to this problem would be to have each node in your computing cluster represent a single cell. For each round, each cell would report it's status to each of its neighbors and in turn receive status reports from all of its neighbors. The cell would finally decide whether to die, live, or regenerate based on the information transmitted by its neighbors. This cycle repeats ad infinitum.

While this approach is intuitive and perhaps almost obvious, it is by no means the only possible parallel paradigm for this problem. Another feasible approach would be to appoint one node in the cluster, perhaps the head node, as a (gatekeeper). This node would be analogous to an air traffic controller in an airport, overseeing communication and shaping and directing traffic. All nodes would report their status to the gatekeeper node at the end of each round and the gatekeeper would in turn broadcast out the needed status information to all nodes.

Regardless of the problem, it is likely that there will exist a plethora of possible parallel algorithms, and generating a diverse array of solutions increases the likelihood of finding a solution that fits both your problem and your hardware/software setup.

As previously alluded to, the next major step in parallelization is choosing a parallel paradigm that best fits both your scientific problem and your existing cluster software and hardware. Hopefully at this point in the process, you have several possible parallel algorithms capable of modeling your problem and are now faced with the mixed blessing of selection.

The first step when beginning to select a solution is evaluation of your software and hardware. These are the two major factors that will influence your decision and if optimization is crucial to your project then identifying the bottlenecks in these two components is essential. Parallel computing is a balancing act between processor speed and network bandwidth and latency. Diagnostic software for your cluster. such as ganglia, will help you identify which of these two is likely to be more of a concern. Additionally, if you can rapidly prototype code for your scientific problem, you can use software such as this to pinpoint whether processing speed or network speed is your limiting factor.

The precision of your algorithm selection will once again depend on your needs. Ideally, you would analyze each algorithm on your list and express it's running time in Big-O notation. You would also want to do something similar in terms of network usage. Although I'm not aware of a comparable established analytical framework such as Big-O notation for network usage, you could analyze usage in terms of number of communications per round for example.

Once you've finally selected what appears to be the best fit algorithm from your list of possibilities, the time has come to actually put your plan into action and code your solution. There exists a wide variety of languages, packages, and API's available for parallel programming such as parallel Haskell, PVM, or openMPI. Your choice of which software package to utilize for your particular project will most likely come down to either what's pre-installed on your cluster or what you're most comfortable with.

Regardless of the software package used, there will exist several commonalities in implementing your algorithm:   role identification  Depending on your chosen algorithm, different nodes may play different roles in your computation. The most common example of this would be the use of a director or gateway node (traffic director) in our Conway's Game Of Life example. Upon implementing these different roles, a method for assigning roles to given nodes should be developed. This is especially important if you have specialized processing or network hardware for specific node types.  stage analysis  Typically your algorithm will naturally separate into several stages, the number of which will depend upon the number of node roles you have and the nature of the algorithm itself. At the very least you will have one stage for communication between nodes and one stage for processing information received. Often times you will have multiple instances of each one of these stage types and it's important to delineate these stages both conceptually and practically in your code structures.  

node communication stages  The communication stages of your algorithm are the real essence of parallelism in your program and this should manifest itself in your code. The communication stage, as addressed from the point of view of a single node, can be broken down into a couple component parts. The first is the conditional decision of whether communication is necessary this round. If no state changes occur a node may decide not to communicate. Secondly, a node must pick out recipients to send data to unless a form of broadcast communication is being used.  data processing stages  The data processing stages of your algorithm will constitute the core of the scientific problem itself. Ideally they should have no parallel code elements within them, however these stages will handle data that is received and prepare data to be sent off in the next communication. The majority of this stage will be quite similar to a non-parallel version of the algorithm.  

Parallelization on Helios
While the previous section dealt with the general topic of parallelization, this section will simply be a short summary or guide to performing this process on the helios or castaway clusters. This discussion is once again also available in the documentation section of the wiki located here


 * Take in the Science -- The first step is to gain at least a basic understanding of the science involved. Although it is tempting to jump straight to the methods or equations, you'll generally find that a conceptual understanding of the topic will allow you to optimize your algorithm or approach. Take the time to read through the paper and pose questions to the professor involved. While important, the amount of time you invest in this step is flexible and will depend on the nature of your project and the time constraints involved.
 * Extract the Algorithm -- Now the time has come to jump to the equations. Most of your needed material will likely lie in the methods section of the paper. Get a sense of any iterative processes involved and if it's helpful, sketch out pseudocode outlining loops or other programmatic structures. Professor feedback during this step is key, if you fail to get the algorithm right your time will be wasted.
 * Translation -- Once your structure is outlined in pseudocode and your equations are at hand, the process of translation to actual code is straightforward and should be a familiar process. At this point it will be a good idea to consider your plans for parallelization. What information needs to be shared? What code will be specific to certain nodes or processes and what will be general? Although you will likely implement the algorithm first without parallel elements, planning ahead by segmenting and commenting the sections and values that will be parallel will help.
 * Sample Runs -- If your code and up and running, the next step is to do sample runs. Consult your professor to find reasonable test values but also be sure to try special cases to validate your code's correctness.
 * Parallelize -- Now that your code has hopefully been validated, you can parallelize it for the cluster probably by utilizing MPI (message passing interface). Hopefully you outlined earlier what data needs to be sent between processes and what code is shared between different process types (if you have different roles). Debugging this process can be a little tricky but it is useful to have all processes output diagnostic messages normally. These messages will be redirected back to the head node by MPI but do be sure to label which node or process they're coming from. For more information, take a look I've provided in the preamble which discusses the process of generalization from a general perspective.
 * Cluster testing -- Testing your code on the cluster is relatively straightforward. Jobs are submitted using the qsub command and you may monitor the status of your job using the qstat command. The man pages on these two commands provide detailed instructions for their use. Once your job has finished running, your output (both STDIN and STDOUT) will be returned in the form of several files which you can then examine and debug.

Conclusion: Future Directions
Professor Schade's research into the dynamics of nitrogen nutrient cycling represent an area of pressing concern in ecology. This research, which I have continued on the cluster, presents the opportunity for continued academic exploration in both research and classroom settings.

The 'Model Complications' section of this paper has already outlined two areas for potential future exploration: new parameter space sampling methods and investigation of early steady state termination. These are two concerns that will need to be addressed before this research can be published and applied to current landscape planning practices.

Research may also extend this nitrogen model in new directions. Spencer Debenport's research during the '07 summer session illustrates this potential. Spencer's work attempted to extend my initial inquiry by simulating linkages between sequential riparian plants. This project may be useful in discovering dynamic interactions between plants and understanding the wider ecosystem as a whole.

The riparian nitrogen cycling project also represents the spearhead of a new thrust, by the computer science department, to place high-performance computing within the classroom. The HiPerCiC project maintained by Todd Frederick and Jeremy Gustafson has created a convenient web interface that will eventually allow students to launch runs of this model on the cluster in a classroom setting. The HiPerCiC project has the potential to transform ecology labs from static investigations with pre-calculated data into a dynamic interaction between the student and the cluster.

Appendix -- Code
riparian.c library.h