OpenMP Sieve of Eratosthenes

The sieve of Eratosthenes is an algorithm used to calculate prime numbers by iteratively eliminating possibilities. A summary of the algorithm (citation needed) is: While k^2 <= n
 * Create an unmarked list of numbers 2,3,4,...,N where N is the range in which to calculate primes.
 * Create a variable k = 2 (the first unmarked number in the list)
 * Do
 * Mark all multiples m of k such that k^2 <= m <= n
 * Calculate the smallest number greater than k that is unmarked. Set k to this value.
 * The remaining unmarked numbers are prime numbers.

Lab Description
The sieve algorithm has been turned into a lab to demonstrate parallel programming using OpenMP (C++). The lab consists of an initial implementation of the sieve, followed by two iterations that add optimizations to the code. The first implementation is described in (citation needed), where it is also implemented using MPI.


 * sieve.cpp (Iteration 1)
 * sieve.cpp (Iteration 2)
 * sieve.cpp (Iteration 3)

Optimization #1
The first optimization is taking advantage of the fact that there is only a single even prime number. By eliminating all even numbers from the lists/computation, a large amount of space and time can be saved.

Optimization #2
The second optimization removes the two OpenMP barrier directives from the code. These are synchronization directives that force all threads to reach a common point before any thread can continue. Since they were used to synchronize the value of k, the program is changed such that every thread keeps track of k on its own. This is achieve by pre-calculating every prime from 3-sqrt(N) (remember that even numbers are disregarded). Each thread is then given a copy of this list of primes -- k is changed by iterating through the list. This optimization allows each thread to move at its own pace, further reducing execution time.

Makefile
The following Makefile was used to build the sieve code. A typical directory contains sieve.cpp, Makefile, build/, and bin/. Note: TAB characters were converted to spaces by the wiki editor, so be sure to change the indenting to use tabs.

Benchmarking and Issues
Benchmarking results on a 32-core machine provide evidence that both the algorithm optimizations and parallelism improve the performance of the sieve. A memory allocation problem (at thread 21) was encountered when running the sieve using N=100,000,000 and P=32, which is still being investigated. (Benchmark information pending)