The Project

I took up the task of parallelizing the code for the open source project IntaRNA. It is a new general and fast approach to the prediction of RNA–RNA interactions incorporating accessibility of target sites as well as the existence of a user-definable seed. It can successfully predict bacterial sRNA targets and determined the exact locations of the interactions with a higher accuracy than competing programs. Taking up this projects gives me an opportunity to work on an interdisciplinary topic and also to apply the skills learned in a class room to a real life application.

The first step was to install the software and understand how it works and what it does. I had a few troubles however with the help of the author and managing organization I was able to get it running and try some test cases. The executable accepts a target and a query string which are RNA sequence and give out details of possible interaction between the two sequence.


With a crude idea of the input and output I went through the compilation process. I understood the make file. The author martin-mann had marked parts of code which can be parallelized. I went ahead with profiling just to get more information of the working. I changed the -O3 (optimization flag) to -pg to enable the profiler. I had to switch O3 off as the optimizer and profiler doesn’t work well together.

Profiling was not an easy task. For small inputs the output of gprof showed lots of functions with 0% execution time for all. On trying bigger inputs I got some numbers, but it was not very helpful. Maximum amount of time taken were by vector access functions and even they were as small as 4-5 %.

In a process to find parallelizable regions I looked at the function fillHybrid(). This function computes hybridization energy and fills it in the matrix. According to the results of profiling this function consumes 1.52% of the time and is called 45 times which is the number of targets I provided.

According to the results of profiling this function looks like a possible candidate for parallelization. On taking a closer look we find that it contains nested loops but only the inner loop can be parallelized as variables are updated in the outer loop, which might cause race conditions.

I went ahead and parallelized the inner loop to compare the results. The parallelized code was a little faster compared to the serial one. However the speedup is almost negligible and as well be caused by noise. Therefore my conclusion will be that parallelizing this piece won’t give us much of an advantage.

Running the code n 45 targets I got the following result:

Time taken by serial code: 11m 25s

Time taken by making inner loop parallel: 11m 18s

Experiment: Initializing array in parallel

The first experiment I tried involves writing to elements of an array. For a 2d array this operation is of complexity n^2 and often needs to be parallelized. If we are writing to separate array locations it is also free from race conditions. The first experiment is to compare performance of filling up an array in serial and parallel.

The code used for the experiment is very simple. I take the size of the array as input from the user. Create an array of size nxn and then initialize array index i, j with the product ixj.

void compute(int **arr,int n){

    for(int i = 0;i<n;i++){

        for(int j = 0;j<n;j++){





The results obtained clearly show that the parallel operations are faster. However as the array size reaches almost 30,000 the performance for both the serial and parallel execution becomes the same. The reason for this is at array size 30,000 the memory occupied is 3.6GB and with 4GB ram on the system we start getting a lot of cache misses which dominate the performance.

Array size

Serial(in sec)

Parallel(in sec)