I took up the task of parallelizing the code for the open source project IntaRNA. It is a new general and fast approach to the prediction of RNA–RNA interactions incorporating accessibility of target sites as well as the existence of a user-definable seed. It can successfully predict bacterial sRNA targets and determined the exact locations of the interactions with a higher accuracy than competing programs. Taking up this projects gives me an opportunity to work on an interdisciplinary topic and also to apply the skills learned in a class room to a real life application.
The first step was to install the software and understand how it works and what it does. I had a few troubles however with the help of the author and managing organization I was able to get it running and try some test cases. The executable accepts a target and a query string which are RNA sequence and give out details of possible interaction between the two sequence.
With a crude idea of the input and output I went through the compilation process. I understood the make file. The author martin-mann had marked parts of code which can be parallelized. I went ahead with profiling just to get more information of the working. I changed the -O3 (optimization flag) to -pg to enable the profiler. I had to switch O3 off as the optimizer and profiler doesn’t work well together.
Profiling was not an easy task. For small inputs the output of gprof showed lots of functions with 0% execution time for all. On trying bigger inputs I got some numbers, but it was not very helpful. Maximum amount of time taken were by vector access functions and even they were as small as 4-5 %.
In a process to find parallelizable regions I looked at the function fillHybrid(). This function computes hybridization energy and fills it in the matrix. According to the results of profiling this function consumes 1.52% of the time and is called 45 times which is the number of targets I provided.
According to the results of profiling this function looks like a possible candidate for parallelization. On taking a closer look we find that it contains nested loops but only the inner loop can be parallelized as variables are updated in the outer loop, which might cause race conditions.
I went ahead and parallelized the inner loop to compare the results. The parallelized code was a little faster compared to the serial one. However the speedup is almost negligible and as well be caused by noise. Therefore my conclusion will be that parallelizing this piece won’t give us much of an advantage.
Running the code n 45 targets I got the following result:
Time taken by serial code: 11m 25s
Time taken by making inner loop parallel: 11m 18s