I attempted to parallelize a part of the project IntaRNA. It is a new general and fast approach to the prediction of RNA–RNA interactions incorporating accessibility of target sites as well as the existence of a user-definable seed. Taking up this project in an interdisciplinary topic posed a lot of challenges. I had to make the code faster with no knowledge of the underlying algorithm and no understanding of the input and the output.
The author Martin-mann had already made the serial code optimized to the maximum possible and marked the regions that needed to be parallelized. This is how I knew where to begin from, however I wanted to see the figures for myself and so I decided to do an analysis of the runtime of the project. I used gprof to do code profiling. I had some troubles interpreting the results however I figured out that the fillHybridE was approximately taking 162.27s out of the total runtime of 249s which is around 65% of the total time and was getting called once for each target.
% cumulative self self total
time seconds seconds calls s/call s/call name
0.06 162.27 0.09 20 0.00 0.02 PredictorMfe2dHeuristic::fillHybridE()
The code structure for fillHybridE consists of four nested for loops. Martin had suggested me to parallelize the inner two loops, however I wanted to have as much as parallel region as possible and I started with the outermost for loop. I have described in detail the approach taken by me and the mistakes I made in my previous blogs The Project and Fixing Race Conditions.
I used an Intel core i5 processor with four cores to perform the analysis. Sample input data was provided by Martin https://github.com/BackofenLab/IntaRNA/issues/18#issuecomment-274523517. I used variations of this data to vary the word load. The given graphs show the variations in time with respect to the work load.
As the inner for loop deals with the query size, we can see the parallel code keeps getting faster as the query length increases.
The speedup obtained by increasing the number of targets is not very prominent as it adds the overhead of creation and removal of threads on each iteration of outer for loop.