LaB #2: Writing My First Perl Script
Learning to program with Perl and writing my first Perl script
One thing I noticed while creating VLinux was the amount of software available in Bioinformatics. This made me realize that developing software could be one of the areas I can focus on.
That meant learning a programming language.
Why I Chose Perl
Although I had some familiarity with C and C++, the syntax of those languages were hard for me to grasp, and so I didn't end up writing any programs on my own.
In contrast, the Perl syntax was relatively simple. Getting started was also easy as the Perl interpreter was preinstalled on Linux systems. Many system programs depended on the Perl package and its modules.
Unlike C and C++, Perl did not have a compilation step. This meant, I would write statements in a text file and then execute them by running perl script.pl
. If there were errors, I would try fixing them and then rerun the script until it worked as expected!
The Learning Process
One resource I found very useful was the book Beginning PERL for Bioinformatics by James Tisdall. The code samples included in the chapters of the book helped me understand concepts such as subroutines, handling file input and output, and working with regular expressions.
One such exercise I remember was parsing sequence information from FASTA and PDB format files. This is also when I learnt about the FASTA format and the importance of new line characters (\n
or \r\n
on Windows) and the plain-text file format.
My First Perl Script
Writing and testing code samples helped me learn the syntax, but I had not written any programs so far that solved a particular problem. Then, there was a situation where I got this opportunity.
The practical Bioinformatics course I taught at the time included an exercise on homology modelling using a program called Modeller. One way to assess the quality of a protein 3D structure generated using the program was to calculate the structure's DOPE (Discrete Optimized Protein Energy) score and compare it to that of the template.
However, the files containing the energy profiles of the model and template did not include gaps that were in the original sequence alignment. So I wrote a script that used the sequence alignment as reference and introduced the gaps in the profiles, and then generated a plot using Gnuplot. I shared the script on a page in the Modeller project wiki.

What This Experience Taught Me
- Learning a new programming language was interesting, but using it to solve a problem was a lot more exciting.
- The script worked, but had some limitations. It only worked with two sequences, there was code duplication, and there were no error checks. If I were writing it today, I would try to include some basic error checks and reduce code duplication as much as possible. Another possibility to consider would be to use existing modules, for example, BioPerl for reading and parsing sequence and alignment files.
- This experience was useful in system administration tasks like setting up local installations of software written in Perl such as EMBOSS-Explorer and Pise for use by students.
Homology modelling using software like Modeller was a CPU intensive process.
In the next issue, I will share how I set up a small Linux cluster using OpenMosix (now defunct) to pool CPU capacity from available machines.
If you would like to receive this newsletter, you can subscribe here.