Introduction

Blam is software for comparing the genomes of two organisms.

Given two NCBI-format .seq files, blam extracts the annotated genes and performs a global pairwise sequence alignment for each pair of genes. The output of the program is those pairs of genes for which the (normalized) score exceeds a certain threshold. Because GPSA allows for gaps and weighted scoring matrices, it is more accurate than some other commonly used techniques such as sequence identity (but much slower).

You may wish to view a short presentation about Blam and its use, if you have Flash installed.

Technical Information

The sequence alignment is performed using the PAM250 matrix, but it can easily be modified to use other matrices.

The alignment routine itself is a short piece of carefully tuned C and Pentium-III machine code. The rest of the program is written in Standard ML (build with mlton). Comparing two bacterial genomes of moderate size naively requires 25 million sequence alignments, each a Θ(n2) operation. Therefore, the program is optimized to avoid doing comparisons when the results can not possibly meet the threshold. On a modern (as of 2004) computer, a genome comparison of two 5,000 gene bacteria takes about nine hours.

Blam is free, open source software. You can get the source code to compile it yourself, or download an executable for windows command line (711kb).

Status

Blam was written by Tom Murphy and Heather Hendrickson as a class project for CMU CS 15-856 Computational Molecular Biology and Genomics. It is unfortunately underdocumented and lacks the capability for customization without rebuilding. However, it has been carefully tested and is used in real research. We encourage you to use it, but advise you to contact us first.

Back to Tom's Web Page.