A Python script to update NCBI BLAST databases

Updating BLAST databases can can be done using the update_blastdb command but this script only updates databases if they are not being used.

The following command will download and/or update the swissprot protein database in the current directory:

update_blastdb --decompress --passive swissprot

Output:

Connected to NCBI
Downloading swissprot.tar.gz... [OK]

Here is the list of downloaded files:

ls
swissprot.tar.gz  swissprot.tar.gz.md5

One issue with this approach is that any long running BLAST jobs currently accessing the database will be aborted. To overcome this problem, I wrote a wrapper around the update_blastdb command - blastdb_updater.py.

It uses a symbolic link to the latest version of the database and only updates the link if the database is not being used. If the database is being used, the script adds a message to the log after the database download is complete. The link can then be updated manually later.

Note
This script will only work on Linux/Unix-like systems due to its dependence on the lsof command to check if a directory is being accessed.

Download python script

From vimalkvn/sysadminbio repository on GitLab:
Link: blastdb_updater.py

Save this script as blastdb_updater.py under /home/user/programs/blastdb_updater.py (only used for the purpose of the examples below). It can be saved somewhere else.

Usage

Assuming you would like to download the swissprot database to /home/user/blast, use:

python /home/user/programs/blastdb_updater.py \
  -d swissprot -p /home/user/blast

A log file will be available under
/home/user/blast/log/blastdb_updater.log.

To use the database in your BLAST search, you can use:

blastp -db /home/user/blast/swissprot/swissprot \
  -query sample.fasta

Other databases (supported by update_blastdb) can be downloaded in the same manner.

Automated update

An automated update can be setup using cron:

MAILTO=email@domain
0 0 1 * * /home/user/programs/blastdb_updater.py \
-d swissprot -p /home/user/blast

The above cron job will update the database on the 1st of every month.

Comments