Updating BLAST databases can can be done using the update_blastdb command but this script only updates databases if they are not being used.
The following command will download and/or update the swissprot protein database in the current directory:
update_blastdb --decompress --passive swissprot
Output:
Connected to NCBI
Downloading swissprot.tar.gz... [OK]
Here is the list of downloaded files:
ls
swissprot.tar.gz swissprot.tar.gz.md5
One issue with this approach is that any long running BLAST jobs currently
accessing the database will be aborted. To overcome this problem, I wrote
a wrapper around the update_blastdb
command - blastdb_updater.py
.
It uses a symbolic link to the latest version of the database and only updates the link if the database is not being used. If the database is being used, the script adds a message to the log after the database download is complete. The link can then be updated manually later.
Note
This script will only work on Linux/Unix-like systems due to its dependence on thelsof
command to check if a directory is being accessed.
Download python script
From vimalkvn/sysadminbio
repository on GitLab:
Link: blastdb_updater.py
Save this script as blastdb_updater.py
under
/home/user/programs/blastdb_updater.py
(only used for
the purpose of the examples below). It can be saved somewhere else.
Usage
Assuming you would like to download the swissprot database
to /home/user/blast
, use:
python /home/user/programs/blastdb_updater.py \
-d swissprot -p /home/user/blast
A log file will be available under
/home/user/blast/log/blastdb_updater.log
.
To use the database in your BLAST search, you can use:
blastp -db /home/user/blast/swissprot/swissprot \
-query sample.fasta
Other databases (supported by update_blastdb
) can be downloaded in the same manner.
Automated update
An automated update can be setup using cron
:
MAILTO=email@domain
0 0 1 * * /home/user/programs/blastdb_updater.py \
-d swissprot -p /home/user/blast
The above cron job will update the database on the 1st of every month.