Tango
Download source code by clicking Tango title above or here.
Please check that the list of dependencies below are locally installed before running.
Usage
Collect, store, and retrieve Genbank records from NCBI with just the GI number. Using NCBI's E-Utilities interface to fetch records and MongoDB as a local database for storage, the program essentially curates a local database that only contains the records you need with the most significant information. This facilitates maintaining a very specific dataset that can be accessed in downstream analysis. No more looking up NCBI files again!
When provided with GI ID(s), the program will connect and download the corresponding file(s) from NCBI, extract the most important data, and store the following in a MongoDB database:
GI, accession, sequence, version, locus, organism, sequence length, gene, protein ID, translation
Applying specific flags, documents can be created, updated, read, and removed in the MongoDB database. There are also options to name a database and the collection. For more information on how MongoDB stores it's data, visit MongoDB's documentation.
Options
-id ID(s)
-file File with ID(s) [csv or txt]
-db Database (Nucleotide, protein, etc..)
-type gb, fasta, etc...
-force Force download?
-mongo MongoDB database name
-collection Collection name in MongoDB database
-insert Insert into database [optional/default]
-update Update database
-read Read from database
-remove Remove from database
-help Shows help message
Ex.) You may choose to create different databases by supplying the -mongo
flag followed by the desired database name: -mongo Axolotl
.
Or choose a different collection by passing the -collection
flag followed by the desired collection name: -collection Protein
.
These are optional as defaults have been assigned to them already.
Database Operations
Insert
To insert new data (documents) in the database, provide the GI number(s) with the optional -insert
flag.
The following have the same function:
./tango.pl -file Examples/gis.csv
./tango.pl -file Examples/gis.txt -insert
./tango.pl -id 74960989 4165050 -insert
Update
To update data (documents) stored in the database, provide the -update
flag followed by the document you want to access in format field:value
you want to update. You will be asked the field you wish to update in that document.
The following looks for the document with _id field
matching 34577062
:
./tango.pl -update _id:34577062
It will then tell you which document you are about to update and ask which field you wish to change:
UPDATING _id record [34577062] in database...
Available fields are: _id accession sequence version locus organism seqLength gene proteinID translation
What field do you want? sequence
What is the NEW value for sequence field? NEWSEQUENCE
Document 34577062 updated, sequence field changed to NEWSEQUENCE.
Read
To read data (documents) stored in the database, provide the -read
flag followed by your query in format field:value
. You will be asked what field from the document you want to report back.
The following reads documents with _id fields
matching 34577062
and 74960989
:
./tango.pl -read _id:34577062 _id:74960989
Remove
To remove data (documents) stored in the database, provide the -remove
flag followed by your query in format field:value
you want removed.
The following removes documents with _id fields
matching 34577062
and 74960989
:
./tango.pl -remove _id:34577062 _id:74960989
Dependencies
You need to have the following installed:
-
BioPerl Modules (CPAN)
-
- [MongoDB Perl Driver] (http://search.cpan.org/dist/MongoDB/)
Developer | Bioinformatician – Decoding the world, one line at a time.
Highly motivated developer predominantly working in Linux and developing software tools. All about open source software and fascinated by working with multitudes of technologies and languages. Striving to make a positive impact in this world.