Skip to content

ndi.database.fun.createGenBankControlledVocabulary

NDI_CREATEGENBANKCONTROLLEDVOCABULARY - create the controlled vocabulary dictionary for animals

ndi.database.fun.createGenBankControlledVocabulary(DIRNAME, ...)

This function examines the name file 'names.dmp' and node file 'nodes.dmp' from the GenBank taxonomy database in the directory DIRNAME. It generates a new text file called 'GenBankControlledVocabulary.tsv' with the following structure:

Header row: 'Scientific_NameGenBank_Common_NameSynonyms<Other_Common_Name' and then 1 entry per organism.

This function also takes name/value pairs that modify the behavior. Parameter (default) | Description


root_node ('Bilateria') | Root scientific name to start with; usually 'Bilateria' to | include most research organisms but not cell lines, | bacteria, viruses, etc (everything not 'Bilateria'). | Use 'Root' for everything. nodefile ('nodes.dmp') | File name of the node file within DIRNAME namefile ('names.dmp') | File name of the name file with DIRNAME outname (... | Output filen name of the file written to disk ['GenBankControlled'... | 'Vocabulary.tsv'])

The taxonomy data is available at https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz.

This function usually takes a couple of hours to run and shows 3 progress bars (the first one is faster than the second).