Skip to content

NDI_CREATEGENBANKCONTROLLEDVOCABULARY - create the controlled vocabulary dictionary for animals, ...)

This function examines the name file 'names.dmp' and node file 'nodes.dmp' from the GenBank taxonomy database in the directory DIRNAME. It generates a new text file called 'GenBankControlledVocabulary.tsv' with the following structure:

Header row: 'Scientific_NameGenBank_Common_NameSynonyms<Other_Common_Name' and then 1 entry per organism.

This function also takes name/value pairs that modify the behavior. Parameter (default) | Description

root_node ('Bilateria') | Root scientific name to start with; usually 'Bilateria' to | include most research organisms but not cell lines, | bacteria, viruses, etc (everything not 'Bilateria'). | Use 'Root' for everything. nodefile ('nodes.dmp') | File name of the node file within DIRNAME namefile ('names.dmp') | File name of the name file with DIRNAME outname (... | Output filen name of the file written to disk ['GenBankControlled'... | 'Vocabulary.tsv'])

The taxonomy data is available at

This function usually takes a couple of hours to run and shows 3 progress bars (the first one is faster than the second).