Skip to content

  NDI_CREATEGENBANKCONTROLLEDVOCABULARY - create the controlled vocabulary dictionary for animals, ...)

  This function examines the name file 'names.dmp' and node file 'nodes.dmp' from 
  the GenBank taxonomy database in the directory DIRNAME. It generates a new text file
  called 'GenBankControlledVocabulary.tsv' with the following structure:

  Header row:
    and then 1 entry per organism.

  This function also takes name/value pairs that modify the behavior.
  Parameter (default)     | Description
  root_node ('Bilateria') | Root scientific name to start with; usually 'Bilateria' to
                          |  include most research organisms but not cell lines, 
                          |  bacteria, viruses, etc (everything not 'Bilateria').
                          |  Use 'Root' for everything.
  nodefile ('nodes.dmp')  | File name of the node file within DIRNAME
  namefile ('names.dmp')  | File name of the name file with DIRNAME
  outname (...            | Output filen name of the file written to disk
  ['GenBankControlled'... | 

  The taxonomy data is available at

  This function usually takes a couple of hours to run and shows 3 progress bars
  (the first one is faster than the second).