Fersiynau electronig

Dangosydd eitem ddigidol (DOI)

  • Alejandro De Santiago
    University of Georgia
  • Tiago Jose Pereira
    University of Georgia
  • Timothy John Ferrero
    Department of Life Sciences, The Natural History Museum, Cromwell Road, London SW7 5BD, UK; Email: g.boxshall@nhm.ac.uk.
  • Natalie Barnes
    Department of Life Sciences, The Natural History Museum, Cromwell Road, London SW7 5BD, UK; Email: g.boxshall@nhm.ac.uk.
  • Delphine Lallias
  • Simon Creer
  • Holly Bik
    University of Georgia
In metabarcoding studies, Linnaean taxonomy assignments of Operational Taxonomic Units (OTUs) or Amplicon Sequence Variants (ASVs) underpin many downstream bioinformatics analyses and ecological interpretations of environmental DNA (eDNA) datasets. However, public molecular databases (i.e., SILVA, EUKARYOME, BOLD) for most microbial metazoan phyla (nematodes, tardigrades, kinorhynchs, etc.) are sparsely populated, negatively impacting our ability to assign ecologically meaningful taxonomy to these understudied groups. Additionally, the choice of bioinformatics parameters and computational algorithms can further affect the accuracy of eDNA taxonomy assignments. Here, we use two in silico datasets to show that taxonomy assignments using the 18S rRNA gene can be dramatically improved by curating Linnaean taxonomy strings associated with each reference sequence and closing phylogenetic gaps by improving taxon sampling. Using free-living nematodes as a case study, we applied two commonly used taxonomy assignment algorithms (BLAST+ and the QIIME2 Naïve Bayes classifier) across six iterations of the SILVA 138 reference database to evaluate the precision and accuracy of taxonomy assignments. The BLAST+ top hit with a 90% sequence similarity cutoff often returned the highest percentage of correctly assigned taxonomy at the genus level, and the QIIME2 Naïve Bayes classifier performed similarly well when paired with a reference database containing corrected taxonomy strings. Our results highlight the urgent need for phylogenetically informed expansions of public reference databases (encompassing both genomes and common gene markers), focused on poorly sampled lineages that are now robustly recovered via eDNA metabarcoding approaches. Additional taxonomy curation efforts should be applied to popular reference databases such as SILVA, and taxon sampling could be rapidly improved by more frequent incorporation of newly published GenBank sequences linked to genus- and/or species-level identifications.
Iaith wreiddiolSaesneg
Rhif yr erthygle70080
CyfnodolynEnvironmental DNA
Cyfrol7
Rhif y cyfnodolyn2
Dyddiad ar-lein cynnar25 Maw 2025
Dynodwyr Gwrthrych Digidol (DOIs)
StatwsCyhoeddwyd - 1 Ebr 2025
Gweld graff cysylltiadau