Towards complete and error-free genome assemblies of all vertebrate species
Allbwn ymchwil: Cyfraniad at gyfnodolyn › Erthygl › adolygiad gan gymheiriaid
StandardStandard
Yn: Nature, Cyfrol 592, Rhif 7856, 28.04.2021, t. 737-746.
Allbwn ymchwil: Cyfraniad at gyfnodolyn › Erthygl › adolygiad gan gymheiriaid
HarvardHarvard
APA
CBE
MLA
VancouverVancouver
Author
RIS
TY - JOUR
T1 - Towards complete and error-free genome assemblies of all vertebrate species
AU - Rhie, Arang
AU - McCarthy, Shane A.
AU - Fedrigo, Olivier
AU - Damas, Joana
AU - Formenti, Giulio
AU - Koren, Sergey
AU - Uliano-Silva, Marcela
AU - Chow, William
AU - Fungtammasan, Arkarachai
AU - Kim, Juwan
AU - Lee, Chul
AU - Ko, Byung June
AU - Chaisson, Mark
AU - Gedman, Gregory L.
AU - Cantin, Lindsey J.
AU - Thibaud-Nissen, Francoise
AU - Haggerty, Leanne
AU - Bista, Iliana
AU - Smith, Michelle
AU - Haase, Bettina
AU - Mountcastle, Jacquelyn
AU - Winkler, Sylke
AU - Paez, Sadye
AU - Howard, Jason
AU - Vernes, Sonja C.
AU - Lama, Tanya M.
AU - Grutzner, Frank
AU - Warren, Wesley C.
AU - Balakrishnan, Christopher N.
AU - Burt, Dave
AU - George, Julia M.
AU - Biegler, Matthew T.
AU - Iorns, David
AU - Digby, Andrew
AU - Eason, Daryl
AU - Robertson, Bruce
AU - Edwards, Taylor
AU - Wilkinson, Mark
AU - Turner, George
AU - Meyer, Axel
AU - Kautt, Andreas F.
AU - Franchini, Paolo
AU - Detrich, H. William
AU - Svardal, Hannes
AU - Wagner, Maximilian
AU - Naylor, Gavin J. P.
AU - Pippel, Martin
AU - Malinsky, Milan
AU - Mooney, Mark
AU - Simbirsky, Maria
AU - Hannigan, Brett T.
AU - Pesout, Trevor
AU - Houck, Marlys
AU - Misuraca, Ann
AU - Kingan, Sarah B.
AU - Hall, Richard
AU - Kronenberg, Zev
AU - Sović, Ivan
AU - Dunn, Christopher
AU - Ning, Zemin
AU - Hastie, Alex
AU - Lee, Joyce
AU - Selvaraj, Siddarth
AU - Green, Richard E.
AU - Putnam, Nicholas H.
AU - Gut, Ivo
AU - Ghurye, Jay
AU - Garrison, Erik
AU - Sims, Ying
AU - Collins, Joanna
AU - Pelan, Sarah
AU - Torrance, James
AU - Tracey, Alan
AU - Wood, Jonathan
AU - Dagnew, Robel E.
AU - Guan, Dengfeng
AU - London, Sarah E.
AU - Clayton, David F.
AU - Mello, Claudio V.
AU - Friedrich, Samantha R.
AU - Lovell, Peter V.
AU - Osipova, Ekaterina
AU - Al-Ajli, Farooq O.
AU - Secomandi, Simona
AU - Kim, Heebal
AU - Theofanopoulou, Constantina
AU - Hiller, Michael
AU - Zhou, Yang
AU - Harris, Robert S.
AU - Makova, Kateryna D.
AU - Medvedev, Paul
AU - Hoffman, Jinna
AU - Masterson, Patrick
AU - Clark, Karen
AU - Martin, Fergal
AU - Howe, Kevin
AU - Flicek, Paul
AU - Walenz, Brian P.
AU - Kwak, Woori
AU - Clawson, Hiram
AU - Diekhans, Mark
AU - Nassar, Luis
AU - Paten, Benedict
AU - Kraus, Robert H. S.
AU - Crawford, Andrew J.
AU - Gilbert, M. Thomas P.
AU - Zhang, Guojie
AU - Venkatesh, Byrappa
AU - Murphy, Robert W.
AU - Koepfli, Klaus-Peter
AU - Shapiro, Beth
AU - Johnson, Warren E.
AU - Di Palma, Federica
AU - Marques-Bonet, Tomas
AU - Teeling, Emma C.
AU - Warnow, Tandy
AU - Graves, Jennifer Marshall
AU - Ryder, Oliver A.
AU - Haussler, David
AU - O’Brien, Stephen J.
AU - Korlach, Jonas
AU - Lewin, Harris A.
AU - Howe, Kerstin
AU - Myers, Eugene W.
AU - Durbin, Richard
AU - Phillippy, Adam M.
AU - Jarvis, Erich D.
PY - 2021/4/28
Y1 - 2021/4/28
N2 - High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1–4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
AB - High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1–4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
U2 - 10.1038/s41586-021-03451-0
DO - 10.1038/s41586-021-03451-0
M3 - Article
VL - 592
SP - 737
EP - 746
JO - Nature
JF - Nature
SN - 1476-4687
IS - 7856
ER -