TY - JOUR
T1 - Lake Malawi cichlid pangenome graph reveals extensive structural variation driven by transposable elements
AU - Quah, Fu Xiang
AU - Almeida, Miguel Vasconcelos
AU - Blumer, Moritz
AU - Yuan, Chengwei Ulrika
AU - Fischer, Bettina
AU - See, Kirsten
AU - Jackson, Ben
AU - Zatha, Richard
AU - Rusuwa, Bosco
AU - Turner, George F
AU - Santos, M Emília
AU - Svardal, Hannes
AU - Hemberg, Martin
AU - Durbin, Richard
AU - Miska, Eric
PY - 2025/4/10
Y1 - 2025/4/10
N2 - Pangenome methods have the potential to uncover hitherto undiscovered sequences missing from established reference genomes, making them useful to study evolutionary and speciation processes in diverse organisms. The cichlid fishes of the East African Rift Lakes represent one of nature's most phenotypically diverse vertebrate radiations, but single-nucleotide polymorphism (SNP)-based studies have revealed little sequence difference, with 0.1%-0.25% pairwise divergence between Lake Malawi species. These were based on aligning short reads to a single linear reference genome and ignored the contribution of larger-scale structural variants (SVs). We constructed a pangenome graph that integrates six new and two existing long-read genome assemblies of Lake Malawi haplochromine cichlids. This graph intuitively represents complex and nested variation between the genomes and reveals that the SV landscape is dominated by large insertions, many exclusive to individual assemblies. The graph incorporates a substantial amount of extra sequence across seven species, the total size of which is 33.1% longer than that of a single cichlid genome. Approximately 4.73% to 9.86% of the assembly lengths are estimated as interspecies structural variation between cichlids, suggesting substantial genomic diversity underappreciated in SNP studies. Although coding regions remain highly conserved, our analysis uncovers a significant proportion of SV sequences as transposable element (TE) insertions, especially DNA, LINE, and LTR TEs. These findings underscore that the cichlid genome is shaped both by small-nucleotide mutations and large, TE-derived sequence alterations, both of which merit study to understand their interplay in cichlid evolution. [Abstract copyright: © 2025 Quah et al.; Published by Cold Spring Harbor Laboratory Press.]
AB - Pangenome methods have the potential to uncover hitherto undiscovered sequences missing from established reference genomes, making them useful to study evolutionary and speciation processes in diverse organisms. The cichlid fishes of the East African Rift Lakes represent one of nature's most phenotypically diverse vertebrate radiations, but single-nucleotide polymorphism (SNP)-based studies have revealed little sequence difference, with 0.1%-0.25% pairwise divergence between Lake Malawi species. These were based on aligning short reads to a single linear reference genome and ignored the contribution of larger-scale structural variants (SVs). We constructed a pangenome graph that integrates six new and two existing long-read genome assemblies of Lake Malawi haplochromine cichlids. This graph intuitively represents complex and nested variation between the genomes and reveals that the SV landscape is dominated by large insertions, many exclusive to individual assemblies. The graph incorporates a substantial amount of extra sequence across seven species, the total size of which is 33.1% longer than that of a single cichlid genome. Approximately 4.73% to 9.86% of the assembly lengths are estimated as interspecies structural variation between cichlids, suggesting substantial genomic diversity underappreciated in SNP studies. Although coding regions remain highly conserved, our analysis uncovers a significant proportion of SV sequences as transposable element (TE) insertions, especially DNA, LINE, and LTR TEs. These findings underscore that the cichlid genome is shaped both by small-nucleotide mutations and large, TE-derived sequence alterations, both of which merit study to understand their interplay in cichlid evolution. [Abstract copyright: © 2025 Quah et al.; Published by Cold Spring Harbor Laboratory Press.]
U2 - 10.1101/gr.279674.124
DO - 10.1101/gr.279674.124
M3 - Article
C2 - 40210437
SN - 1549-5469
VL - 35
SP - 1094
EP - 1107
JO - Genome research
JF - Genome research
ER -