The methylomes of Lake Malawi cichlids reveal epigenetic variation associated with phenotypic diversification

Epigenetic variation modulates gene expression and can be heritable. However, knowledge of the contribution of epigenetic variation to diversification and speciation in nature remains limited. Here, we present the first genome-wide methylome study in a large vertebrate evolutionary radiation, focussing on liver and muscle tissues in six genetically similar but eco-morphologically divergent cichlid fishes from Lake Malawi. In both tissues we find substantial methylome divergence in DNA sequences conserved between species and differentially methylated regions (DMR) are significantly enriched in recently active transposable elements. DMRs in the liver are associated with transcription changes of genes with hepatic functions, pointing to a link between dietary ecology and methylome divergence. Unexpectedly, DMRs shared across adult tissues are enriched in genes involved in embryonic and developmental processes, suggesting roles in early embryogenesis. Our study provides initial evidence for DNA methylation contributing to phenotypic diversification of cichlids, and represents an important resource for further work.


MAIN
Trait inheritance and phenotypic diversification are primarily explained by the transmission of genetic information encoded in the DNA sequence. In addition, a variety of epigenetic processes have recently been reported to mediate heritable transmission of phenotypes in animals and plants 1-6 . However, current understanding of the evolutionary significance of epigenetic processes, and of their roles in organismal diversification, is in its infancy.
DNA methylation, or the covalent addition of a methyl group onto the 5 th carbon of cytosine (mC) in DNA, is a reversible epigenetic mark present across multiple kingdoms [7][8][9] , can be heritable, and has been linked to transmission of acquired phenotypes in plants and animals 2,5,6,10-12 . The importance of this mechanism is underlined by the fact that proteins involved in the deposition of mC ('writers', DNMTs), in mC maintenance during cell division, and in the removal of mC ('erasers', TETs), are mostly essential and show high degrees of conservation across vertebrates species [13][14][15][16] . In addition, some ancestral functions of methylated cytosines are highly conserved, such as in the transcriptional silencing of exogenous genomic elements (transposons) 17,18 . In vertebrates, DNA methylation functions have evolved to play an important role in the orchestration of cell differentiation during normal embryogenesis/development through complex interactions with histone post-translational modifications (DNA accessibility) and mC-sensitive readers (such as transcription factors) [18][19][20][21][22][23][24] in particular at cis-regulatory regions (i.e. promoters, enhancers). Early-life establishment of stable DNA methylation patterns can thus affect transcriptional activity in the embryo and persist into fully differentiated cells 25 . DNA methylation variation has also been postulated to have evolved in the context of natural selection by promoting phenotypic plasticity and thus possibly facilitating adaptation, speciation and adaptive radiation 2,4,11,26 .
Studies in plants have revealed how covarying environmental factors and DNA methylation variation underlying stable and heritable transcriptional changes in adaptive traits 2,6,[10][11][12]27 . Some initial evidence is also present in vertebrates 2,5,[28][29][30] . In the cavefish for example, an early developmental process -eye degeneration -has been shown to be mediated by DNA methylation, suggesting mC variation as an evolutionary factor generating adaptive phenotypic plasticity during development and evolution 28,31 . However, whether correlations between environmental variation and DNA methylation patterns promote phenotypic diversification more widely among natural vertebrate populations remains unknown.
In this study we sought to quantify and characterise natural variability in DNA methylation in the context of the Lake Malawi haplochromine cichlid adaptive radiation, one of the most spectacular examples of rapid vertebrate phenotypic diversification 32 . In total, the radiation comprises over 800 endemic species 33 , that are estimated to have evolved from common ancestry approximately 800,000 years ago 34 . Species within the radiation can be grouped into seven distinct ecomorphological groups based on their ecology, morphology and genetic differences: (1) shallow benthic, (2) deep benthic, (3) deep pelagic zooplanktivorous/piscivorous Diplotaxodon, (4) the rock-dwelling 'mbuna', (5) zooplanktivorous 'utaka', (6) Astatotilapia calliptera specialised for shallow weedy habitats (also found in surrounding rivers and lakes), and (7) the midwater pelagic piscivores Rhamphochromis 35,36 .
Recent large-scale genetic studies have revealed that the Lake Malawi cichlid flock is characterised by an overall very low genetic divergence among species (0.1-0.25%), combined with a low mutation rate, a high rate of hybridisation and extensive incomplete lineage sorting (shared retention of ancestral genetic variation across species) 33,35,37,38 . Multiple molecular mechanisms may be at work to enable such an explosive phenotypic diversification. Therefore, investigating the epigenetic mechanisms in Lake Malawi cichlids represents a remarkable opportunity to expand our comprehension of the processes underlying adaptation, phenotypic diversification and speciation.
Here we describe, quantify and assess the divergence in methylomes in six cichlid species spanning five of the seven ecomorphological groups of the Lake Malawi haplochromine radiation. To this end, high-coverage whole-genome bisulfite sequencing (WGBS) and total RNA sequencing (RNAseq) from livers and muscle tissues were undertaken. We find that Lake Malawi haplochromine cichlids exhibit substantial methylome divergence, despite conserved underlying DNA sequences, and is enriched in evolutionary young transposable elements. Differential transcriptional activity is significantly associated with between-species methylome divergence, most prominently in genes involved in key hepatic metabolic functions. Furthermore, we show that a large fraction of methylome divergence between species pertains to embryonic and developmental processes, possibly contributing to early establishment of phenotypic diversity. This represents the first comparative analysis of natural methylome variation in Lake Malawi cichlids and provides initial evidence for a role of DNA methylation in adaptive phenotypic diversification in cichlids. Our study represents an important resource for future research in the context of adaptive diversification.

The methylomes of Lake Malawi cichlids feature conserved vertebrate characteristics
To characterise the methylome variation and assess possible functional relationships in natural populations of Lake Malawi cichlids, we performed high-coverage whole-genome sequencing of methylomes (WGBS) and total transcriptomes (RNAseq) simultaneously from liver and muscle tissues for wild-caught male specimens from each of six cichlid species (Fig. 1a-c, SUPP Fig. S1, SUPP Tables S1-2). The species selected were: Rhamphochromis longiceps (RL), a pelagic piscivore (Rhamphochromis group); Diplotaxodon limnothrissa (DL), a deep-water pelagic carnivore (Diplotaxodon group); Maylandia zebra (MZ) and Petrotilapia genalutea (PG), two rock-dwelling algae eaters (Mbuna group); Aulonocara stuartgranti (AS), a benthic invertebrate-eating sand/rock-dweller that is genetically part of the deep-benthic group; Astatotilapia calliptera (AC), a species of rivers and lake margins 39 (Fig. 1b).  On average, 285.51±55.6 million paired-end reads (See SUPP Table S1) for liver and muscle methylomes were generated with WGBS, yielding ~10-15x per-sample coverage at CG dinucleotide sites (SUPP Fig. S2a; see Methods and Supplementary notes). All sequenced reads were mapped to two different Lake Malawi cichlid genome assemblies in parallel (Maylandia zebra UMD2a for all analyses and Astatotilapia calliptera fAstCal1.2 for mapping comparison; see Methods) without significant mapping rate differences (SUPP Fig. S2e), reflecting the high level of conservation at the DNA sequence level across the Malawi radiation (SUPP Fig. S3a, b). In parallel, liver and muscle transcriptomes were generated for the same specimens as used for WGBS, yielding on average 11.9±0.7 million paired-end reads (SUPP Table S1 and Methods).
We first characterised global features of the methylome of Lake Malawi cichlids. The genome of Lake Malawi cichlid was found to have copies of DNA methyltransferases (DNMTs) and ten-eleven translocation methylcytosine dioxygenases (TETs), the 'readers' and 'erasers' of DNA methylation respectively (SUPP Fig. S4a Table S3). This is consistent with previous studies highlighting high methylation levels in bodies of active genes in plants and animals, and high levels of methylation at promoters of weakly expressed genes in vertebrates 7,23 . We conclude that the methylomes of Lake Malawi cichlids share many regulatory features, and possibly associated functions, with those of other vertebrates, which renders Lake Malawi cichlids a promising model system in this context.

Methylome divergence in Lake Malawi cichlids
To assess the possible role of DNA methylation in phenotypic diversification, we then sought to quantify and characterise the variation in liver and muscle methylomes across the genomes of Lake Malawi haplochromine cichlids. Despite overall very low sequence divergence 35 (SUPP Fig. S3a-b), Lake Malawi cichlids were found to show substantial methylome divergence across species within each tissue type, while within-species biological replicates always clustered together (Fig. 2a, SUPP   Fig. S8, Methods). The species relationships inferred by clustering of the liver methylomes at conserved individual CG dinucleotides closely recapitulate the genetic relationship inferred from DNA sequence 35 , with one exception -the methylome clusters A. calliptera samples as an outgroup, not a sister group to Mbuna. This is consistent with its unique position as a riverine species, while all species are obligate lake dweller ( Fig. 2a and SUPP Fig. S3a, b), and the same pattern was observed irrespective of whether the M. zebra or A. calliptera reference genome was used (SUPP Fig. S9).
As DNA methylation variation tends to correlate over genomic regions consisting of several neighbouring CG sites, we defined and sought to characterise differentially methylated regions (DMRs) among Lake Malawi cichlid species (≥50bp-long, ≥4 CG and ≥25% methylation difference across any pair of species, p<0.05; see Methods). In total, 13,190 between-species DMRs were found among the liver methylomes and 4,236 among the muscle methylomes. By contrast, 26,459 withinspecies DMRs were found in the between-tissue comparisons (SUPP Fig. S10a, b). Overall, DMRs in Lake Malawi cichlids were predicted to be as long as 5,000bp (95% CI of median size: 282-298bp).
While the methylation differences between liver and muscle were the most prominent at single GC dinucleotide resolution (Fig. 2a), and also resulted in the highest number of DMRs, we found DMRs to be slightly larger and methylation differences within them substantially stronger among species than between tissues (Dunn's test, p<2.2×10 -16 ; SUPP Fig. S10 c-d). Next, we characterised the genomic features enriched for between-species methylome divergence.
In the liver, promoter regions and orphan CGIs have 2.8-and 3.7-fold enrichment respectively for between-species liver DMRs over random expectation (χ 2 test, p<0.0001; Fig. 2b and SUPP Fig.   S10a, e). Methylome variation at promoter regions has been shown to affect transcription activity via a number of mechanisms (e.g. transcription factor binding affinity) 20,43 and, in this way, may participate in phenotypic adaptive diversification in Lake Malawi cichlids. In particular, genes with promoter DMRs show enrichment for enzymes involved in hepatic metabolic functions (Fig. 2c). Furthermore, the high enrichment of DMRs in intergenic orphan CGIs (less so in promoter CGIs; Fig. 2b), accounting for n=1,611 (12.2%) of total liver DMRs (SUPP Fig. S10a), suggests that intergenic CGIs may have DNA methylation-mediated regulatory functions.
The majority of between-species liver DMRs (66%, n=8,666) are within TE regions (TE-DMRs; SUPP  [44][45][46] . Indeed, movement of transposable elements has recently been shown to contribute to phenotypic diversification in Lake Malawi cichlids 47 . In contrast to the between-species liver DMRs, within-species DMRs based on comparison of liver against muscle methylomes show much less variation in enrichment across genomic features. Only gene bodies show weak enrichment for methylome variation. Moreover, both CGI classes, as well as repetitive and intergenic regions show considerable tissue-DMR depletion, suggesting a smaller DNA methylation-related contribution of these elements to tissue differentiation (Fig. 2b).

Methylation divergence is associated with changes in transcriptional activity of hepatic genes
We hypothesised that adaptation to different diets in Lake Malawi cichlids could be associated with distinct hepatic functions, manifesting as differences in transcriptional patterns which, in turn, could be influenced by divergent methylation patterns. To investigate this, we first performed differential gene expression analysis (Fig. 1c). In total, 3,437 genes were found to be differentially expressed between livers of the four Lake Malawi cichlid species investigated (RL, DL, MZ and PG; Walt test, false discovery rate adjusted p-value using Benjamini-Hochberg (FDR)<0.01; Fig. 3a   Next, we checked for association between liver DMRs and transcriptional change. Of the 5,798 among-species DMRs that could be assigned to a specific gene (i.e., DMRs within promoters, gene bodies or within 0.5-4kbp away from a gene; see Methods), 767 were associated with differentially expressed genes, which is greater than expected by chance (Fig. 3a, b; p<6.1×10 -5 ), suggesting that DMRs may affect liver gene expression. Of these 767 putative functional DMRs (pfDMRs), the majority (59%) are localised over gene bodies, equally distributed between exons and introns, hinting at possible intronic cis-regulatory elements 48 . The remaining pfDMRs are in promoters (27%) or intergenic (14%) (Fig. 3c). The majority of pfDMRs contain younger TE sequences, in particular in intronic regions, while only few contain CGIs (SUPP Fig. S12a, b). In promoters and intergenic regions, >75% of pfDMR sequences contain TEs (Fig. 3c).
Genomic regions containing pfDMRs are significantly enriched in genes involved in hepatic and metabolic oxidation-reduction processes ( Fig. 3d and SUPP Fig. S12c). These include genes encoding haem-containing cytochrome P450 enzymes (such as cyp3a4, cy7b1; SUPP Fig. S12c), which are important metabolic factors in steroid and fatty acid metabolism, as well as genes encoding other hepatic enzymes involved in energy balance processes. This enrichment is associated with significant methylome divergence among species, in particular in promoter regions and gene bodies (Fig.   3d). For example, the gene sulfurtransferase tstd1-like, an enzyme involved in energy balance and mitochondrial metabolism is expressed exclusively in the liver of the deep-water pelagic species D.
limnothrissa where it shows ~80% reduced methylation levels in a gene-body DMR compared to all the other species (Fig. 3e, h). Another example is the promoter of the enzyme carbonyl reductase [NADPH] 1 (cbr1) which shows significant hypomethylation (2.2kbp-long DMR) in the algae-eaters MZ and PG, associated with up to ~60-fold increased gene expression in their livers compared to the predatory Rhamphochromis and Diplotaxodon (Fig. 3f, i). Interestingly, cbr1 is involved in the metabolism of various fatty acids in the liver and has been associated with fatty acid-mediated cellular signalling in response to environmental perturbation 49 . As a final example, we highlight the cytotoxic effector perforin 1-like (prf1-like), an important player in liver-mediated energy balance and immune functions 50 . Its promoter is hypermethylated (>88% mC/C) exclusively in the liver of the deep-water species DL, while having low methylation levels (~25%) in the four other species (Fig.  3g). This gene is not expressed in DL livers but is highly expressed in the livers of the other species that all show low methylation in the promoter (Fig. 3j). Taken together, these results suggest that transcriptional remodelling associated with significant epigenetic divergence might participate in the adaptation to different diets.

Multi-tissue methylome divergence is enriched for early developmental processes
We further hypothesised that between-species DMRs that are found in both the liver and muscle methylomes could relate to functions associated with early development/embryogenesis. Given that liver is endoderm-derived and muscle is mesoderm-derived, such shared multi-tissue DMRs could be involved in processes that find their origins prior to or early in gastrulation. Such DMRs could also have been established early on during embryogenesis and may have core cellular functions. Therefore, we focussed on the three species for which we had methylomes from both tissues to explore the overlap between muscle and liver DMRs (Fig. 4a). Based on pairwise species comparisons (SUPP Fig. S13a-b), we identified 'species' DMRs showing a unique pattern of DNA methylation in one of the three species. We found that >42% of these were found in both tissues ('multi-tissue' DMRs), while >37% were liver-specific and only ~13% were muscle-specific (Fig. 4b). The relatively high proportion of multi-tissue DMRs suggests there may be extensive among-species divergence in core cellular or metabolic pathways. To investigate this further, we performed GO enrichment analysis. As expected, liver-specific DMRs are particularly enriched for hepatic metabolic functions, while muscle-specific DMRs are significantly associated with muscle-related functions, such as glycogen catabolic pathways (Fig. 4c). Multi-tissue DMRs, however, are significantly enriched for genes involved in development and embryonic processes, in particular related to cell differentiation and brain development (Fig. 4 c, e-g). In all the three species, multi-tissue DMRs are significantly longer on average (936±72bp vs 388±29 bp; Dunn's test, p<0.0001; SUPP Fig. S13c), are more often localised in promoter regions (SUPP Fig. S13d)  Several examples of multi-tissue DMRs are worth highlighting as generating hypotheses for potential future functional studies (Fig. 4e-g) 53 . The promoter of gap43 is largely devoid of methylation (overall <5% average mC/C levels across this 5.2kbp-long DMR) in both muscle and liver tissues of D. limnothrissa, while being highly methylated (>86%) in the other species (Fig. 4f). In A. calliptera, the transcription of gap43 is restricted to the brain and embryo (SUPP Fig. S13f), consistent with a role in neural development and in the adult brain. Finally, another multi-tissue DMR potentially involved in neural embryonic functions is located in the promoter region of the gene tenm2, coding for teneurin transmembrane protein (Fig. 4g). tenm2 is a gene expressed early on during zebrafish embryogenesis and is involved in neurodevelopment and neuron migration-related cell signalling 54 . This 2.7kbplong DMR is completely unmethylated in the algae-eating rock-dweller Petrotilapia genalutea (almost 80% reduction of methylation overall compared to the other species) and may mediate species specific adaptive phenotypic plasticity related to synapse formation and neuronal networks.

DISCUSSION
The molecular mechanisms underlying adaptive phenotypic diversification are subject of intense interest 33,35,37,55,56 and the extent of the role of epigenetic processes is hotly debated 2,4,57 . However, indepth molecular epigenetic studies remain rare in evolutionary genomics and its key model systems 2,4,28,57 . Here, we focussed on the genetically closely-related haplochromine cichlids of Lake Malawi, representing a unique system to investigate the epigenetic basis for phenotypic diversification 35,38,58 . Specifically, we describe genome-wide methylome variation at a single CG dinucleotide resolution as well as transcriptomes of two adult tissues of different embryonic origins in six ecomorphologically divergent species (Fig. 1b). This work represents the first study investigating epigenetic marks as a potential basis for diversification and adaptation in natural populations of cichlid fishes, and provides evidence of a role for DNA methylation in rewiring the transcriptional network and transposon element landscape in this context. Given the resemblances we found between cichlid methylomes and those of warm-blooded vertebrates ( Fig. 1. d, e), suggesting evolutionarily con- We found that that regions of methylome divergence between species (DMRs) were enriched in promoters and orphan CGIs (Fig. 2b). Methylation variation in promoter regions is known to have important cis-regulatory functions in vertebrates, in particular during development 19,20,23,28,30 . Such cisregulatory activity is also apparent in Lake Malawi cichlids, with methylation at promoters negatively correlated with transcriptional activity (Fig. 1e and SUPP Fig. S7a-d). This is likely mediated by the tight interaction of DNA methylation with 5mC-sensitive DNA-binding proteins, such as transcription factors 21 . On the other hand, the functional roles of orphan CGIs are less well understood 41 . However, orphan CGIs have by far the highest enrichment for species methylome divergence (4.4-fold over chance; Fig. 2b) -most of which are located in unannotated genomic regions. Orphan CGIs, as well as intergenic TEs (Fig. 2d), might include ectopic promoters, enhancers and other distal regulatory elements 40,41 that may participate in adaptive diversification. Such putative cis-regulatory regions could be validated against a full functional annotation of the genome of Lake Malawi cichlid, which is currently lacking.
We identified that in some species methylome divergence was significantly associated with differential liver transcriptome activity, especially pertaining to hepatic functions involved in steroid hormone and fatty acid metabolism (Fig. 3b, d, e-j). Consistent with a functional role of DNA methylation in cis-regulatory regions 20,43 , we revealed significant methylation divergence in the promoters of differentially transcribed genes involved in liver-mediated energy expenditure processes and metabolism, such as gene prf1-like (>60-fold increase in expression; Fig. 3g, j), associated with obesity in mouse 44 . Such a functional link may promote phenotypic diversification via adaptation to different diets. Our understanding of this would benefit from knowledge of the extent to which environmental or diet perturbation might result in adaptation-associated functional methylome changes. Additionally, the characterisation of the methylomes of Lake Malawi cichlid species from different ecomorphological groups but sharing the same habitat/diet, would inform on the specificity and possible functions of methylome divergence at metabolic genes.
TE and repetitive sequences present on average higher methylation levels than the genome-wide average (Fig. 1d), although some specific TE classes show more variable and lower levels (SUPP Fig.   S1e). DNA methylation-mediated transcriptional repression of mostly deleterious TE elements is crucial to the integrity of most eukaryote genomes, from plants to fish and mammals, and can be mediated in both animals and plants by small non-coding RNAs, such as piwi-interacting RNAs (piRNAs) in zebrafish and mammals 17,18,62 . Notably, the majority (~60%) of species differences in methylation patterns associated with transcriptional changes in liver was localised in evolutionary young transposon/repeat regions ( Fig. 3c and SUPP Fig. S12a). Even though most of TE activity is under tight cellular control to ensure genome stability, transposition events have also been associated with genome evolution and phenotypic diversification. Indeed, TE insertion may represent a source of functional genomic variation and novel cis-regulatory elements, underlying altered transcriptional network 44,46,47,63 . In haplochromine cichlids, variation in anal fin egg-spots patterns associated with courtship behaviour, has been linked to a novel cis-regulatory element, derived from TE sequences 45 .
Additionally, Brawand and colleagues have revealed that most TE insertions near genes in East African cichlids were associated with altered gene expression patterns 37 . Moreover, genes in piRNArelated pathways have been reported to be under positive selection in Lake Malawi cichlid flock, in line with a fast evolving TE sequence landscape observed in cichlids 35 , and these genes may also be associated with TE-related methylome variation, similar to Arabidopsis 10,64 .
Not only can novel TE insertions participate in genome evolution, DNA methylation at TE-derived cis-regulatory elements has been shown to affect transcriptional activity of nearby genes 11,44 . In rodents, the insertion of one IAP (intra-cisternal A particle) retrotransposon in the upstream cis-regulatory region of the agouti gene is associated with considerable phenotypic variation of coat colours and metabolic changes. Differential methylation levels at this TE-derived ectopic promoter directly impacts the activity of the agouti gene 5,27 , and such epigenetic patterns of methylation are transmitted to the offspring along with the altered phenotypes in a non-genetic manner 2 . Similarly, in toadflax, the flower symmetry is associated with the variable and heritable methylation patterns in the TEderived promoter of the Lcyc gene, resulting in symmetrical or asymmetrical flowers 6 . Also, in a population-scale study of more than a thousand natural Arabidopsis accessions, epigenetic variation was found to be associated with phenotypes, mostly arising from methylation-mediated TE silencing that was significantly associated with altered transcription of adaptive genes such as those determining flowering time 10,64 . Our work adds to this by providing further evidence that interactions between TE sequences and between-species methylome divergence lead to altered transcriptional networks. This lays groundwork for further investigation of this issue in cichlid fishes.
Finally, we revealed that between-species methylome differences in liver tissues were greater than differences between muscle tissues (Fig. 4b), possibly highlighting a higher dependence of hepatic functions on natural epigenetic divergence. This indicates that a significant portion of the betweenspecies methylome divergence in liver may contribute to adaptive phenotypic divergence, in particular by affecting the genes involved in tissue-specific functions, such as hepatic metabolic processes (Fig. 3c, e-j). However, almost half of the methylome divergence we observed that was driven by a single species was consistently found in both liver and muscle (Fig. 4b). This multi-tissue methylome divergence is consistent with epigenetic influences on core cellular functions, and may also be relevant to early-life biological processes such as development, cellular differentiation and embryogenesis (Fig. 4c, d-g). For example, we identified a large hypomethylated region in the visual homeobox gene vsx2 in both liver and muscle tissues in the deep-water Diplotaxodon (Fig. 4e). This gene is involved in eye differentiation and may participate in long-lasting visual adaptive phenotypic divergences required to populate dimly parts of the lake, similar to the DNA methylation-mediated adaptive eye degeneration in cavefish 28 . Notably, recent studies have highlighted signatures of positive selection and functional substitutions in genes related to visual traits in D. limnothrissa 35,52 . If multitissue methylome divergence has been established very early during differentiation, and has important regulatory functions pertaining to early developmental stages 25 and possibly core cellular functions, then it may promote long-lasting phenotypic divergence unique to each species' adaption.
Our observations suggest further characterisation of the methylomes and transcriptomes in the different cells of developing embryos may be valuable, to investigate when between-species methylome divergence is established, as well as any functional roles in early-life phenotypic diversification.
To conclude, recent large-scale genomic studies have highlighted that several mechanisms may participate in the phenotypic diversification of Lake Malawi haplochromine cichlids, such as hybridisation and incomplete lineage sorting 33,35,58,65 . Our study adds to these observations by providing initial evidence of a possible role for DNA methylation in adaptation and species divergence. Altogether, we have demonstrated substantial divergence in methylation patterns and transcriptomes among closely related Lake Malawi cichlid fish species representing distinct ecomorphological groups, despite low levels of genome sequence differentiation. This raises the possibility that variation in methylation patterns can play a major role in phenotypic divergence in these rapidly evolving species. Further work is required to elucidate the extent to which this might result from plastic responses to the environment in addition to any basis in sequence divergence. This study represents the first epigenomic study investigating natural methylome variation in the context of phenotypic diversification in genetically similar but ecomorphologically divergent cichlid species part of a large vertebrate radiation and provides an important resource for further work.

Overview sampling
All Malawi cichlid fish were collected with local fishermen by G. F. Turner SUPP Table S1.

Additional statistics
Kruskal-Wallis H and Dunn's multiple comparisons tests (Benjamini-Hochberg correction unless otherwise specified) were performed using FSA (v0.8.25). Box plots represent interquartile range (IQR) with minimal and maximal values (black lines) and outliers (black circles).

Genome-wide annotations
The reference genome of M. zebra (UMD2a; NCBI genome build: GCF_000238955.4 and NCBI annotation release 104) was used to generate all annotations. Custom annotation files were generated and defined as follows: promoter regions, TSS±500 bp unless otherwise indicated; gene bodies included both exons and introns and other intronic regions, and excluded the first 500bp regions downstream of TSS to avoid any overlap with promoter regions; transposable elements and repetitive elements were modelled and annotated, as well as their sequence divergence analysed using RepeatModeler (v1.0.11) and RepeatMasker (v4.0.9.p2), respectively. Intergenic regions were defined as genomic regions more than 5kbp from genes. CpG-rich regions, or CpG islands (CGI), were predicted and annotated using makeCGI 68 . The following genomes were used to compare genomic CG contents across different organisms (SUPP Fig. S5a musculus, GRCm38.p5), tammar wallaby (N. eugenii, Meug1.1). pfDMRs and TE/repeat elements were assigned to a gene when they were located within gene bodies (from 0.5kbp downstream TSS), within promoter regions (TSS±500bp) and in the vicinity of genes (within 0.5-4kbp away from genes).

Enrichment analysis
Enrichment analysis was calculated by shuffling each type of DMRs (liver, muscle, tissue) across the M.zebra UMD2a genome (accounting for the number of DMRs and length; 1000 iterations). The expected values were determined by intersecting shuffled DMRs with each genomic category (1000 iterations). Chi-square tests were then performed for each O/E distribution. The same process was performed for TE enrichment analysis.

Gene Ontology enrichment analysis
All GO enrichments analyses were performed using g:Profiler (https://biit.cs.ut.ee/gprofiler/gost; version Sept 2020). Only annotated genes for Maylandia zebra were used with a statistical cut-off of FDR<0.05 (unless otherwise specified).

RNAseq
Total RNA extraction and NGS library for long (>200nt) RNA sequencing In brief, for each species, three biological replicates of liver and muscle tissues were used to sequence total RNA (see SUPP Fig. S1 for a summary of the method and SUPP Table S2 for sampling size). The same specimens were used for both RNAseq and WGBS.
RNAseq libraries for both liver and muscle tissues were prepared using ~5-10mg of RNAlater-preserved homogenised liver and muscle tissues. Total RNA was isolated using a phenol/chloroform method following manufacturer's instructions (TRIzol, ThermoFisher). RNA samples were treated with DNase (TURBO DNase, Ther-moFisher) to remove any DNA contamination. Quality and quantity of total RNA extracts were determined using NanoDrop spectrophotometer (ThermoFisher), Qubit (ThermoFisher) and BioAnalyser (Agilent). Following ribosomal RNA depletion (RiboZero, Illumina), stranded rRNA-depleted RNA libraries (Illumina) were prepped according to the manufacturer's instructions and sequenced (paired-end 75bp-long reads) on HiSeq2500 V4 (Illumina) by the sequencing facility of the Wellcome Sanger Institute.
Published RNAseq data for all the tissues of A. calliptera sp. Itupi was taken from 35 .
A Spearman's rank correlation matrix was produced with R function cor, and the gene expression matrix to assess transcription variation across samples. Graphs, unsupervised clustering and heatmaps were produced with R packages ggplot2 (v3. 3

ACKNOWLEDGMENTS:
We would like to thank S.M. Grant's diving team for collecting some of the fish specimens, as well as the Fisheries Research Unit of the Government of Malawi, and the Tanzania Fisheries Research Institute, for their assistance and support. We would like to thank the staff at the Gurdon Institute and the sequencing facilities at CRUK Cambridge Institute, Gurdon and Sanger Institutes for their expertise and support. We would like to thank the members of the Miska Lab as well as the Cambridge Cichlid community for fruitful discussions. We thank Navin B. Ramakrishna for critical comments on the manuscripts, as well as David Jordan, Tomás di Domenico and Konrad LM Rudolph for their support with data analysis. We are grateful to Ole Seehausen and Marcel Häsler  Table S1 -S3