Publications

2016
Kleinstiver, B.P., et al. Genome-wide specificities of CRISPR-Cas Cpf1 nucleases in human cells. Nat Biotechnol 34, 8, 869-74 (2016).Abstract
The activities and genome-wide specificities of CRISPR-Cas Cpf1 nucleases are not well defined. We show that two Cpf1 nucleases from Acidaminococcus sp. BV3L6 and Lachnospiraceae bacterium ND2006 (AsCpf1 and LbCpf1, respectively) have on-target efficiencies in human cells comparable with those of the widely used Streptococcus pyogenes Cas9 (SpCas9). We also report that four to six bases at the 3' end of the short CRISPR RNA (crRNA) used to program Cpf1 nucleases are insensitive to single base mismatches, but that many of the other bases in this region of the crRNA are highly sensitive to single or double substitutions. Using GUIDE-seq and targeted deep sequencing analyses performed with both Cpf1 nucleases, we were unable to detect off-target cleavage for more than half of 20 different crRNAs. Our results suggest that AsCpf1 and LbCpf1 are highly specific in human cells.
Perin, J., Fischer Walker, C.L., Black, R.E. & Aryee, M.J. Meta-Analysis With a Continuous Covariate That Is Differentially Categorized Across Studies. Am J Epidemiol 183, 5, 507-14 (2016).Abstract
We propose taking advantage of methodology for missing data to estimate relationships and adjust outcomes in a meta-analysis where a continuous covariate is differentially categorized across studies. The proposed method incorporates all available data in an implementation of the expectation-maximization algorithm. We use simulations to demonstrate that the proposed method eliminates bias that would arise by ignoring a covariate and generalizes the meta-analytical approach for incorporating covariates that are not uniformly categorized. The proposed method is illustrated in an application for estimating diarrhea incidence in children aged ≤59 months.
Tsai, S.Q., Topkar, V.V., Joung, K.J. & Aryee, M.J. Open-source guideseq software for analysis of GUIDE-seq data. Nat Biotechnol 34, 5, 483 (2016).
2015
Kleinstiver, B.P., et al. Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature 523, 7561, 481-5 (2015).Abstract
Although CRISPR-Cas9 nucleases are widely used for genome editing, the range of sequences that Cas9 can recognize is constrained by the need for a specific protospacer adjacent motif (PAM). As a result, it can often be difficult to target double-stranded breaks (DSBs) with the precision that is necessary for various genome-editing applications. The ability to engineer Cas9 derivatives with purposefully altered PAM specificities would address this limitation. Here we show that the commonly used Streptococcus pyogenes Cas9 (SpCas9) can be modified to recognize alternative PAM sequences using structural information, bacterial selection-based directed evolution, and combinatorial design. These altered PAM specificity variants enable robust editing of endogenous gene sites in zebrafish and human cells not currently targetable by wild-type SpCas9, and their genome-wide specificities are comparable to wild-type SpCas9 as judged by GUIDE-seq analysis. In addition, we identify and characterize another SpCas9 variant that exhibits improved specificity in human cells, possessing better discrimination against off-target sites with non-canonical NAG and NGA PAMs and/or mismatched spacers. We also find that two smaller-size Cas9 orthologues, Streptococcus thermophilus Cas9 (St1Cas9) and Staphylococcus aureus Cas9 (SaCas9), function efficiently in the bacterial selection systems and in human cells, suggesting that our engineering strategies could be extended to Cas9s from other species. Our findings provide broadly useful SpCas9 variants and, more importantly, establish the feasibility of engineering a wide range of Cas9s with altered and improved PAM specificities.
Coverage recommendations for methylation analysis by whole-genome bisulfite sequencing.
Ziller, M.J., Hansen, K.D., Meissner, A. & Aryee, M.J. Coverage recommendations for methylation analysis by whole-genome bisulfite sequencing. Nat Methods 12, 3, 230-232 (2015). Publisher's VersionAbstract

Whole-genome bisulfite sequencing (WGBS) allows genome-wide DNA methylation profiling, but the associated high sequencing costs continue to limit its widespread application. We used several high-coverage reference data sets to experimentally determine minimal sequencing requirements. We present data-derived recommendations for minimum sequencing depth for WGBS libraries, highlight what is gained with increasing coverage and discuss the trade-off between sequencing depth and number of assayed replicates.

GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases.
Tsai, S.Q., et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat Biotechnol (2015).Abstract

CRISPR RNA-guided nucleases (RGNs) are widely used genome-editing reagents, but methods to delineate their genome-wide, off-target cleavage activities have been lacking. Here we describe an approach for global detection of DNA double-stranded breaks (DSBs) introduced by RGNs and potentially other nucleases. This method, called genome-wide, unbiased identification of DSBs enabled by sequencing (GUIDE-seq), relies on capture of double-stranded oligodeoxynucleotides into DSBs. Application of GUIDE-seq to 13 RGNs in two human cell lines revealed wide variability in RGN off-target activities and unappreciated characteristics of off-target sequences. The majority of identified sites were not detected by existing computational methods or chromatin immunoprecipitation sequencing (ChIP-seq). GUIDE-seq also identified RGN-independent genomic breakpoint 'hotspots'. Finally, GUIDE-seq revealed that truncated guide RNAs exhibit substantially reduced RGN-induced, off-target DSBs. Our experiments define the most rigorous framework for genome-wide identification of RGN off-target effects to date and provide a method for evaluating the safety of these nucleases before clinical use.

2014
EWS-FLI1 Utilizes Divergent Chromatin Remodeling Mechanisms to Directly Activate or Repress Enhancer Elements in Ewing Sarcoma.
Riggi, N., et al. EWS-FLI1 Utilizes Divergent Chromatin Remodeling Mechanisms to Directly Activate or Repress Enhancer Elements in Ewing Sarcoma. Cancer Cell 26, 5, 668-81 (2014).Abstract

The aberrant transcription factor EWS-FLI1 drives Ewing sarcoma, but its molecular function is not completely understood. We find that EWS-FLI1 reprograms gene regulatory circuits in Ewing sarcoma by directly inducing or repressing enhancers. At GGAA repeat elements, which lack evolutionary conservation and regulatory potential in other cell types, EWS-FLI1 multimers induce chromatin opening and create de novo enhancers that physically interact with target promoters. Conversely, EWS-FLI1 inactivates conserved enhancers containing canonical ETS motifs by displacing wild-type ETS transcription factors. These divergent chromatin-remodeling patterns repress tumor suppressors and mesenchymal lineage regulators while activating oncogenes and potential therapeutic targets, such as the kinase VRK1. Our findings demonstrate how EWS-FLI1 establishes an oncogenic regulatory program governing both tumor survival and differentiation.

Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays.
Aryee, M.J., et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics (2014).Abstract

MOTIVATION: The recently released Infinium HumanMethylation450 array (the '450k' array) provides a high-throughput assay to quantify DNA methylation (DNAm) at ∼450 000 loci across a range of genomic features. Although less comprehensive than high-throughput sequencing-based techniques, this product is more cost-effective and promises to be the most widely used DNAm high-throughput measurement technology over the next several years. RESULTS: Here we describe a suite of computational tools that incorporate state-of-the-art statistical techniques for the analysis of DNAm data. The software is structured to easily adapt to future versions of the technology. We include methods for preprocessing, quality assessment and detection of differentially methylated regions from the kilobase to the megabase scale. We show how our software provides a powerful and flexible development platform for future methods. We also illustrate how our methods empower the technology to make discoveries previously thought to be possible only with sequencing-based methods. AVAILABILITY AND IMPLEMENTATION: http://bioconductor.org/packages/release/bioc/html/minfi.html. CONTACT: khansen@jhsph.edu; rafa@jimmy.harvard.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Tsai, S.Q., et al. Dimeric CRISPR RNA-guided FokI nucleases for highly specific genome editing. Nat Biotechnol (2014).Abstract
Monomeric CRISPR-Cas9 nucleases are widely used for targeted genome editing but can induce unwanted off-target mutations with high frequencies. Here we describe dimeric RNA-guided FokI nucleases (RFNs) that can recognize extended sequences and edit endogenous genes with high efficiencies in human cells. RFN cleavage activity depends strictly on the binding of two guide RNAs (gRNAs) to DNA with a defined spacing and orientation substantially reducing the likelihood that a suitable target site will occur more than once in the genome and therefore improving specificities relative to wild-type Cas9 monomers. RFNs guided by a single gRNA generally induce lower levels of unwanted mutations than matched monomeric Cas9 nickases. In addition, we describe a simple method for expressing multiple gRNAs bearing any 5' end nucleotide, which gives dimeric RFNs a broad targeting range. RFNs combine the ease of RNA-based targeting with the specificity enhancement inherent to dimerization and are likely to be useful in applications that require highly precise genome editing.
Zou, J., Lippert, C., Heckerman, D., Aryee, M. & Listgarten, J. Epigenome-wide association studies without the need for cell-type composition. Nat Methods 11, 3, 309-11 (2014).Abstract
In epigenome-wide association studies, cell-type composition often differs between cases and controls, yielding associations that simply tag cell type rather than reveal fundamental biology. Current solutions require actual or estimated cell-type composition-information not easily obtainable for many samples of interest. We propose a method, FaST-LMM-EWASher, that automatically corrects for cell-type composition without the need for explicit knowledge of it, and then validate our method by comparison with the state-of-the-art approach. Corresponding software is available from http://www.microsoft.com/science/.
Liu, Y., et al. GeMes, Clusters of DNA Methylation under Genetic Control, Can Inform Genetic and Epigenetic Analysis of Disease. Am J Hum Genet 94, 4, 485-95 (2014).Abstract
Epigenetic marks such as DNA methylation have generated great interest in the study of human disease. However, studies of DNA methylation have not established population-epigenetics principles to guide design, efficient statistics, or interpretation. Here, we show that the clustering of correlated DNA methylation at CpGs was similar to that of linkage-disequilibrium (LD) correlation in genetic SNP variation but for much shorter distances. Some clustering of methylated CpGs appeared to be genetically driven. Further, a set of correlated methylated CpGs related to a single SNP-based LD block was not always physically contiguous-segments of uncorrelated methylation as long as 300 kb could be interspersed in the cluster. Thus, we denoted these sets of correlated CpGs as GeMes, defined as potentially noncontiguous methylation clusters under the control of one or more methylation quantitative trait loci. This type of correlated methylation structure has implications for both biological functions of DNA methylation and for the design, analysis, and interpretation of epigenome-wide association studies.
2013
Aryee, M.J., et al. DNA methylation alterations exhibit intraindividual stability and interindividual heterogeneity in prostate cancer metastases. Sci Transl Med 5, 169, 169ra10 (2013).Abstract
Human cancers almost ubiquitously harbor epigenetic alterations. Although such alterations in epigenetic marks, including DNA methylation, are potentially heritable, they can also be dynamically altered. Given this potential for plasticity, the degree to which epigenetic changes can be subject to selection and act as drivers of neoplasia has been questioned. We carried out genome-scale analyses of DNA methylation alterations in lethal metastatic prostate cancer and created DNA methylation "cityscape" plots to visualize these complex data. We show that somatic DNA methylation alterations, despite showing marked interindividual heterogeneity among men with lethal metastatic prostate cancer, were maintained across all metastases within the same individual. The overall extent of maintenance in DNA methylation changes was comparable to that of genetic copy number alterations. Regions that were frequently hypermethylated across individuals were markedly enriched for cancer- and development/differentiation-related genes. Additionally, regions exhibiting high consistency of hypermethylation across metastases within individuals, even if variably hypermethylated across individuals, showed enrichment for cancer-related genes. Whereas some regions showed intraindividual metastatic tumor heterogeneity in promoter methylation, such methylation alterations were generally not correlated with gene expression. This was despite a general tendency for promoter methylation patterns to be strongly correlated with gene expression, particularly at regions that were variably methylated across individuals. These findings suggest that DNA methylation alterations have the potential for producing selectable driver events in carcinogenesis and disease progression and highlight the possibility of targeting such epigenome alterations for development of longitudinal markers and therapeutic strategies.
Liu, Y.*, et al. Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in rheumatoid arthritis. Nat Biotechnol (2013).Abstract

Epigenetic mechanisms integrate genetic and environmental causes of disease, but comprehensive genome-wide analyses of epigenetic modifications have not yet demonstrated robust association with common diseases. Using Illumina HumanMethylation450 arrays on 354 anti-citrullinated protein antibody-associated rheumatoid arthritis cases and 337 controls, we identified two clusters within the major histocompatibility complex (MHC) region whose differential methylation potentially mediates genetic risk for rheumatoid arthritis. To reduce confounding factors that have hampered previous epigenome-wide studies, we corrected for cellular heterogeneity by estimating and adjusting for cell-type proportions in our blood-derived DNA samples and used mediation analysis to filter out associations likely to be a consequence of disease. Four CpGs also showed an association between genotype and variance of methylation. The associations for both clusters replicated at least one CpG (P < 0.01), with the rest showing suggestive association, in monocyte cell fractions in an independent cohort of 12 cases and 12 controls. Thus, DNA methylation is a potential mediator of genetic risk.

2012
Fischer Walker, C.L., Perin, J., Aryee, M.J., Boschi-Pinto, C. & Black, R.E. Diarrhea incidence in low- and middle-income countries in 1990 and 2010: a systematic review. BMC Public Health 12, 220 (2012).Abstract
BACKGROUND: Diarrhea is recognized as a leading cause of morbidity and mortality among children under 5 years of age in low- and middle-income countries yet updated estimates of diarrhea incidence by age for these countries are greatly needed. We conducted a systematic literature review to identify cohort studies that sought to quantify diarrhea incidence among any age group of children 0-59 mo of age. METHODS: We used the Expectation-Maximization algorithm as a part of a two-stage regression model to handle diverse age data and overall incidence rate variation by study to generate country specific incidence rates for low- and middle-income countries for 1990 and 2010. We then calculated regional incidence rates and uncertainty ranges using the bootstrap method, and estimated the total number of episodes for children 0-59 mo of age in 1990 and 2010. RESULTS: We estimate that incidence has declined from 3.4 episodes/child year in 1990 to 2.9 episodes/child year in 2010. As was the case previously, incidence rates are highest among infants 6-11 mo of age; 4.5 episodes/child year in 2010. Among these 139 countries there were nearly 1.9 billion episodes of childhood diarrhea in 1990 and nearly 1.7 billion episodes in 2010. CONCLUSIONS: Although our results indicate that diarrhea incidence rates may be declining slightly, the total burden on the health of each child due to multiple episodes per year is tremendous and additional funds are needed to improve both prevention and treatment practices in low- and middle-income countries.
Easwaran, H.*, et al. A DNA hypermethylation module for the stem/progenitor cell signature of cancer. Genome Res 22, 5, 837-49 (2012).Abstract

Many DNA-hypermethylated cancer genes are occupied by the Polycomb (PcG) repressor complex in embryonic stem cells (ESCs). Their prevalence in the full spectrum of cancers, the exact context of chromatin involved, and their status in adult cell renewal systems are unknown. Using a genome-wide analysis, we demonstrate that ~75% of hypermethylated genes are marked by PcG in the context of bivalent chromatin in both ESCs and adult stem/progenitor cells. A large number of these genes are key developmental regulators, and a subset, which we call the "DNA hypermethylation module," comprises a portion of the PcG target genes that are down-regulated in cancer. Genes with bivalent chromatin have a low, poised gene transcription state that has been shown to maintain stemness and self-renewal in normal stem cells. However, when DNA-hypermethylated in tumors, we find that these genes are further repressed. We also show that the methylation status of these genes can cluster important subtypes of colon and breast cancers. By evaluating the subsets of genes that are methylated in different cancers with consideration of their chromatin status in ESCs, we provide evidence that DNA hypermethylation preferentially targets the subset of PcG genes that are developmental regulators, and this may contribute to the stem-like state of cancer. Additionally, the capacity for global methylation profiling to cluster tumors by phenotype may have important implications for further refining tumor behavior patterns that may ultimately aid therapeutic interventions.

Lee, H., et al. DNA methylation shows genome-wide association of NFIX, RAPGEF2 and MSRB3 with gestational age at birth. Int J Epidemiol 41, 1, 188-99 (2012).Abstract
BACKGROUND: Gestational age at birth strongly predicts neonatal, adolescent and adult morbidity and mortality through mostly unknown mechanisms. Identification of specific genes that are undergoing regulatory change prior to birth, such as through changes in DNA methylation, would increase our understanding of developmental changes occurring during the third trimester and consequences of pre-term birth (PTB). METHODS: We performed a genome-wide analysis of DNA methylation (using microarrays, specifically CHARM 2.0) in 141 newborns collected in Baltimore, MD, using novel statistical methodology to identify genomic regions associated with gestational age at birth. Bisulphite pyrosequencing was used to validate significant differentially methylated regions (DMRs), and real-time PCR was performed to assess functional significance of differential methylation in a subset of newborns. RESULTS: We identified three DMRs at genome-wide significance levels adjacent to the NFIX, RAPGEF2 and MSRB3 genes. All three regions were validated by pyrosequencing, and RAGPEF2 also showed an inverse correlation between DNA methylation levels and gene expression levels. Although the three DMRs appear very dynamic with gestational age in our newborn sample, adult DNA methylation levels at these regions are stable and of equal or greater magnitude than the oldest neonate, directionally consistent with the gestational age results. CONCLUSIONS: We have identified three differentially methylated regions associated with gestational age at birth. All three nearby genes play important roles in the development of several organs, including skeletal muscle, brain and haematopoietic system. Therefore, they may provide initial insight into the basis of PTB's negative health outcomes. The genome-wide custom DNA methylation array technology and novel statistical methods employed in this study could constitute a model for epidemiologic studies of epigenetic variation.
Walker, C.F.L., Aryee, M.J., Boschi-Pinto, C. & Black, R.E. Estimating diarrhea mortality among young children in low and middle income countries. PLoS One 7, 1, e29151 (2012).Abstract
BACKGROUND: Diarrhea remains one of the leading causes of morbidity and mortality among children under 5 years of age, but in many low and middle-income countries where vital registration data are lacking, updated estimates with regard to the proportion of deaths attributable to diarrhea are needed. METHODS: We conducted a systematic literature review to identify studies reporting diarrhea proportionate mortality for children 1-59 mo of age published between 1980 and 2009. Using the published proportionate mortality estimates and country level covariates we constructed a logistic regression model to estimate country and regional level proportionate mortality and estimated uncertainty bounds using Monte-Carlo simulations. FINDINGS: We identified more than 90 verbal autopsy studies from around the world to contribute data to a single-cause model. We estimated diarrhea proportionate mortality for 84 countries in 6 regions and found diarrhea to account for between 10.0% of deaths in the Americas to 31.3% of deaths in the South-east Asian region. DISCUSSION: Diarrhea remains a leading cause of death for children 1-59 mo of age. Published literature can be used to create a single-cause mortality disease model to estimate mortality for countries lacking vital registration data.
Mathias, D.K., et al. Expression, immunogenicity, histopathology, and potency of a mosquito-based malaria transmission-blocking recombinant vaccine. Infect Immun 80, 4, 1606-14 (2012).Abstract
Vaccines have been at the forefront of global research efforts to combat malaria, yet despite several vaccine candidates, this goal has yet to be realized. A potentially effective approach to disrupting the spread of malaria is the use of transmission-blocking vaccines (TBV), which prevent the development of malarial parasites within their mosquito vector, thereby abrogating the cascade of secondary infections in humans. Since malaria is transmitted to human hosts by the bite of an obligate insect vector, mosquito species in the genus Anopheles, targeting mosquito midgut antigens that serve as ligands for Plasmodium parasites represents a promising approach to breaking the transmission cycle. The midgut-specific anopheline alanyl aminopeptidase N (AnAPN1) is highly conserved across Anopheles vectors and is a putative ligand for Plasmodium ookinete invasion. We have developed a scalable, high-yield Escherichia coli expression and purification platform for the recombinant AnAPN1 TBV antigen and report on its marked vaccine potency and immunogenicity, its capacity for eliciting transmission-blocking antibodies, and its apparent lack of immunization-associated histopathologies in a small-animal model.
Sabunciyan, S.*, et al. Genome-wide DNA methylation scan in major depressive disorder. PLoS One 7, 4, e34451 (2012).Abstract

While genome-wide association studies are ongoing to identify sequence variation influencing susceptibility to major depressive disorder (MDD), epigenetic marks, such as DNA methylation, which can be influenced by environment, might also play a role. Here we present the first genome-wide DNA methylation (DNAm) scan in MDD. We compared 39 postmortem frontal cortex MDD samples to 26 controls. DNA was hybridized to our Comprehensive High-throughput Arrays for Relative Methylation (CHARM) platform, covering 3.5 million CpGs. CHARM identified 224 candidate regions with DNAm differences >10%. These regions are highly enriched for neuronal growth and development genes. Ten of 17 regions for which validation was attempted showed true DNAm differences; the greatest were in PRIMA1, with 12-15% increased DNAm in MDD (p = 0.0002-0.0003), and a concomitant decrease in gene expression. These results must be considered pilot data, however, as we could only test replication in a small number of additional brain samples (n = 16), which showed no significant difference in PRIMA1. Because PRIMA1 anchors acetylcholinesterase in neuronal membranes, decreased expression could result in decreased enzyme function and increased cholinergic transmission, consistent with a role in MDD. We observed decreased immunoreactivity for acetylcholinesterase in MDD brain with increased PRIMA1 DNAm, non-significant at p = 0.08.While we cannot draw firm conclusions about PRIMA1 DNAm in MDD, the involvement of neuronal development genes across the set showing differential methylation suggests a role for epigenetics in the illness. Further studies using limbic system brain regions might shed additional light on this role.

Chittenden, T.W., et al. nEASE: a method for gene ontology subclassification of high-throughput gene expression data. Bioinformatics 28, 5, 726-8 (2012).Abstract
High-throughput technologies can identify genes whose expression profiles correlate with specific phenotypes; however, placing these genes into a biological context remains challenging. To help address this issue, we developed nested Expression Analysis Systematic Explorer (nEASE). nEASE complements traditional gene ontology enrichment approaches by determining statistically enriched gene ontology subterms within a list of genes based on co-annotation. Here, we overview an open-source software version of the nEASE algorithm. nEASE can be used either stand-alone or as part of a pathway discovery pipeline. AVAILABILITY: nEASE is implemented within the Multiple Experiment Viewer software package available at http://www.tm4.org/mev. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Pages