Contact: Caleb Lareau (Teaching Associate)
Please submit all completed homeworks as a
Exploring R and GWAS
Due: Wednesday, November 1
Code hint for working with Problems 1 and 2:
library(qqman) library(tidyr) library(magnittr) # Import summary statistics gwas <- data.frame(data.table::fread(paste0("zcat < ","../data/jointGwasMc_LDL.txt.gz"), showProgress = FALSE)) # Get chromosome / basepair information gwas <- gwas %>% separate(SNP_hg19, c("CHR", "BP"), ":") # Cleanup the raw data gwas$CHR <- as.numeric(as.character(gsub("chr", "", gwas$CHR))) gwas$BP <- as.numeric(gwas$BP) gwas <- gwas[gwas$P.value > 10^-100,] # Make plot and save it to disk rather than the R graphics output png("mymanhattan.png") manhattan(gwas, p = "P.value") dev.off()
For Windows users wanting to use
Visit this site and consider downloading the toolkit:
Pedigrees and Hardy-Weinberg Equilibrium
Due: Monday, November 6
- Make the rare disease assumption. In other words, if an individual is present in a pedigree and not part of the core pedigree assume that he/she is not a carrier for the trait. - The last problem, consider the setting where we are sampling individuals as adults and measuring their genotypes. Would HWE hold? Why or why not?
Genome-wide association studies
FOR THE HOMEWORK:
PC <- prcomp(t(geno)) allDF <- data.frame(ancestry = pheno$ancestry, gender = pheno$gender, Y = pheno$Y, PC$rotation[,1:4])
library(irlba) PC <- irlba(t(geno)) allDF <- data.frame(ancestry = pheno$ancestry, gender = pheno$gender, Y = pheno$Y, PC$v[,1:4])
Due: Monday, November 13
Data for Homework 3:
Description: An n x m matrix of genotypes. Genotypes are coded as 0, 1, 2, or NA for missing. Each row provides an individual’s genotype across all m genetic markers. Each column provides the genotypes of all n individuals at a single marker.
Description: Chromosome, position, and alleles for each SNP. Here, the ith SNP corresponds to the ith column in genotype.txt. Alleles under “X1” are the affect allele. For example, if (X0, X1)=(C,T), then 0, 1, and 2 in genotype.txt correspond to CC, TC, and TT, respectively.
Description: Primary outcome (1=case,0=control), gender (F=female, M=male), and reported ancestry (European, African) for each individual. Here, the ith individual corresponds to the ith row in genotype.txt.
Rare variant analysis
Due: Monday, November 27
For this assignment, we have a slight change-of-pace where we will review some recent literature that incorporate principles of statistical genetics applied to understand the etiology of Multiple Sclerosis.
LD score regression / GWAS + epigenetics
Due: Monday, December 4