BST227

Assignments

Contact: Caleb Lareau (Teaching Associate)

Please submit all completed homeworks as a .pdf file to the course webpage on Canvas.

Homework 1

Exploring R and GWAS

Due: Wednesday, November 1

Code hint for working with Problems 1 and 2:

library(qqman)
library(tidyr)
library(magnittr)

# Import summary statistics
gwas <- data.frame(data.table::fread(paste0("zcat < ","../data/jointGwasMc_LDL.txt.gz"),
                                     showProgress = FALSE))

# Get chromosome / basepair information
gwas <- gwas %>% separate(SNP_hg19, c("CHR", "BP"), ":")

# Cleanup the raw data
gwas$CHR <- as.numeric(as.character(gsub("chr", "", gwas$CHR)))
gwas$BP <- as.numeric(gwas$BP)
gwas <- gwas[gwas$P.value > 10^-100,]

# Make plot and save it to disk rather than the R graphics output
png("mymanhattan.png")
manhattan(gwas, p = "P.value")
dev.off()

For Windows users wanting to use zcat

Visit this site and consider downloading the toolkit:

https://git-for-windows.github.io/

Homework 2

Pedigrees and Hardy-Weinberg Equilibrium

Due: Monday, November 6

Hints:

- Make the rare disease assumption. In other words, if an individual is present in a
pedigree and not part of the core pedigree assume that he/she is not a
carrier for the trait. 

- The last problem, consider the setting where we are sampling individuals as adults
and measuring their genotypes. Would HWE hold? Why or why not? 

Homework 3

Genome-wide association studies

FOR THE HOMEWORK:

Replace

PC <- prcomp(t(geno))
allDF <- data.frame(ancestry = pheno$ancestry, gender = pheno$gender,
                    Y = pheno$Y, PC$rotation[,1:4])


WITH THIS:

library(irlba)

PC <- irlba(t(geno))
allDF <- data.frame(ancestry = pheno$ancestry, gender = pheno$gender,
                    Y = pheno$Y, PC$v[,1:4])

Due: Monday, November 13

Data for Homework 3:

Filename: genotype.txt.gz

Description: An n x m matrix of genotypes. Genotypes are coded as 0, 1, 2, or NA for missing.
Each row provides an individual’s genotype across all m genetic markers.
Each column provides the genotypes of all n individuals at a single marker.

Filename: legend.txt

Description: Chromosome, position, and alleles for each SNP.
Here, the ith SNP corresponds to the ith column in genotype.txt. Alleles under “X1” are the affect allele.
For example, if (X0, X1)=(C,T), then 0, 1, and 2 in genotype.txt correspond to CC, TC, and TT, respectively.

Filename: phenotype.txt

Description: Primary outcome (1=case,0=control), gender (F=female, M=male),
and reported ancestry (European, African) for each individual.
Here, the ith individual corresponds to the ith row in genotype.txt.

Homework 4

Rare variant analysis

Due: Monday, November 27

For this assignment, we have a slight change-of-pace where we will review some recent literature that incorporate principles of statistical genetics applied to understand the etiology of Multiple Sclerosis.

Homework 5

LD score regression / GWAS + epigenetics

Due: Monday, December 4