# Assignments

Contact: Caleb Lareau (Teaching Associate)

Please submit all completed homeworks as a `.pdf` file to the course webpage on Canvas.

## Homework 1

Exploring R and GWAS

Due: Wednesday, November 1

Code hint for working with Problems 1 and 2:

``````library(qqman)
library(tidyr)
library(magnittr)

# Import summary statistics
gwas <- data.frame(data.table::fread(paste0("zcat < ","../data/jointGwasMc_LDL.txt.gz"),
showProgress = FALSE))

# Get chromosome / basepair information
gwas <- gwas %>% separate(SNP_hg19, c("CHR", "BP"), ":")

# Cleanup the raw data
gwas\$CHR <- as.numeric(as.character(gsub("chr", "", gwas\$CHR)))
gwas\$BP <- as.numeric(gwas\$BP)
gwas <- gwas[gwas\$P.value > 10^-100,]

# Make plot and save it to disk rather than the R graphics output
png("mymanhattan.png")
manhattan(gwas, p = "P.value")
dev.off()
``````

For Windows users wanting to use `zcat`

``````https://git-for-windows.github.io/
``````

## Homework 2

Pedigrees and Hardy-Weinberg Equilibrium

Due: Monday, November 6

### Hints:

``````- Make the rare disease assumption. In other words, if an individual is present in a
pedigree and not part of the core pedigree assume that he/she is not a
carrier for the trait.

- The last problem, consider the setting where we are sampling individuals as adults
and measuring their genotypes. Would HWE hold? Why or why not?

``````

## Homework 3

Genome-wide association studies

FOR THE HOMEWORK:

Replace

``````PC <- prcomp(t(geno))
allDF <- data.frame(ancestry = pheno\$ancestry, gender = pheno\$gender,
Y = pheno\$Y, PC\$rotation[,1:4])

``````

WITH THIS:

``````library(irlba)

PC <- irlba(t(geno))
allDF <- data.frame(ancestry = pheno\$ancestry, gender = pheno\$gender,
Y = pheno\$Y, PC\$v[,1:4])

``````

Due: Monday, November 13

Data for Homework 3:

Filename: genotype.txt.gz

``````Description: An n x m matrix of genotypes. Genotypes are coded as 0, 1, 2, or NA for missing.
Each row provides an individual’s genotype across all m genetic markers.
Each column provides the genotypes of all n individuals at a single marker.
``````

Filename: legend.txt

``````Description: Chromosome, position, and alleles for each SNP.
Here, the ith SNP corresponds to the ith column in genotype.txt. Alleles under “X1” are the affect allele.
For example, if (X0, X1)=(C,T), then 0, 1, and 2 in genotype.txt correspond to CC, TC, and TT, respectively.
``````

Filename: phenotype.txt

``````Description: Primary outcome (1=case,0=control), gender (F=female, M=male),
and reported ancestry (European, African) for each individual.
Here, the ith individual corresponds to the ith row in genotype.txt.
``````

## Homework 4

Rare variant analysis

Due: Monday, November 27

For this assignment, we have a slight change-of-pace where we will review some recent literature that incorporate principles of statistical genetics applied to understand the etiology of Multiple Sclerosis.

## Homework 5

LD score regression / GWAS + epigenetics

Due: Monday, December 4