Introduction

You’ve recently discovered that there are in fact two types of cholesterol – both good (HDL) and bad (LDL). You are worried that you may have a genetic predisposition to having high levels of LDL cholesterol and decide to investigate what genes may be associated with LDL cholesterol. Here, we’ll do a basic exploration of the data before investigating the specific genetic effects in subsequent lectures. We encourage you to use R markdown to include both your code and plots in the same document and then submit a pdf to Canvas.

Problem 1

Access the LDL summary statistics from the Global Lipids Genetics Consortium (GLGC) genome-wide association study. Visualize the summary statistics both as a Manhattan Plot and as a Q-Q plot. What chromosome(s) appear to have genome-wide significant hits?

Summary statistic files were downloaded from a link on the UMich website that was directed from the LDHub collection of files.

library(qqman)
library(tidyr)
library(dplyr)
library(R.utils)
# Import summary statistics
setwd("C:/Users/Michele/Documents/GitHub/BST227/Solutions/2018/Homework2/")
pathToDataFile <- "C:/Users/Michele/Documents/GitHub/BST227/Solutions/2018/Homework2/jointGwasMc_LDL.txt.gz"
gwas$beta <- as.numeric(gwas$beta); gwas$se <- as.numeric(gwas$se);
gwas$N <- as.numeric(gwas$N); gwas$P.value <- as.numeric(gwas$P.value);
gwas$Freq.A1.1000G.EUR <- as.numeric(gwas$Freq.A1.1000G.EUR)

# Alternatively these lines make work for other users:
#gwas <- as.data.frame(data.table::fread(paste0("zcat < ", pathToDataFile)))

# Get chromosome / basepair information
gwas <- gwas %>% separate(SNP_hg19, c("CHR", "BP"), ":")

# Cleanup the raw data
gwas$CHR <- as.numeric(as.character(gsub("chr", "", gwas$CHR))) # gets rid of the "chr" in "chr10"
gwas$BP <- as.numeric(gwas$BP)
gwas <- gwas %>% filter(P.value > 10^-100)

# Make plot
manhattan(gwas, p = "P.value")
# Find chromosomes with p-values greater than threshold
gwas$transform.P <- -log10(gwas$P.value)
gwasSigHits <- gwas[gwas$transform.P > -log10(5*10^-8),] sort(unique(gwasSigHits$CHR))
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 16 17 19 20 22

From the above plot and output we see that all chromosomes except chromosomes 15, 18, and 21 have SNPs with a genome-wide significant association (red line at $$-\log_{10}(5\times 10^{-8})$$.

# make qq plot
qq(gwas$P.value) Problem 2 As part of the GLGC Consortium, the group analyzed data for a different SNP array, the Metabochip. Visualize the summary statistics both as a Manhattan Plot and as a Q-Q plot. What chromosome(s) appear to have genome-wide significant hits? Hint: Download the data using the “LDL Cholesterol”" link found under the “Analysis of Metabochip Data” heading of the GLGC page from Problem 1. # Import summary statistics pathToMetaboFile <- "C:/Users/Michele/Documents/GitHub/BST227/Solutions/2018/Homework2/Mc_LDL.txt.gz" metabo <- as.data.frame(read.table(pathToMetaboFile, header = T)) # Alternatively, this line for reading in the data may work for some users #metabo <- data.frame(data.table::fread(paste0("zcat < ","../data/Mc_LDL.txt.gz"), # showProgress = FALSE)) # Get chromosome / basepair information metabo <- metabo %>% separate(SNP_hg19, c("CHR", "BP"), ":") # Cleanup the raw data metabo$CHR <- as.numeric(as.character(gsub("chr", "", metabo$CHR))) metabo$BP <- as.numeric(metabo\$BP)
metabo <- metabo %>% filter(P.value > 10^-100)

# Make plot
manhattan(metabo, p = "P.value")