Introduction

You’ve recently discovered that there are in fact two types of cholesterol – both good (HDL) and bad (LDL). You are worried that you may have a genetic predisposition to having high levels of LDL cholesterol and decide to investigate what genes may be associated with LDL cholesterol. Here, we’ll do a basic exploration of the data before investigating the specific genetic effects in subsequent lectures. We encourage you to use R markdown to include both your code and plots in the same document and then submit a pdf to Canvas.

Problem 1

Access the LDL summary statistics from the Global Lipids Genetics Consortium (GLGC) genome-wide association study. Visualize the summary statistics both as a Manhattan Plot and as a Q-Q plot. What chromosome(s) appear to have genome-wide significant hits?

Hint: The data can be found through LDHub: http://ldsc.broadinstitute.org/ldhub/. Click on “Get Started with LD Hub” > “Sign in with your Google account” > “Go GWAShare Center”and then click on the link for “LDL.sumstats_deGC.gz” in the leftmost column. This takes you to the GLGC website where you can read about the data and then download the file by clicking on the “LDL Cholesterol” link under “Joint Analysis of Metabochip and GWAS Data.” Look back at Lab 2 for helpful R packages and functions for visualizing the data.

Summary statistic files were downloaded from a link on the UMich website that was directed from the LDHub collection of files.

library(qqman)
library(tidyr)
library(dplyr)
library(R.utils)
# Import summary statistics
setwd("C:/Users/Michele/Documents/GitHub/BST227/Solutions/2018/Homework2/")
pathToDataFile <- "C:/Users/Michele/Documents/GitHub/BST227/Solutions/2018/Homework2/jointGwasMc_LDL.txt.gz"
gwas <- as.data.frame(read.table(pathToDataFile, header = T))
gwas$beta <- as.numeric(gwas$beta); gwas$se <- as.numeric(gwas$se);
gwas$N <- as.numeric(gwas$N); gwas$P.value <- as.numeric(gwas$P.value);
gwas$Freq.A1.1000G.EUR <- as.numeric(gwas$Freq.A1.1000G.EUR)

# Alternatively these lines make work for other users:
#gwas <- as.data.frame(data.table::fread(paste0(pathToDataFile), showProgress=TRUE))
#gwas <- as.data.frame(data.table::fread(paste0("zcat < ", pathToDataFile)))

# Get chromosome / basepair information
gwas <- gwas %>% separate(SNP_hg19, c("CHR", "BP"), ":")

# Cleanup the raw data
gwas$CHR <- as.numeric(as.character(gsub("chr", "", gwas$CHR))) # gets rid of the "chr" in "chr10"
gwas$BP <- as.numeric(gwas$BP)
gwas <- gwas %>% filter(P.value > 10^-100)

# Make plot
manhattan(gwas, p = "P.value")
Manhattan plot of joint GWAS results

Manhattan plot of joint GWAS results

# Find chromosomes with p-values greater than threshold
gwas$transform.P  <- -log10(gwas$P.value)
gwasSigHits <- gwas[gwas$transform.P > -log10(5*10^-8),]
sort(unique(gwasSigHits$CHR))
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 16 17 19 20 22

From the above plot and output we see that all chromosomes except chromosomes 15, 18, and 21 have SNPs with a genome-wide significant association (red line at \(-\log_{10}(5\times 10^{-8})\).

# make qq plot
qq(gwas$P.value)
qqplot of GWAS results

qqplot of GWAS results

Problem 2

As part of the GLGC Consortium, the group analyzed data for a different SNP array, the Metabochip. Visualize the summary statistics both as a Manhattan Plot and as a Q-Q plot. What chromosome(s) appear to have genome-wide significant hits?

Hint: Download the data using the “LDL Cholesterol”" link found under the “Analysis of Metabochip Data” heading of the GLGC page from Problem 1.

# Import summary statistics
pathToMetaboFile <- "C:/Users/Michele/Documents/GitHub/BST227/Solutions/2018/Homework2/Mc_LDL.txt.gz"
metabo <- as.data.frame(read.table(pathToMetaboFile, header = T))

# Alternatively, this line for reading in the data may work for some users
#metabo <- data.frame(data.table::fread(paste0("zcat < ","../data/Mc_LDL.txt.gz"),
#                                       showProgress = FALSE))

# Get chromosome / basepair information
metabo <- metabo %>% separate(SNP_hg19, c("CHR", "BP"), ":")

# Cleanup the raw data
metabo$CHR <- as.numeric(as.character(gsub("chr", "", metabo$CHR)))
metabo$BP <- as.numeric(metabo$BP)
metabo <- metabo %>% filter(P.value > 10^-100)

# Make plot
manhattan(metabo, p = "P.value")