Previously, we looked at summary statistics for LDL using a large cohort. Here, we’ll see how those summary statistics are actually generated. We’ll look at a case-control outcome for IBD for chromosome 10 using simulated data.

**A) Measure the association of each SNP on the phenotype without adjusting for any other covariates. B) Show a qq-plot and manhattan plot as we did in homework 1 using the qqman package. C) Compute \(\lambda_{GC}\) as we did in homework 1. What does the computed value of \(\lambda_{GC}\) say about the degree of inflation for this chromosome?**

Here in one code chunk, we can compute all three regression models for 1A, 2A, and 2B. The final data frame “result” will have the SNP coordinates, alleles, and then 3 p-values for each of the associated models.

```
library(data.table)
library(irlba)
# Import Data
geno <- data.matrix(data.frame(fread(paste0("zcat <", "../../Assignments/Homework3/data/genotype.txt.gz"), header = FALSE)))
```

```
##
Read 0.0% of 6081 rows
Read 6081 rows and 34405 (of 34405) columns from 0.394 GB file in 00:00:15
```

```
pheno <- read.table("../../Assignments/Homework3/data/phenotype.txt", header = TRUE)
snpPos <- read.table("../../Assignments/Homework3/data/legend.txt", header = TRUE)
# QC
snpMissingRate <- colMeans(is.na(geno))
personMissingRate <- rowMeans(is.na(geno))
geno <- geno[personMissingRate <= 0.05, snpMissingRate <= 0.05]
snpPos <- snpPos[snpMissingRate <= 0.05,]
pheno <- pheno[personMissingRate <= 0.05,]
# Handle Missingness
for (j in 1:dim(geno)[2]) {
temp <- geno[,j]
geno[,j][is.na(temp)] <- median(temp,na.rm = TRUE) #replace missing with median
geno[,j] <- (geno[,j]-mean(geno[,j]))/sd(geno[,j]) #normalization / standarization
}
# Perform PCA
PC <- irlba(geno)
allDF <- data.frame(ancestry = pheno$ancestry, gender = pheno$gender,
Y = pheno$Y, PC$u)
# Initialize the data frame of results
result <- snpPos
result$noadjp <- 1; result$ancestryp <- 1; result$PCp <- 1
# Loop over each yariant
for (j in 1:dim(geno)[2]) {
# Fit models
noadj.mod <- summary(glm(allDF$Y~geno[,j],family=binomial()))$coeff
ancestry.mod<- summary(glm(allDF$Y~geno[,j]+allDF$gender+allDF$ancestry,
family=binomial()))$coeff
PC4.mod <- summary(glm(allDF$Y~geno[,j]+allDF$gender+allDF$X1+allDF$X2+allDF$X3+allDF$X4,
family=binomial()))$coeff
# Overwrite the results data frame
result[j,c("noadjp","ancestryp","PCp")] <- c(noadj.mod[2,4],ancestry.mod[2,4],PC4.mod[2,4])
}
```

The summary statistics can be visualized from the model here:

`qqman::manhattan(result, chr = "chr", bp = "position", p = "noadjp")`