1 Introduction

The SNPassoc package contains facilities for data manipulation, tools for exploratory data analysis, convenient graphical facilities, and tools for assessing genetic association for both quantitative and categorial (case-control) traits in whole genome approaches. Genome-based studies are normally analyzed using a multistage approach. In the first step researchers are interested in assessing association between the outcome and thousands of SNPs. In a second and possibly third step, medium/large scale studies are performed in which only a few hundred of SNPs, those with a putative association found in the first step, are genotyped. SNPassoc is specially designed for analyzing this kind of designs. In addition, a convenience-based approach (select variants on the basis of logistical considerations such as the ease and cost of genotyping) can also be analyzed using SNPassoc. Different genetic models are also implemented in the package. Analysis of multiple SNPs can be analyzed using either haplotype or gene-gene interaction approaches.

This document is an updated version of the initial vignette that was published with the SNPassoc paper González et al. (2007). It contains a more realistic example belonging to a real dataset. The original vignette is still available here.

2 Data loading

SNP data are typically available in text format or Excel spreadsheets which are easily uploaded in R as a data frame. Here, as an illustrative example, we are analyzing a dataset containing epidemiological information and 51 SNPs from a case-control study on asthma. The data is available within SNPassoc and can be loaded by

Then, the data is loaded into the R session by

data(asthma, package = "SNPassoc")
str(asthma, list.len=9)
'data.frame':   1578 obs. of  57 variables:
 $ country    : Factor w/ 10 levels "Australia","Belgium",..: 5 5 5 5 5 5 5 5 5 5 ...
 $ gender     : Factor w/ 2 levels "Females","Males": 2 2 2 1 1 1 1 2 1 1 ...
 $ age        : num  42.8 50.2 46.7 47.9 48.4 ...
 $ bmi        : num  20.1 24.7 27.7 33.3 25.2 ...
 $ smoke      : int  1 0 0 0 0 1 0 0 0 0 ...
 $ casecontrol: int  0 0 0 0 1 0 0 0 0 0 ...
 $ rs4490198  : Factor w/ 3 levels "AA","AG","GG": 3 3 3 2 2 2 3 2 2 2 ...
 $ rs4849332  : Factor w/ 3 levels "GG","GT","TT": 3 2 3 2 1 2 3 3 2 1 ...
 $ rs1367179  : Factor w/ 3 levels "CC","GC","GG": 2 2 2 3 3 3 2 3 3 3 ...
  [list output truncated]
asthma[1:5, 1:8]
  country  gender      age      bmi smoke casecontrol rs4490198 rs4849332
1 Germany   Males 42.80630 20.14797     1           0        GG        TT
2 Germany   Males 50.22861 24.69136     0           0        GG        GT
3 Germany   Males 46.68857 27.73230     0           0        GG        TT
4 Germany Females 47.86311 33.33187     0           0        AG        GT
5 Germany Females 48.44079 25.23634     0           1        AG        GG

We observe that we have case-control status (0: control, 1: asthma) and another 4 variables encoding the country of origin, gender, age, body mass index (bmi) and smoking status (0: no smoker, 1: ex-smoker, 2: current smoker). There are 51 SNPs whose genotypes are given by the alleles names.

3 Descriptive analysis

To start the analysis, we must indicate which columns of the dataset asthma contain the SNP data, using the setupSNP function. In our example, SNPs start from column 7 onwards, which we specify in argument colSNPs

library(SNPassoc)
asthma.s <- setupSNP(data=asthma, colSNPs=7:ncol(asthma), sep="")

This is an alternative way of determining the columns containing the SNPs

idx <- grep("^rs", colnames(asthma))
asthma.s <- setupSNP(data=asthma, colSNPs=idx, sep="")

The argument sep indicates the character separating the alleles. The default value is ’‘/´´. In our case, there is no separating character, so that, we set sep="". The argument name.genotypes can be used when genotypes are available in other formats, such as 0, 1, 2 or’‘norm´´,’‘het´´,’’mut´´. The purpose of the setupSNP function is to assign the class snp to the SNPs variables, to which SNPassoc methods will be applied. The function labels the most common genotype across subjects as the reference one. When numerous SNPs are available, the function can be parallelized through the argument mc.cores that indicates the number of processors to be used. We can verify that the SNP variables are given the new class snp

head(asthma.s$rs1422993)
[1] G/G G/T G/G G/T G/T G/G
Genotypes: G/G G/T T/T
Alleles:  G T 
class(asthma.s$rs1422993)
[1] "snp"    "factor"

and summarize their content with summary

summary(asthma.s$rs1422993)
Genotypes: 
    frequency percentage
G/G       903  57.224335
G/T       570  36.121673
T/T       105   6.653992

Alleles: 
  frequency percentage
G      2376   75.28517
T       780   24.71483

HWE (p value): 0.250093 

which shows the genotype and allele frequencies for a given SNP, testing for Hardy-Weinberg equilibrium (HWE). We can also visualize the results in a plot by

plot(asthma.s$rs1422993)
SNP summary. Bar chart showing the basic information of a given SNP

Figure 1: SNP summary
Bar chart showing the basic information of a given SNP

The argument type helps to get a pie chart

plot(asthma.s$rs1422993, type=pie)
SNP summary. Pie chart showing the basic information of a given SNP

Figure 2: SNP summary
Pie chart showing the basic information of a given SNP

The summary function can also be applied to the whole dataset

summary(asthma.s, print=FALSE)
           alleles major.allele.freq HWE      missing (%)
rs4490198  A/G     59.2              0.174133  0.6       
rs4849332  G/T     61.8              0.522060  0.1       
rs1367179  G/C     81.4              0.738153  1.0       
rs11123242 C/T     81.7              0.932898  0.6       
rs13014858 G/A     58.3              0.351116  0.1       
rs1430094  G/A     66.9              0.305509  0.4       
rs1430093  C/A     66.6              0.817701  3.5       
rs746710   G/C     51.5              0.614368  0.0       
rs1430090  T/G     70.0              0.025180  1.6       
rs6737251  C/T     69.3              0.235996  0.3       
rs11685217 C/T     80.1              0.009462  4.5       
rs1430097  C/A     65.1              0.738166  1.0       
rs10496465 A/G     85.8              0.917997  0.6       
rs3756688  T/C     63.9              0.154632  0.6       
rs2303063  A/G     53.0              0.722069  1.1       
rs1422993  G/T     75.3              0.250093  0.0       
rs2400478  G/A     62.6              0.256786  0.9       
rs714588   A/G     54.9              0.838329  0.8       
rs1023555  T/A     76.8              0.943443  0.5       
rs898070   G/A     62.6              1.000000  0.6       
rs963218   C/T     53.2              0.389387  0.3       
rs1419835  C/T     78.2              0.505391  0.6       
rs765023   T/C     64.5              0.030513  6.9       
rs1345267  A/G     61.0              0.112183  0.1       
rs324381   G/A     64.6              0.242223 11.6       
rs184448   T/G     55.9              0.008446  2.2       
rs324396   C/T     71.2              0.197291  0.3       
rs324957   G/A     57.0              0.007417  0.4       
rs324960   C/T     66.6              0.077777  1.1       
rs10486657 C/T     81.3              0.672703  4.3       
rs324981   A/T     53.2              0.048438  0.2       
rs1419780  C/G     80.8              0.569652  0.2       
rs325462   T/A     51.0              0.337862  0.3       
rs727162   G/C     78.5              0.765708  0.0       
rs10250709 G/A     65.4              0.266434  0.0       
rs6958905  T/C     64.7              0.377472  0.4       
rs10238983 T/C     75.7              0.216435  0.4       
rs4941643  A/G     54.1              0.635887  7.2       
rs3794381  C/G     71.7              0.652089  7.1       
rs2031532  G/A     65.0              0.911918  0.0       
rs2247119  T/C     71.5              0.457710  0.5       
rs8000149  T/C     63.2              0.588077  0.4       
rs2274276  G/C     57.0              0.571386  0.6       
rs7332573  G/T     91.5              0.869947  1.5       
rs3829366  T/A     51.6              0.722626  1.3       
rs6084432  G/A     83.7              0.266716  0.6       
rs512625   G/A     69.5              0.905395  0.4       
rs3918395  G/T     86.8              0.508732  1.2       
rs2787095  G/C     60.2              0.102053  0.8       
rs2853215  G/A     73.0              0.249516  0.2       

showing the SNP labels with minor/major allele format, the major allele frequency the HWE test and the percentage of missing genotypes. Missing values can be further explored plotting with

plotMissing(asthma.s, print.labels.SNPs = FALSE)
Missing genotypes. Black squares shows missing genotuype information of asthma data example.

Figure 3: Missing genotypes
Black squares shows missing genotuype information of asthma data example.

This plot can be used to inspect if missing values appear randomly across individuals and SNPs. In our case, we can see that the missing pattern may be considered random, except for three clusters in consecutive SNPs (large black squares). These individuals should be further checked for possible problems with genotyping.

4 Hardy-Weinberg equilibrium

Genotyping of SNPs needs to pass quality control measures. Aside from technical details that need to be considered for filtering SNPs with low quality, genotype calling error can be detected by a HWE test. The test compares the observed genotype frequencies with those expected under random mating, which follows when the SNPs are in the absence of selection, mutation, genetic drift, or other forces. Therefore, HWE must be checked only in controls. There are several tests described in the literature to verify HWE. In SNPassoc HWE is tested for all the bi-allelic SNP markers using a fast exact test Wigginton, Cutler, and Abecasis (2005) implemented in the tableHWE function.

hwe <- tableHWE(asthma.s)
head(hwe)
           HWE (p value)
rs4490198      0.1741325
rs4849332      0.5220596
rs1367179      0.7381531
rs11123242     0.9328981
rs13014858     0.3511162
rs1430094      0.3055089

We observe that the first SNPs in the dataset are under HWE since their P-values rejecting the HWE hypothesis (null hypothesis) are larger than 0.05. However, when tested in control samples only, by stratifying by cases and controls

hwe2 <- tableHWE(asthma.s, casecontrol)

#SNPs is HWE in the whole sample but not controls
snpNHWE <- hwe2[,1]>0.05 & hwe2[,2]<0.05
rownames(hwe2)[snpNHWE]
[1] "rs1345267"
hwe2[snpNHWE,]
all groups          0          1 
0.11218285 0.04956349 0.81604706 

We see that rs1345267 is not in HWE within controls because its P-value is <0.05. Notice that one is interested in keeping those SNPsthat do not reject the null hypothesis. As several SNPs are tested, multiple comparisons must be considered. In this particular setting, a threshold of 0.001 is normally considered. As a quality control measure, it is not necessary to be as conservative as in those situations where false discovery rates need to be controlled.

SNPs that do not pass the HWE test must be removed form further analyses. We can recall setupSNP and indicate the columns of the SNPs to be kept

snps.ok <- rownames(hwe2)[hwe2[,2]>=0.001]
pos <- which(colnames(asthma)%in%snps.ok, useNames = FALSE)
asthma.s <- setupSNP(asthma, pos, sep="")

Note that in the variable pos we redefine the SNP variables to be considered as class snp.

5 SNP association analysis

We are interested in finding those SNPs associated with asthma status that is encoded in the variable casecontrol. We first illustrate the association between case-control status and the SNP rs1422993. The association analysis for all genetic models is performed by the function association that regresses casecontrol on the variable rs1422993 from the dataset asthma.s that already contains the SNP variables of class snp.

association(casecontrol ~ rs1422993, data = asthma.s)

SNP: rs1422993  adjusted by: 
                0    %   1    %   OR lower upper  p-value  AIC
Codominant                                                    
G/G           730 59.0 173 50.9 1.00             0.017768 1642
G/T           425 34.3 145 42.6 1.44  1.12  1.85              
T/T            83  6.7  22  6.5 1.12  0.68  1.84              
Dominant                                                      
G/G           730 59.0 173 50.9 1.00             0.007826 1642
G/T-T/T       508 41.0 167 49.1 1.39  1.09  1.77              
Recessive                                                     
G/G-G/T      1155 93.3 318 93.5 1.00             0.877863 1649
T/T            83  6.7  22  6.5 0.96  0.59  1.57              
Overdominant                                                  
G/G-T/T       813 65.7 195 57.4 1.00             0.005026 1641
G/T           425 34.3 145 42.6 1.42  1.11  1.82              
log-Additive                                                  
0,1,2        1238 78.5 340 21.5 1.22  1.01  1.47 0.040151 1644

The function association follows the usual syntax of R modelling functions with the difference that the variables in the model that are of class snp are tested using different genetic models. In our example, we observe that all genetic models but the recessive one are statistically significant. association also fits the overdominant model, which compares the two homozygous genotypes versus the heterozygous one. This genetic model of inheritance is biologically rare although it has been linked to sickle cell anemia in humans. The result table describes the number of individuals in each genotype across cases and controls. The ORs and CI-95% are also computed. The last column describes the AIC (Akaike information criteria) that can be used to decide which is the best model of inheritance; the lower the better the model is.

In the example, one may conclude that rs1422993 is associated with asthma and that, for instance, the risk of being asthmatic is 39% higher in people having at least one alternative allele (T) with respect to individuals having none (dominant model). This risk is statistically significant since the CI-95% does not contain 1, the P-value is 0.0078<0.022, or the P-value of the max-statistics is 0.01

maxstat(asthma.s$casecontrol, asthma.s$rs1422993)
     dominant recessive log-additive MAX-statistic Pr(>z)  
[1,]    7.073     0.024        4.291         7.073 0.0182 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

If an expected model of inheritance is hypothesized, the association analysis for the model can be specified in the argument model, which by default test all models,

association(casecontrol ~ rs1422993, asthma.s, model="dominant")

SNP: rs1422993  adjusted by: 
           0  %   1    %   OR lower upper  p-value  AIC
Dominant                                               
G/G      730 59 173 50.9 1.00             0.007826 1642
G/T-T/T  508 41 167 49.1 1.39  1.09  1.77              

Association tests are typically adjusted by covariates, which are incorporated in the model in the usual form

association(casecontrol ~ rs1422993 + country + smoke, asthma.s)

SNP: rs1422993  adjusted by: country smoke 
                0    %   1    %   OR lower upper p-value  AIC
Codominant                                                   
G/G           728 59.1 173 51.0 1.00             0.06940 1408
G/T           423 34.3 144 42.5 1.38  1.05  1.82             
T/T            81  6.6  22  6.5 1.06  0.61  1.85             
Dominant                                                     
G/G           728 59.1 173 51.0 1.00             0.03429 1407
G/T-T/T       504 40.9 166 49.0 1.33  1.02  1.73             
Recessive                                                    
G/G-G/T      1151 93.4 317 93.5 1.00             0.79338 1411
T/T            81  6.6  22  6.5 0.93  0.54  1.60             
Overdominant                                                 
G/G-T/T       809 65.7 195 57.5 1.00             0.02147 1406
G/T           423 34.3 144 42.5 1.37  1.05  1.80             
log-Additive                                                 
0,1,2        1232 78.4 339 21.6 1.19  0.96  1.46 0.11191 1409

ORs for stratified analysis on given categorical covariates are used to verify whether the risk is constant across groups

association(casecontrol ~ rs1422993 + survival::strata(gender), asthma.s)

SNP: rs1422993  adjusted by: survival::strata(gender) 
                0    %   1    %   OR lower upper  p-value  AIC
Codominant                                                    
G/G           730 59.0 173 50.9 1.00             0.022940 1634
G/T           425 34.3 145 42.6 1.42  1.11  1.83              
T/T            83  6.7  22  6.5 1.09  0.66  1.80              
Dominant                                                      
G/G           730 59.0 173 50.9 1.00             0.011144 1633
G/T-T/T       508 41.0 167 49.1 1.37  1.07  1.74              
Recessive                                                     
G/G-G/T      1155 93.3 318 93.5 1.00             0.805330 1640
T/T            83  6.7  22  6.5 0.94  0.58  1.53              
Overdominant                                                  
G/G-T/T       813 65.7 195 57.4 1.00             0.006378 1632
G/T           425 34.3 145 42.6 1.41  1.10  1.80              
log-Additive                                                  
0,1,2        1238 78.5 340 21.5 1.21  1.00  1.46 0.055231 1636

We can see, for instance, that the dominant model is significant only in males. The subset argument allows fitting the model in a subgroup of individuals

association(casecontrol ~ rs1422993, asthma.s, 
                subset=country=="Spain")

SNP: rs1422993  adjusted by: 
               0    %  1    %   OR lower upper p-value   AIC
Codominant                                                  
G/G          179 54.6 22 44.9 1.00              0.3550 295.2
G/T          125 38.1 24 49.0 1.56  0.84  2.91              
T/T           24  7.3  3  6.1 1.02  0.28  3.66              
Dominant                                                    
G/G          179 54.6 22 44.9 1.00              0.2059 293.7
G/T-T/T      149 45.4 27 55.1 1.47  0.81  2.70              
Recessive                                                   
G/G-G/T      304 92.7 46 93.9 1.00              0.7576 295.2
T/T           24  7.3  3  6.1 0.83  0.24  2.85              
Overdominant                                                
G/G-T/T      203 61.9 25 51.0 1.00              0.1502 293.2
G/T          125 38.1 24 49.0 1.56  0.85  2.85              
log-Additive                                                
0,1,2        328 87.0 49 13.0 1.23  0.77  1.96  0.3816 294.5

These analyses can be also be performed in quantitative traits, such as body mass index, since association function automatically selects the error distribution of the regression analysis (either Gaussian or binomial).

association(bmi ~ rs1422993, asthma.s) 

SNP: rs1422993  adjusted by: 
                n    me     se       dif   lower  upper p-value  AIC
Codominant                                                          
G/G           896 25.53 0.1446  0.000000                 0.9069 9069
G/T           565 25.50 0.1834 -0.027059 -0.4874 0.4332             
T/T           105 25.71 0.4676  0.178076 -0.7057 1.0619             
Dominant                                                            
G/G           896 25.53 0.1446  0.000000                 0.9818 9067
G/T-T/T       670 25.54 0.1710  0.005089 -0.4324 0.4426             
Recessive                                                           
G/G-G/T      1461 25.52 0.1135  0.000000                 0.6694 9067
T/T           105 25.71 0.4676  0.188540 -0.6769 1.0540             
Overdominant                                                        
G/G-T/T      1001 25.55 0.1383  0.000000                 0.8424 9067
G/T           565 25.50 0.1834 -0.045739 -0.4965 0.4050             
log-Additive                                                        
0,1,2                           0.033951 -0.3153 0.3832  0.8489 9067

For BMI, association tests whether the difference between means is statistically significant, rather than computing an OR.

For multiple SNP data, our objective is to identify the variants that are significantly associated with the trait. The most basic strategy is, therefore, to fit an association test like the one described above for each of the SNPs in the dataset and determine which of those associations are significant. The massive univariate testing is the most widely used analysis method for omic data because of its simplicity. In SNPassoc, this type of analysis is done with the function WGassociation

ans <- WGassociation(casecontrol, data=asthma.s)
head(ans)
           comments codominant dominant recessive overdominant log-additive
rs4490198         -    0.52765  0.29503   0.96400      0.29998      0.49506
rs4849332         -    0.96912  0.92986   0.84806      0.82327      0.97049
rs1367179         -    0.62775  0.59205   0.35786      0.86419      0.43994
rs11123242        -    0.68622  0.67596   0.39801      0.92878      0.52009
rs13014858        -    0.52578  0.26739   0.88011      0.34966      0.40897
rs1430094         -    0.13375  0.10569   0.54432      0.04490      0.36611

Here, only the outcome is required in the formula argument (first argument) since the function successively calls association on each of the variables of class snp within data. The function returns the P-values of association of each SNP under each genetic model. Covariates can also be introduced in the model

ans.adj <- WGassociation(casecontrol ~ country + smoke, asthma.s)
head(ans.adj)

SNPassoc is computationally limited on large genomic data. The computing time can be reduced by parallelization, specifying in the argument mc.cores the number of computing cores to be used. Alternatively, the function scanWGassociation, a C compiled function, can be used to compute a predetermined genetic model across all SNPs, passed in the argument model, which by default is the additive model

ans.fast <- scanWGassociation(casecontrol, asthma.s)

NOTE: This function is not available on the SNPassoc version available on CRAN. The user can install the development version available on GitHub to get access to this function just executing

devtools::install_github("isglobal-brge/SNPassoc")

The P-values obtained from massive univariate analyses are visualized with the generic plot function

plot(ans)
Manhattan-type plots for different genetic models. P-values in -log10 scale to assess the association between case-control status and SNPs in the asthma example.

Figure 4: Manhattan-type plots for different genetic models
P-values in -log10 scale to assess the association between case-control status and SNPs in the asthma example.

This produces a Manhattan plot of the -log10(P-values) for all the SNPs over all models. It shows the nominal level of significance and the Bonferroni level, which is the level corrected by the multiple testing across all SNPs. The overall hypothesis of massive univariate association tests is whether there is any SNP that is significantly associated with the phenotype. As multiple SNPs are tested, the probability of finding at least one significant finding increases if we do not lower the significance threshold. The Bonferroni correction lowers the threshold by the number of SNPs tested (0.0001=0.05/51). In the Manhattan plotof our analysis, we see that no SNP is significant at the Bonferroni level, and therefore there is no SNP that is significantly associated with asthma.

Maximum-statistic (see González et al. (2008)) can also be used to test association between asthma status and SNPs

ans.max <- maxstat(asthma.s, casecontrol)
ans.max
           dominant recessive log-additive MAX-statistic  Pr(>z)   
rs4490198     1.097     0.002        0.466         1.097 0.50207   
rs4849332     0.008     0.037        0.001         0.037 0.97650   
rs1367179     0.287     0.845        0.602         0.845 0.58872   
rs11123242    0.175     0.714        0.417         0.714 0.63910   
rs13014858    1.230     0.023        0.683         1.230 0.46364   
rs1430094     2.617     0.368        0.821         2.617 0.20687   
rs1430093     1.051     0.042        0.743         1.051 0.51949   
rs746710      0.728     0.679        1.051         1.051 0.51615   
rs1430090     0.172     0.463        0.000         0.463 0.74286   
rs6737251     0.143     0.156        0.217         0.217 0.86897   
rs11685217    0.894     0.030        0.705         0.894 0.56948   
rs1430097     0.003     0.183        0.029         0.183 0.88847   
rs10496465    0.003     0.020        0.008         0.020 0.98729   
rs3756688     0.016     0.738        0.266         0.738 0.62631   
rs2303063     0.060     1.271        0.658         1.271 0.45282   
rs1422993     7.073     0.024        4.291         7.073 0.01782 * 
rs2400478     1.662     0.056        1.055         1.662 0.35902   
rs714588      0.659     0.061        0.150         0.659 0.65848   
rs1023555     0.221     0.104        0.261         0.261 0.84688   
rs898070      0.020     1.794        0.346         1.794 0.33132   
rs963218      0.165     0.190        0.263         0.263 0.84374   
rs1419835     1.084     0.775        0.295         1.084 0.51082   
rs765023      1.959     0.526        0.483         1.959 0.30285   
rs1345267     2.582     0.003        1.383         2.582 0.21185   
rs324381      0.175     0.402        0.377         0.402 0.77455   
rs184448      9.710     2.026        8.253         9.710 0.00309 **
rs324396      2.312     0.188        1.051         2.312 0.24768   
rs324957      7.598     2.134        7.157         7.598 0.01322 * 
rs324960      2.901     7.928        6.443         7.928 0.01161 * 
rs10486657    0.054     0.500        0.001         0.500 0.73041   
rs324981      3.168     2.431        4.270         4.270 0.08166 . 
rs1419780     0.123     1.193        0.005         1.193 0.47801   
rs325462      1.545     1.645        2.412         2.412 0.23233   
rs727162      3.022     0.643        3.074         3.074 0.15950   
rs10250709    0.737     0.007        0.360         0.737 0.62865   
rs6958905     0.490     0.099        0.447         0.490 0.73263   
rs10238983    0.094     0.692        0.328         0.692 0.64736   
rs4941643     1.294     0.006        0.465         1.294 0.44616   
rs3794381     0.127     0.998        0.489         0.998 0.53537   
rs2031532     0.025     0.240        0.127         0.240 0.85726   
rs2247119     0.379     0.000        0.229         0.379 0.78437   
rs8000149     0.020     0.043        0.043         0.043 0.97292   
rs2274276     0.005     0.034        0.023         0.034 0.97787   
rs7332573     1.247     1.703        1.825         1.825 0.32896   
rs3829366     1.137     0.448        0.069         1.137 0.49192   
rs6084432     2.999     0.848        3.279         3.279 0.14166   
rs512625      2.180     0.030        1.119         2.180 0.26371   
rs3918395     0.903     0.414        0.455         0.903 0.57133   
rs2787095     0.178     0.069        0.184         0.184 0.88683   
rs2853215     0.686     0.931        1.130         1.130 0.49337   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We note that even under the max-statistics none of the SNPs tested is significant under the Bonferroni correction (<0.0001) for multiple SNP testing

#minimum P-value across SNPs
min(ans.max["Pr(>z)",])
[1] 0.003085532

Information for specific association models for given SNPs can also be retrieved with WGstats

infoTable <- WGstats(ans)

Therefore, we can have access to the results for a given SNP by

infoTable$rs1422993

SNP: rs1422993  adjusted by: 
                0    %   1    %   OR lower upper  p-value  AIC
Codominant                                                    
G/G           730 59.0 173 50.9 1.00             0.017768 1642
G/T           425 34.3 145 42.6 1.44  1.12  1.85              
T/T            83  6.7  22  6.5 1.12  0.68  1.84              
Dominant                                                      
G/G           730 59.0 173 50.9 1.00             0.007826 1642
G/T-T/T       508 41.0 167 49.1 1.39  1.09  1.77              
Recessive                                                     
G/G-G/T      1155 93.3 318 93.5 1.00             0.877863 1649
T/T            83  6.7  22  6.5 0.96  0.59  1.57              
Overdominant                                                  
G/G-T/T       813 65.7 195 57.4 1.00             0.005026 1641
G/T           425 34.3 145 42.6 1.42  1.11  1.82              
log-Additive                                                  
0,1,2        1238 78.5 340 21.5 1.22  1.01  1.47 0.040151 1644

recovering our previous results given by association function.

NOTE: The R output of specific association analyses can be exported into LaTeX by using getNiceTable function and xtable R package. The following code creates a table for the SNPs rs1422993 and rs184448

library(xtable)
out <- getNiceTable(ans[c("rs1422993", "rs184448")])

nlines <- attr(out, "nlines")
hlines <- c(-1, -1, 0, cumsum(nlines+1), nrow(out), nrow(out))

print(xtable(out, caption='Genetic association using
                different genetic models from asthma 
                data example of rs1422993 and rs184448 
                SNPs obtained with SNPassoc.',
             label = 'tab-2SNPs'),
      tabular.enviroment="longtable", file="tableSNPs",
      floating=FALSE,  include.rownames = FALSE, 
      hline.after= hlines, sanitize.text.function=identity)

6 Gene-environment and gene-gene interactions

Gene-enviroment (GxE) analyses can be performed within SNPassoc using association function. Assume that we are interested in testing whether the risk of rs1422993 for asthma under the dominant model is different among smokers (variable smoke; 0=never, 1=ever). This code fits a model with an interaction term where the environmental variable is required to be a factor factor variable.

association(casecontrol ~ dominant(rs1422993)*factor(smoke), 
            data=asthma.s)

      SNP: dominant(rs1422993  adjusted by: 
 Interaction 
---------------------
          0       OR lower upper   1      OR lower upper
G/G     486 130 1.00    NA    NA 242 43 0.66  0.46  0.97
G/T-T/T 354 128 1.35  1.02  1.79 150 38 0.95  0.63  1.42

p interaction: 0.8513 

 factor(smoke) within dominant(rs1422993 
---------------------
G/G 
    0   1   OR lower upper
0 486 130 1.00    NA    NA
1 242  43 0.66  0.46  0.97

G/T-T/T 
    0   1  OR lower upper
0 354 128 1.0    NA    NA
1 150  38 0.7  0.47  1.06

p trend: 0.8513 

 dominant(rs1422993 within factor(smoke) 
---------------------
0 
          0   1   OR lower upper
G/G     486 130 1.00    NA    NA
G/T-T/T 354 128 1.35  1.02  1.79

1 
          0  1   OR lower upper
G/G     242 43 1.00    NA    NA
G/T-T/T 150 38 1.43  0.88  2.31

p trend: 0.8513 

The result is an interaction table showing that the risk of individuals carrying the T allele increases the risk of asthma in never smokers (OR=1.35; CI: 1.02-1.79) while it is not significant in ever smokers (OR=0.95; CI: 0.63-1.42). However, the interaction is not statistically significant (\(P\)-interaction=0.8513). The output also shows the stratified ORs that can help in interpreting the results.

In a similar way, gene-gene interaction (GxG) of a given SNP epistasis model can also be fitted using the same function. In that case, the genetic model of the interacting SNP must be indicated in the model.inteaction argument.

association(casecontrol ~ rs1422993*factor(rs184448), 
            data=asthma.s, model.interaction = "dominant" )

      SNP: rs1422993  adjusted by: 
 Interaction 
---------------------
        T/T      OR lower upper T/G      OR lower upper   0  1  G/G lower upper
G/G     227 43 1.00    NA    NA 359 96 1.41  0.95  2.10 128 30 1.24  0.74  2.07
G/T-T/T 154 33 1.13  0.69  1.86 265 93 1.85  1.24  2.77  78 38 2.57  1.55  4.27

p interaction: 0.24499 

 factor(rs184448) within rs1422993 
---------------------
G/G 
      0  1   OR lower upper
T/T 227 43 1.00    NA    NA
T/G 359 96 1.41  0.95  2.10
G/G 128 30 1.24  0.74  2.07

G/T-T/T 
      0  1   OR lower upper
T/T 154 33 1.00    NA    NA
T/G 265 93 1.64  1.05  2.55
G/G  78 38 2.27  1.32  3.90

p trend: 0.24499 

 rs1422993 within factor(rs184448) 
---------------------
T/T 
          0  1   OR lower upper
G/G     227 43 1.00    NA    NA
G/T-T/T 154 33 1.13  0.69  1.86

T/G 
          0  1   OR lower upper
G/G     359 96 1.00    NA    NA
G/T-T/T 265 93 1.31  0.95  1.82

G/G 
          0  1   OR lower upper
G/G     128 30 1.00    NA    NA
G/T-T/T  78 38 2.08  1.19  3.62

p trend: 0.12743 

We observe that the interaction between these two SNPs is not statistically significant (P-value=0.24). However, the OR of GG genotype of rs184448 differs across individuals between the GG and GT-TT genotypes of rs1422993 (see ORs for GG in the second table of the output).

The user also can perform GxG for a set of SNPs using this code. Let us assume we are interested in assessing interaction between the SNPs that are significant at 10% level

ans <- WGassociation(casecontrol, data=asthma.s)
mask <- apply(ans, 1, function(x) min(x, na.rm=TRUE)<0.1)
sig.snps <- names(mask[mask])
sig.snps
 [1] "rs1430094" "rs1422993" "rs765023"  "rs184448"  "rs324396"  "rs324957" 
 [7] "rs324960"  "rs324981"  "rs727162"  "rs6084432"
idx <- which(colnames(asthma)%in%sig.snps)
asthma.s2 <- setupSNP(asthma, colSNPs = idx, sep="")
ans.int <- interactionPval(casecontrol ~ 1, data=asthma.s2)
ans.int
            rs1430094   rs1422993    rs765023    rs184448    rs324396
rs1430094 0.132526514 0.653457029 0.816154586 0.787386835 0.694311497
rs1422993 0.140433948 0.016719949 0.133131712 0.375246182 0.683959376
rs765023  0.171993821 0.144029395 0.182805988 0.961405419 0.194520134
rs184448  0.100074594 0.019963426 0.034666219 0.007969948 0.036825613
rs324396  0.204742661 0.190583402 0.131945703 0.344952544 0.209589962
rs324957  0.103680541 0.019820456 0.034446832 0.696623215 0.318903026
rs324960  0.107216765 0.024180405 0.174801032 0.291457716 0.584132142
rs324981  0.128489057 0.144077239 0.153521356 0.646139555 0.301782199
rs727162  0.258828613 0.240900610 0.188971595 0.173341993 0.120240909
rs6084432 0.214413938 0.187372912 0.187830183 0.264160389 0.252965105
             rs324957    rs324960    rs324981    rs727162   rs6084432
rs1430094 0.644661690 0.096913557 0.774083175 0.043758067 0.946641185
rs1422993 0.257661509 0.175946789 0.095251421 0.577504655 0.027437424
rs765023  0.437535350 0.614095008 0.117705086 0.217580773 0.896707843
rs184448  0.018290318 0.356124918 0.311031078 0.297592380 0.349332875
rs324396  0.232076552 0.747574191 0.064231657 0.361336095 0.388626207
rs324957  0.019578585 0.595796247 0.965290164 0.402781904 0.272919867
rs324960  0.404830873 0.017401694 0.168077057 0.406220478 0.725774374
rs324981  0.657378128 0.533095852 0.117043376 0.463436271 0.862291944
rs727162  0.174960014 0.500811687 0.167364803 0.206131411 0.885294485
rs6084432 0.174006927 0.146808311 0.210647392 0.203957720 0.193711884

we can visualize the results by

plot(ans.int)