Import GWAS summary statistics from Open GWAS

library(MungeSumstats)

MungeSumstats now offers high throughput query and import functionality to data from the MRC IEU Open GWAS Project.

Find GWAS datasets

#### Search for datasets ####
metagwas <- MungeSumstats::find_sumstats(traits = c("parkinson","alzheimer"), 
                                         min_sample_size = 1000)
head(metagwas,3)
ids <- (dplyr::arrange(metagwas, nsnp))$id

##          id               trait group_name year    author
## 1 ieu-a-298 Alzheimer's disease     public 2013   Lambert
## 2   ieu-b-2 Alzheimer's disease     public 2019 Kunkle BW
## 3 ieu-a-297 Alzheimer's disease     public 2013   Lambert
##                                                                                                                                                                                                                                                                                                                    consortium
## 1                                                                                                                                                                                                                                                                                                                        IGAP
## 2 Alzheimer Disease Genetics Consortium (ADGC), European Alzheimer's Disease Initiative (EADI), Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium (CHARGE), Genetic and Environmental Risk in AD/Defining Genetic, Polygenic and Environmental Risk for Alzheimer's Disease Consortium (GERAD/PERADES),
## 3                                                                                                                                                                                                                                                                                                                        IGAP
##                 sex population     unit     nsnp sample_size       build
## 1 Males and Females   European log odds    11633       74046 HG19/GRCh37
## 2 Males and Females   European       NA 10528610       63926 HG19/GRCh37
## 3 Males and Females   European log odds  7055882       54162 HG19/GRCh37
##   category                subcategory ontology mr priority     pmid sd
## 1  Disease Psychiatric / neurological       NA  1        1 24162737 NA
## 2   Binary Psychiatric / neurological       NA  1        0 30820047 NA
## 3  Disease Psychiatric / neurological       NA  1        2 24162737 NA
##                                                                      note ncase
## 1 Exposure only; Effect allele frequencies are missing; forward(+) strand 25580
## 2                                                                      NA 21982
## 3                Effect allele frequencies are missing; forward(+) strand 17008
##   ncontrol     N
## 1    48466 74046
## 2    41944 63926
## 3    37154 54162

Import full results

You can supply import_sumstats() with a list of as many OpenGWAS IDs as you want, but we’ll just give one to save time.

datasets <- MungeSumstats::import_sumstats(ids = "ieu-a-298",
                                           ref_genome = "GRCH37")

Summarise results

By default, import_sumstats results a named list where the names are the Open GWAS dataset IDs and the items are the respective paths to the formatted summary statistics.

print(datasets)

## $`ieu-a-298`
## [1] "/tmp/RtmpI3qdKJ/ieu-a-298.tsv.gz"

You can easily turn this into a data.frame as well.

results_df <- data.frame(id=names(datasets), 
                         path=unlist(datasets))
print(results_df)

##                  id                             path
## ieu-a-298 ieu-a-298 /tmp/RtmpI3qdKJ/ieu-a-298.tsv.gz

Import full results (parallel)

Optional: Speed up with multi-threaded download via axel.

datasets <- MungeSumstats::import_sumstats(ids = ids, 
                                           vcf_download = TRUE, 
                                           download_method = "axel", 
                                           nThread = max(2,future::availableCores()-2))

Further functionality

See the Getting started vignette for more information on how to use MungeSumstats and its functionality.

Session Info

utils::sessionInfo()

## R Under development (unstable) (2022-12-07 r83413)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] MungeSumstats_1.7.10 BiocStyle_2.27.0    
## 
## loaded via a namespace (and not attached):
##   [1] tidyselect_1.2.0            dplyr_1.0.10               
##   [3] blob_1.2.3                  filelock_1.0.2             
##   [5] R.utils_2.12.2              Biostrings_2.67.0          
##   [7] bitops_1.0-7                fastmap_1.1.0              
##   [9] RCurl_1.98-1.9              BiocFileCache_2.7.1        
##  [11] VariantAnnotation_1.45.0    GenomicAlignments_1.35.0   
##  [13] XML_3.99-0.13               digest_0.6.31              
##  [15] lifecycle_1.0.3             ellipsis_0.3.2             
##  [17] KEGGREST_1.39.0             RSQLite_2.2.19             
##  [19] googleAuthR_2.0.0           magrittr_2.0.3             
##  [21] compiler_4.3.0              rlang_1.0.6                
##  [23] sass_0.4.4                  progress_1.2.2             
##  [25] tools_4.3.0                 utf8_1.2.2                 
##  [27] yaml_2.3.6                  data.table_1.14.6          
##  [29] rtracklayer_1.59.0          knitr_1.41                 
##  [31] prettyunits_1.1.1           curl_4.3.3                 
##  [33] bit_4.0.5                   DelayedArray_0.25.0        
##  [35] xml2_1.3.3                  BiocParallel_1.33.6        
##  [37] purrr_0.3.5                 BiocGenerics_0.45.0        
##  [39] desc_1.4.2                  R.oo_1.25.0                
##  [41] grid_4.3.0                  stats4_4.3.0               
##  [43] fansi_1.0.3                 biomaRt_2.55.0             
##  [45] SummarizedExperiment_1.29.1 cli_3.4.1                  
##  [47] rmarkdown_2.18              crayon_1.5.2               
##  [49] generics_0.1.3              ragg_1.2.4                 
##  [51] httr_1.4.4                  rjson_0.2.21               
##  [53] DBI_1.1.3                   cachem_1.0.6               
##  [55] stringr_1.5.0               zlibbioc_1.45.0            
##  [57] assertthat_0.2.1            parallel_4.3.0             
##  [59] AnnotationDbi_1.61.0        BiocManager_1.30.19        
##  [61] XVector_0.39.0              restfulr_0.0.15            
##  [63] matrixStats_0.63.0          vctrs_0.5.1                
##  [65] Matrix_1.5-3                jsonlite_1.8.4             
##  [67] bookdown_0.30               IRanges_2.33.0             
##  [69] hms_1.1.2                   S4Vectors_0.37.3           
##  [71] bit64_4.0.5                 systemfonts_1.0.4          
##  [73] GenomicFeatures_1.51.2      jquerylib_0.1.4            
##  [75] glue_1.6.2                  pkgdown_2.0.6.9000         
##  [77] codetools_0.2-18            stringi_1.7.8              
##  [79] GenomeInfoDb_1.35.5         BiocIO_1.9.1               
##  [81] GenomicRanges_1.51.3        tibble_3.1.8               
##  [83] pillar_1.8.1                rappdirs_0.3.3             
##  [85] htmltools_0.5.4             GenomeInfoDbData_1.2.9     
##  [87] BSgenome_1.67.1             dbplyr_2.2.1               
##  [89] R6_2.5.1                    textshaping_0.3.6          
##  [91] rprojroot_2.0.3             evaluate_0.18              
##  [93] Biobase_2.59.0              lattice_0.20-45            
##  [95] R.methodsS3_1.8.2           png_0.1-8                  
##  [97] Rsamtools_2.15.0            gargle_1.2.1               
##  [99] memoise_2.0.1               bslib_0.4.1                
## [101] Rcpp_1.0.9                  xfun_0.35                  
## [103] fs_1.5.2                    MatrixGenerics_1.11.0      
## [105] pkgconfig_2.0.3