Infers the genome build of the summary statistics file (GRCh37 or GRCh38) from the data. Uses SNP (RSID) & CHR & BP to get genome build.

get_genome_build(
  sumstats,
  nThread = 1,
  sampled_snps = 10000,
  standardise_headers = TRUE,
  mapping_file = sumstatsColHeaders,
  dbSNP = 155,
  header_only = FALSE
)

Arguments

sumstats

data table/data frame obj of the summary statistics file for the GWAS ,or file path to summary statistics file.

nThread

Number of threads to use for parallel processes.

sampled_snps

Downsample the number of SNPs used when inferring genome build to save time.

standardise_headers

Run standardise_sumstats_column_headers_crossplatform.

mapping_file

MungeSumstats has a pre-defined column-name mapping file which should cover the most common column headers and their interpretations. However, if a column header that is in your file is missing of the mapping we give is incorrect you can supply your own mapping file. Must be a 2 column dataframe with column names "Uncorrected" and "Corrected". See data(sumstatsColHeaders) for default mapping and necessary format.

dbSNP

version of dbSNP to be used (144 or 155). Default is 155.

header_only

Instead of reading in the entire sumstats file, only read in the first N rows where N=sampled_snps. This should help speed up cases where you have to read in sumstats from disk each time.

Value

ref_genome the genome build of the data