R/get_genome_build.R
get_genome_build.Rd
Infers the genome build of the summary statistics file (GRCh37 or GRCh38) from the data. Uses SNP (RSID) & CHR & BP to get genome build.
get_genome_build(
sumstats,
nThread = 1,
sampled_snps = 10000,
standardise_headers = TRUE,
mapping_file = sumstatsColHeaders,
dbSNP = 155,
header_only = FALSE
)
data table/data frame obj of the summary statistics file for the GWAS ,or file path to summary statistics file.
Number of threads to use for parallel processes.
Downsample the number of SNPs used when inferring genome build to save time.
Run
standardise_sumstats_column_headers_crossplatform
.
MungeSumstats has a pre-defined
column-name mapping file
which should cover the most common column headers and their interpretations.
However, if a column header that is in your file is missing of the mapping we
give is incorrect you can supply your own mapping file. Must be a 2 column
dataframe with column names "Uncorrected" and "Corrected". See
data(sumstatsColHeaders)
for default mapping and necessary format.
version of dbSNP to be used (144 or 155). Default is 155.
Instead of reading in the entire sumstats
file,
only read in the first N rows where N=sampled_snps
.
This should help speed up cases where you have to read in sumstats
from disk each time.
ref_genome the genome build of the data