Alan Murphy
Research Vision
My research programme aims to build seq2func models that reveal how regulatory sequence, genomic context, and cellular state combine to determine context-specific gene expression — and that remain reliable under the conditions where they are actually used. State-of-the-art models achieve strong predictive performance on held-out genomic regions, but that alone does not establish that they have learned regulatory logic which stays valid when sequences are perturbed or when regulatory organisation is rearranged.
I frame model failures through the lens of distribution shift:
- Covariate shift — sequences or perturbations that are poorly represented in the training data.
- Concept shift — the same sequence behaving differently because the cellular context has changed.
I diagnose these failures systematically and close them by learning from targeted perturbation experiments — discarding genome-wide knowledge where it misleads, and keeping predictions reliable even when perturbations shift the cell state itself.
How reliably models handle these shifts ultimately determines their usefulness across the applications that depend on them:
- Interpreting non-coding variants — predicting the functional consequences of regulatory genetic variation.
- Designing perturbation experiments that reveal cis-regulatory biology.
- Engineering synthetic regulatory sequences with tunable properties.
Throughout, I develop open-access tools and resources to promote reproducibility and empower the genomics community.
News
- 2026-06-25
Published a blog post benchmarking sequence-to-function models against CRISPRi enhancer-knockdown screens on the Genomics x AI blog, showing that leading models (including AlphaGenome and Borzoi) systematically underestimate distal enhancer effects, with the gap widening as enhancers get farther from their target genes.
- 2026-05-07
Presented Improving genomic deep learning models with perturbation data, our work on continual learning for genomics, at Biology of Genomes 2026, Cold Spring Harbor Laboratory.
- 2026-04-08
Co-authored a research highlight in Cell Research on AlphaGenome and predicting non-coding variant effects, with Masayuki Nagai and Peter Koo.
- 2026-02-20
Published a blog post on fine-tuning AlphaGenome for MPRA and STARR-seq data on the Genomics x AI blog, highlighting the approach of treating sequence-to-function models as modular regulatory encoders leading to state-of-the-art results across perturbation assays.
- 2025-10-10
Presented our work on causal refinement for genomic deep learning models through continual learning at MLCB 2025. Check out the talk on the MLCB YouTube channel.
- 2025-03-03
Joined Peter Koo’s group at Cold Spring Harbor Laboratory, New York to develop methods to improve genomic sequence-to-function and genomic language models.
- 2024-12-11
Our ChromExpress paper investigating the relationship of histone marks with expression using deep learning is out in Nucleic Acids Research (NAR)! See a quick overview on X/Twitter.
- 2024-11-16
Our Enformer Celltyping paper, a genomic DNN to accurately predict epigenetic signals in previously unseen cell types, is out in Nature Communications! See more on Twitter or on BlueSky.
- 2023-12-04
Our re-analysis paper of the first single-cell RNA-seq Alzheimer’s disease dataset is out in eLife! Check out an overview here.
- 2023-09-25
Presented my PhD work predicting the cell type-specific effects of genetic variants on the epigenome at the Kipoi Summit for computational regulatory genomics.
- 2023-07-29
Presented a session on single-cell genomics for Alzheimer’s disease as part of ADDI’s Summer Learning Series.
- 2022-12-22
Our paper benchmarking differential expression methods for single-cell RNA-seq is out in Nature Communications! Check out our overview here.
- 2021-10-02
MungeSumstats, our software for rapid standardisation and quality control of GWAS or QTL summary statistics, is now out in Bioinformatics.
- 2021-07-19
Thrilled to officially start my PhD with Dr. Nathan Skene’s group in the Department of Brain Sciences, Imperial College London as part of the UK DRI.
Selected Publications
This is a selection of recent work. For a complete and always up-to-date list, see my Google Scholar profile.
MungeSumstats: a Bioconductor package for the standardization and quality control of many GWAS summary statistics
Murphy, A. E., Schilder, B. M. & Skene, N. G. Bioinformatics 37, 4593–4596 (2021)
A balanced measure shows superior performance of pseudobulk methods in single-cell RNA-sequencing analysis
Murphy, A. E. & Skene, N. G. Nat. Commun. 37, 7851 (2022)
Avoiding false discoveries in single-cell RNA-seq by revisiting the first Alzheimers disease dataset
Murphy, A. E., Fancy, N. & Skene, N. G. eLife 12:RP90214 (2023)
Predicting cell type-specific epigenomic profiles accounting for distal genetic effects
Murphy, A.E., Beardall, W., Rei, M. et al. Nat. Commun. 15, 9951 (2024)
Predicting gene expression from histone marks using chromatin deep learning models depends on histone mark function, regulatory distance and cellular states
Murphy, A.E., Askarova,A. , Lenhard, B. et al. Nucleic Acids Research, gkae1212, (2024).
Predicting non-coding variant effects with AlphaGenome
Murphy, A.E.*, Nagai, M.* & Koo, P.K. Cell Res. (2026). https://doi.org/10.1038/s41422-026-01249-1
