Alan Murphy

Computational Biologist | Postdoctoral Researcher, Cold Spring Harbor Laboratory, New York

Hi, I'm Alan Murphy, a computational biologist conducting my postdoctoral research in Dr. Peter Koo's lab at Cold Spring Harbor Laboratory. I build sequence-to-function (seq2func) models of gene regulation — models that map DNA sequence to molecular and cellular activity — and I focus on making them reliable when it matters: when sequences are perturbed, when regulatory organisation is rearranged, or when the cellular context changes. My goal is to turn seq2func models from accurate pattern matchers into trustworthy, mechanistically informative platforms for interpreting non-coding variants, designing perturbation experiments, and engineering synthetic regulatory sequences. Beyond research, I co-created and serve as editor for the Genomics × AI blog, a growing community hub highlighting methods and applications at the intersection of genomics and machine learning. This initiative reflects my dedication to scholarly leadership and knowledge sharing in computational biology.

Research Vision

My research programme aims to build seq2func models that reveal how regulatory sequence, genomic context, and cellular state combine to determine context-specific gene expression — and that remain reliable under the conditions where they are actually used. State-of-the-art models achieve strong predictive performance on held-out genomic regions, but that alone does not establish that they have learned regulatory logic which stays valid when sequences are perturbed or when regulatory organisation is rearranged.

I frame model failures through the lens of distribution shift:

Covariate shift — sequences or perturbations that are poorly represented in the training data.
Concept shift — the same sequence behaving differently because the cellular context has changed.

I diagnose these failures systematically and close them by learning from targeted perturbation experiments — discarding genome-wide knowledge where it misleads, and keeping predictions reliable even when perturbations shift the cell state itself.

How reliably models handle these shifts ultimately determines their usefulness across the applications that depend on them:

Interpreting non-coding variants — predicting the functional consequences of regulatory genetic variation.
Designing perturbation experiments that reveal cis-regulatory biology.
Engineering synthetic regulatory sequences with tunable properties.

Throughout, I develop open-access tools and resources to promote reproducibility and empower the genomics community.

Download CV Google Scholar Genomics × AI Blog

News

2026-06-25
Published a blog post benchmarking sequence-to-function models against CRISPRi enhancer-knockdown screens on the Genomics x AI blog, showing that leading models (including AlphaGenome and Borzoi) systematically underestimate distal enhancer effects, with the gap widening as enhancers get farther from their target genes.
2026-05-07
Presented Improving genomic deep learning models with perturbation data, our work on continual learning for genomics, at Biology of Genomes 2026, Cold Spring Harbor Laboratory.
2026-04-08
Co-authored a research highlight in Cell Research on AlphaGenome and predicting non-coding variant effects, with Masayuki Nagai and Peter Koo.
2026-02-20
Published a blog post on fine-tuning AlphaGenome for MPRA and STARR-seq data on the Genomics x AI blog, highlighting the approach of treating sequence-to-function models as modular regulatory encoders leading to state-of-the-art results across perturbation assays.
2025-10-10
Presented our work on causal refinement for genomic deep learning models through continual learning at MLCB 2025. Check out the talk on the MLCB YouTube channel.
2025-03-03
Joined Peter Koo’s group at Cold Spring Harbor Laboratory, New York to develop methods to improve genomic sequence-to-function and genomic language models.
2024-12-11
Our ChromExpress paper investigating the relationship of histone marks with expression using deep learning is out in Nucleic Acids Research (NAR)! See a quick overview on X/Twitter.
2024-11-16
Our Enformer Celltyping paper, a genomic DNN to accurately predict epigenetic signals in previously unseen cell types, is out in Nature Communications! See more on Twitter or on BlueSky.
2023-12-04
Our re-analysis paper of the first single-cell RNA-seq Alzheimer’s disease dataset is out in eLife! Check out an overview here.
2023-09-25
Presented my PhD work predicting the cell type-specific effects of genetic variants on the epigenome at the Kipoi Summit for computational regulatory genomics.
2023-07-29
Presented a session on single-cell genomics for Alzheimer’s disease as part of ADDI’s Summer Learning Series.
2022-12-22
Our paper benchmarking differential expression methods for single-cell RNA-seq is out in Nature Communications! Check out our overview here.
2021-10-02
MungeSumstats, our software for rapid standardisation and quality control of GWAS or QTL summary statistics, is now out in Bioinformatics.
2021-07-19
Thrilled to officially start my PhD with Dr. Nathan Skene’s group in the Department of Brain Sciences, Imperial College London as part of the UK DRI.

Selected Publications

This is a selection of recent work. For a complete and always up-to-date list, see my Google Scholar profile.

Predicting gene expression from histone marks using chromatin deep learning models depends on histone mark function, regulatory distance and cellular states

Murphy, A.E., Askarova,A. , Lenhard, B. et al. Nucleic Acids Research, gkae1212, (2024).

Predicting non-coding variant effects with AlphaGenome

Murphy, A.E.*, Nagai, M.* & Koo, P.K. Cell Res. (2026). https://doi.org/10.1038/s41422-026-01249-1

Alan Murphy

Alan Murphy

Research Vision

News

Selected Publications

MungeSumstats: a Bioconductor package for the standardization and quality control of many GWAS summary statistics

A balanced measure shows superior performance of pseudobulk methods in single-cell RNA-sequencing analysis

Avoiding false discoveries in single-cell RNA-seq by revisiting the first Alzheimers disease dataset

Predicting cell type-specific epigenomic profiles accounting for distal genetic effects

Predicting gene expression from histone marks using chromatin deep learning models depends on histone mark function, regulatory distance and cellular states

Predicting non-coding variant effects with AlphaGenome