top of page
Our Research 
Research Interests
Computational Genomics
ML in medicine
Machine Learning

The Big Data and Genomics Lab at UCLA is led by Eran Halperin. Our lab is affiliated with the Computer Science Department in the School of Engineering and the Departments of Human Genetics, Computational Medicine, and the Department of Anesthesiology in the School of Medicine. Students are welcome to join our lab through the following graduate programs: Bioinoformatics, Computer Science, Biomath, and MD-PhD.

Our lab aims to improve our understanding and treatment of human disease by the analysis of big data collected in relation to the diseases. We have been developing computational methods for many types of genomic data, including genetics, methylation, RNA expression, single-cell and single-nucleus RNA, and microbiome data, as well as medical data including medical images, electronic health records, and physiological waveforms. 

The methodology we apply and develop involves a combination of machine learning, optimization algorithms, combinatorial optimization, and Bayesian statistics. In particular, our lab is currently focusing on the development of methods in the following domains:


Methods for cell-type specific genomic data. We develop methods for the analysis of methylation and RNA in the cell-type specific level, both using deconvolution methods of bulk data, and by developing new methods for the anlaysis of single-cell RNA and methylation data. We then connect apply these methods to study human disease. 


Machine Learning in medicine. We develop computational methods to assist different fields of medicine, including ophthalmology (e.g., Rakocz et al., Nature Digital Medicine, 2021) and anesthesiology (e.g., Hill et al., Scientific Reports, 2021), and acute medicine (e.g., Rakocz et al., KDD, 2021). Our algorithms include new deep architectures, as well as more traditional approaches in machine learning and statistics.


Selected Publications 

"Methylation risk scores are associated with a collection of phenotypes within electronic health record systems"

Nature Genomics Medicine, 2022

"Imputation of the continuous arterial line blood pressure waveform from non-invasive measurements using deep learning"

Scientific Reports, 2021


"Automated identification of clinical features from sparsely annotated 3-dimensional medical imaging"

Nature Digital Medicine, 2021

"A Statistical Model for Quantifying the Needed Duration of Social Distancing for the COVID-19 Pandemic"

KDD, 2021

"Optimized design of single-cell RNA sequencing experiments for cell-type-specific eQTL analysis"

Nature Communication, 2020

"Accurate estimation of cell composition in bulk expression through robust integration of single-cell information."

Nature Communications, 2020

"Context-aware dimensionality reduction deconvolutes gut microbial community dynamics"

Nature Biotechnology 2020

"FEAST: fast expectation-maximization for microbial source tracking"

Nature Methods, 2019

"Cell-type-specific resolution epigenetics without the need for cell sorting or single-cell biology"

Nature Communications, 2019

"CONFINED: distinguishing biological from technical sources of variation by leveraging multiple methylation datasets"

Genome Biology, 2019

"Detecting heritable phenotypes without a model using fast permutation testing for heritability and set-tests"

Nature Communications, 2018

"BayesCCE: a Bayesian framework for estimating cell-type composition from DNA methylation without the need for methylation reference."

Genome Biology, 2018

"Correcting for cell-type heterogeneity in DNA methylation: a comprehensive evaluation"

Nature Methods, 2017

Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies" 
Nature Methods, 2016

"Fast and accurate construction of confidence intervals for heritability"

The American Journal of Human Genetics, 2016


"A genetic and socioeconomic study of mate choice in Latinos reveals novel assortment patterns." 
PNAS, 2015. Read more

"Identifying personal genomes by surname inference" 
Science, 2013. Read more


"A model-based approach for analysis of spatial structure in genetic data" 
Nature Genetics, 2012. Read more


"Genome-wide association study of follicular lymphoma identifies a risk locus at 6p21.32" 
Nature Genetics, 2010. Read more


"Genomic privacy and limits of individual detection in a pool" 
Nature Genetics, 2009. Read more


"Maximizing power in association studies"
Nature Biotechnology, 27(3), 255-6, 2009. 
Read more

"Estimating local ancestry in admixed populations"

The American Journal of Human Genetics, 2008.

We thank the National Science Foundation and the National Institute of Health for their current support. We also thank the Israeli Science Foundation, the German-Israeli Science Foundation, IBM, the Blavatnik Research Foundation, the Juludan Research Foundation, the National Institute of Health, and The Edmond J. Safra Center for Bioinformatics for their support in the past, and hopefully in the future. 
Software developed by
Software developed and maintained by our group.
  • FEAST: Microbial source tracking using Expectation Maximization

  • ReFACTor: Correcting cell-type heterogeneity in whole-genome methylation data.

  • TCA: A deconvolution method for cell-type specific analysis of methylation data.

  • Bisque: A method for the inference of cell-type composition in heterogeneous tissues using RNA data.  

  • MTV-LMM: A method for the analysis and prediction of temporal microbiome data using linear mixed models.

  • CONFINED: A method for the detection of biological confounders (vs. technical) in methylation data.

  • GLINT: A user-friendly command-line tool for fast analysis of genome-wide DNA methylation data (EWAS). The package includes implementation of ReFACTor, EPIStructure, Linear Mixed Models association testing, reference-based cell type estimation.

  • LAMP: Estimating Locus Specific Ancestry

  • LAMP-LD: Leveraging LD in the estimation of Locus Specific Ancestry

Software developed by our group, maintained elsewhere.
  • GEVALT: Selecting the most predicting tag SNPs (maintained in Ron Shamir's group).

  • RECYCLER: Detecting plasmids from de-novo assembly (maintained in Ron Shamir's group)

Retired Software

  The following are software packages that we developed in the past but no longer maintain:​​​

  • SecureGenome: Maintaining your genomic privacy

  • SEQEM: Assigning short reads to homologous genes in RNA-SEQ experiments. 

  • CAMP: Coalescent based Association Mapping

  • WHAP: Weighted Haplotype Association Server. 

  • LOCO-LD: LD corrected spatial analysis 

  • SPA: Spatial Ancestry Analysis in genetic data

  • BARCODE: Compression of sequence data using Bloom filters 



May, 2016 - Yedael Waldman, a joint post-doc between our lab and in Alon Keinan's Cornell lab, publishes a paper about the genetic history of Bene-Israel Indian Jews. Science DailyThe Times of India,Haaretz.

November, 2015 - Our paper on epigenetic differences between men and women was featured in Nature Reviews Genetics.

October 2015 - Our PNAS paper finds that people choose their mates based on their genomes. NDTVDaily Mail
Business Standard,
Science Daily.


January 2013 - Our Science paper came out on the first page of the printed version of Israeli newspaper Ha'haretz, two days before the national parliament elections... 
Ha'aretz paper version Ha'aretz first page


July 2012 - Israeli business newspaper (The Marker):
40 under 40 
The Marker paper version The Marker online version


June 2012 - Genetic GPS 
Wired MagazineHaaretz Israeli NewsHaaretz (Hebrew version)Times of IsraelLondon Jewish ChronicleInternational Business Times (Italian)


May 2011 - Profiles in Computer Science 
Biomedical Computation Review


September 2009 - From DNA Data to Disease Diagnosis
Frontier EconomySpanish version


August 2009 - Researchers Claim New Software Can Skirt Privacy Challenges of GWAS Data-Sharing 
GenomeWebInnovation reportPhysorgThe Medical News (Sydney, Australia)Jerusalem PostYnet (Israel)Biopharma.


July, 2009 - Gene linked to increasingly common type of blood cancer EurekAlert

296A Engineering VI

University of California, Los Angeles 90095-1600

bottom of page