Our Research 
Research Interests
Computational Genomics
Statistical Genetics
Machine Learning

ML in medicine

The Big Data and Genomics Lab at UCLA is led by Eran Halperin. Our lab is affiliated with the Computer Science Department in the School of Engineering and the Departments of Human Genetics, Computational Medicine, and the Department of Anesthesiology in the School of Medicine. Students are welcome to join our lab through the following graduate programs: Bioinoformatics, Computer Science, Biomath, and MD-PhD.

Our lab aims to improve our understanding and treatment of human disease by the analysis of big data collected in relation to the diseases. In recent years our main focus was on the analysis of different types of genomic data, including genetics, methylation, RNA expression, single-cell and single-nucleus RNA, and microbiome data. Another major effort in the lab is the development of predictive methods for clinical outcomes using a combination of genomics data and clinical data available by the UCLA hospital and by other collaborators.


The methodology we apply and develop involves a combination of machine learning, optimization algorithms, combinatorial optimization, and Bayesian statistics. In particular, our lab is currently focusing on the development of methods in the following domains:


  • Methods for high-throughput genomic data in heterogeneous tissues. We develop deconvolution and decomposition methods for bulk methylation and RNA data using information from sorted cells and from single-cell and single nucleus RNA expression data. This provides opportunities to study cell-type-specific biology using bulk measurements. 

  • Methods for microbial source tracking. We are developing methods for the inference of the sources contributing to a microbiome sample across a large number of contexts.

  • Methods for the analysis of time-series genomic data. We are developing statistical models that capture time-series data, particularly in microbiome. Such models can be useful to understand the dynamics of the genomic data under different scenarios.

  • Data Mining of Medical Records. One of the current efforts of our lab is the prediction of health related outcomes (e.g., adverse surgery outcomes) using the large collection of medical records, genetic data, methylation data, clinical images, and physiological waveforms in the UCLA hospital, as well as other data provided through our collaborators. 


In addition to methods development, we work closely with groups around the world in order to study specific diseases. Particularly, we have been working on studies of non-Hodgkin lymphoma, leukemia, rheumatoid arthritis, asthma, coronary artery disease, myocardial infarction (heart attack), metabolic syndrome, and obesity. These studies shed important light on the biological mechanisms of these diseases, and they will hopefully pave the way for improved diagnosis and personalized treatment based on an individual's genomic data.

Selected Publications 

"FEAST: fast expectation-maximization for microbial source tracking"

Nature Methods, 2019

"Cell-type-specific resolution epigenetics without the need for cell sorting or single-cell biology"

Nature Communications, 2019

"CONFINED: distinguishing biological from technical sources of variation by leveraging multiple methylation datasets"

Genome Biology, 2019

"Detecting heritable phenotypes without a model using fast permutation testing for heritability and set-tests"

Nature Communications, 2018

"BayesCCE: a Bayesian framework for estimating cell-type composition from DNA methylation without the need for methylation reference."

Genome Biology, 2018

"Correcting for cell-type heterogeneity in DNA methylation: a comprehensive evaluation"

Nature Methods, 2017

Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies" 
Nature Methods, 2016

"Fast and accurate construction of confidence intervals for heritability"

The American Journal of Human Genetics, 2016


"A genetic and socioeconomic study of mate choice in Latinos reveals novel assortment patterns." 
PNAS, 2015. Read more


"Characterization of whole-genome autosomal differences of DNA methylation between men and women."
Epigenetics & Chromatin, 2015 Read more

"Identifying personal genomes by surname inference" 
Science, 2013. Read more


"A model-based approach for analysis of spatial structure in genetic data" 
Nature Genetics, 2012. Read more


"Joint analysis of multiple metagenomic samples", 
PLoS Computational Biology, 2012.


"Genome-wide association study of follicular lymphoma identifies a risk locus at 6p21.32" 
Nature Genetics, 2010. Read more


"Genomic privacy and limits of individual detection in a pool" 
Nature Genetics, 2009. Read more


"Genetic variants at 6p21.33 are associated with susceptibility to follicular lymphoma" 
Nature Genetics, 2009. Read more


"SNP imputation in association studies"
Nature Biotechnology, 27(4), 349-51, 2009. Read more


"Maximizing power in association studies"
Nature Biotechnology, 27(3), 255-6, 2009. 
Read more

"Estimating local ancestry in admixed populations"

The American Journal of Human Genetics, 2008.

Our lab is currenly looking for data scientists. Please click here for more details.
  • eran-halperin-4.jpg
    Eran Halperin, PhD

    Professor, UCLA

    Eran is interested in developing  methods  (using machine learning and beyond) that impact medicine by improving treatment and care.

  • BrianHill.jpg
    Brian Hill

    PhD student

    Brian's interests lie at the intersection of machine learning and medicine. His current research leverages electronic health records, physiological waveform signals, and genomic data to predict negative health outcomes. 

    Brian's website

  • Mike_edited.jpg
    Mike Thompson

    PhD student

    Mike works on dimensionality reduction methods in genomics, with the goal of improving treatment of confounders.

  • BrandonnJew.jpg
    Brandon Jew

    PhD student

    Brandon is interested in understanding the role of genetics in neuropsychiatric disease from large-scale multiomics data. 

    Brandon's website.

  • nadav_edited.jpg
    Nadav Rakocz

    PhD student

    Nadav is working on the analysis of medical images, waveforms, and EHR, in order to predict adverse medical outcomes.  

  • BriscoeLeah.jpg
    Leah Briscoe

    PhD student

    Leah develops statistical methods for the analysis of population level microbiome data.

  • Johnson_edited.jpg
    Zeyuan (Johnson) Chen

    PhD student

    Johnson primarily focuses on finding interesting low-rank structure and the deconvolution of methylation data to correct for confounders when doing EWAS.

  • IMG_20190904_185517.jpg
    Jeffery Chiang

    Data Scientist 

    Jeff works on improving patient care by applying machine learning to medical records and images.

  • Ulzee.jpg
    Ulzee An

    PhD candidate

    Ulzee works in the intersection of statistical and deep learning methods to discover actionable structure in data for the immediate betterment of medical care.

  • portraitq.jpg
    Akos Rudas

    Visiting PhD student

    Akos is interested in contributing to medicine through analyses of large datasets from healthcare systems using statistical and machine learning methods.

We thank the National Science Foundation and the National Institute of Health for their current support. We also thank the Israeli Science Foundation, the German-Israeli Science Foundation, IBM, the Blavatnik Research Foundation, the Juludan Research Foundation, the National Institute of Health, and The Edmond J. Safra Center for Bioinformatics for their support in the past, and hopefully in the future. 
Software developed by
Software developed and maintained by our group.
  • FEAST: Microbial source tracking using Expectation Maximization

  • ReFACTor: Correcting cell-type heterogeneity in whole-genome methylation data.

  • TCA: A deconvolution method for cell-type specific analysis of methylation data.

  • Bisque: A method for the inference of cell-type composition in heterogeneous tissues using RNA data.  

  • MTV-LMM: A method for the analysis and prediction of temporal microbiome data using linear mixed models.

  • CONFINED: A method for the detection of biological confounders (vs. technical) in methylation data.

  • GLINT: A user-friendly command-line tool for fast analysis of genome-wide DNA methylation data (EWAS). The package includes implementation of ReFACTor, EPIStructure, Linear Mixed Models association testing, reference-based cell type estimation.

  • LAMP: Estimating Locus Specific Ancestry

  • LAMP-LD: Leveraging LD in the estimation of Locus Specific Ancestry

Software developed by our group, maintained elsewhere.
  • GEVALT: Selecting the most predicting tag SNPs (maintained in Ron Shamir's group).

  • RECYCLER: Detecting plasmids from de-novo assembly (maintained in Ron Shamir's group)

Retired Software

  The following are software packages that we developed in the past but no longer maintain:​​​

  • SecureGenome: Maintaining your genomic privacy

  • SEQEM: Assigning short reads to homologous genes in RNA-SEQ experiments. 

  • CAMP: Coalescent based Association Mapping

  • WHAP: Weighted Haplotype Association Server. 

  • LOCO-LD: LD corrected spatial analysis 

  • SPA: Spatial Ancestry Analysis in genetic data

  • BARCODE: Compression of sequence data using Bloom filters 



May, 2016 - Yedael Waldman, a joint post-doc between our lab and in Alon Keinan's Cornell lab, publishes a paper about the genetic history of Bene-Israel Indian Jews. Science DailyThe Times of India,Haaretz.

November, 2015 - Our paper on epigenetic differences between men and women was featured in Nature Reviews Genetics.

October 2015 - Our PNAS paper finds that people choose their mates based on their genomes. NDTVDaily Mail
Business Standard,
Science Daily.


January 2013 - Our Science paper came out on the first page of the printed version of Israeli newspaper Ha'haretz, two days before the national parliament elections... 
Ha'aretz paper version Ha'aretz first page


July 2012 - Israeli business newspaper (The Marker):
40 under 40 
The Marker paper version The Marker online version


June 2012 - Genetic GPS 
Wired MagazineHaaretz Israeli NewsHaaretz (Hebrew version)Times of IsraelLondon Jewish ChronicleInternational Business Times (Italian)


May 2011 - Profiles in Computer Science 
Biomedical Computation Review


September 2009 - From DNA Data to Disease Diagnosis
Frontier EconomySpanish version


August 2009 - Researchers Claim New Software Can Skirt Privacy Challenges of GWAS Data-Sharing 
GenomeWebInnovation reportPhysorgThe Medical News (Sydney, Australia)Jerusalem PostYnet (Israel)Biopharma.


July, 2009 - Gene linked to increasingly common type of blood cancer EurekAlert

296A Engineering VI

University of California, Los Angeles 90095-1600




© 2017 by Leticia Ortiz.