ML in medicine
The Big Data and Genomics Lab at UCLA is led by Eran Halperin. Our lab is affiliated with the Computer Science Department in the School of Engineering and the Departments of Human Genetics, Computational Medicine, and the Department of Anesthesiology in the School of Medicine. Students are welcome to join our lab through the following graduate programs: Bioinoformatics, Computer Science, Biomath, and MD-PhD.
Our lab aims to improve our understanding and treatment of human disease by the analysis of big data collected in relation to the diseases. In recent years our main focus was on the analysis of different types of genomic data, including genetics, methylation, RNA expression, single-cell and single-nucleus RNA, and microbiome data. Another major effort in the lab is the development of predictive methods for clinical outcomes using a combination of genomics data and clinical data available by the UCLA hospital and by other collaborators.
The methodology we apply and develop involves a combination of machine learning, optimization algorithms, combinatorial optimization, and Bayesian statistics. In particular, our lab is currently focusing on the development of methods in the following domains:
Methods for high-throughput genomic data in heterogeneous tissues. We develop deconvolution and decomposition methods for bulk methylation and RNA data using information from sorted cells and from single-cell and single nucleus RNA expression data. This provides opportunities to study cell-type-specific biology using bulk measurements.
Methods for microbial source tracking. We are developing methods for the inference of the sources contributing to a microbiome sample across a large number of contexts.
Methods for the analysis of time-series genomic data. We are developing statistical models that capture time-series data, particularly in microbiome. Such models can be useful to understand the dynamics of the genomic data under different scenarios.
Data Mining of Medical Records. One of the current efforts of our lab is the prediction of health related outcomes (e.g., adverse surgery outcomes) using the large collection of medical records, genetic data, methylation data, clinical images, and physiological waveforms in the UCLA hospital, as well as other data provided through our collaborators.
In addition to methods development, we work closely with groups around the world in order to study specific diseases. Particularly, we have been working on studies of non-Hodgkin lymphoma, leukemia, rheumatoid arthritis, asthma, coronary artery disease, myocardial infarction (heart attack), metabolic syndrome, and obesity. These studies shed important light on the biological mechanisms of these diseases, and they will hopefully pave the way for improved diagnosis and personalized treatment based on an individual's genomic data.
"Optimized design of single-cell RNA sequencing experiments for cell-type-specific eQTL analysis"
Nature Communication, 2020
"Accurate estimation of cell composition in bulk expression through robust integration of single-cell information."
Nature Communications, 2020
"Context-aware dimensionality reduction deconvolutes gut microbial community dynamics"
Nature Biotechnology 2020
"FEAST: fast expectation-maximization for microbial source tracking"
Nature Methods, 2019
"Cell-type-specific resolution epigenetics without the need for cell sorting or single-cell biology"
Nature Communications, 2019
"CONFINED: distinguishing biological from technical sources of variation by leveraging multiple methylation datasets"
Genome Biology, 2019
"Detecting heritable phenotypes without a model using fast permutation testing for heritability and set-tests"
Nature Communications, 2018
"BayesCCE: a Bayesian framework for estimating cell-type composition from DNA methylation without the need for methylation reference."
Genome Biology, 2018
"Correcting for cell-type heterogeneity in DNA methylation: a comprehensive evaluation"
Nature Methods, 2017
Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies"
Nature Methods, 2016
"A genetic and socioeconomic study of mate choice in Latinos reveals novel assortment patterns."
PNAS, 2015. Read more
"Identifying personal genomes by surname inference"
Science, 2013. Read more
"A model-based approach for analysis of spatial structure in genetic data"
Nature Genetics, 2012. Read more
"Genome-wide association study of follicular lymphoma identifies a risk locus at 6p21.32"
Nature Genetics, 2010. Read more
"Genomic privacy and limits of individual detection in a pool"
Nature Genetics, 2009. Read more
"Genetic variants at 6p21.33 are associated with susceptibility to follicular lymphoma"
Nature Genetics, 2009. Read more
"Maximizing power in association studies"
Nature Biotechnology, 27(3), 255-6, 2009. Read more
The American Journal of Human Genetics, 2008.
Our lab is currenly looking for data scientists. Please click here for more details.
We thank the National Science Foundation and the National Institute of Health for their current support. We also thank the Israeli Science Foundation, the German-Israeli Science Foundation, IBM, the Blavatnik Research Foundation, the Juludan Research Foundation, the National Institute of Health, and The Edmond J. Safra Center for Bioinformatics for their support in the past, and hopefully in the future.
Software developed by
Software developed and maintained by our group.
FEAST: Microbial source tracking using Expectation Maximization
ReFACTor: Correcting cell-type heterogeneity in whole-genome methylation data.
TCA: A deconvolution method for cell-type specific analysis of methylation data.
Bisque: A method for the inference of cell-type composition in heterogeneous tissues using RNA data.
MTV-LMM: A method for the analysis and prediction of temporal microbiome data using linear mixed models.
CONFINED: A method for the detection of biological confounders (vs. technical) in methylation data.
GLINT: A user-friendly command-line tool for fast analysis of genome-wide DNA methylation data (EWAS). The package includes implementation of ReFACTor, EPIStructure, Linear Mixed Models association testing, reference-based cell type estimation.
LAMP: Estimating Locus Specific Ancestry
LAMP-LD: Leveraging LD in the estimation of Locus Specific Ancestry
Software developed by our group, maintained elsewhere.
The following are software packages that we developed in the past but no longer maintain:
SecureGenome: Maintaining your genomic privacy
CAMP: Coalescent based Association Mapping
WHAP: Weighted Haplotype Association Server.
LOCO-LD: LD corrected spatial analysis
SPA: Spatial Ancestry Analysis in genetic data
BARCODE: Compression of sequence data using Bloom filters
May, 2016 - Yedael Waldman, a joint post-doc between our lab and in Alon Keinan's Cornell lab, publishes a paper about the genetic history of Bene-Israel Indian Jews. Science Daily, The Times of India,Haaretz.
November, 2015 - Our paper on epigenetic differences between men and women was featured in Nature Reviews Genetics.
January 2013 - Our Science paper came out on the first page of the printed version of Israeli newspaper Ha'haretz, two days before the national parliament elections...
Ha'aretz paper version Ha'aretz first page
May 2011 - Profiles in Computer Science
Biomedical Computation Review
August 2009 - Researchers Claim New Software Can Skirt Privacy Challenges of GWAS Data-Sharing
GenomeWeb, Innovation report, Physorg, The Medical News (Sydney, Australia), Jerusalem Post, Ynet (Israel), Biopharma.
July, 2009 - Gene linked to increasingly common type of blood cancer EurekAlert
296A Engineering VI
University of California, Los Angeles 90095-1600