Statistical Modeling

We develop novel statistical models for the analysis and integration of various types of data, including clinical data and high-​throughput molecular profiling data from single cells. We often employ the framework of probabilistic graphical models. They form a class of statistical models that represent conditional independencies among random variables by a graph and include Bayesian networks and Markov random fields. For example, we have used graphical models for modeling dependencies among mutations and for reconstructing intracellular signaling pathways.

Enlarged view: A graphical model encodes statistical dependencies
A graphical model encodes statistical dependencies [from doi:10.1186/s13059-015-0592-6]

Molecular profiling data are high-​dimensional and noisy, which complicates their analysis. Regularization is a common strategy to address this challenge. It involves imposing specific constraints on the model parameters. Using these techniques, we have solved high-dimensional regression problems for the analysis of genome-​wide RNA interference data and for predicting cancer type from tumor-derived genomic data.

Enlarged view: A nested effects model
A nested effects model consisting of hidden variables (S-nodes) and measurements of observed variable (E-nodes) [from doi:10.1093/bioinformatics/btz325]

We are also interested in the statistical and mathematical properties of graphical models, because they can inform about the feasibility and efficiency of the model, such as model complexity, model identifiability, and model comparison.

Selected references:

JavaScript has been disabled in your browser