Tutorial at ISMB 2018

Location Schedule & materials Package installation Virtual machine

Machine learning methods in the analysis of genomic and clinical data [AM2]

This tutorial covers various machine learning (ML) tools that have been developed for the analysis of genomic and clinical data. It is an intermediate level tutorial targeted to an audience with previous experience in diverse bioinformatics methods such as:

genome-wide association studies
comparison of structured data (graphs or strings)
traditional text mining

We will discuss state-of-the-art methods and their applications. In addition to covering the necessary theoretical background, we will also conduct two hands-on sessions to put into practice some of the topics discussed in the presentations. Additionally, we will present illustrative examples of how deep learning is currently being used in the analysis of biomedical data.

Location

ISMB 2018 is hosted at the Hyatt Regency Chicago. Our tutorial will take place at the Grand Ballroom B on Friday, July 6 2018 between 9 am and 1 pm.

Presenters

Damian Roqueiro (damian.roqueiro@bsse.ethz.ch)
Felipe Llinares-López (felipe.llinares@bsse.ethz.ch)

Schedule and materials

Introduction (Download . (PDF, 7.9 MB))

Overview of the topics presented in the tutorial

Module I: Significant pattern mining (SPM) for biomarker discovery (Download . (PDF, 5 MB))

The need to consider feature interactions in biomarker discovery
Goal of significant itemset mining. Challenges:
- Statistical
- Computational
Concept of minimum attainable P-value
The LAMP algorithm
Contributions to the field
- Exploiting dependencies between patterns
- Correction for categorical covariates
- Interval search: Genome-wide association studies at a region level
Outlook and conclusions

Hands-on session: Applying significant pattern mining on genomic data (Download . (ZIP, 7.3 MB))

Step-by-step tutorial on how to run the package CASMAP

Module II: Methods to compare structured biomedical data (Download . (PDF, 11.3 MB))

The need for data transformation in machine learning
Kernel methods and the kernel trick
Kernelizing the nearest neighbor algorithm
Kernels on structured data
- String kernels
- Graph kernels
  - Random walk kernel
  - Shortest path kernel
  - Graphlet kernel
  - Weisfeiler-Lehman kernel

Hands-on session: Applying graph kernels (Download . (ZIP, 13 MB))

Step-by-step tutorial on how to use the package graphkernels

Module III: Deep learning and applications to biomedical data (text mining) (Download . (PDF, 10.3 MB))

Electronic health records (EHRs). Challenges in processing unstructured text.
Traditional approaches to text mining
Obtaining word embeddings
- Skip-gram model
Application: Mortality prediction using unstructured text in EHRs
- Deep newtworks combining word embeddings

Installation of packages (for hands-on sessions)

Package CASMAP

The package is available for R and Python.

For R

1. You can download and install the package directly from external page CRAN.

From your R console, type:

> install.packages("CASMAP")

For Python

Follow the instructions in our external page GitHub repository to download the source code and to install the package.

Package graphkernels

The package is available for R and Python.

For R

1. You can download and install the package directly from external page CRAN.

From your R console, type:

> install.packages("graphkernels")

2. You can download the source code from our external page GitHub repository.

For Python

1. You can simply install it with the pip command. From your command prompt type:

$ pip install graphkernels

2. You can download the source code and install the package following the instructions in our external page GitHub repository.

Virtual machine with pre-installed packages

You can download and boot up with VirtualBox a Ubuntu 14.04 virtual machine. The virtual machine has the two packages already pre-installed.

Use this option as a back-up plan in case the installation of any of the packages is not successful.

Installing the VM

The VM can be downloaded and instantiated on any platform by following these steps:

Download external page VirtualBox for your specific platform
Download the virtual machine mlcb-vm.vdi, ~8.0 Gb, _{MD5=e26b876ee040a0861cb1d004a783fc2e}
Import the VM from within VirtualBox
Note: Depending on your version of VirtualBox, the instructions below may require clicking "Continue" to move to the next selection.
- Click the "New" icon at the top to create a new VM
- Type "mlcb_vm" to set the name of the VM
- Select "Linux" as the Type of operating system
- Select "Ubuntu (64-bit) as Version
- Select 4,096 Mb as memory size
- Select "Use an existing virtual hard disk file" and then search for the file all-gwas.vdi downloaded in step 2
Now a new VM will show up in the left panel of Virtual Box. The VM will be marked as "Powered Off"
Right-click on the VM and select "Settings". Go to "System" and then to "Acceleration". Make sure that:
- [For Linux] the options "Enable VT-x/AMD-V" and "Enable Nested Paging" are selected
- [For macOS and Windows] the option "Enable Nested Paging" is selected (this is normally the default behavior)
To start the VM, click the "Start" icon at the top
Log in with the user account: tutorial with password ismb18