Tutorial at ISMB 2018
Machine learning methods in the analysis of genomic and clinical data [AM2]
This tutorial covers various machine learning (ML) tools that have been developed for the analysis of genomic and clinical data. It is an intermediate level tutorial targeted to an audience with previous experience in diverse bioinformatics methods such as:
- genome-wide association studies
- comparison of structured data (graphs or strings)
- traditional text mining
We will discuss state-of-the-art methods and their applications. In addition to covering the necessary theoretical background, we will also conduct two hands-on sessions to put into practice some of the topics discussed in the presentations. Additionally, we will present illustrative examples of how deep learning is currently being used in the analysis of biomedical data.
Schedule and materials
Introduction (Download . (PDF, 7.9 MB))
Overview of the topics presented in the tutorial
Module I: Significant pattern mining (SPM) for biomarker discovery (Download . (PDF, 5 MB))
- The need to consider feature interactions in biomarker discovery
- Goal of significant itemset mining. Challenges:
- Statistical
- Computational
- Concept of minimum attainable P-value
- The LAMP algorithm
- Contributions to the field
- Exploiting dependencies between patterns
- Correction for categorical covariates
- Interval search: Genome-wide association studies at a region level
- Outlook and conclusions
Hands-on session: Applying significant pattern mining on genomic data (Download . (ZIP, 7.3 MB))
- Step-by-step tutorial on how to run the package CASMAP
Module II: Methods to compare structured biomedical data (Download . (PDF, 11.3 MB))
- The need for data transformation in machine learning
- Kernel methods and the kernel trick
- Kernelizing the nearest neighbor algorithm
- Kernels on structured data
- String kernels
- Graph kernels
- Random walk kernel
- Shortest path kernel
- Graphlet kernel
- Weisfeiler-Lehman kernel
- String kernels
Hands-on session: Applying graph kernels (Download . (ZIP, 13 MB))
- Step-by-step tutorial on how to use the package graphkernels
Module III: Deep learning and applications to biomedical data (text mining) (Download . (PDF, 10.3 MB))
- Electronic health records (EHRs). Challenges in processing unstructured text.
- Traditional approaches to text mining
- Obtaining word embeddings
- Skip-gram model
- Application: Mortality prediction using unstructured text in EHRs
- Deep newtworks combining word embeddings
- Deep newtworks combining word embeddings
Package CASMAP
The package is available for R and Python.
For R
1. You can download and install the package directly from external page CRAN.
From your R console, type:
> install.packages("CASMAP")
For Python
Follow the instructions in our external page GitHub repository to download the source code and to install the package.
Package graphkernels
The package is available for R and Python.
For R
1. You can download and install the package directly from external page CRAN.
From your R console, type:
> install.packages("graphkernels")
2. You can download the source code from our external page GitHub repository.
For Python
1. You can simply install it with the pip command. From your command prompt type:
$ pip install graphkernels
2. You can download the source code and install the package following the instructions in our external page GitHub repository.
You can download and boot up with VirtualBox a Ubuntu 14.04 virtual machine. The virtual machine has the two packages already pre-installed.
Use this option as a back-up plan in case the installation of any of the packages is not successful.
Installing the VM
The VM can be downloaded and instantiated on any platform by following these steps:
- Download external page VirtualBox for your specific platform
- Download the virtual machine mlcb-vm.vdi, ~8.0 Gb, MD5=e26b876ee040a0861cb1d004a783fc2e
- Import the VM from within VirtualBox
Note: Depending on your version of VirtualBox, the instructions below may require clicking "Continue" to move to the next selection.- Click the "New" icon at the top to create a new VM
- Type "mlcb_vm" to set the name of the VM
- Select "Linux" as the Type of operating system
- Select "Ubuntu (64-bit) as Version
- Select 4,096 Mb as memory size
- Select "Use an existing virtual hard disk file" and then search for the file all-gwas.vdi downloaded in step 2
- Now a new VM will show up in the left panel of Virtual Box. The VM will be marked as "Powered Off"
- Right-click on the VM and select "Settings". Go to "System" and then to "Acceleration". Make sure that:
- [For Linux] the options "Enable VT-x/AMD-V" and "Enable Nested Paging" are selected
- [For macOS and Windows] the option "Enable Nested Paging" is selected (this is normally the default behavior)
- [For Linux] the options "Enable VT-x/AMD-V" and "Enable Nested Paging" are selected
- To start the VM, click the "Start" icon at the top
- Log in with the user account: tutorial with password ismb18