This is an old revision of the document!
Table of Contents
Worked example Prediction
Training
1- Log into HiPathia. For further information on this step visit logging in.
2- Collection of data. We will work with a Breast Cancer dataset from the repository The Cancer Genome Atlas (TCGA) Link to dataset.
More information on the proposed dataset is available here:
Before use in HiPathia, the dataset must be normalized. We recommend using the logarithm of the trimmed mean of M values (log2TMM).
We have selected samples of breast cancer tumors from the dataset, annotated as luminal A and luminal B (the molecular annotations come from this paper). You can learn more about breast cancer molecular subtypes here. The purpose of this study is to train a predictor so that it can learn to distinguish molecular subtypes from gene expression data using the Hipathia mechanistic models, and evaluate the model with a controlled set of samples.
The expression matrix and the experimental design can be downloaded from these links:
- Expression matrix: brca_sub_class_exp_train.txt
- Experimental design:brca_sub_class_des_train.txt
4- Upload the normalized data to HiPathia by clicking My data in the data panel, or click the Run training example button. For further information on this step visit Upload your data.
5. In the Type panel select Train new predictor.
6. In the Input data panel, select Expression matrix. Click the File browser of the Expression matrix section and select the desired file.
7. In the Design data panel, select Class predictor. Click the File browser of the Experimental design section and select the desired file. Automatically Condition 1 and Condition 2 files are selected. Select “Tumor” for Condition 1 and “Normal” for Condition 2.
8. Select Human (Homo sapiens) as species (default).
9. In the Pathways panel, select all the pathways (default).
10. In the Study information panel, click the File browser button and select the desired output folder. In this case, we will use analysis_BRCA. Give a name to the study, for example “BRCA train model”.
11. Click the Run analysis button. A Study will be created and listed in the studies panel. You can access this panel by clicking on the My studies button.
Training report
Once the launched study is finished, the report/results will be available in “My studies”. The report page of the Prediction-training tool includes different output results. You can download any table or image shown on the results page by clicking on the name right before it. You can also download the pathway and function matrices by clicking on Circuit values, For more information about each result please read Prediction - Training report and Prediction - Workflow sections.
Breast Cancer Molecular Subtype Classification
The experiment consists in classifying a given sample as Luminal A or Luminal B (molecular subtype). We use TCGA data, no pathway filtering was done by hand.
Model Analysis
Hyperparameter search:
CV Performance: CV stats
The most relevant features along with their interaction sign:
Selected circuits name | Coef sign |
ErbB signaling pathway: ELK1* | - |
Progesterone-mediated oocyte maturation: CDK1 | + |
Pathways in cancer: BCL2 | - |
Cell cycle: CDC45 MCM7 MCM6 MCM5 MCM4 MCM3 MCM2 | + |
Neurotrophin signaling pathway: NFKB1 | + |
Pathways in cancer: E2F1 | + |
p53 signaling pathway: SERPINB5 | - |
p53 signaling pathway: CDK1 CCNB3 | + |
Vascular smooth muscle contraction: ACTA2 | - |
Neurotrophin signaling pathway: JUN | - |
Apoptosis: TP53 | + |
PPAR signaling pathway: ACAA1 | - |
Hippo signaling pathway: BIRC5 | + |
PPAR signaling pathway: CPT1C | + |
ErbB signaling pathway: GSK3B | + |
cAMP signaling pathway: GRIN3A | + |
Oocyte meiosis: CPEB2 | + |
Pathways in cancer: CSF3R | + |
Choline metabolism in cancer: WAS | + |
NOD-like receptor signaling pathway: CASP5 | + |
Hippo signaling pathway: SMAD1 SMAD4 | - |
Pathways in cancer: BIRC5 | + |
ErbB signaling pathway: STAT5A* | - |
ErbB signaling pathway: STAT5A | - |
p53 signaling pathway: CDK2 CCNE1 | + |
Proteoglycans in cancer: MAPK1 | - |
Pathways in cancer: CTBP1 HDAC1 | - |
cGMP-PKG signaling pathway: CNGB1 | - |
Ras signaling pathway: RAP1A | - |
Oxytocin signaling pathway: EEF2 | - |
Gap junction: C00681 | - |
Insulin signaling pathway: G6PC | - |
Fc gamma R-mediated phagocytosis: ARF6* | + |
Platelet activation: PIK3R5* | - |
Progesterone-mediated oocyte maturation: PIK3R5 | - |
PI3K-Akt signaling pathway: CDKN1B | - |
PPAR signaling pathway: FADS2 | + |
Thyroid hormone signaling pathway: WNT4 | - |
Thyroid hormone signaling pathway: CTNNB1 | - |
Split Analysis
Split Performance: Test stats
PR curve over the test:
ROC curve over the test set:
Probability for the test set: