User Tools

Site Tools


prediction

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
prediction [2020/02/05 16:34]
krian [Workflow]
prediction [2020/04/03 20:18] (current)
Line 19: Line 19:
 The **expression data** has to be:  The **expression data** has to be: 
   * Expression matrix provided by ourselves (see how to upload files in [[upload_your_data|Upload your data]]).   * Expression matrix provided by ourselves (see how to upload files in [[upload_your_data|Upload your data]]).
 +When we select a gene expression file, the number of samples of this matrix will appear under the "file browser"​ button as shown below.
 +{{ ::​diffnumbersamples.png?​nolink |}}
 ==== Design data panel ==== ==== Design data panel ====
 The design data panel allows you to choose the kind of experiment you want to perform. You can choose between two kinds of experimental design: The design data panel allows you to choose the kind of experiment you want to perform. You can choose between two kinds of experimental design:
Line 35: Line 37:
 {{ ::​species.png?​nolink |}} {{ ::​species.png?​nolink |}}
 ==== Parameters ==== ==== Parameters ====
-This panel includes further parameters necessary to run an analysis.+This panel includes further parameters necessary to run an analysis.\\ 
 **Filter circuits**: Check to obtain the circuits that best differentiate your phenotype. This option is only available from //​Prediction//​ tool. **Filter circuits**: Check to obtain the circuits that best differentiate your phenotype. This option is only available from //​Prediction//​ tool.
 {{ ::​filtercircuits.png?​nolink |}} {{ ::​filtercircuits.png?​nolink |}}
Line 106: Line 108:
 You can download the model statistics. You can download the model statistics.
   * **Selected features**: You can download the filtered paths that best differentiate your phenotype. This section is only available when selecting //filter paths// option.   * **Selected features**: You can download the filtered paths that best differentiate your phenotype. This section is only available when selecting //filter paths// option.
 +===== Workflow =====
 +The prediction tool is based on a machine learning module, this module of the Hipathia web tool can be summarized as follows:
 +  * Expected input and output:
 +    * Input features: hipathia circuit values.
 +    * Input response: 1-D binary array with the same number of samples as the input features.
 +    * Output 1: CV performance metrics
 +    * Output 2: The selected features with their respective interaction sign, sorted by their elevance.
 +      * A positive sign indicates that a given feature pushes the prediction towards the positive class.
 +    * Output 3: Statistics and ROC, PR curves for typical train test split scenario.
 +    * Output 4: Probability boxplots for the test set.
 +  * Feature selection:
 +    * we select the features that best discriminate between the response values by means of the LASSO [4] (using the ''​glmnet''​ r package which implements a fast coordinate descent version of the LASSO [5]).
 +    * We filter the feature space using those circuits selected in the previous step.
 +  * Hyperparameter search (''​C''​ //cost// or //margin// and γ) of a non-linear SVM [6] with a radial-basis kernel:
 +    * **γ**: determines the complexity of the svm frontier. ​
 +    * **cost**: is basically the //margin// around the frontier established by the svm.
 +    * **method**: both γ and //margin// are obtained using a k-fold cross-validation procedure:
 +      * for each selection of γ and ''​C''​ we train a svm
 +      * we compute the mean of the misclassification error over all the folds in the test split
 +      * we select the best pair of hyperparameters (γ, ''​C''​),​ i.e. the ones with the lower CV mean error.
 +    * From now onwards we fix the features selected by the LASSO and the hyperparameters previously found.
 +    * The ''​SVM''​ training has been carried out using one of the most powerful libraries to train svm-based models ''​LIBSVM''​ [2] by means of the R interface provided by the package ''​e1071''​ [1].
 +  * Performance evaluation: ​
 +    * We perform a k-fold cross-validation with the features and hyperparameters selected above in order to report the generalization capabilities of the method. ​
 +      * The report contains a set of commonly used metrics for classification.
 +    * We perform a train-test split analysis
 +      * We randomly select 30% of the samples as the test
 +      * We train a SVM on the train set using the hyperparameters and features previously found.
 +      * We provide summary statistics as in the case of the k-fold cross-validation.
 +      * We plot the ROC and Precision-Recall (PR) curves along with the area under the curve. ​
 +      * Note that all curve visualizations have been done using the specialized R package ''​PRROC''​ [3]
 +
 +
 +=== Breast Cancer Molecular Subtype Classification ===
 +
 +The experiment consists in classifying a giving sample as Luminal A or Luminal B (molecular subtype). We use TCGA data (processed by Inma), no pathway filtering was done by hand.
 +
 +**Model Analysis**
 +
 +Hyperparameter search:
 +
 +{{ :​svm.performance.heatmap.png?​direct&​400 | Hyperparameter search heatmap}}
 +
 +CV Performance:​
 +{{ :​model_stats.tsv| CV stats }}
 +
 +The most relevant features along with their interaction sign:
 +
 +|features ​                                                         |coefs |
 +|:​-----------------------------------------------------------------|:​-----|
 +|p53 signaling pathway: CDK1 CCNB3                                 ​|+ ​    |
 +|p53 signaling pathway: CDK2 CCNE1                                 ​|+ ​    |
 +|p53 signaling pathway: SERPINB5 ​                                  ​|- ​    |
 +|Oocyte meiosis: REC8*                                             ​|+ ​    |
 +|Neurotrophin signaling pathway: NFKB1                             ​|+ ​    |
 +|Neurotrophin signaling pathway: RHOA                              |+     |
 +|Amphetamine addiction: ARC                                        |-     |
 +|RIG-I-like receptor signaling pathway: CHUK IKBKB IKBKG           ​|+ ​    |
 +|Ras signaling pathway: RAP1A                                      |-     |
 +|Progesterone-mediated oocyte maturation: CDK1                     ​|+ ​    |
 +|Vascular smooth muscle contraction:​ ACTA2                         ​|- ​    |
 +|Complement and coagulation cascades: BDKRB1 ​                      ​|+ ​    |
 +|Fanconi anemia pathway: RAD51C ​                                   |+     |
 +|TGF-beta signaling pathway: ROCK1                                 ​|- ​    |
 +|ErbB signaling pathway: ELK1*                                     ​|- ​    |
 +|HTLV-I infection: TP53 TBPL2                                      |+     |
 +|Platelet activation: ITPR1                                        |-     |
 +|Pathways in cancer: E2F1                                          |+     |
 +|PI3K-Akt signaling pathway: BCL2                                  |-     |
 +|Maturity onset diabetes of the young: NKX6-1 ​                     |+     |
 +|Signaling pathways regulating pluripotency of stem cells: MAPK1* ​ |-     |
 +|Rap1 signaling pathway: THBS1                                     ​|+ ​    |
 +|HTLV-I infection: E2F1                                            |+     |
 +|Jak-STAT signaling pathway: CDKN1A ​                               |+     |
 +|Epstein-Barr virus infection: RB1                                 ​|- ​    |
 +|Colorectal cancer: BIRC5                                          |+     |
 +|Signaling pathways regulating pluripotency of stem cells: MYC     ​|- ​    |
 +|Taste transduction:​ C00076* ​                                      ​|- ​    |
 +|Hepatitis B: JUN                                                  |-     |
 +|Axon guidance: GSK3B                                              |-     |
 +|MAPK signaling pathway: MAPT                                      |-     |
 +|cAMP signaling pathway: HHIP                                      |-     |
 +|Fanconi anemia pathway: BRCA1                                     ​|+ ​    |
 +|Pathways in cancer: CSF3R                                         ​|+ ​    |
 +|Cell cycle: CDC45 MCM7 MCM6 MCM5 MCM4 MCM3 MCM2                   ​|+ ​    |
 +|ErbB signaling pathway: CDKN1A ​                                   |+     |
 +|HTLV-I infection: PTTG2                                           ​|+ ​    |
 +|AMPK signaling pathway: CCNA2                                     ​|+ ​    |
 +|Oocyte meiosis: CDC25C* ​                                          ​|+ ​    |
 +|Non-alcoholic fatty liver disease (NAFLD): BAX                    |+     |
 +|Hepatitis B: PCNA                                                 ​|+ ​    |
 +|AMPK signaling pathway: G6PC                                      |-     |
 +|Adrenergic signaling in cardiomyocytes:​ BCL2                      |-     |
 +|HTLV-I infection: ANAPC10 CDC20                                   ​|- ​    |
 +|Progesterone-mediated oocyte maturation: CDK1*                    |+     |
 +|Complement and coagulation cascades: C4A                          |-     |
 +|Choline metabolism in cancer: WAS                                 ​|+ ​    |
 +|ErbB signaling pathway: STAT5A ​                                   |-     |
 +|Herpes simplex infection: FOS                                     ​|- ​    |
 +|Amyotrophic lateral sclerosis (ALS): DERL1                        |+     |
 +|AGE-RAGE signaling pathway in diabetic complications:​ F3          |-     |
 +|Non-alcoholic fatty liver disease (NAFLD): PKLR                   ​|+ ​    |
 +|Maturity onset diabetes of the young: FOXA3                       ​|- ​    |
 +|AMPK signaling pathway: CPT1C                                     ​|+ ​    |
 +|PPAR signaling pathway: FADS2                                     ​|+ ​    |
 +|Rap1 signaling pathway: C00076* ​                                  ​|+ ​    |
 +|cAMP signaling pathway: GRIN3A ​                                   |+     |
 +|Glutamatergic synapse: CACNA1A ​                                   |-     |
 +|Progesterone-mediated oocyte maturation: MAPK14 ​                  ​|- ​    |
 +|Salivary secretion: BEST2                                         ​|+ ​    |
 +|Vibrio cholerae infection: PDIA4                                  |+     |
 +|cAMP signaling pathway: PLN                                       ​|+ ​    |
 +|Neurotrophin signaling pathway: JUN                               ​|- ​    |
 +|Pathways in cancer: CCNA1                                         ​|- ​    |
 +|Epithelial cell signaling in Helicobacter pylori infection: GIT1  |-     |
 +|Renal cell carcinoma: TGFA                                        |-     |
 +|Influenza A: RNASEL ​                                              ​|+ ​    |
 +|Thyroid hormone signaling pathway: TP53*                          |+     |
 +|Epithelial cell signaling in Helicobacter pylori infection: CXCL1 |-     |
 +|Signaling pathways regulating pluripotency of stem cells: MAPK14 ​ |+     |
 +|Hepatitis C: EIF2S1 ​                                              ​|+ ​    |
 +|Proteoglycans in cancer: CTNNB1 ​                                  ​|+ ​    |
 +|Influenza A: STAT1 IRF9                                           ​|+ ​    |
 +|Thyroid hormone signaling pathway: CTNNB1 ​                        ​|- ​    |
 +|Taste transduction:​ PKD1L3 PKD2L1 ​                                ​|+ ​    |
 +|Prostate cancer: C16038 ​                                          ​|- ​    |
 +|Basal cell carcinoma: PTCH1* ​                                     |-     |
 +|Toxoplasmosis:​ C06314 ​                                            ​|- ​    |
 +|Prostate cancer: BCL2                                             ​|- ​    |
 +|Measles: EIF2S1 ​                                                  ​|+ ​    |
 +|Acute myeloid leukemia: CCNA1                                     ​|- ​    |
 +|Glucagon signaling pathway: CPT1C* ​                               |+     |
 +
 +
 +**Split Analysis**
 +
 +Split Performance:​
 +{{ :​test_model_stats.tsv| Test stats }}
 +
 +PR curve over the test:
 +
 +{{ :​split_test_pr.png?​400 | Precision-recall (PR) curve for the test split. }}
 +
 +ROC curve over the test set:
 +
 +{{ :​split_test_roc.png?​400 | ROC curve for the test split. }}
 +
 +Probability for the test set:
 +
 +{{ :​test_probability_boxplot.png?​400 | ROC curve for the test split. }}
 +
 +
  
  
prediction.1580920464.txt.gz · Last modified: 2020/04/03 20:17 (external edit)