Differences

This shows you the differences between two versions of the page.

--- worked_example_prediction [2021/01/13 17:11] – Move discussion entry from train to this page. cloucera
+++ worked_example_prediction [2021/01/30 16:49] (current) – removed krian
@@ Line 1: / Line 1: @@
-====== Worked example Prediction  ======
-===== Test inputs =====
-**1.** Log in into HiPathia. For further information on this step visit [[logging_in|Logging in]].
-**2.** We will test the model with another Breast Cancer dataset (Luminal A Vs Luminal B) from the repository The Cancer Genome Atlas.
-You can download the expression matrix of the test data from the link:
-  * Test expression matrix: [[http://hipathia.babelomics.org/data/brca_sub_class_exp_test.txt|brca_sub_class_exp_test.txt]]
-**3.** Upload the data to HiPathia in the data panel by clicking on //My data//. For further information on this step visit [[upload_your_data|Upload your data]].
-**4.** Click //Prediction// button.
-{{ :hipathia_bar_pred.png?600 |}}
-**5.** In the //Type// panel, select //Test existing predictor//. A window with all the existing models will appear. Select the model you want to use. We will use the model we have created in [[worked_example_prediction_-_train|Worked example Prediction - Train]]. The model information will appear on the right panel.
-{{ :hipathia_work5.png?600 |}}
-**6.** In the //Input data panel// select //Expression matrix//. Press the //File browser// of the //Expression matrix file//, and select the desired file, //brca_genes_vals_bn_test.txt//.
-{{ :hipathia_work6.png?600 |}}
-**7.** In the //Job information// panel, press the //File browser// button and select the desired output folder. In this case, we will use //analysis_BRCA//. Give a name to the study, for example, "BRCA test model".
-{{ :hipathia_work7.png?600 |}}
-**8.** Press the //Run analysis// button. A study will be created and listed in the studies panel. You can access this panel by clicking on the //My studies// button.
-===== Test report=====
-As we said before, this experiment consists in test the trained model [[worked_example_prediction_-_train|(in this example)]], using another split of data (Luminal A or Luminal B).
-{{ ::testreport.png?nolink |}}
-==== Study Information ====
-Here you can find the information about the selected study.
-  * **Name**: the study name.
-  * **Description**: the description of the current study.
-  * **Tool**: the name of the used tool (in this case, is Hipathia).
-  * **Date**: study's launching date (MM/DD/AAAA, HH:MM:SS AM/PM format)
-==== Input Parameters ====
-Here you can find the parameters with which the current study was launched.
-{{ ::inputpredictreport.png?nolink |}}
-  * **Expression file**: The name of the expression file that has been used in the current study.
-  * **Species**: The species of this experiment; Human (Homo sapiens),Mouse (Mus musculus) or Rat (Rattus norvegicus).
-==== Circuit values ====
-You can download the matrix of circuit activity values by clicking on circuit values.
-This matrix file indicates for each "effector circuit" the level of activation calculated using Hipathia method for each sample.
-==== Prediction model ====
-This is the most important result of our predictor, which is a matrix with three columns :
-  * Sample name: all the 125 samples in the used expression matrix file.
-  * Prediction: the predicted group LumB (Luminal B) or LumA (Luminal A)
-  * Probability LumB: this is the probability of being lumB, if it is 1 that means the predictor is 100% sure that the given result will be LumB.
-You can download the matrix of predicted experimental design by clicking on //Prediction results//.
-===== Prediction evaluation =====
-==== Confusion Matrix and Statistics ====
-^              ^^              Reference              ^^
-^              ^             ^    Lum A    ^    LumB    ^
-^    Prediction    ^    LumA    |    95    |    5    |
-^    :::    ^    LumB    |    9    |    16    |
-^              Accuracy              |||    0.888    |
-^              95% CI              |||    (0.8192, 0.9374)    |
-^              No Information Rate              |||    0.832    |
-^              P-Value [Acc > NIR]              |||    0.0547    |
-^              Kappa              |||    0.6277    |
-^              Mcnemar's Test              ^^^^
-^              P-Value              |||    0.4227    |
-^              Sensitivity              |||    0.9135    |
-^              Specificity              |||    0.7619    |
-^              Pos Pred Value              |||    0.9500    |
-^              Neg Pred Value              |||    0.6400    |
-^              Prevalence              |||    0.8320    |
-^              Detection Rat              |||    0.7600    |
-^              Detection Prevalence              |||    0.8000    |
-^              Balanced Accuracy              |||    0.8377    |
-===== Discussion =====
-In this section we provide a brief discussion of the machine learning example which encompasses [[worked_example_prediction_-_train|Worked example Prediction]] and [[worked_example_prediction|Worked example Prediction]].
-In this example we introduce a machine learning workflow, a binary classification estimator fused with feature selection and normalization, built on top of the signalization circuit activation values, computed by means of the HiPathia mechanistic model. The workflow has two main advantages over using the full gene set: on the one hand, the dimensionality of the feature space is amply compressed (thus filtering noise) since there are from ten, if using the full gene set, to three times less circuits than genes, if reducing the gene set to the singalization subset, and on the other hand, the machine learning model is more explainable due, among other things, to the use of a smaller set of features that in turn are easier to interpret thanks to the functional characterization of the circuits.
-Our proposed experiment, consisting of differentiate between luminal breast cancer molecular subtypes, shows that our methodology is very suitable to this particular task, as can be inferred from the performance metrics computed on a fully independent set of samples and the CV splits. Furthermore, the excellent results are obtained using a small subset of the circuits, which reinforces the model explainability.
-===== Related papers =====
-  * https://pubmed.ncbi.nlm.nih.gov/24521998/
-  * https://www.nature.com/articles/nature11412
-  * https://www.sciencedirect.com/science/article/pii/S096097761930596X
-  * https://www.annalsofoncology.org/article/S0923-7534(19)37122-4/fulltext