worked_example_prediction_-_train
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
worked_example_prediction_-_train [2021/01/30 15:31] – [Training] krian | worked_example_prediction_-_train [2021/01/30 16:16] (current) – removed krian | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Worked example Prediction ===== | ||
- | ===== Training ===== | ||
- | |||
- | **1-** Log into HiPathia. For further information on this step visit [[logging_in|logging in]]. | ||
- | |||
- | **2-** Collection of data. We will work with a Breast Cancer dataset from the repository The Cancer Genome Atlas (TCGA) [[https:// | ||
- | More information on the proposed dataset is available here: | ||
- | * https:// | ||
- | * https:// | ||
- | |||
- | Before use in HiPathia, the dataset must be normalized. We recommend using the [[https:// | ||
- | |||
- | We have selected samples of breast cancer tumors from the dataset, annotated as luminal A and luminal B (the molecular annotations come from [[https:// | ||
- | |||
- | The expression matrix and the experimental design can be downloaded from these links: | ||
- | |||
- | * Expression matrix: [[http:// | ||
- | * Experimental design: | ||
- | |||
- | **3-** Click // | ||
- | {{ : | ||
- | |||
- | **4-** Upload the normalized data to HiPathia by clicking //My data// in the data panel, or click the //Run training example// button. For further information on this step visit [[upload_your_data|Upload your data]]. | ||
- | {{ :: | ||
- | **5.** In the //Type// panel select //Train new predictor// | ||
- | |||
- | **6.** In the //Input data// panel, select // | ||
- | |||
- | {{ : | ||
- | |||
- | **7.** In the //Design data// panel, select //Class predictor// | ||
- | |||
- | {{ : | ||
- | |||
- | **8.** Select Human (Homo sapiens) as species (default). | ||
- | |||
- | |||
- | **9.** In the // | ||
- | |||
- | **10.** In the //Study information// | ||
- | |||
- | {{ : | ||
- | |||
- | **11.** Click the //Run analysis// button. A Study will be created and listed in the studies panel. You can access this panel by clicking on the //My studies// button. | ||
- | |||
- | ===== Training report ===== | ||
- | The report page of the Prediction tool includes different output results. You can download any table or image shown on the results page by clicking on the name right before it. You can also download the pathway and function matrices by clicking on //Circuit values//. | ||
- | |||
- | The results are divided in different panels: | ||
- | ==== Study Information ==== | ||
- | Here you can find the information about the selected study. | ||
- | {{ :: | ||
- | * **Name**: the study name. | ||
- | * **Description**: | ||
- | * **Tool**: the name of the used tool (in this case, is Hipathia). | ||
- | * **Date**: study launching date (MM/ | ||
- | ==== Input Parameters ==== | ||
- | Here you can visualize the parameters with which the current study was launched. | ||
- | {{ :: | ||
- | * **Expression file**: The name of the expression file that has been used in the current study. | ||
- | * **Design file**: The name of the design file that has been used in the current study. | ||
- | * **Comparison**: | ||
- | * **Paired analysis**: Have the input data been paired? **No** or **Yes**. | ||
- | * **Species**: | ||
- | ==== Circuit values ==== | ||
- | You can download the matrix of circuit activity values by clicking on circuit values. | ||
- | {{ :: | ||
- | This matrix file indicates for each " | ||
- | ==== Model training ==== | ||
- | Here you can visualize the results from the prediction analysis. | ||
- | * **K-fold cross-validation**: | ||
- | {{ :: | ||
- | * **SVM-RBF hypermeter performance** | ||
- | {{ :: | ||
- | * **Test model statistics**: | ||
- | {{ :: | ||
- | * **Split Train pr**: | ||
- | {{ :: | ||
- | * **Split Train roc**: | ||
- | {{ :: | ||
- | * **Split Test pr**: | ||
- | {{ :: | ||
- | * **Split Test roc**: | ||
- | {{ :: | ||
- | ==== Prediction model ==== | ||
- | {{ :: | ||
- | |||
- | ==== Model statistics ==== | ||
- | You can download the model statistics. | ||
- | * **Selected features**: You can download the filtered paths that best differentiate your phenotype. This section is only available when selecting //filter paths// option. | ||
- | ===== Workflow ===== | ||
- | The prediction tool is based on a machine learning module, this module of the Hipathia web tool can be summarized as follows: | ||
- | * Expected input and output: | ||
- | * Input features: Hipathia circuit values. | ||
- | * Input response: 1-D binary array with the same number of samples as the input features. | ||
- | * Output 1: CV performance metrics | ||
- | * Output 2: The selected features with their respective interaction sign, sorted by their relevance. | ||
- | * A positive sign indicates that a given feature pushes the prediction towards the positive class. | ||
- | * Output 3: Statistics and ROC, PR curves for typical train test split scenario. | ||
- | * Output 4: Probability boxplots for the test set. | ||
- | * Feature selection: | ||
- | * We select the features that best discriminate between the response values by means of the LASSO [4] (using the '' | ||
- | * We filter the feature space using those circuits selected in the previous step. | ||
- | * Hyperparameter search ('' | ||
- | * **γ**: determines the complexity of the svm frontier. | ||
- | * **cost**: is basically the //margin// around the frontier established by the svm. | ||
- | * **method**: both γ and //margin// are obtained using a k-fold cross-validation procedure: | ||
- | * for each selection of γ and '' | ||
- | * we compute the mean of the misclassification error over all the folds in the test split | ||
- | * we select the best pair of hyperparameters (γ, '' | ||
- | * From now onwards we fix the features selected by the LASSO and the hyperparameters previously found. | ||
- | * The '' | ||
- | * Performance evaluation: | ||
- | * We perform a k-fold cross-validation with the features and hyperparameters selected above in order to report the generalization capabilities of the method. | ||
- | * The report contains a set of commonly used metrics for classification. | ||
- | * We perform a train-test split analysis | ||
- | * We randomly select 30% of the samples as the test | ||
- | * We train a SVM on the train set using the hyperparameters and features previously found. | ||
- | * We provide summary statistics as in the case of the k-fold cross-validation. | ||
- | * We plot the ROC and Precision-Recall (PR) curves along with the area under the curve. | ||
- | * Note that all curve visualizations have been done using the specialized R package '' | ||
- | |||
- | === Breast Cancer Molecular Subtype Classification === | ||
- | |||
- | The experiment consists in classifying a given sample as Luminal A or Luminal B (molecular subtype). We use TCGA data, no pathway filtering was done by hand. | ||
- | |||
- | **Model Analysis** | ||
- | |||
- | Hyperparameter search: | ||
- | |||
- | {{ : | ||
- | |||
- | CV Performance: | ||
- | {{ : | ||
- | |||
- | The most relevant features along with their interaction sign: | ||
- | |||
- | |**features** | ||
- | |p53 signaling pathway: CDK1 CCNB3 | ||
- | |p53 signaling pathway: CDK2 CCNE1 | ||
- | |p53 signaling pathway: SERPINB5 | ||
- | |Oocyte meiosis: REC8* | ||
- | |Neurotrophin signaling pathway: NFKB1 | ||
- | |Neurotrophin signaling pathway: RHOA |+ | | ||
- | |Amphetamine addiction: ARC |- | | ||
- | |RIG-I-like receptor signaling pathway: CHUK IKBKB IKBKG | ||
- | |Ras signaling pathway: RAP1A |- | | ||
- | |Progesterone-mediated oocyte maturation: CDK1 | ||
- | |Vascular smooth muscle contraction: | ||
- | |Complement and coagulation cascades: BDKRB1 | ||
- | |Fanconi anemia pathway: RAD51C | ||
- | |TGF-beta signaling pathway: ROCK1 | ||
- | |ErbB signaling pathway: ELK1* | ||
- | |HTLV-I infection: TP53 TBPL2 |+ | | ||
- | |Platelet activation: ITPR1 |- | | ||
- | |Pathways in cancer: E2F1 |+ | | ||
- | |PI3K-Akt signaling pathway: BCL2 |- | | ||
- | |Maturity onset diabetes of the young: NKX6-1 | ||
- | |Signaling pathways regulating pluripotency of stem cells: MAPK1* | ||
- | |Rap1 signaling pathway: THBS1 | ||
- | |HTLV-I infection: E2F1 |+ | | ||
- | |Jak-STAT signaling pathway: CDKN1A | ||
- | |Epstein-Barr virus infection: RB1 | ||
- | |Colorectal cancer: BIRC5 |+ | | ||
- | |Signaling pathways regulating pluripotency of stem cells: MYC | ||
- | |Taste transduction: | ||
- | |Hepatitis B: JUN |- | | ||
- | |Axon guidance: GSK3B |- | | ||
- | |MAPK signaling pathway: MAPT |- | | ||
- | |cAMP signaling pathway: HHIP |- | | ||
- | |Fanconi anemia pathway: BRCA1 | ||
- | |Pathways in cancer: CSF3R | ||
- | |Cell cycle: CDC45 MCM7 MCM6 MCM5 MCM4 MCM3 MCM2 | ||
- | |ErbB signaling pathway: CDKN1A | ||
- | |HTLV-I infection: PTTG2 | ||
- | |AMPK signaling pathway: CCNA2 | ||
- | |Oocyte meiosis: CDC25C* | ||
- | |Non-alcoholic fatty liver disease (NAFLD): BAX |+ | | ||
- | |Hepatitis B: PCNA | ||
- | |AMPK signaling pathway: G6PC |- | | ||
- | |Adrenergic signaling in cardiomyocytes: | ||
- | |HTLV-I infection: ANAPC10 CDC20 | ||
- | |Progesterone-mediated oocyte maturation: CDK1* |+ | | ||
- | |Complement and coagulation cascades: C4A |- | | ||
- | |Choline metabolism in cancer: WAS | ||
- | |ErbB signaling pathway: STAT5A | ||
- | |Herpes simplex infection: FOS | ||
- | |Amyotrophic lateral sclerosis (ALS): DERL1 |+ | | ||
- | |AGE-RAGE signaling pathway in diabetic complications: | ||
- | |Non-alcoholic fatty liver disease (NAFLD): PKLR | ||
- | |Maturity onset diabetes of the young: FOXA3 | ||
- | |AMPK signaling pathway: CPT1C | ||
- | |PPAR signaling pathway: FADS2 | ||
- | |Rap1 signaling pathway: C00076* | ||
- | |cAMP signaling pathway: GRIN3A | ||
- | |Glutamatergic synapse: CACNA1A | ||
- | |Progesterone-mediated oocyte maturation: MAPK14 | ||
- | |Salivary secretion: BEST2 | ||
- | |Vibrio cholerae infection: PDIA4 |+ | | ||
- | |cAMP signaling pathway: PLN | ||
- | |Neurotrophin signaling pathway: JUN | ||
- | |Pathways in cancer: CCNA1 | ||
- | |Epithelial cell signaling in Helicobacter pylori infection: GIT1 |- | | ||
- | |Renal cell carcinoma: TGFA |- | | ||
- | |Influenza A: RNASEL | ||
- | |Thyroid hormone signaling pathway: TP53* |+ | | ||
- | |Epithelial cell signaling in Helicobacter pylori infection: CXCL1 |- | | ||
- | |Signaling pathways regulating pluripotency of stem cells: MAPK14 | ||
- | |Hepatitis C: EIF2S1 | ||
- | |Proteoglycans in cancer: CTNNB1 | ||
- | |Influenza A: STAT1 IRF9 | ||
- | |Thyroid hormone signaling pathway: CTNNB1 | ||
- | |Taste transduction: | ||
- | |Prostate cancer: C16038 | ||
- | |Basal cell carcinoma: PTCH1* | ||
- | |Toxoplasmosis: | ||
- | |Prostate cancer: BCL2 | ||
- | |Measles: EIF2S1 | ||
- | |Acute myeloid leukemia: CCNA1 | ||
- | |Glucagon signaling pathway: CPT1C* | ||
- | |||
- | |||
- | **Split Analysis** | ||
- | |||
- | Split Performance: | ||
- | {{ : | ||
- | |||
- | PR curve over the test: | ||
- | |||
- | {{ : | ||
- | |||
- | ROC curve over the test set: | ||
- | |||
- | {{ : | ||
- | |||
- | Probability for the test set: | ||
- | |||
- | {{ : | ||
- | |||
- | |||
- | ===== Bibliography ===== | ||
- | |||
- | [1]D. Meyer, E. Dimitriadou, | ||
- | |||
- | [2]C.-C. Chang and C.-J. Lin, “LIBSVM: A Library for Support Vector Machines, | ||
- | |||
- | [3]J. Grau, I. Grosse, and J. Keilwagen, “PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R,” Bioinformatics, | ||
- | |||
- | [4]R. Tibshirani, “Regression Shrinkage and Selection via the Lasso,” Journal of the Royal Statistical Society. Series B (Methodological), | ||
- | |||
- | [5]J. Friedman, T. Hastie, and R. Tibshirani, “Regularization Paths for Generalized Linear Models via Coordinate Descent,” J Stat Softw, vol. 33, no. 1, pp. 1–22, 2010, doi: 10.18637/ | ||
- | |||
- | [6]C. Cortes and V. Vapnik, “Support-vector networks, | ||
worked_example_prediction_-_train.1612020670.txt.gz · Last modified: 2021/01/30 15:31 by krian