worked_example_prediction_train_and_test
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
worked_example_prediction_train_and_test [2021/01/30 19:15] – [Confusion Matrix and Statistics] cloucera | worked_example_prediction_train_and_test [2021/01/31 17:33] (current) – [Test report] cloucera | ||
---|---|---|---|
Line 7: | Line 7: | ||
The training phase is used to build the predictor. To do so, we need firstly a training set (a dataset of individuals belonging to two classes and properly labeled, or individuals with the associated measurement). This training set must be large and diverse enough to represent the real population of individuals on which we will be using the trained predictor. Then the predictor “learns” from this example training dataset how the classes are related to the attributes, which we will call features from now on. The performance is assessed by cross-validation, | The training phase is used to build the predictor. To do so, we need firstly a training set (a dataset of individuals belonging to two classes and properly labeled, or individuals with the associated measurement). This training set must be large and diverse enough to represent the real population of individuals on which we will be using the trained predictor. Then the predictor “learns” from this example training dataset how the classes are related to the attributes, which we will call features from now on. The performance is assessed by cross-validation, | ||
+ | |||
+ | It is important to note that, when the features used for training the predictor have a functional meaning by themselves, such as the signaling circuits have, the interpretation of the reasons for which the predictor takes a decision is straightforward and related to the biological processes that define the differences between the cases compared. | ||
Below is a worked example of how to train and use a predictor/ | Below is a worked example of how to train and use a predictor/ | ||
Line 57: | Line 59: | ||
===== Training report ===== | ===== Training report ===== | ||
+ | |||
Once the launched study is finished, the report/ | Once the launched study is finished, the report/ | ||
The report page of the Prediction-training tool includes different output results. You can download any table or image shown on the results page by clicking on the name right before it. You can also download the pathway and function matrices by clicking on //Circuit values//, For more information about each result please read [[prediction# | The report page of the Prediction-training tool includes different output results. You can download any table or image shown on the results page by clicking on the name right before it. You can also download the pathway and function matrices by clicking on //Circuit values//, For more information about each result please read [[prediction# | ||
Line 65: | Line 68: | ||
**Model Analysis** | **Model Analysis** | ||
- | |||
- | Hyperparameter search: | ||
- | |||
- | {{ : | ||
CV Performance: | CV Performance: | ||
+ | {{ :: | ||
+ | |||
{{ : | {{ : | ||
- | The most relevant features along with their interaction sign: | + | The most relevant features along with their interaction sign. It is important to note that, when the features used for training the predictor have a functional meaning by themselves, such as the signaling circuits have, the interpretation of the reasons for which the predictor takes a decision is straightforward and related to the biological processes that define the differences between the cases compared. The most relevant circuits with the interaction sign are: |
|**Selected circuits name** | |**Selected circuits name** | ||
Line 124: | Line 125: | ||
PR curve over the test: | PR curve over the test: | ||
- | {{ : | + | {{ : |
ROC curve over the test set: | ROC curve over the test set: | ||
Line 130: | Line 131: | ||
{{ : | {{ : | ||
- | Probability for the test set: | + | Probability |
{{ : | {{ : | ||
Line 176: | Line 177: | ||
This is the most important result of our predictor, which is a matrix with three columns: | This is the most important result of our predictor, which is a matrix with three columns: | ||
- | * Sample name: all the 125 samples in the used expression matrix file. | + | * Sample name: all the 125 samples |
* Prediction: the predicted group LumB (Luminal B) or LumA (Luminal A) | * Prediction: the predicted group LumB (Luminal B) or LumA (Luminal A) | ||
* Probability LumB: this is the probability of being lumB, if it is 1 that means the predictor is 100% sure that the given result will be LumB. | * Probability LumB: this is the probability of being lumB, if it is 1 that means the predictor is 100% sure that the given result will be LumB. | ||
+ | |||
You can download the matrix of predicted experimental design by clicking on // | You can download the matrix of predicted experimental design by clicking on // | ||
===== Prediction evaluation ===== | ===== Prediction evaluation ===== | ||
- | Note that for this example we know beforehand the ground truth labels so we can compute the classification metrics as in the simulated split during the training phase. The ROC and PR curves are quite similar to those of the simulated split which inform us of the good generalization capabilities of the tool for this problem. The trend can also be observed from the companion metrics table. | + | Note that for this example we know beforehand the ground truth labels so we can compute the classification metrics as in the simulated split during the training phase. The ROC and PR curves are quite similar to those of the simulated split which inform us of the good generalization capabilities of the tool for this problem. The trend can also be observed from the companion metrics table and the confusion matrix. |
{{ : | {{ : | ||
Line 191: | Line 193: | ||
{{ : | {{ : | ||
+ | ^ statistic ^ value ^ | ||
+ | | Sensitivity | 0.761904761904762 | | ||
+ | | Specificity | 0.913461538461538 | | ||
+ | | Positive Predictive Value | 0.64 | | ||
+ | | Negative Predictive Value | 0.95 | | ||
+ | | False Positive Rate | 0.0865384615384616 | | ||
+ | | False Negative Rate | 0.238095238095238 | | ||
+ | | Likelihood Ratio Positive | 8.8042328042328 | | ||
+ | | Likelihood Ratio Negative | 0.260651629072682 | | ||
+ | | Percentage of data points in the main diagonal | 0.888 | | ||
+ | | Percentage of data points in the main diagonal corrected for agreement by chance | 0.627659574468085 | | ||
+ | | Rand index | 0.799483870967742| | ||
+ | | Rand index corrected for agreement by chance | 0.525491509396793 | | ||
+ | | Total Accuracy | 0.888 | | ||
- | statistic value | + | ^ ^^ Reference |
- | + | ^ ^ | |
- | Sensitivity 0.761904761904762 | + | ^ Prediction |
- | + | ^ ::: ^ LumB | 9 | 16 | | |
- | Specificity 0.913461538461538 | + | |
- | + | ||
- | Positive Predictive Value 0.64 | + | |
- | + | ||
- | Negative Predictive Value 0.95 | + | |
- | + | ||
- | False Positive Rate 0.0865384615384616 | + | |
- | + | ||
- | False Negative Rate 0.238095238095238 | + | |
- | + | ||
- | Likelihood Ratio Positive 8.8042328042328 | + | |
- | + | ||
- | Likelihood Ratio Negative 0.260651629072682 | + | |
- | + | ||
- | Percentage of data points in the main diagonal 0.888 | + | |
- | + | ||
- | Percentage of data points in the main diagonal corrected for agreement by chance 0.627659574468085 | + | |
- | + | ||
- | Rand index 0.799483870967742 | + | |
- | + | ||
- | Rand index corrected for agreement by chance 0.525491509396793 | + | |
- | + | ||
- | Total Accuracy 0.888 | + | |
worked_example_prediction_train_and_test.1612034126.txt.gz · Last modified: 2021/01/30 19:15 by cloucera