3. Results

Output format

Results are currently exported into a set of plots, a tex file and a json file, in order to easily overview data or fetch it to another program. You will find those in the reports_output you choose inside your parameters file.

Prediction mode

For each classification level, you get a set of infos. You may get even more with 'full_test_set': verbose but compute time will be higher. The first row tells the global classification result, based upon the reads selection method you choose. At the end of the report, you may find the full parameters set you used on this particular sample, for archiving purposes. The tree displays information on which ways were investigated, matching the threshold you defined. Nodes are classes and edges are percentages of attributed reads on those classes from previous node.

Then, you get the information by level : in base mode, you have one confusion matrix, displaying the results from a test set against the trained model ; numbers are row percentages, showing for each actual class the percentage of correctly (or not) predicted reads. Note that this info is calculated upon your selection parameters, so you can investigate which parameters works better for your database, but you should look out for overfitting!

The next graph tells you about reads attribution : once calculations are done, reads are attributed to a certain class. The dashed line displays the investigation limit you set (reads percentage to consider a class) Next up, if you choose advanced plotting options, you get an additional set of graphs. Firstly, you have 'Mean and standard deviation' which tells you more about the fidelity of successive boostings.

Secondly, you get a plotting of the 15 most important features for the split, to get an extra bit more info upon the signature concept which this algorithm relies on.

Lastly, you get additional plots that are not included inside the .tex file but are in free consultation from the output folder :

Reads probabilities repartitions
Reads selection across functions

Additional figures

In the previous section, some scripts were described as outputting figures. They do not analyses your results, but rather the models you are using and the data you use as input, in order to extract classification rules. You may head over the report to get what those figures are speaking about.

WISP : Bacterial families identification from long reads, machine learning with XGBoost

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3. Results

Output format

Prediction mode

Additional figures

Clone this wiki locally