Exercise 3

Due date : 25/4 before 21:00, via email. Send a short pdf with your code, results and discussion.

Can be done individually or in pairs

Unpaired data

Files a.csv b.csv c.csv d.csv e.csv contain five sets of unpaired data

run the ANOVA - report the p-value
is the ANOVA p-value is low enough, run some post hoc analysis (Tukey HDS or paired t-test with Holms correction) and report which groups are significantly different from each other
run the Kruskal-Wallis test and report the p-value
If the Kruskal Wallis is low enough, run all pairwise Wilcoxon rank sum test. Report which groups are significantly different from each other for using Holms and Bonferroni corrections on the p-values from the pairwise comparisons.

Paired data

File multi.csv contains 17 runs of your algorithm against 4 other. The data is running time (lower is better). The data for your algorithm is the first column. Which algorithm is significantly slower than yours? Assume that the data is not normal, and so you should use the Friedman test followed by the necessary comparisons using Wilcoxon sign rank. Use Holms, and Bonferroni corrections.

Also use the Nemenyi procedure (implemented in the scmamp package

As an extra work, not to be sent to me in your answer, you should also try to use a repeated measure ANOVA instead of the Friedman test, and do the comparisons using the t-test. You will see that repeated measure ANOVA is suprisingly complicated to use in R.

Extra homework for the ML students: comparing 2 classifiers

Run 2 different classifiers on a sufficiently large data set (the classifiers can have the default hyperparameters).

use a 5x2cv and compare them using paired Wilcoxon on the 10 accuracies.

use the same 5x2cv and compare them using NMcNemar on the 10 sets of answers.

Which is the stronger test procedure?