Exercise 1

Due date : 26/3 before 21:00, via email. Send a short pdf with your code, results, and discussion.

Can be done individually or in pairs

Small data set 1 non paired

Files a1.csv and b1.csv contain two sets of data. Run the following 2 independent sample tests on then and report the p-value.

Plot the two histograms and discuss whether the data is "gaussian like". If so which p-value should you trust/use? I am not sure there is a correct answer to this one, but I would like to know your arguments for your choice.

Small data set 2 - paired data

File one.csv contain two sets of paired data (each set is a colum, the pairing is the line).. Run the following 2 paired sample tests on then and report the p-value.

Run the non-paired versions of the two tests, and compare the p-values. Does the result correspond to what we discussed in class?

Run the sign test The sign test verifies how many times the data from one group is larger than the corresponding data from the other group and computes the probability that this proportion will appear from a binomial distribution with probability 0.5 (the two groups are the same). Run the binom.test function in R.

How does the sign test compares with the other two?

Study on the factors that impact the p-value

Using R (or any other tool) generate 2 samples of 15 data from a Gaussian (normal) with means 10 and 13 and standard deviation 5.

Calculate the the average of the p-value of running the t-test on 10 pairs of samples as above (the mean of p-values is there only because your random sample can be odd and the p-value of the test can be way off).