Exercise 1

ATTENTION - that was a data with the wrong format in the old vesion of dados1.csv - the new version of the file (as of 12/3 11am) is OK

Due date: 18/3 in class

If the exercise is submitted until 20/3 (in class) there will be a 50% penalty (the grade will be divided by two), after that the exercise will no longer be accepted.

In the file dados1.csv each line is a data point, and each column an attribute, named A, B, C, and D. The file is in the generic standard of "csv", but "csv" is not a single standard!

  1. Read the data and show the first 5 data points.
  2. Show which data have missing values? Remove them
  3. Which data points have outliers? How did you discover that? Remove them.
  4. Plot the istogram for attribute A, with 10 and 30 bins. Which one is the most informative?
  5. Calculate and print the covariance matrix for the data
  6. Compute the PCA for the data. How many dimensions should you keep?
  7. Plot an AY graph of the two largest PCA dimensions

Last modified: Wed Mar 5 18:57:33 BRT 2014