II School on Machine Learning and
Knowledge Discovery in Databases (MLKDD)
July 14th – July 17th, 2013
São Carlos – SP – Brazil
http://www.amda.icmc.usp.br/mlkdd2013/
mlkddschool@gmail.com


** Registration is open via the school website. We highly encourage
early registration, as enrollment is limited to a restricted number of
participants. **

** II MLKDD School will be held concurrently with the first Symposium
on Knowledge Discovery, Mining and Learning (KDMiLe) -
http://kdmile.linkedej.com.br **


With the big data wave, the areas of machine learning and knowledge
discovery in databases have grown in importance, attracting the
attention of research institutions and companies. Consequently, there
is an imminent need for skilled people able to work in these areas.
The Second School on Machine Learning and Knowledge Discovery in
Databases (II MLKDD) aims to cultivate skilled professionals in these
areas by providing short courses from world-class specialists. The II
MLKDD will occur in São Carlos, SP, in July 14th to 17th. The previous
school, I MLKDD, occurred in Rio de Janeiro, in 1998. The II MLKDD
will provide short courses in current research issues in machine
learning and data mining. It will be held concurrently with the first
Symposium on Knowledge Discovery, Mining and Learning (KDMiLe).

The II MLKDD School will offer short courses taught by well-known
international researchers from the areas of machine learning and data
mining:
·      Carlos Soares, Universidade do Porto, Portugal
·      Jesse Read, Universidad Carlos III de Madrid, Spain
·      João Gama, Universidade do Porto, Portugal
·      Marko Grobelnik, Jožef Stefan Institute, Slovenia
·      Peter A. Flach, University of Bristol, United Kingdom

The school will be sponsored by the Research Support Center for
Machine Learning in Data Analysis, NAP-AMDA, from the Universidade de
São Paulo.

Organizing Committee:
·      André de Carvalho, ICMC-USP
·      Cristina Ciferri, ICMC-USP
·      Eduardo Hruschka, ICMC-USP
·      Estevam Hruschka, UFSCar
·      Gustavo Batista, ICMC-USP (Coordinator)
·      Márcio Basgalupp, UNIFESP

Registration fees:

Before June 30th
·      Student NAP-AMDA Member: R$ 100,00
·      Student Non NAP-AMDA Member: R$ 150,00
·      Professional NAP-AMDA Member: R$ 200,00
·      Professional Non NAP-AMDA Member: R$ 300,00

AfterJune 30th
·      Student NAP-AMDA Member: R$ 150,00
·      Student Non NAP-AMDA Member: R$ 200,00
·      Professional NAP-AMDA Member: R$ 250,00
·      Professional Non NAP-AMDA Member: R$ 350,00

Venue: The II MLKDD will take place in the Instituto de Ciências
Matemáticas e de Computação, Universidade de São Paulo, in São Carlos,
Brazil.

Short Courses Titles and Abstracts:

Title:  Predicting the rankings of financial analysts, supporting big
data projects and other applications of label ranking methods
Researcher: Carlos Soares, Universidade do Porto, Portugal
Abstract: Label ranking (LR) is a machine learning task where the goal
is to predict the ranking of a set of labels for an example rather
than choosing the label to which the example belongs, as in
classification. LR can be used for trading in the stock market. The
goal is to predict which analysts will make accurate recommendations
concerning whether to buy or sell a stock. This approach assumes that
predicting the performance of these analysts is less complex than
predicting the behavior of the markets. Given that in a given period,
1) different analysts may make recommendations but 2) not all analysts
are expected to make them, we need to predict their relative
performance. Therefore, the problem can be naturally formulated as a
label ranking task, where the goal is to predict the ranking of
analysts. Another problem that can be addressed as a LR task is the
management of large number of data mining models, as is becoming
increasingly common in big data applications. This talk will be a
general introduction to LR: definition of LR; description of a few
adaptations of algorithms for LR; evaluation of LR models; and LR
applications.

Title:  Multi-label Classification
Researcher: Jesse Read, University Carlos III de Madrid, Spain
Abstract: In multi-label classification, each data instance may be
associated with multiple class labels, as opposed to a single class.
The multi-label context arises naturally in many domains, such as text
categorization and labeling images. The main challenge is modeling
dependencies between labels, which must be done efficiently to scale
up to datasets (or data streams) of large dimensions. This tutorial
reviews some of the most important methods for multi-label
classification, both of the ‘data transformation' and ‘algorithm
adaptation' variety. Many practical applications will be described, as
well as a thorough analysis of label dependency, and the connections
to closely related problems like multi-output and structured-output
prediction.

Title: Knowledge Discovery from Data Streams
Researcher: João Gama, Universidade do Porto, Portugal
Abstract: The data streams computational model is one of the most
challenging topics for intelligent systems. The key characteristics of
data streams are its evolving nature, potentially infinite, and
unknown dynamics. These aspects are relevant for most of the
applications we are faced today. It poses new problems for data
analysis: we need to model evolving data in real-time using limited
computational resources, detect change points, discard outdated
information, etc. Data stream analysis has been used with all kind of
decision models, and learning tasks. The goal of this course is to
present the state-of-the-art in mining data streams and discuss open
research problems, issues, and challenges in this area. We will focus
on processing distributed data streams, a topic of paramount
importance in the research community, as new algorithms are needed to
process this streaming data in real time. We will present techniques
for change detection, clustering, classification, and frequent
patterns from distributed data streams. Finally the course will
conclude with open issues and future directions.

Title: Big Data Analytics
Researcher: Marko Grobelnik, Jožef Stefan Institute, Slovenia
Abstract: Is Big Data Analytics different? ...does it bring anything
new? Quick answer is: YES. The fundamental intuitions are mostly the
same as we are used to from the more traditional data analytics, but
we need to be aware of somewhat extended set of tools and algorithms
which allow us to deal with large data sets. In this tutorial we will
cover classes of algorithms which mostly existed already in the past
(in the non-Big Data era) but got higher importance with the
appearance of large and fast data sets. This includes emphasis on
algorithms for smart sampling, stream processing, locality sensitive
hashing, distributed computing and more. Tutorial will cover
introduction to the Big Data research from the side of the algorithms,
tools, applications, market, and corresponding literature.

Title: Comparing apples and oranges: towards commensurate evaluation
metrics in classification
Researcher: Peter Flach, University of Bristol, UK
Abstract: A wide range of evaluation metrics exists in supervised
learning, including accuracy, area under the ROC curve (AUC) and Brier
score. At first sight these metrics assess different aspects of a
predictive model's performance: accuracy measures classification
performance (ability to assign the correct class), AUC measures
ranking performance (ability to score positives higher than negatives)
and Brier score assesses scoring performance (ability to assign
probabilities close to the 'ideal' 0/1 values). While it thus appears
that these measures are not directly comparable, in this talk I will
discuss a framework whereby each measure can be directly related to
expected classification loss under varying operating conditions in
terms of class and cost distributions, utilising the notion of a
threshold selection method. One of the results of this work is a new
interpretation of AUC in terms of expected loss under a novel
rate-driven threshold selection method. This result is particularly
relevant as it provides a rebuttal of recent criticisms of AUC as
being fundamentally incoherent.

Lecturers Short Bio:

Carlos Soares is a well-known researcher in meta-learning. He received
his B.Sc. degree in Systems Engineering and Informatics from
Universidade do Minho, Portugal. He received his M.Sc. degree in
Artiﬁcial Intelligence and his Ph.D. in Computer Science from
Universidade do Porto, Portugal. He is an Associate Professor at
Faculdade de Engenharia, Universidade do Porto. His main interests are
Machine Learning, Data Mining, Meta-learning and Data Streams.

Jesse Read is a researcher at the University Carlos III of Madrid,
Spain. In 2010 he completed his Ph.D. in the Department of Computer
Science at the University of Waikato in New Zealand, with a thesis on
scalable multi-label classification. His has continued to work in this
area, as well as in data streams, wireless sensor networks, and
multi-output prediction.

João Gama is one of the world experts in data streams. He is Associate
professor at the University of Porto and researcher at LIAAD-Inesc
Tec. He served as PI in several FCT projects in learning adaptive
systems. He published more than 110 papers in major International
conferences and journals, served as PC chair at ECML05, DS09, ADMA09,
and Conference Chair at IDA11. He co-organized a series of workshops
on learning from data streams in conjunction with ECML-PKDD, KDD, SAC
and ICML. He is member of the editorial board of MLJ, DAMI, NGC, and
PAI and he is author of a recent book in Knowledge Discovery from Data
Streams.

Marko Grobelnik is an expert in the areas of analysis of large amounts
of complex data with the purpose to extract useful knowledge. In
particular, the areas of expertise comprise: Data Mining, Text Mining,
Information Extraction, Link Analysis, and Data Visualization as well
as more integrative areas such as Semantic Web, Knowledge Management
and Artificial Intelligence. Apart from research on theoretical
aspects of data analysis techniques he has considerable experience in
the field of practical applications and development of business
solutions based on the innovative technologies. His main achievements
are from the field of Text-Mining (analysis of large amounts of
textual data), having leading role on scientific and applicative
projects funded by European Commission, having projects with
industries such as Microsoft Research, British Telecom, New York
Times, Siemens, and organizing several international events on the
related topics

Peter Flach has been Professor of Artificial Intelligence at the
University of Bristol since 2003. An internationally leading
researcher in the areas of mining highly structured data and the
evaluation and improvement of machine learning models using ROC
analysis, he has also published on the logic and philosophy of machine
learning, and on the combination of logic and probability. He is
author of Simply Logical: Intelligent Reasoning by Example (John
Wiley, 1994) and Machine Learning: the Art and Science of Algorithms
that Make Sense of Data (Cambridge University Press, 2012). Prof Flach
is the Editor-in-Chief of the Machine Learning journal, one of the two
top journals in the field that has been published for over 25 years by
Kluwer and now Springer. He was Programme Co-Chair of the 1999
International Conference on Inductive Logic Programming, the 2001
European Conference on Machine Learning, the 2009 ACM Conference on
Knowledge Discovery and Data Mining, and the 2012 European Conference
on Machine Learning and Knowledge Discovery in Databases in Bristol.