II School on Machine Learning and Knowledge Discovery in Databases (MLKDD) July 14th – July 17th, 2013 São Carlos – SP – Brazil http://www.amda.icmc.usp.br/mlkdd2013/ mlkddschool@gmail.com ** Registration is open via the school website. We highly encourage early registration, as enrollment is limited to a restricted number of participants. ** ** II MLKDD School will be held concurrently with the first Symposium on Knowledge Discovery, Mining and Learning (KDMiLe) - http://kdmile.linkedej.com.br ** With the big data wave, the areas of machine learning and knowledge discovery in databases have grown in importance, attracting the attention of research institutions and companies. Consequently, there is an imminent need for skilled people able to work in these areas. The Second School on Machine Learning and Knowledge Discovery in Databases (II MLKDD) aims to cultivate skilled professionals in these areas by providing short courses from world-class specialists. The II MLKDD will occur in São Carlos, SP, in July 14th to 17th. The previous school, I MLKDD, occurred in Rio de Janeiro, in 1998. The II MLKDD will provide short courses in current research issues in machine learning and data mining. It will be held concurrently with the first Symposium on Knowledge Discovery, Mining and Learning (KDMiLe). The II MLKDD School will offer short courses taught by well-known international researchers from the areas of machine learning and data mining: · Carlos Soares, Universidade do Porto, Portugal · Jesse Read, Universidad Carlos III de Madrid, Spain · João Gama, Universidade do Porto, Portugal · Marko Grobelnik, Jožef Stefan Institute, Slovenia · Peter A. Flach, University of Bristol, United Kingdom The school will be sponsored by the Research Support Center for Machine Learning in Data Analysis, NAP-AMDA, from the Universidade de São Paulo. Organizing Committee: · André de Carvalho, ICMC-USP · Cristina Ciferri, ICMC-USP · Eduardo Hruschka, ICMC-USP · Estevam Hruschka, UFSCar · Gustavo Batista, ICMC-USP (Coordinator) · Márcio Basgalupp, UNIFESP Registration fees: Before June 30th · Student NAP-AMDA Member: R$ 100,00 · Student Non NAP-AMDA Member: R$ 150,00 · Professional NAP-AMDA Member: R$ 200,00 · Professional Non NAP-AMDA Member: R$ 300,00 AfterJune 30th · Student NAP-AMDA Member: R$ 150,00 · Student Non NAP-AMDA Member: R$ 200,00 · Professional NAP-AMDA Member: R$ 250,00 · Professional Non NAP-AMDA Member: R$ 350,00 Venue: The II MLKDD will take place in the Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, in São Carlos, Brazil. Short Courses Titles and Abstracts: Title: Predicting the rankings of financial analysts, supporting big data projects and other applications of label ranking methods Researcher: Carlos Soares, Universidade do Porto, Portugal Abstract: Label ranking (LR) is a machine learning task where the goal is to predict the ranking of a set of labels for an example rather than choosing the label to which the example belongs, as in classification. LR can be used for trading in the stock market. The goal is to predict which analysts will make accurate recommendations concerning whether to buy or sell a stock. This approach assumes that predicting the performance of these analysts is less complex than predicting the behavior of the markets. Given that in a given period, 1) different analysts may make recommendations but 2) not all analysts are expected to make them, we need to predict their relative performance. Therefore, the problem can be naturally formulated as a label ranking task, where the goal is to predict the ranking of analysts. Another problem that can be addressed as a LR task is the management of large number of data mining models, as is becoming increasingly common in big data applications. This talk will be a general introduction to LR: definition of LR; description of a few adaptations of algorithms for LR; evaluation of LR models; and LR applications. Title: Multi-label Classification Researcher: Jesse Read, University Carlos III de Madrid, Spain Abstract: In multi-label classification, each data instance may be associated with multiple class labels, as opposed to a single class. The multi-label context arises naturally in many domains, such as text categorization and labeling images. The main challenge is modeling dependencies between labels, which must be done efficiently to scale up to datasets (or data streams) of large dimensions. This tutorial reviews some of the most important methods for multi-label classification, both of the ‘data transformation' and ‘algorithm adaptation' variety. Many practical applications will be described, as well as a thorough analysis of label dependency, and the connections to closely related problems like multi-output and structured-output prediction. Title: Knowledge Discovery from Data Streams Researcher: João Gama, Universidade do Porto, Portugal Abstract: The data streams computational model is one of the most challenging topics for intelligent systems.
The key characteristics of data streams are its evolving nature, potentially infinite, and unknown dynamics. These aspects are relevant for most of the applications we are faced today. It poses new problems for data analysis: we need to model evolving data in real-time using limited computational resources, detect change points, discard outdated information, etc.
Data stream analysis has been used with all kind of decision models, and learning tasks.
The goal of this course is to present the state-of-the-art in mining data streams and discuss open research problems, issues, and challenges in this area. We will focus on processing distributed data streams, a topic of paramount importance in the research community, as new algorithms are needed to process this streaming data in real time. We will present techniques for change detection, clustering, classification, and frequent patterns from distributed data streams. Finally the course will conclude with open issues and future directions. Title: Big Data Analytics Researcher: Marko Grobelnik, Jožef Stefan Institute, Slovenia Abstract: Is Big Data Analytics different? ...does it bring anything new? Quick answer is: YES. The fundamental intuitions are mostly the same as we are used to from the more traditional data analytics, but we need to be aware of somewhat extended set of tools and algorithms which allow us to deal with large data sets. In this tutorial we will cover classes of algorithms which mostly existed already in the past (in the non-Big Data era) but got higher importance with the appearance of large and fast data sets. This includes emphasis on algorithms for smart sampling, stream processing, locality sensitive hashing, distributed computing and more. Tutorial will cover introduction to the Big Data research from the side of the algorithms, tools, applications, market, and corresponding literature. Title: Comparing apples and oranges: towards commensurate evaluation metrics in classification Researcher: Peter Flach, University of Bristol, UK Abstract: A wide range of evaluation metrics exists in supervised learning, including accuracy, area under the ROC curve (AUC) and Brier score. At first sight these metrics assess different aspects of a predictive model's performance: accuracy measures classification performance (ability to assign the correct class), AUC measures ranking performance (ability to score positives higher than negatives) and Brier score assesses scoring performance (ability to assign probabilities close to the 'ideal' 0/1 values). While it thus appears that these measures are not directly comparable, in this talk I will discuss a framework whereby each measure can be directly related to expected classification loss under varying operating conditions in terms of class and cost distributions, utilising the notion of a threshold selection method. One of the results of this work is a new interpretation of AUC in terms of expected loss under a novel rate-driven threshold selection method. This result is particularly relevant as it provides a rebuttal of recent criticisms of AUC as being fundamentally incoherent. Lecturers Short Bio: Carlos Soares is a well-known researcher in meta-learning. He received his B.Sc. degree in Systems Engineering and Informatics from Universidade do Minho, Portugal. He received his M.Sc. degree in Artificial Intelligence and his Ph.D. in Computer Science from Universidade do Porto, Portugal. He is an Associate Professor at Faculdade de Engenharia, Universidade do Porto. His main interests are Machine Learning, Data Mining, Meta-learning and Data Streams. Jesse Read is a researcher at the University Carlos III of Madrid, Spain. In 2010 he completed his Ph.D. in the Department of Computer Science at the University of Waikato in New Zealand, with a thesis on scalable multi-label classification. His has continued to work in this area, as well as in data streams, wireless sensor networks, and multi-output prediction. João Gama is one of the world experts in data streams. He is Associate professor at the University of Porto and researcher at LIAAD-Inesc Tec. He served as PI in several FCT projects in learning adaptive systems. He published more than 110 papers in major International conferences and journals, served as PC chair at ECML05, DS09, ADMA09, and Conference Chair at IDA11. He co-organized a series of workshops on learning from data streams in conjunction with ECML-PKDD, KDD, SAC and ICML. He is member of the editorial board of MLJ, DAMI, NGC, and PAI and he is author of a recent book in Knowledge Discovery from Data Streams. Marko Grobelnik is an expert in the areas of analysis of large amounts of complex data with the purpose to extract useful knowledge. In particular, the areas of expertise comprise: Data Mining, Text Mining, Information Extraction, Link Analysis, and Data Visualization as well as more integrative areas such as Semantic Web, Knowledge Management and Artificial Intelligence. Apart from research on theoretical aspects of data analysis techniques he has considerable experience in the field of practical applications and development of business solutions based on the innovative technologies. His main achievements are from the field of Text-Mining (analysis of large amounts of textual data), having leading role on scientific and applicative projects funded by European Commission, having projects with industries such as Microsoft Research, British Telecom, New York Times, Siemens, and organizing several international events on the related topics Peter Flach has been Professor of Artificial Intelligence at the University of Bristol since 2003. An internationally leading researcher in the areas of mining highly structured data and the evaluation and improvement of machine learning models using ROC analysis, he has also published on the logic and philosophy of machine learning, and on the combination of logic and probability. He is author of Simply Logical: Intelligent Reasoning by Example (John Wiley, 1994) and Machine Learning: the Art and Science of Algorithms that Make Sense of Data (Cambridge University Press, 2012). Prof Flach is the Editor-in-Chief of the Machine Learning journal, one of the two top journals in the field that has been published for over 25 years by Kluwer and now Springer. He was Programme Co-Chair of the 1999 International Conference on Inductive Logic Programming, the 2001 European Conference on Machine Learning, the 2009 ACM Conference on Knowledge Discovery and Data Mining, and the 2012 European Conference on Machine Learning and Knowledge Discovery in Databases in Bristol.