Personal tools
Home Pós-Graduação Informações para Alunos e Docentes do Programa Seminários no IC-UNICAMP Seminários de Pesquisa do IC Palestra: RSS WEB SYNDICATION - A PRECISE CHARACTERIZATION OF BOTH RSS ITEMS AND SUBSCRIPTIONS.
Navigation
IC 40 anos
 
Document Actions

Palestra: RSS WEB SYNDICATION - A PRECISE CHARACTERIZATION OF BOTH RSS ITEMS AND SUBSCRIPTIONS.

Prof. Cédric du Mouza, CNAM, Paris, France, na Série de Seminários 2010 da Pós-Graduação, dia 13/08/2010, às 14:00 h, Auditório do IC, Sala 85 - IC 2.

What Palestra
When 13/08/2010
from 14:00 to 15:00
Where Auditório do IC - Sala 85 - IC 2
Add event to calendar vCal
iCal

With the continuous growth of online information, content 
syndication has become a popular means for timely information 
delivery on the Web. It essentially enhances traditional 
pull-oriented searching and browsing of web pages with 
push-oriented publishing formats and subscription protocols of web 
content. Today, web syndication technologies such as RSS or Atom 
are widely used in a variety of applications spreading from 
large-scale news or social media (blog, wiki, etc.) broadcasting 
to small-scale information sharing in scientific and professional. 
In this context, information publishers deliver brief summaries, 
called feed items, of the information they publish on the Web, 
while information consumers using adequate RSS/Atom software 
subscribe to a number of feeds seen as information channels and 
get informed about the addition of new items. Feed publishing rate, 
structure ofitems, vocabulary evolution over time are among the 
characteristics of feeds we are interested in. 
In this talk, I will present such characteristics of a testbed 
of 7811 RSS/Atom feeds (with a total number of 12,275,588 items) 
from a large collection of 13588 feeds we acquired from 2/03/2010 
to 30/05/2010 (3 months) in the context of the French ANR Roses 
project (www-bd.lip6.fr/roses/doku.php?id=start). 
A precise characterization of web feeds  is crucial in order to 
build the next-generation of RSS/Atom technologies supporting 
advanced functionalities for rapidly growing communities of 
publishers and subscribers. To tune refreshing policies, to
benchmark scalability and performance of RSS/Atom subscription 
indexes or to evaluate effectiveness of textual stream mining,
retrieval, recommendation and enrichment techniques are some of 
the tasks which require a precise characterization of feeds. 
Compared to previous empirical studies in this area, we focus on 
the structure and content characteristics of RSS/Atom items as 
well as on the quality of the vocabulary employed by RSS/Atom items. 
During my talk I will first present our primary observations 
regarding the content and quality of RSS feeds and items. Then I 
will show a similar analytical work on web queries. Finally I will 
illustrate some current work about indexing RSS subscriptions at 
the scale of the Web that is strongly relying on the properties of 
RSS items/feeds and web queries identified previously.

================================================

Cédric du Mouza is Assistant Professor in the database and 
information system group of the Conservatoire National des Arts et 
Métiers (CNAM) in Paris. He has received a Ph.D. in computer science 
from the C.N.A.M. in 2005 and two M.Sc. (University Pierre et Marie 
Curie-Paris VI and University of Manchester). He also holds a 
engineering diploma from the Institut d'Informatique d'Entreprise (IIE). 
His research work in the CEDRIC lab. mainly focuses on the distributed 
representation, indexing and querying of multi-dimensional and 
plain-text data. He also works on data quality (semantics, privacy 
protection). He is author or co-author of research papers published in 
major journals or conferences (ICDE, CIKM, GeoInformatica, VLDBJ,
ACM-GIS, etc). 

=================================================

Organizadora: Profa. Ariadne Maria Brito Rizzoni Carvalho 
(ariadne@ic.unicamp.br)
IC -- Unicamp   /   Fone: (019) 3521-5864

=================================================


Instituto de Computação :: Universidade Estadual de Campinas
Av. Albert Einstein, 1251 - Cidade Universitária • CEP 13083-852 • Campinas/SP - Brasil • Fone: [19] 3521-5838