Palestra: RSS WEB SYNDICATION - A PRECISE CHARACTERIZATION OF BOTH RSS ITEMS AND SUBSCRIPTIONS.
Prof. Cédric du Mouza, CNAM, Paris, France, na Série de Seminários 2010 da Pós-Graduação, dia 13/08/2010, às 14:00 h, Auditório do IC, Sala 85 - IC 2.
| What | Palestra |
|---|---|
| When |
13/08/2010 from 14:00 to 15:00 |
| Where | Auditório do IC - Sala 85 - IC 2 |
| Add event to calendar |
|
With the continuous growth of online information, content
syndication has become a popular means for timely information
delivery on the Web. It essentially enhances traditional
pull-oriented searching and browsing of web pages with
push-oriented publishing formats and subscription protocols of web
content. Today, web syndication technologies such as RSS or Atom
are widely used in a variety of applications spreading from
large-scale news or social media (blog, wiki, etc.) broadcasting
to small-scale information sharing in scientific and professional.
In this context, information publishers deliver brief summaries,
called feed items, of the information they publish on the Web,
while information consumers using adequate RSS/Atom software
subscribe to a number of feeds seen as information channels and
get informed about the addition of new items. Feed publishing rate,
structure ofitems, vocabulary evolution over time are among the
characteristics of feeds we are interested in.
In this talk, I will present such characteristics of a testbed
of 7811 RSS/Atom feeds (with a total number of 12,275,588 items)
from a large collection of 13588 feeds we acquired from 2/03/2010
to 30/05/2010 (3 months) in the context of the French ANR Roses
project (www-bd.lip6.fr/roses/doku.php?id=start).
A precise characterization of web feeds is crucial in order to
build the next-generation of RSS/Atom technologies supporting
advanced functionalities for rapidly growing communities of
publishers and subscribers. To tune refreshing policies, to
benchmark scalability and performance of RSS/Atom subscription
indexes or to evaluate effectiveness of textual stream mining,
retrieval, recommendation and enrichment techniques are some of
the tasks which require a precise characterization of feeds.
Compared to previous empirical studies in this area, we focus on
the structure and content characteristics of RSS/Atom items as
well as on the quality of the vocabulary employed by RSS/Atom items.
During my talk I will first present our primary observations
regarding the content and quality of RSS feeds and items. Then I
will show a similar analytical work on web queries. Finally I will
illustrate some current work about indexing RSS subscriptions at
the scale of the Web that is strongly relying on the properties of
RSS items/feeds and web queries identified previously.
================================================
Cédric du Mouza is Assistant Professor in the database and
information system group of the Conservatoire National des Arts et
Métiers (CNAM) in Paris. He has received a Ph.D. in computer science
from the C.N.A.M. in 2005 and two M.Sc. (University Pierre et Marie
Curie-Paris VI and University of Manchester). He also holds a
engineering diploma from the Institut d'Informatique d'Entreprise (IIE).
His research work in the CEDRIC lab. mainly focuses on the distributed
representation, indexing and querying of multi-dimensional and
plain-text data. He also works on data quality (semantics, privacy
protection). He is author or co-author of research papers published in
major journals or conferences (ICDE, CIKM, GeoInformatica, VLDBJ,
ACM-GIS, etc).
=================================================
Organizadora: Profa. Ariadne Maria Brito Rizzoni Carvalho
(ariadne@ic.unicamp.br)
IC -- Unicamp / Fone: (019) 3521-5864
=================================================
