Defesa de Mestrado de Patrícia Raia Nogueira Cavoto

Título do Trabalho
ReGraph: interligando bancos de dados relacionais e de grafos
Patrícia Raia Nogueira Cavoto
Add to Calender 2016-02-04 00:00:00 2016-02-04 00:00:00 Defesa de Mestrado de Patrícia Raia Nogueira Cavoto ReGraph: interligando bancos de dados relacionais e de grafos Auditório do IC 2 - Sala 85 INSTITUTO DE COMPUTAÇÃO America/Sao_Paulo public
14:00 h
Auditório do IC 2 - Sala 85
André Santanchè (IC/UNICAMP)
Banca Examinadora

André Santanchè (IC/UNICAMP)
Rodrigo Dias Arruda Senra (EMC Brazil Research & Development Center)
Ricardo da Silva Torres (IC/UNICAMP)
Júlio Cesar dos Reis (IC/UNICAMP)
Luciano Antonio Digiampietri (EACH/USP)


Networks are everywhere. From social interactions: family, friends, hobbies; passing through computer science: computers connected on the Internet; to nature: as food chains. Recent research shows the importance of links and network analysis to discover knowledge in existing data. Moreover, the Linked Open Data and Semantic Web efforts empowered the fast growth of open knowledge repositories on the web, mainly in the RDF (Resource Description Framework) graph model. However, most data is stored in relational databases, whose model has not been designed to address queries focusing in the connections. In contrast, the flexible graph model is suitable for data analysis focusing on links and the network topology, e.g., a connected component analysis. Therefore, our research is inspired by the data OLAP (OnLine Analytical Processing) approach of creating a special database designed for data analysis, a network-driven data analysis, using graph databases. In this dissertation, we present ReGraph, a framework to map data from a relational to a graph database, managing a dynamic coexistence and evolution of both, not supported by related work. ReGraph has minimal impact in the existing infrastructure, providing a flexible and tailored graph model for each relational schema. It uses an initial ETL (Extract, Transform and Load) process to replicate the existing data in the graph database. A scheduled service is responsible for reflecting changes in the relational data into the graph, keeping both synchronized. ReGraph also provides an annotation functionality to materialize inferences and to support data enrichment, which enables linking the local database to global knowledge graphs on the Web. We have used the ReGraph framework to generate FishGraph, a graph database created from the FishBase relational database. Using FishGraph we analyzed the connections among thousands of identification keys and species, and we have linked local data to DBpedia, creating annotations over the local graph, providing new knowledge over the existing data.