Defesa de Mestrado de Patrícia Raia Nogueira Cavoto

Título do Trabalho
ReGraph: interligando bancos de dados relacionais e de grafos
Candidato(a)
Patrícia Raia Nogueira Cavoto
Nível
Mestrado
Data
Add to Calender 2016-02-04 00:00:00 2016-02-04 00:00:00 Defesa de Mestrado de Patrícia Raia Nogueira Cavoto ReGraph: interligando bancos de dados relacionais e de grafos Auditório do IC 2 - Sala 85 INSTITUTO DE COMPUTAÇÃO mauroesc@ic.unicamp.br America/Sao_Paulo public
Horário
14:00 h
Local
Auditório do IC 2 - Sala 85
Orientador(a)
André Santanchè (IC/UNICAMP)
Banca Examinadora

Titulares:
André Santanchè (IC/UNICAMP)
Rodrigo Dias Arruda Senra (EMC Brazil Research & Development Center)
Ricardo da Silva Torres (IC/UNICAMP)
Suplentes:
Júlio Cesar dos Reis (IC/UNICAMP)
Luciano Antonio Digiampietri (EACH/USP)
 

Resumo

Networks are everywhere. From social interactions: family, friends, hobbies; passing through computer science: computers connected on the Internet; to nature: as food chains. Recent research shows the importance of links and network analysis to discover knowledge in existing data. Moreover, the Linked Open Data and Semantic Web efforts empowered the fast growth of open knowledge repositories on the web, mainly in the RDF (Resource Description Framework) graph model. However, most data is stored in relational databases, whose model has not been designed to address queries focusing in the connections. In contrast, the flexible graph model is suitable for data analysis focusing on links and the network topology, e.g., a connected component analysis. Therefore, our research is inspired by the data OLAP (OnLine Analytical Processing) approach of creating a special database designed for data analysis, a network-driven data analysis, using graph databases. In this dissertation, we present ReGraph, a framework to map data from a relational to a graph database, managing a dynamic coexistence and evolution of both, not supported by related work. ReGraph has minimal impact in the existing infrastructure, providing a flexible and tailored graph model for each relational schema. It uses an initial ETL (Extract, Transform and Load) process to replicate the existing data in the graph database. A scheduled service is responsible for reflecting changes in the relational data into the graph, keeping both synchronized. ReGraph also provides an annotation functionality to materialize inferences and to support data enrichment, which enables linking the local database to global knowledge graphs on the Web. We have used the ReGraph framework to generate FishGraph, a graph database created from the FishBase relational database. Using FishGraph we analyzed the connections among thousands of identification keys and species, and we have linked local data to DBpedia, creating annotations over the local graph, providing new knowledge over the existing data.