06 dez 2022
15:00 Master's Defense Fully the Distance
Theme
Abstractive summarization of podcasts using longformers
Student
Edgar Kenji Tanaka
Advisor / Teacher
Advisor: Jacques Wainer/Co-advisor: Ann Clifton
Brief summary
"Podcasts have established themselves as an important source of audio content these days. As the number of podcasts increases, the need for good descriptions to help users decide whether or not to listen to a particular episode becomes increasingly evident. However, descriptions provided by podcast creators often lack important information about the episode. Additionally, these descriptions are often used for product advertising or social media outreach. As an alternative to these descriptions provided by creators, the auto-summarization task of podcasts was proposed at the TREC 2020 conference. Many researchers have proposed different models based on deep learning to solve this problem. However, all proposed models were restricted to English podcasts only. As podcast consumption increases globally, it is critical to explore models capable of ingesting and generating text in multiple languages. In this master's thesis, we investigated the application of transformer-based multilingual models to automatically generate abstractive summaries from podcast transcripts. We experimented and contrasted models with a full self-attention mechanism and a Longformer self-attention mechanism. In addition, we studied the impact of fine-tuning these models in a monolingual and bilingual way. Finally, we explore the phenomenon of cross lingual transfer learning in the context of summarizing multilingual podcasts. The scope of our research is limited to English and Portuguese, but the methodology proposed here can be generalized to any other set of languages. "
Examination Board
Headlines:
Jacques Wainer IC / UNICAMP
Julio Cesar dos Reis IC / UNICAMP
Thiago Alexandre Salgueiro Pardo ICMC / USP
Substitutes:
Sandra Eliza Fontes de Avila IC / UNICAMP
Norton Trevisan Roman EACH / USP