Summary: This Document describes the Final Graduation Project by student João Guilherme Alves Santos, student of Computer Engineering AA modality 2018, ra 199624. The student's Final Graduation project was guided by Prof. Dr. Maria Cecília Calani Baranauskas, full professor at the University State of Campinas (UNICAMP), affiliated as collaborator at the Institute of Computing and co-supervised by Prof. Dr. Emanuel Felipe Duarte, PhD in Computer Science from the University State of Campinas and Yusseli Lizeth Méndez Mendoza, master in Computer Science from Campinas State University.
The project was executed in the second half of 2023, through of the discipline MC030 - Final Graduation Project.
Summary: This is the project report carried out as a Course Completion Work at the Institute of Computing. The objective of the work is to study a new method for extracting plot units using large language models.
Automatic text generation is a challenging process, due to difficulties for models in maintaining cohesion and coherence throughout the generation process. To assist this generation process, many works use plot units, defined as controllable structured representations of text, to help models maintain coherence throughout the generated story. Due to their standardized structure, plots are easily manipulated and can be used to generate incoherent stories. These stories are intended to be used to train coherence classification models, something that also helps to guarantee the coherence of generation models. In this work, we present a new method for extracting plot units from coherent stories, using Large Language Models (LLMs). Vicuna-13b, an open-source chatbot trained from the LLaMA model that can generate texts with a quality similar to models such as GPT-4, and WizardLM, an LLM specifically trained to carry out complex instructions, were studied. The results were evaluated qualitatively, considering the formatting and quality of the information present in the plots, and the proposed method was compared with other conventional extraction methods. The results show the advantages of using LLMs for plot extraction, and current challenges for the method and how they should be addressed in the future are discussed.
Abstract: The present report details the development of a framework for the construction of self-contained pipelines for data processing, using Dask and Argo for the generation of these pipelines. This structure benefits from Dask Kubernetes Operator for adaptive scaling, automatically adjusting according to demand and complexity of tasks. Implemented on the Kubernetes platform, this framework ensures scalability, flexibility, and optimization of resources. The system was designed to cater to a wide range of applications, being especially relevant in areas such as seismic data analysis and ETL (Extraction, Transformation, and Loading) processes. Thanks to the efficient integration between Dasf, Dask, Argo, RapidsAI, and Kubernetes, it is possible to handle workloads of various sizes and intensities, enhancing the processing and analysis of large volumes of data. In a scenario where data production is growing exponentially, the need for robust and scalable like solutions this becomes essential. This project is an important step towards scalable solutions, paving the way for future innovations in the field of large-scale data processing.
Abstract: Credit scoring is a crucial element in the economy sector, relying on an individual's trustworthiness in honoring financial commitments. Institutions providing credit increasing leverage Artificial Intelligence (AI) for widespread credit assessments. However, the integration of Machine Learning (ML) models raises ethical concerns, particularly regarding biases inherent in AI models. This document explains ability methods based on feature importance and emphasizes the need for transparency in data utilization. The research aims to enhance model explainability and proposals techniques for a clearer understanding of model operations.
The study employs Feature Importance Techniques, like Permutation Importance, in order to study which features impact the model the most. To address data generation anomalies created by the previous method, Novelty Detection Algorithms, including Isolation Forest and Density Forest, are introduced. Additionally, we also explore Counterfactual Explanations as a method to explain ML model outcomes and how to change an specific data to retrieve a desired prediction.
Summary: Traditional monitoring systems traffic, based predominantly on models visuals, may have a high bandwidth cost. transmission and processing, in addition to not operating so well at night due to lighting reduced. This work addresses this limitation proposing a vehicle detection approach based on sensor integration, specifically microphones and accelerometers. It was possible to collect real data from a pathway and, after treatments and feature extraction, training models machine learning capable of detecting and classify vehicles based on their signatures sound. Analysis of the results obtained here seeks to provide useful information to improve traffic monitoring systems. Limitations, including dependence on the specific scenario, weather conditions and diversity of vehicles, indicate opportunities for future improvements aiming for a more robust application in several road environments.
Summary: With the consolidation of the vector extension for ISA RISC-V in 2021, its application in workloads highly parallelizable, such as HPC and Machine Learning, becomes more desirable. Still, no There is currently an open implementation of this extension for the Rocket processor. Therefore, We aimed to create a proof of concept of vector unit for this processor, serving basis for future developments. The unity implemented in the Chisel language within the Chipyard framework supports some of the instructions specification, from which we characterize its current performance. The theoretical analysis indicates benefits to be extracted from this approach that differ from experimental results, due to issues in implementation that require optimization. Future efforts should lead to gains significant performance gains, in addition to executing in FPGA environment for efficiency studies energy.
Abstract: With the consolidation of the vector extension of the RISC-V ISA in 2021, its application to highly parallelizable workloads such as HPC and Machine Learning becomes mandatory. Still, there is currently no open implementation of this extension for the Rocket processor. With this, we aimed to create a proof of concept of a vector unit for this processor, serving as a basis for future developments. The theoretical analysis indicates benefits to be extracted from this approach that differ from the experimental results, due to implementation issues that require optimization. Future efforts should lead to significant performance gains, in addition to execution in an FPGA environment for energy efficiency studies.
Summary: The String Least Common Partition Problem (MCSP) is used to compare strings with applications in computational biology. Good heuristics have great value for this problem, since it was proven to be NP-Hard. In addition to presenting implementations for known heuristics from literature, we developed a representation by efficient graph for MCSP instances, reducing it to a permutation problem and allowing the application of optimization algorithms to seek solutions. O Particle Swarm Optimization (PSO) was adapted for this representation and was capable of not only significantly improving the result of the other heuristics used, but also find good solutions independent, proving to be a meta-heuristic promising, especially for instances with few character repetitions. This job suggests the use of graph representation with other optimization methods for MCSP.
Summary: This work is a report on a final project of degree, aiming to study ways of analyze configurable systems, taking into account consideration of its complexity and operating conditions presence.
Based on this work, it was possible make a contribution to the analysis scenario of configurable systems in global academia, This scenario is still new and has few published study articles. In view of this, it is considered that the API built in this project can be useful for programmers looking for analyze configurable systems.
Summary: Systems biology, a young interdisciplinary area of ascending importance, dedicates himself to studying complex biological systems. The discovery of new elements, processes and phenomena through biology promotes an amplification of complexity who is involved in the analyzes carried out by this area, which highlights the need to create or adoption of new techniques and tools for this purpose. Recently, due to the intersection between fundamental concepts of systems biology and complex networks, the scientific community began to note the potential related to the use of approaches based on these networks in the studies developed by the area in question.
Having defined the context, we developed an experiment based on translation from the perspective of complex networks of a study on thyroid cancer and associated with the biology of systems. This research, starting from an approach table of analysis and the little explored panorama of gene regulation by microRNAs, resulted in a post-transcriptional regulatory network in the context of thyroid cancer. Our objective was to elucidate the importance of the network approach complex in research involving the biology of systems. To do this, we made use of the tools miRWalk and Neo4j, in addition to topological metrics centrality and community detection.
Summary: Emotions are physiological responses that can be measured by capturing physiological data through sensors. In this work we collected heartbeat rates measured through the optical sensor of a Galaxy Watch smartwatch with WearOS during a pilot workshop with the Aquarela Virtual system. The data obtained was filtered and processed in order to enable the analysis of affective states by pairing the execution of different phases of the pilot workshop with cardiac measurements carried out at different moments throughout the participants' experience. The objective of this work is to evaluate the ability of common wearable devices to monitor data that may be useful for predicting affective states continuously. These results can allow the personalization of interaction and new forms of design in systems that can integrate these devices.
Abstract: Emotions are physiological responses that can be measured through the capture of physiological data via sensors. In this study, we collected heart rate data measured through the optical sensor of a Galaxy Watch with WearOS during the implementation of a pilot workshop with the Aquarela Virtual system. The obtained data were filtered and processed to enable the analysis of affective states by pairing the execution of different phases of the pilot workshop with the cardiac measurements taken at different moments throughout the participants' experience. The aim of this study is to assess the capability of common wearable devices for monitoring data that may be useful for continuously predicting affective states. These findings may enable the customization of interaction and new design approaches in systems that can integrate these devices.
Abstract: Developing task-oriented conversational systems requires substantial annotated data, posing a challenges in Natural Language Processing (NLP). Manual annotation is time-consuming and error-prone, hindering progress for smaller AI teams. This work presents a novel Dialog Annotation Methodology and a ready-to-use adaptable software tool offering automatic annotation. The automatic annotation model is based on a cascade of Machine Learning and Large Language Model annotation to annotate entities and intentions in natural language dialogs.
Summary: The use of videos as an educational tool has become an increasingly common practice, which can enhance the assimilation of information and provide a more engaging and effective learning experience for the student. Solutions for creating quizzes to support the training and assessment of student knowledge, therefore, limit their potential by using only textual files to build databases, given that knowledge is currently widely disseminated through various means different communication channels. This work investigates techniques and tools that help in the automatic transcription of video files into text format. To do this, we studied OpenAI Whisper, an open-source tool responsible for voice recognition and converting the audio obtained into textual format. The model allows for multilingual voice transcription. We conducted experimental evaluations in English, Portuguese and Spanish. Additionally, a Web platform designed as a space for the community to create, share and carry out quizzes collaboratively was structured.
Summary: This work presents an approach for training multilingual generative models for question-and-answer (QA) systems. The goal is to improve the accuracy of responses in different languages, using open-source models. To this end, datasets in Portuguese and Spanish were used, and the models were adjusted with specific machine learning techniques. The research also addresses the importance of flexible and secure models, ensuring greater control over the data. The results indicate that the use of multilingual models can improve accessibility and effectiveness in QA systems.
Abstract: With the rising demand for high-quality 360° video driven by increasing adoption of virtual reality technology, providing high visual quality for the user is crucial for a better end-user experience. This paper investigates the use of server-side request scheduling in order to mitigate visual degradation, especially tile missing ratio within the user's field of view. We evaluate three scheduling policies - first-in first-out (FIFO), strict priority (SP), and weighted fair queuing (WFQ) - through simulations and measure their effect, comparing benefits and drawbacks. These findings aim to provide insights for the selection of scheduling policies in real-world QUIC applications, contributing to ongoing development of immersive environmental solutions.
Summary: The recent Covid-19 pandemic has exposed problems regarding the Brazilian population’s access to the Internet and demonstrated that the process of digital inclusion in country is not complete. During this period challenging, it became evident that the lack of Internet access is not just a matter of convenience, but rather a significant barrier for access to essential information, education remote, online healthcare services and opportunities to work. The crisis revealed deep inequalities, where many Brazilians face difficulties to participate in daily activities due to lack of of connectivity. Students without Internet access faced obstacles in remote learning and teachers had problems teaching their classes, damaging the quality of teaching for many citizens.
There are significant disparities of internet access between different regions of the Brazil. While some urban areas have high connection rates, rural and peripheral areas face greater access difficulties. In Brazil, This economic reality imposes challenges in accessing technological equipment for a wide range of population, and it is in this context that the creation of affordable computers becomes essential.
Taking the aforementioned factors into account, the construction of low-cost personal computers can be a relevant solution and is precisely the objective of this project. In a recent operation the Brazilian Federal Revenue confiscated a large number of TV Boxes that would be used to illegal transmission of content. This job explored the use of this equipment as a basis for construction of low-cost computers. From installation of a new operating system and necessary adjustments, it is possible to take advantage of the hardware of these TV Boxes in a positive way. To the connect these devices to a remote desktop, it also opens up the possibility of improve system processing capacity and build a low-cost device that can be used for a variety of purposes, including the pedagogical.
Summary: This work explores the transformation capacity of a TV Box based on the Rockchip RK3229 SoC in a Internet of Things (IoT) device, addressing two strategies to overcome the challenge of no exposed pins by default. The first involves reusing connections for components existing on the board, such as LEDs and some connectors. This approach includes desoldering components and modifying or deactivating drivers, in addition to use of Device Tree Overlays to convert the pins available on generic GPIOs. The second strategy employs the board's debug UART to connect it to a microcontroller and use your pins. For this purpose, a driver was developed custom that exposes a fake GPIO chip on the TV Box, carrying out communication between transparent way to the user. Both approaches are detailed and compared, and their results discussed.
Abstract: This work explores the transformation capability of a TV Box based on the Rockchip RK3229 SoC into an Internet of Things (IoT) device, addressing two strategies to overcome the challenge of the absence of exposed pins by default. The first involves the reuse of connections to existing components on the board, such as LEDs and some connectors. This approach includes the desoldering of components and the modification or disabling of drivers, in addition to the use of Device Tree Overlays to convert the available pins into generic GPIOs. The second strategy employs the board's debug UART to connect it to a microcontroller and use its pins. For this, a custom driver was developed that exposes a fake GPIO chip on the TV Box, facilitating transparent communication between devices to the user. Both approaches are detailed and compared, and their results discussed.
Summary This one work explores the capacity for transformation of a TV Box based on the Rockchip RK3229 SoC in a Internet of Things (IoT) device, addressing strategies to overcome the challenge of the absence of exposed pines due to defect. There first implies the reuse of connections for existing components on the board, such as LEDs and some connectors. This approach includes desoldering of components and modification deactivation of drivers, in addition to the use of Device Tree Overlays to convert available pines in generic GPIOs. The second strategy employs Board debugging UART to connect it to a microcontroller and use its pins. For him if developed a custom driver that exposes a fake GPIO chip in the TV Box, carrying out communication between devices transparent to the user. Both approaches are detail and compare, and discuss their results.
Summary: The recent Covid-19 pandemic has exposed problems regarding the Brazilian population’s access to the Internet and demonstrated that the process of digital inclusion in country is not complete. During this period challenging, it became evident that the lack of Internet access is not just a matter of convenience, but rather a significant barrier for access to essential information, education remote, online healthcare services and opportunities to work. The crisis revealed deep inequalities, where many Brazilians face difficulties to participate in daily activities due to lack of of connectivity. Students without Internet access faced obstacles in remote learning and teachers had problems teaching their classes, damaging the quality of teaching for many citizens.
There are significant disparities of internet access between different regions of the Brazil. While some urban areas have high connection rates, rural and peripheral areas face greater access difficulties. In Brazil, This economic reality imposes challenges in accessing technological equipment for a wide range of population, and it is in this context that the creation of affordable computers becomes essential.
Taking the aforementioned factors into account, the construction of low-cost personal computers can be a relevant solution and is precisely the objective of this project. In a recent operation the Brazilian Federal Revenue confiscated a large number of TV Boxes that would be used to illegal transmission of content. This job explored the use of this equipment as a basis for construction of low-cost computers. From installation of a new operating system and necessary adjustments, it is possible to take advantage of the hardware of these TV Boxes in a positive way. To the connect these devices to a remote desktop, it also opens up the possibility of improve system processing capacity and build a low-cost device that can be used for a variety of purposes, including the pedagogical.
Summary: The recent Covid-19 pandemic has exposed problems regarding the Brazilian population’s access to the Internet and demonstrated that the process of digital inclusion in country is not complete. During this period challenging, it became evident that the lack of Internet access is not just a matter of convenience, but rather a significant barrier for access to essential information, education remote, online healthcare services and opportunities to work. The crisis revealed deep inequalities, where many Brazilians face difficulties to participate in daily activities due to lack of of connectivity. Students without Internet access faced obstacles in remote learning and teachers had problems teaching their classes, damaging the quality of teaching for many citizens.
There are significant disparities of internet access between different regions of the Brazil. While some urban areas have high connection rates, rural and peripheral areas face greater access difficulties. In Brazil, This economic reality imposes challenges in accessing technological equipment for a wide range of population, and it is in this context that the creation of affordable computers becomes essential.
Taking the aforementioned factors into account, the construction of low-cost personal computers can be a relevant solution and is precisely the objective of this project. In a recent operation the Brazilian Federal Revenue confiscated a large number of TV Boxes that would be used to illegal transmission of content. This job explored the use of this equipment as a basis for construction of low-cost computers. From installation of a new operating system and necessary adjustments, it is possible to take advantage of the hardware of these TV Boxes in a positive way. To the connect these devices to a remote desktop, it also opens up the possibility of improve system processing capacity and build a low-cost device that can be used for a variety of purposes, including the pedagogical.
Summary: The recent Covid-19 pandemic has exposed problems regarding the Brazilian population’s access to the Internet and demonstrated that the process of digital inclusion in country is not complete. During this period challenging, it became evident that the lack of Internet access is not just a matter of convenience, but rather a significant barrier for access to essential information, education remote, online healthcare services and opportunities to work. The crisis revealed deep inequalities, where many Brazilians face difficulties to participate in daily activities due to lack of of connectivity. Students without Internet access faced obstacles in remote learning and teachers had problems teaching their classes, damaging the quality of teaching for many citizens.
There are significant disparities of internet access between different regions of the Brazil. While some urban areas have high connection rates, rural and peripheral areas face greater access difficulties. In Brazil, This economic reality imposes challenges in accessing technological equipment for a wide range of population, and it is in this context that the creation of affordable computers becomes essential.
Taking the aforementioned factors into account, the construction of low-cost personal computers can be a relevant solution and is precisely the objective of this project. In a recent operation the Brazilian Federal Revenue confiscated a large number of TV Boxes that would be used to illegal transmission of content. This job explored the use of this equipment as a basis for construction of low-cost computers. From installation of a new operating system and necessary adjustments, it is possible to take advantage of the hardware of these TV Boxes in a positive way. To the connect these devices to a remote desktop, it also opens up the possibility of improve system processing capacity and build a low-cost device that can be used for a variety of purposes, including the pedagogical.
Summary: The recent Covid-19 pandemic has exposed problems regarding the Brazilian population’s access to the Internet and demonstrated that the process of digital inclusion in country is not complete. During this period challenging, it became evident that the lack of Internet access is not just a matter of convenience, but rather a significant barrier for access to essential information, education remote, online healthcare services and opportunities to work. The crisis revealed deep inequalities, where many Brazilians face difficulties to participate in daily activities due to lack of of connectivity. Students without Internet access faced obstacles in remote learning and teachers had problems teaching their classes, damaging the quality of teaching for many citizens.
There are significant disparities of internet access between different regions of the Brazil. While some urban areas have high connection rates, rural and peripheral areas face greater access difficulties. In Brazil, This economic reality imposes challenges in accessing technological equipment for a wide range of population, and it is in this context that the creation of affordable computers becomes essential.
Taking the aforementioned factors into account, the construction of low-cost personal computers can be a relevant solution and is precisely the objective of this project. In a recent operation the Brazilian Federal Revenue confiscated a large number of TV Boxes that would be used to illegal transmission of content. This job explored the use of this equipment as a basis for construction of low-cost computers. From installation of a new operating system and necessary adjustments, it is possible to take advantage of the hardware of these TV Boxes in a positive way. To the connect these devices to a remote desktop, it also opens up the possibility of improve system processing capacity and build a low-cost device that can be used for a variety of purposes, including the pedagogical.
Summary: This work is the final report of a final project degree, whose objective is to implement new functionalities, carry out tests and validate the consensus mechanism for Committeeless blockchain Proof-of-Stake (CPoS). More specifically, this work describes the development that allows the execution of the mechanism in a distributed manner, in addition to insertion of transactions into blockchain blocks, and evaluates the impact of these additions.
From carried out several experiments, we found a set of parameters that makes execution possible distributed CPoS, as well as the insertion of transactions in blocks. The results using These parameters proved to be satisfactory, but indicated points to be worked on to achieve better performance. In particular, We highlight the large volume of data transmitted by network, which poses as a great challenge to be faced for the use of CPoS in scenarios more realistic.
Summary: With recent advances in quantum computing, it is necessary to remodel our current cryptography systems, moving away from classical cryptography models and moving towards models known as post-quantum, cryptographic algorithms resistant to attacks from both traditional computers and computers. quantum. However, these algorithms tend to be more expensive in terms of computational resources to be implemented in software than current cryptographic algorithms, therefore creating an even greater need for optimized implementations for specific architectures. In this project, in addition to an explanation of the PERK post-quantum signature algorithm, a better optimized implementation for the ARMv8.5 architecture is carried out, and is then compared with its reference implementation, demonstrating its performance gains in execution cycles.
Abstract: RISC-V is a promising ISA and will soon be the architecture of many chips, specially embedded systems. It's necessary to guarantee that applications that run in systems designed with RISC-V will be at the same time secure and cryptographically fast. The NIST Lightweight Cryptography competition selected the finalist: Ascon, a family of cryptography algorithms designed to run on devices with low computational power. This research explores the Ascon family of algorithms on the RISC-V 64-bit architecture, analyzing the Ascon permutation and the Ascon-128 algorithm, and whether it's possible to optimize it for riscv64, proposing a new technique regarding the decryption implementation. The implementation developed in this research was benchmarked in the Allwinner D1 chip, a RISC-V 64-bit 1 GHz single-issue CPU supporting the RV64GC ISA, and compared with other implementations. Finally, it's discussed that new microarchitectures, and, the future of the RISC-V ISA with new instructions extensions recently ratified, could improve the performance of the Ascon family of algorithms and other cryptographic algorithms.
Abstract: Infrastructure as Code (IaC) is widely embraced for its ability to facilitate system infrastructure management, ensuring ease of modification and reproducibility. However, the inherent susceptibility of IaC configurations to security vulnerabilities requires specialized tools for code analysis. Building upon the work of Rahman et al., who identified 7 security smells present in IaC scripts and introduced SLIC, a static analysis tool for identifying security smells in Puppet scripts, this paper presents SLiTer — a tool designed to detect the same security smells in Terraform files. By doing so, we developed two Rule Engines to serve distinct purposes: the first faithfully translated SLIC rules to establish a baseline, while the second incorporated modifications to enhance accuracy when applied to Terraform configurations. Evaluating SLiTer on 105 Terraform files from 15 directories revealed the most prevalent security smell as "Hard-coded secret," aligning with findings in the original work. SLiTer may prove valuable for practitioners seeking to identify general security smells in Terraform configurations, complementing other tools like Sonar or tfparse for provider-specific issues.
Summary: In this study, we explore the potential impacts resulting from the use of Feature Flags in code maintenance. The central objective is understand whether the adoption of these flags is associated problems in software maintenance. To reach For this purpose, we conducted a Code assessment Smells in an open source repository, followed by static analysis and information crossing about files that incorporate Feature Flags. A identification of these files was carried out by through a mapping using expressions regular in the analyzed repository.
You results indicate that the set of files containing Feature Flags exhibited a higher density of Code Smells per file compared to those without flags. However, a qualitative analysis revealed that no Code Smell was directly caused by the use of these flags. So, in this specific repository, we did not observe impacts negative effects on code maintenance resulting from use of Feature Flags.
Abstract:
Abstract In this study, we explore the potential impacts arising from the use of Feature Flags in code maintenance. The central goal is to understand whether the adoption of these flags is associated with issues in software maintenance. To achieve this purpose, we conducted an evaluation of Code Smells in an open-source repository, followed by a static analysis and cross-referencing of information about files incorporating Feature Flags. The identification of these files was carried out through mapping using regular expressions in the analyzed repository.
The results indicate that the group of files containing Feature Flags displayed a higher density of Code Smells per file compared to those without flags. However, a qualitative analysis revealed that no Code Smell was directly caused by the use of these flags. Thus, in this specific repository, we did not observe negative impacts on code maintenance resulting from the use of Feature Flags.
Summary In this studio, We explore the possible impacts arising from the use of Feature Flags in code maintenance. El central objective is to understand whether the adoption of These flags are associated with problems in the software maintenance. To achieve this purpose, we carried out an evaluation of Code Smells in an open source repository, followed of static analysis and the information crux about files that incorporate Feature Flags. There identification of these files was carried out through a map using regular expressions in the analyzed repository.
Los results indicate that the set of files that contains Feature Flags showing a higher density of Code Smells by file compared to those without flags. However, an analysis qualitatively revealed that no Code Smell was caused directly by the use of these flags. Therefore, in this specific repository, we observe negative impacts on code maintenance derived from the use of Feature Flags.
Summary: To meet the demands of modern applications, the OpenMP Cluster aims to simplify programming in High Performance Computing clusters, leveraging multi-core, multi-node and accelerators. As an extension to OpenMP, it improves task management and transfer of data between nodes in a cluster using MPI and was designed to be integrated into LLVM/OpenMP with a plugin for libomptarget. This project introduces a significant update to OpenMP Cluster, adapting it to the new plugin structure of libomptarget, modernizing its implementation and providing a scalable solution for running HPC applications in diverse environments computational.
Summary: This course completion work aims to report progress in development and analysis of a reality location based game increased with the aim of enhancing the engagement and learning of members of the institution investigated. We describe advances in augmented reality and games in the technical and culture in recent years and how we seek combine them in a game with diverse points of interaction distributed across a physical space, presenting the results and challenges encountered during the development process.
Summary: This undergraduate final project report aims to investigate the impact of known neural networks for an image classification problem. Four neural networks were chosen to be evaluated in an animal classification problem. From these two initial choices, which will be described in detail in the report, we will present the methods developed and the databases used, we will discuss the experimental results found focusing on a comparative analysis of neural networks in the context of animal image classification.
Summary: This study investigates the effectiveness of open multimodal machine learning open-source in resolving issues of entrance exams and Brazilian national exams, using a subset of the dataset BlueX that combines text and image. The focus is to evaluate comparatively the performance of models such as OpenFlamingo, LLaVA 1.5 and CogVLM, and analyze how These align with results in databases recognized, in addition to comparing them with models purely textual in the same issue domain. One Key aspect of this work is an ablation study with Vicuna, a text-only model, to understand the impact of multimodality on results. This study highlights the relevance of integration of textual and visual information into artificial intelligence (AI) models, facilitating understanding of the evolution of multimodal machine learning models, and highlights the influence of multimodality on tasks of VQA (visual question answering) that involve significant textual components.
Summary: This work proposes a technique for ordering items in a list, based on MABs (Multi-Armed Bandits), for a new application on a mobile device. The application consists of an aggregator of university events that aims to recommend those that are most relevant. The study developed a testing method based on hypotheses related to the operation of the application, in addition to applying these tests to two sorting algorithms. The algorithms discussed model the lists in Click Models: Document Based and Cascade Based. This report describes the algorithm used and the system architecture.
Summary: This work seeks to reproduce and compare visual encryption techniques. The study sought to understand the peculiarities of each implemented methodology, what each technique proposed in the literature offers, in addition to its advantages and disadvantages. For this task, experiments were carried out using arithmetic methodologies, such as the use of the XOR operation and modular arithmetic, in addition to some of its subvariant techniques, which aim to further improve its results and/or performance. The effectiveness and efficiency of the algorithms were measured using metrics such as PSNR (Peak Signal-to-Noise Ratio), SSIM (Structural Similarity Index Measure) and code execution time.
Summary: This project analyzed the instantiation of services in cloud, focusing on Google Cloud Platform (GCP). Used factorial and Fibonacci calculations to benchmarking in various configurations instance, evaluating performance and cost. The search highlighted the importance of selecting settings optimized to balance effectiveness and cost, especially with limited budgets. It was observed that distributed configurations improve the performance, but increase costs, with emphasis on the efficiency of Cloud Run.
.
Summary: This project analyzes and compares the challenges and impacts the adoption of self-distributive systems and serverless computing in environments cloud computing, specifically evaluating the performance of systems running on Google Cloud Run compared to systems deployed on Kubernetes using the SDS (Self Distributing Systems) model. Basing in previous research, the study seeks to answer questions about performance differences, identify specific scenarios of advantage for each approach and evaluate their characteristics in terms of resource use and response time. Tests were carried out detailed performance in the implemented systems, subjecting them to different workloads and monitoring critical metrics. You results provide a comprehensive view of the advantages and disadvantages of systems self-distributive and serverless computing in cloud computing environments.
Summary: In view of recent progress in the field of artificial intelligence, the fast and growing development of mobile computing and the irrevocable right to privacy, learning models federates have become a key player in the current technological development. The objective of this project is to evaluate noisy data processing in image classification models, using the reinforcement learning technique.
Summary: This is the project report carried out as Work Completion of Course at the Institute of Computing, in partnership with Prof. Roberto Greco from the Institute of Geosciences, whose objective is to develop a system for collecting data from hives that will be installed in schools.
The system allows receive hive data via Wi-Fi on a mobile device, remaining at a distance safe from bees. For this project, it was A board developed by Prof. Fabiano Fruett with temperature, humidity, sound and proximity. Programming the board software for data collection and transmission, as well as implementing an Android application for visualization of results are the objects of study of this report.
Summary: As the size and complexity of current systems, there is a growing need for optimization and speed increase of technologies that support them. Currently, there is a large use of container technologies and cloud computing, in However, despite them having the flexibility necessary for modern systems, there is still a great need for manual intervention for your operation. In this way, systems with self-distribution arise to better implement the architectures, since this method makes it possible to automatic management of system distribution and are able to adapt in less time. However, These systems still have difficulty in implementation due to its difficulty in being used in a simple and user-friendly way. Like this, the need for more traditional methods arose to provide the information necessary for the construction of these systems in a more simplified and fast. Through an analysis of key points, a scheme for creating systems with this autonomous management so that it is accessible to future uses from more complex cases to more simplified cases.
Summary: This work aims to study the process of analyzing risk in the credit market, whose main focus is estimate a person's probability of default physical. To this end, the project used a United States credit data to train and evaluate model performance.
Based on this study, it is understood that through information both basic and related to the financial behavior, it is possible to generate a model capable of reasonably estimating the probability of an individual being in default with the payment of a loan. Furthermore, through identifying which information is considered important for the model, it is possible advance in understanding the behavior of defaulter in general, in addition to optimizing the collection of information used for this purpose.
Summary: This work seeks to implement and compare different types of neural models and techniques training. To this end, the performance of two models, one based on the perceptron architecture and another based on LLM GPT2, both trained in centralized and federated manner. To make the evaluation, we used the precision metric on the “agnews” dataset, which contains news from different segments, such as technology and sports
Based on this research, it was determined that the type of model and training technique varies as required. Larger models like GPT2 tend to perform better than older models simple, although they are very heavy, and therefore must be assess the precision required for the application. If data privacy and lack of structure centralized learning is relevant, federated learning demonstrated good results, although it requires resources network.
Summary: This end-of-course project aims to explore empirical possibilities of applying XR in glasses based on the use of different architectures and technologies to evaluate their performance in relation to concerns the end-to-end frame capture delay from the customer, fog processing and display In the client. We explore several available options of processing on both CPU and GPU to check which one performed better when applied in a case study for assisted cycling.
Throughout the project, we were able to compare several technologies and their effectiveness in solving the problem. We were also able to obtain performance metrics video quality and solution efficiency. A From this information, it is possible to establish the technical needs and challenges faced when implementing a data processing system border image.
Summary: Drone delivery services have gained attention from academia, government and industry, associating itself with green and sustainable logistics. Existing studies mainly focus on specific issues of anti-collision strategies in scenarios with limited number of drones, neglecting landing and takeoff management in large distribution centers, where there is a high density of drones.
This project rates and improves Drone Edge Management System (DREMS), which handles the landing sequence and takeoffs in high-density areas. You results highlight the need for a strategy of sequencing in distribution centers for optimize delivery in high-density scenarios drones. were successfully developed and applied new strategies that increase real rates of landings and takeoffs, without causing an increase in total number of collisions.
Summary: The number of order requests placed on the Brazil through e-commerce has grown substantially in the last decade. Backstage, to ensure that a product is delivered in a fast and reliable way for the customer, the sector logistics department carries out several strategies so that each stage of distribution is carried out in the manner as efficient as possible. One of these strategies is the use of autonomous vehicles responsible for optimize the movement of materials within warehouses. This creates space for different analyzes and proposals for mathematical modeling and problem architecture, in order to optimize the task. In this work, models were presented in logistics warehouse graphs with different characteristics and peculiarities, such as number of transport vehicles, number of boxes, points delivery time, factory size, distance between points, physical mass of an order and autonomy vehicle battery. Furthermore, they were different optimization techniques were explored to a delegation of handling tasks materials inside the warehouse, taking into account consideration parameters such as complexity computing and resource availability. Yet a study was carried out on the use of distributed systems to understand strategies most suitable for each type of organization.
Summary: This end-of-course project aims to explore empirical possibilities of applying XR in glasses based on the use of different architectures and technologies to evaluate their performance in relation to concerns the end-to-end frame capture delay from the customer, fog processing and display in client. We explored several available options for processing on both CPU and GPU for check which one performed better when applied in a case study for assisted cycling. O document presents results and analyzes of different methods tested during the work.
Summary: Access to multi-dimensional tensors represents a significant portion of the execution time of a program, particularly in scientific programs in the areas of High-Performance Computing (HPC) and Machine Learning (ML), where the Tensor size is too large. Linearize tensors in order to make them one-dimensional is recognized as a powerful technique for increasing the location of its elements in the hierarchy of the memory, thus reducing access latency. However, linearization is a very little technique historically used, its application being restricted to manual implementations of GEMMs. This happens because, given the access pattern of the tensors, neither all linearizations produce an improvement in performance when performed. Recently, the state of art in this area has been focusing on ways of generalize the use of linearization, as is the case with GPAT, which analyzes the viability of some applications. With these new tools, it is possible perform the general application of linearization in time compilation, in order to improve the performance of the algorithm training Machine Learning (ML), so how to speed up the execution of programs in High-Performance Computing (HPC).
Summary: This work is the report of a Completion of course dealing with the problem of parallelization of loops with dependencies of type do-across. An improvement proposal was raised and analyzed, based on the positioning of the sequential execution section, and This proposal was compared with the behavior current status of a compiler, in manual testing. You comparison results were a low potential optimization, with average observed gains of
using this strategy and a more than reasonable performance of the techniques currently used to treat these types of loops.
Abstract: Scientific denialism refers to the discrepancy between information and its effective communication. In this project, our objective is to delve into the field of knowledge visualization by integrating the latest advancements in AI generative models. We aim to create a visually compelling and deterministic representation based on satellite data of Rio de Janeiro's climate over the next 100 years, elucidating its interaction with climate change. While we have achieved promising results in image generation, we still face challenges in constructing a cohesive video that maintains temporal coherence, semantic consistency and realism.
Summary: This document aims to expose the concepts and methodologies related to the topics of continuous integration, delivery and deployment — CI/CD — and to the principles and updates of e-commerce, in order to argue about the particular and fundamental importance of this set of practices for this sector of the industry. Aspects of complexity, dynamism and demand, as well as the challenges faced by those who work in this industry, are also explained and discussed in this report.
In addition, this text also has the experimental purpose of presenting the aspects and particularities of eight different CI/CD tools on the current market, in order to elaborate a comparative analysis between them. This analysis is based on criteria that describe the requirements and desirable features for the development and maintenance of not only software, but also a successful e-commerce platform.
Summary: APIs (Application Programming Interface – API) offer a set of operations that allow the creation of applications and communication between services (systems). API tests are essential to ensure the proper functioning and performance of applications. In this project, API tests combine Model Based Testing (MBT) and Behavior Driven Testing (BDT) to automate not only the execution but also the generation of test cases. MBT is a method for developing tests that has as its premise the modeling of the system, while BDT has as premise the elaboration of test cases based on scenarios that represent the behavior of the system. To evaluate the effectiveness of combining these methods, a Rest API was used as a case study.
Summary: This work seeks to reproduce and compare some techniques for transferring style, texture and image characteristics between different classes. The study sought to explore some style transfer techniques and methods based on known unsupervised machine learning models to try to observe ways to generate images that reproduce artistic styles and textures, as well as mimic classes or that automate the generation of schematic images. For this task, a model based on CycleGAN was implemented using layers of ResNet18 and databases of some famous painters and artistic styles.
Abstract: This work seeks to reproduce and compare some techniques for transferring style, texture and image characteristics between different classes. The study sought to explore some style transfer techniques and methods based on known unsupervised machine learning models to try to observe ways to generate images that reproduce artistic styles and textures, as well as mimic classes or that automate the generation of schematic images. For this task, a model based on CycleGAN was implemented using layers of ResNet18 and databases of some famous painters and artistic styles.
Summary This work seeks to reproduce and compare some techniques for transferring style, texture and image characteristics between different classes. The studio sought to explore some techniques and methods of style transfer based on known models of automatic learning in the supervisee, in order to try to observe ways of generating images that reproduce artistic styles and textures, as well as imitating classes which automatically generate schematic images . For this task, a model based on CycleGAN was implemented using ResNet18 layers and databases of some famous painters and artistic styles.
Summary: In this work, we present a software implementation of Elephant, a lightweight encryption algorithm designed as an encrypted authentication scheme with associated data, based on the ``encrypt-then-authenticate'' construction and with a counter mode of operation. Its main component is the permutation
, a generalization of the permutation used in the PRESENT cipher (b = 64) for values greater than 64. We present new computational techniques to improve the performance of the permutation
in three versions: b = 160, 176 and 192. The results of our implementation in C, on the Apple M1 processor (ARMv8 architecture), show a significant improvement in the performance of the Dumbo and Jumbo variants of the Elephant algorithm, both achieving gains of up to 14 times the encryption rate per cycle compared to the reference implementation. Finally, we note that the results presented in this work also apply to improve the performance of the SPONGENT cryptographic digest function.
Summary: Deepfakes are synthetic media generated by neural networks that can present content harmful to society, such as fake news and fraud. Considering the constant evolution of such technologies, it is important to develop methods that can automatically detect them and one of the most effective ways to do this is with machine learning models. However, the performance of such detectors depends on the context of the training data and can decrease considerably when applied to tests in contexts not seen during training. In the detection of deepfakes, this is especially challenging, as the generation techniques are constantly evolving, implying that such models present a performance drop in the real world. In this project, we explore the impact of data augmentation using ``image to image'' diffusion models and Generative Adversarial Networks (GANs) on the ability of deepfake detectors to generalize to domains not observed in training, performing intra and cross dataset tests to a more comprehensive evaluation of the results.
Summary: This work represents a project report carried out in partnership with LSC, whose objective is to develop a tool capable of sorting and filtering seismic data represented in SEG-Y format using high-performance computing.
From the developed work, it was possible to develop a query language, which can be used to operate on the data, from its parsing and conversion to SQL in order to use it in another tool developed in C++ and Rust.
Summary: Medical records, often in unstructured formats such as transcripts of doctor-patient dialogues, present a challenge for analysis, interpretation and effective use in the healthcare ecosystem. There is a major challenge in transforming unstructured clinical texts into structured and semantically rich data, such as RDF triples. This can generate better access, quality and organization for health data. This study aims to investigate, develop and experiment with a method for generating RDF triples from unstructured texts in Portuguese, composed of relevant clinical information identified in the transcription of dialogues between doctor and patient in clinical consultations. Our proposal explored and experimented with several large language models (such as GPT-3 and BLOOM) and techniques for using them to generate triples.
Summary: This work constitutes a comprehensive report of the study, proposal and development of improvements in the QLattes extension. This study was based on the QLattes extension originally created by Nabor C. Mendonça, under the guidance of Professor Juliana F. Borin and with the joint collaboration of Mendonça. The improvements span several areas, including the addition of new functionality, such as viewing multiple CVs, which will allow for a more comprehensive analysis of researchers' academic achievements. In addition, a code refactoring was carried out, with emphasis on modularization, to improve the understanding and maintainability of the system. The adoption of newer technologies, such as React, allows for a more fluid and intuitive experience for QLattes users. Finally, a new design proposal was developed, seeking an attractive and intuitive interface that facilitates filtering, configuration, analysis and visualization of data from publications classified by Qualis. With these improvements, it is expected to provide different metrics and perspectives to users, enabling a more comprehensive and accurate analysis of scientific production. The report will describe in detail the development process, the features implemented and the results achieved with these improvements.
Instituto de Computação :: State University of Campinas
Av. Albert Einstein, 1251 - Cidade Universitária Zeferino Vaz • 13083-852 Campinas, SP - Brazil • Phone: [19] 3521-5838 |