29 April 2025
09:00 Doctoral defense Room 85 of IC 2
Topic on
Inter-node message passing in the optical disaggregated memory scenario
Student
Maurício Gagliardi Palma
Advisor / Teacher
Rodolfo Jardim de Azevedo
Brief summary
Memory is a very important resource in the computing environment. It must have low access times, high bandwidth and sufficient storage space to run applications. In addition, it is common for users to not know precisely how much storage space their applications require, which leads them to overestimate the requirement for this resource, since insufficient memory substantially degrades performance or even causes the program to fail. Thus, good management of memory usage in a computer cluster becomes a challenge. One solution that seeks to improve this usage is to disaggregate (separate) the computers' memory from the other components, allowing us to connect and disconnect these memories in a better way, adapting this resource to what is specifically needed for each application. In practice, disaggregation in this context translates into allowing external memories, in addition to those already connected to the computer's motherboard, to be allocated to the computer. This is not a trivial thing, considering the performance standards that are imposed on the memory channel. This document presents two studies that seek to contribute to the adoption of disaggregated memory. The first involves the evaluation of a solution for disaggregating memory, called Optically Connected Memory (OCM), which consists of creating an optical connection between the memory controller, which is located in the processor, and the main memory. OCM allows the memory channel to have a length of around meters (up to 6 meters counting the round trip in our evaluation) while maintaining the bandwidth requirements imposed by the DDR standard. Our results show that OCM can achieve up to 5,5x greater performance when compared to disaggregating memory using a conventional network with a performance of 40 Gb. The second research is on the Flexible Memory Unit (FMU) protocol, which is a new protocol for sending messages between computers. The FMU protocol uses disaggregated memories via OCM. In this way, it is able to use the bandwidth of the DDR standard to perform this communication, accelerating the sending of messages. In addition to this gain in bandwidth, the disaggregated memory serves as a dedicated buffer for storing messages, which allows messages to be sent immediately by the sending computer. In our results, the FMU protocol showed gains of up to 5,18x in the execution of applications that are totally dependent on communication performance, and gains of up to 1,22x in the execution of applications in which communication has a smaller impact on performance.
Examination Board
Headlines:
Rodolfo Jardim de Azevedo | IC / UNICAMP |
Alfredo Goldman vel Lejbman | IME / USP |
Alexandro José Baldassin | IGCE / UNESP |
Hervé Cédric Yviquel | IC / UNICAMP |
Lucas Francisco Wanner | IC / UNICAMP |
Substitutes:
Sandro Rigo | IC / UNICAMP |
Paulo Sérgio Lopes de Souza | ICMC / USP |
Hermes Senger | CCET/UFSCar |