As an MSc student in VeRLab, with Prof. Erickson Nascimento, my main research focus is in exploring multi-modal learning techniques to leverage the natural audio-visual correspondence seen in many data formats, such as videos. More specifically, I’m mostly interested in tasks such as sound source localization, visual-based sound separation, and audio-based video summarization.
I am also part of the Semantic Hyperlapse project in our research group, which objective is to fast-forward egocentric videos as far as the semantic information is concerned. In our lab, I also happily contribute to projects in the following topics: Medical Image Analysis and Sports Analytics.
Previously, I’ve worked with the extraction of local features by learning local representations using CNNs [1] and developing a platform-independent routing protocol that enables reliable and efficient any-to-any data traffic [2].
MSc in Computer Vision, Current
Universidade Federal de Minas Gerais
BSc in Computer Science, 2019
Universidade Federal de Minas Gerais
The rapid increase in the amount of published visual data and the limited time of users bring the demand for processing untrimmed videos to produce shorter versions that convey the same information. Despite the remarkable progress that has been made by summarization methods, most of them can only select a few frames or skims, which creates visual gaps and breaks the video context. In this paper, we present a novel methodology based on a reinforcement learning formulation to accelerate instructional videos. Our approach is capable of adaptively selecting frames that are not relevant to convey the information without creating gaps in the final video. Our agent is textually and visually oriented to select which frames to remove to shrink the input video. Additionally, we propose a novel network, called Visually-guided Document Attention Network (VDAN), able to generate a highly discriminative embedding space to represent both textual and visual data. Our experiments show that our method achieves the best performance in terms of F1 Score and coverage at the video segment level.
Despite the impressive progress being made in autonomous vehicles, human drivers will remain ubiquitous in the imminent years. Therefore, intelligent hybrid vehicular systems must be aware of the interactions between humans and the environment (e.g., sound, vibration, speed, etc.). In this paper, we evaluate the effect of acoustic annoyance on drivers in a real-world driving study. We found significant differences in driving styles elicited by annoying acoustics and present an online classifier that uses onboard inertial measurement unit measurements to distinguish whether a driver is annoyed with 77% accuracy. Moreover, we directly measured the forces applied on the passenger with a pressure mat lined on the car seat, and empirically confirm that our proposed passenger dynamics model is reasonable. However, due to our acoustically induced driving styles not being polarizing enough, we were unable to show that passengers’ self-reported ride comfort changed with acoustic annoyance.