Publications

Straight to the Point: Fast-forwarding Videos via Reinforcement Learning Using Textual Data

The rapid increase in the amount of published visual data and the limited time of users bring the demand for processing untrimmed videos to produce shorter versions that convey the same information. Despite the remarkable progress that has been made by summarization methods, most of them can only select a few frames or skims, which creates visual gaps and breaks the video context. In this paper, we present a novel methodology based on a reinforcement learning formulation to accelerate instructional videos. Our approach is capable of adaptively selecting frames that are not relevant to convey the information without creating gaps in the final video. Our agent is textually and visually oriented to select which frames to remove to shrink the input video. Additionally, we propose a novel network, called Visually-guided Document Attention Network (VDAN), able to generate a highly discriminative embedding space to represent both textual and visual data. Our experiments show that our method achieves the best performance in terms of F1 Score and coverage at the video segment level.

On Modeling the Effects of Auditory Annoyance on Driving Style and Passenger Comfort

Despite the impressive progress being made in autonomous vehicles, human drivers will remain ubiquitous in the imminent years. Therefore, intelligent hybrid vehicular systems must be aware of the interactions between humans and the environment (e.g., sound, vibration, speed, etc.). In this paper, we evaluate the effect of acoustic annoyance on drivers in a real-world driving study. We found significant differences in driving styles elicited by annoying acoustics and present an online classifier that uses onboard inertial measurement unit measurements to distinguish whether a driver is annoyed with 77% accuracy. Moreover, we directly measured the forces applied on the passenger with a pressure mat lined on the car seat, and empirically confirm that our proposed passenger dynamics model is reasonable. However, due to our acoustically induced driving styles not being polarizing enough, we were unable to show that passengers’ self-reported ride comfort changed with acoustic annoyance.

Personalizing Fast-Forward Videos Based on Visual and Textual Features from Social Network

The growth of Social Networks has fueled the habit of people in logging their day-to-day activities, and long First-Person Videos (FPVs) are one of the main tools in this new habit. Semantic-aware fast-forward methods are able to decrease the watch time and select meaningful moments, which is key to increase the chances of these videos being watched. However, these methods can not handle semantics in terms of personalization. In this work, we present a new approach to automatically creating personalized fast-forward videos for FPVs. Our approach explores the availability of text-centric data from the user’s social networks such as status updates to infer her/his topics of interest and, assigns scores to the input frames according to her/his preferences. Extensive experiments are conducted on three different datasets with simulated and real-world users as input, achieving an average F1 score of up to 12.8 percentage points higher than the best competitors. We also present a user study to demonstrate the effectiveness of our method.

Fast-Forward Methods for Egocentric Videos: A Review

The emergence of low-cost, high-quality personal wearable cameras combined with a large and increasing storage capacity of video-sharing websites have evoked a growing interest in first-person videos. A First-Person Video is usually composed of monotonous long-running unedited streams captured by a device attached to the user body, which makes it visually unpleasant and tedious to watch. Thus, there is a rise in the need to provide quick access to the information therein. In the last few years, a popular approach to retrieve the information from videos is to produce a short version of the input video by creating a video summary; however, this approach disrupts the temporal context of the recording. Fast-Forward is another approach that creates a shorter version of the video preserving the video context by increasing its playback speed. Although Fast-Forward methods keep the recording story, they do not consider the semantic load of the input video. The Semantic Fast-Forward approach creates a shorter version of First-Person Videos dealing with both video context and emphasis of the relevant portions to keep the semantic load of the input video. In this paper, we present a review of the representative methods in both fast-forward and semantic fast-forward methods and discuss the future directions of the area.

Exploring the Limitations of the Convolutional Neural Networks on Binary Tests Selection for Local Features

Convolutional Neural Networks (CNN) have been successfully used to recognize and extract visual patterns in different tasks such as object detection, object classification, scene recognition, and image retrieval. The CNNs have also contributed in local features extraction by learning local representations. A representative approach is LIFT that generates keypoint descriptors more discriminative than handcrafted algorithms like SIFT, BRIEF, and SURF. In this paper, we investigate the binary tests selection problem, and we present an in-depth study of the limit of searching solutions with CNNs when the gradient is computed from the local neighborhood of the selected pixels. We performed several experiments with a Siamese Network trained with corresponding and non-corresponding patch pairs. Our results show the presence of Local Minima and also a problem that we called Incorrect Gradient Components. We pursued to understand the binary tests selection problem and even some limitations of Convolutional Neural Networks to avoid searching for solutions in unviable directions.

Matrix: Multihop Address Allocation and Dynamic Any-to-Any Routing for 6LoWPAN

Standard routing protocols for IPv6 over Low power Wireless Personal Area Networks (6LoWPAN) are mainly designed for data collection applications and work by establishing a tree-based network topology, which enables packets to be sent upwards, from the leaves to the root, adapting to dynamics of low-power communication links. The routing tables in such unidirectional networks are very simple and small since each node just needs to maintain the address of its parent in the tree, providing the best-quality route at every moment. In this work, we propose Matrix, a platform-independent routing protocol that utilizes the existing tree structure of the network to enable reliable and efficient any-to-any data traffic. Matrix uses hierarchical IPv6 address assignment in order to optimize routing table size, while preserving bidirectional routing. Moreover, it uses a local broadcast mechanism to forward messages to the right subtree when persistent node or link failures occur. We implemented Matrix on TinyOS and evaluated its performance both analytically and through simulations on TOSSIM. Our results show that the proposed protocol is superior to available protocols for 6LoWPAN, when it comes to any-to-any data communication, in terms of reliability, message efficiency, and memory footprint.