The emergence of low-cost personal mobiles devices and wearable cameras and, the increasing storage capacity of video-sharing websites have pushed forward a growing interest towards first-person videos. Wearable cameras can operate for hours without the need for continuous handling. These videos are generally long-running streams with unedited content, which makes them boring and visually unpalatable since the natural body movements cause the videos to be jerky and even nauseating. Hyperlapse algorithms aim to create a shorter watchable version with no abrupt transitions between the frames. However, an important aspect of such videos is the relevance of the frames, usually ignored in hyperlapse videos. In this work, we propose a novel methodology capable of summarizing and stabilizing egocentric videos by extracting and analyzing the semantic information in the frames. This work also describes a dataset collection with several labeled videos and introduces a new smoothness evaluation metric for egocentric videos. Several experiments are conducted to show the superiority of our approach over the state-of-the-art hyperlapse algorithms as far as semantic information is concerned. According to the results, our method is on average 10.67 percentage points higher than the second best in relation to the maximum amount of semantics that can be obtained, given the required speed-up. More information can be found in our supplementary video: https://youtu.be/_TU8KPaA8aU.