The popularity of first person videos has been increased in social media. Thanks to the low operational cost and large storage capacity of the cameras and wearable devices, people are recording many hours of daily activities, sport actions and home videos. These videos, named Egocentric videos, are long-running videos with unedited content, which make them boring and difficult to watch. A central challenge is to make egocentric videos watchable. The natural motion of the recorder’s body in a fast-forward mode becomes nauseate. In this work, we propose a novel methodology to compose the new fast-forward video by selecting frames based in semantic information extracted from images. The experiments show that our approach outperforms the state-of-the-art in as far as semantic information is concerned and is also capable of producing pleasant video to be watched.