Digital video is a crucial component of multimedia that enhances presentations with accurate, engaging visual and aural data that affects several industries. The transition of video storage from analog to digital is being fueled by a variety of causes. Improved compression methods, cheaper technology, and more network needs are some of these drivers. This paper presents a novel video summarization based on physiological signals provided by emotional stimuli. Through these stimuli, 15 emotions are analyzed using physiological signals. The dataset was gathered from 15 participants who watched 61 episodes of 14 television series while wearing a wristband. We built several deep-learning models for the main purpose of recognizing emotions to summarize video. Among the established networks, the best performance has been obtained with the 1D-CNN, with 92.87% accuracy. This work has been done through a series of empirical experiments; since the frequency of the physiological signals is different, we used models with original and resampled configurations in each experiment. The comprehensive comparison result indicates that the oversampling approach gives the highest accuracy as well as the lowest computational complexity. The performance of the proposed video summarization approach was evaluated by a survey of participants, and the results showed that the summaries contained the critical moments of the video. The proposed approach may be useful and effective in physiological signal-based applications requiring emotion recognition, such as emotion-based video summarization or film genre detection. Additionally, reading such summaries facilitates comprehension of the significance of making rapid judgments regarding likes, ratings, comments, etc.