Events

Public defence in Networking Technology, M.Sc. Petr Byvshev

Untangling motion and appearance in videos and deep neural networks.
Robot making sushi
Robot making sushi, by Stable Diffusion 2

M.Sc. Petr Byvshev will defend the thesis "Motion and Appearance Representation Learning of Human Activities from Videos and Wearable Sensors" on 19 April 2023 at 12 (EET) in Aalto University School of Electrical Engineering, Department of Information and Communications Engineering, in lecture hall T1, Konemiehentie 2, Espoo.

Opponent: Prof. Shaogang Gong, Queen Mary University of London, UK
Custos: Prof. Yu Xiao, Aalto University School of Electrical Engineering, Department of Information and Communications Engineering

Thesis available for public display 10 days prior to the defence at: https://aaltodoc.aalto.fi/doc_public/eonly/riiputus/
Doctoral theses in the School of Electrical Engineering: https://aaltodoc.aalto.fi/handle/123456789/53

Public defence announcement:

Have you noticed how videos have become a significant part of our lives? Presently, they have also become a crucial part of deep learning networks' training data. That's why we're excited to announce a new thesis that explores how to enhance video representations learned with deep learning networks.

The aim of the study is to investigate how temporal dependencies are learned by models, particularly in 3D deep video features. When trained on prevalent video datasets, the networks are biased towards appearance information rather than motion. We demonstrate, how networks can adapt to varying levels of temporality and overcome the bias.

In addition, the study presents new methods to extract temporal information from videos optimally and combine it with other sensory modalities, such as smart gloves. We present a simulation-driven platform for creating smart-glove-based human activity recognition systems by using large pools of video data to create synthetic sensor data. The thesis is highly relevant to other research in the field of video representation and deep learning networks. The main result of the study is the development of new methods to extract temporal information from videos optimally and combine it with other sensory modalities. This thesis brings new information and insights into the impact of open video data on learned video representations and advances the architecture designs of deep learning networks.

The information presented in this thesis can be applied to improve video representations and understand the variate nature of video content. For example, the methods developed in this study can be applied to create more accurate and robust video activity recognition systems.

This thesis can be interesting for those who want to learn about state-of-the-art deep video models and large scale datasets they are trained on. Everyone is welcome to join.

Contact information of doctoral candidate:

Email [email protected]
Mobile +31621321586
  • Published:
  • Updated: