Séminaire de M. WOLF - vendredi 21 septembre 2018 à 14h00, Pôle API - Illkirch

juil. 4 2018

Titre : Structured Deep Learning and Visual Reasoning

Abstract :

Visual data consists of massive amounts of variables, and making sense of their content requires modeling their complex dependencies and relationships. This talk presents an overview of our past activities, which aim in enforcing coherence in this large ensemble of observed and latent variables, and to infer estimates from it. In particular, the presentation deals with work on attention mechanisms for video analysis, where structure in the data is not imposed but predicted from imput through a fully trainend model.

Application wise, we address human action recognition from RGB data and study the role of articulated pose and of visual attention mechanisms for this application. In particular, articulated pose is well established as an intermediate representation and capable of providing precise cues relevant to human motion and behavior. Our method has been designed to explictely remove the dependency on pose during training, making the method more broadly applicable in situations where pose is not available. Instead, a sparse represention of focus points is calculated by a dynamic visual attention model and passed to a set of distributed recurrent neural workers. State-of-the-art results are achieved on several datasets, among which is the largest dataset for human activity recognition, namely NTU-RGB+D.