Tutorial Speakers

Prof. Vittorio Ferrari
University of Edinburgh
Google Research
Knowledge transfer and human-machine collaboration for training visual models

BIO
Vittorio Ferrari is a Professor at the School of Informatics of the University of Edinburgh and a Research Scientist at Google, leading a research group on visual learning in each institution. He received his PhD from ETH Zurich in 2004 and was a post-doctoral researcher at INRIA Grenoble in 2006-2007 and at the University of Oxford in 2007-2008. Between 2008 and 2012 he was Assistant Professor at ETH Zurich, funded by a Swiss National Science Foundation Professorship grant. He received the prestigious ERC Starting Grant, and the best paper award from the European Conference in Computer Vision, both in 2012. He is the author of over 90 technical publications. He regularly serves as an Area Chair for the major computer vision conferences, he will be a Program Chair for ECCV 2018 and a General Chair for ECCV 2020. He is an Associate Editor of IEEE Pattern Analysis and Machine Intelligence. His current research interests are in learning visual models with minimal human supervision, object detection, and semantic segmentation.
Abstract
Object class detection and segmentation are challenging tasks that typically requires tedious and time consuming manual annotation for training. In this talk I will present three techniques we recently developed for reducing this effort. In the first part I will explore a knowledge transfer scenario: training object detectors for target classes with only image-level labels, helped by a set of source classes with bounding-box annotations. In the second and third parts I will consider human-machine collaboration scenarios (for annotating bounding-boxes of one object class, and for annotating the class label and approximate segmentation of every object and background region in an image).

Ivan Laptev
Research Director, INRIA Paris
Towards action understanding with less supervision

BIO
Ivan Laptev is a senior researcher at INRIA Paris, France. He received a PhD degree in Computer Science from the Royal Institute of Technology in 2004 and a Habilitation degree from École Normale Supérieure in 2013. Ivan's main research interests include visual recognition of human actions, objects and interactions. He has published over 60 papers at international conferences and journals of computer vision and machine learning. He serves as an associate editor of IJCV and TPAMI journals, he will serve as a program chair for CVPR’18, he was an area chair for CVPR’10,’13,’15,’16 ICCV’11, ECCV’12,’14 and ACCV’14,16, he has co-organized several tutorials, workshops and challenges at major computer vision conferences. He has also co-organized a series of INRIA summer schools on computer vision and machine learning (2010-2013). He received an ERC Starting Grant in 2012 and was awarded a Helmholtz prize in 2017.
Abstract
Next to the impressive progress in static image recognition, action understanding remains a puzzle. The lack of large annotated datasets, the compositional nature of activities and ambiguities of manual supervision are likely obstacles towards a breakthrough. To address these issues, this talk will present alternatives for the fully-supervised approach to action recognition. First I will discuss methods that can efficiently deal with annotation noise. In particular, I will talk about learning from incomplete and noisy YouTube tags, weakly-supervised action classification from textual descriptions and weakly-supervised action localization using sparse manual annotation. The second half of the talk will discuss the problem of automatically defining appropriate human actions and will draw relations to robotics.

Abhinav Gupta
Associate Professor, Carnegie Mellon University
Supersizing and Empowering Visual Learning

BIO
Abhinav Gupta is an Associate Professor at the Robotics Institute, Carnegie Mellon University. and Research Manager at Facebook AI Research (FAIR). Abhinav’s research focuses on scaling up learning by building self-supervised, lifelong and interactive learning systems. Specifically, he is interested in how self-supervised systems can effectively use data to learn visual representation, common sense and representation for actions in robots. Abhinav is a recipient of several awards including ONR Young Investigator Award, PAMI Young Research Award, Sloan Research Fellowship, Okawa Foundation Grant, Bosch Young Faculty Fellowship, YPO Fellowship, IJCAI Early Career Spotlight, ICRA Best Student Paper award, and the ECCV Best Paper Runner-up Award. His research has also been featured in Newsweek, BBC, Wall Street Journal, Wired and Slashdot.
Abstract
In the last decade, we have made significant advances in field of computer vision thanks to supervised learning. But this passive supervision of our models has now become our biggest bottleneck. In this talk, I will discuss our efforts towards scaling up and empowering learning. First, I will show how amount of labeled data is still a crucial factor in representation learning. I will then discuss one possible avenue on how we can scale up learning by using self-supervision. Next, I will discuss how we can scale up semantic learning to 10x and more categories by using visual knowledge and graph-based reasoning. But just scaling on amount of data and categories is not sufficient. We also need to empower our learning algorithms with the ability to control its own supervision. In third part of the talk, I will discuss how we can move from passive to interactive learning in context of VQA. Our agents live in the physical world and need the ability to interact in the physical world. Towards this goal, I will finally present our efforts in large-scale learning of embodied agents in Robotics.

Zeynep Akata
Assistant Professor , University of Amsterdam
Max Planck Institute
Explaining and Representing Novel Concepts With Minimal Supervision

BIO
Dr. Zeynep Akata is an Assistant Professor with the University of Amsterdam in the Netherlands, Scientific Manager of the Delta Lab and a Senior Researcher at the Max Planck Institute for Informatics in Germany. She holds a BSc degree from Trakya University (2008), MSc degree from RWTH Aachen (2010) and a PhD degree from University of Grenoble (2014). After completing her PhD at the INRIA Rhone Alpes with Prof. Dr. Cordelia Schmid, she worked as a post-doctoral researcher at the Max Planck Institute for Informatics with Prof. Dr. Bernt Schiele and a visiting researcher with Prof Trevor Darrell at UC Berkeley. She is the recipient of Lise Meitner Award for Excellent Women in Computer Science in 2014. Her research interests include machine learning combined with vision and language for the task of explainable artificial intelligence (XAI).
Abstract
Clearly explaining a rationale for a classification decision to an end-user can be as important as the decision itself. Existing approaches for deep visual recognition aregenerally opaque and do not output any justification text; contemporary vision-languagemodels can describe image content but fail to take into account class-discriminativeimage aspects which justify visual predictions. In this talk, I will present my past and current work on Zero-Shot Learning, Vision and Language for Generative Modeling and Explainable Artificial Intelligence in that (1) how we can generalize the image classifica- tion models to the cases when no visual training data is available, (2) how to generateimages and image features using detailed visual descriptions, and (3) how our models focus on discriminating properties of the visible object, jointly predict a class label,explain why the predicted label is appropriate for the image whereas another label is not.

^ Back to Top