DSPT#80 Webinar – Deep Multimodal and Cross-modal Embeddings

July 29 @ 18:30 - 20:00


Deep learning problems often divide themselves into image processing and natural language processing, but what if there was a way for your neural network to extract information from images and text at the same time? David Semedo, a PhD from Nova University of Lisbon, will show us how we can combine different forms of information (image, text, etc.) in a single unifying representation.

The schedule for the webinar is the following:

• 18:30 – 18:45: Opening the meeting
• 18:45 – 19:30: Deep Multimodal and Cross-modal Embeddings – David Semedo
• 19:30 – 19:45: Q&A
• 19:45: Closing

This meetup is sponsored by Bosch (https://www.bosch.pt). Thank you for your support!

The way we experience the world is multimodal. Data comes in several modalities, each having distinct computational representations. Multimodal and cross-modal embeddings are data representations that unify multiple modalities (e.g. images, text, audio, etc.).
These have been successfully used in many machine learning pipelines, towards solving complex AI problems requiring modelling joint information about multiple modalities.
In this talk, we will walk you through the building blocks (architectures, loss functions, etc.) of deep multimodal and cross-modal embedding models.
Then, we showcase the application of these principles to tackle challenging yet interesting multimedia understanding challenges, involving bridging Vision and Language.


David Semedo has a PhD from the NOVA University of Lisbon and a researcher at Web and Multimedia Search and Data Mining group from NOVA LINCS. His PhD focuses on neural-based representation learning for Multimedia Understanding, where he investigates models that computationally bridge vision and language, by modeling cross-modal data interactions over time. He has been a researcher in several national and international projects and has published and reviewed at top-tier conferences. His interests are multimodal machine learning, at the intersection of CV and NLP, neural networks and data mining.
Personal Webpage: https://davidsemedo.com/


