SFU MOCAD Seminar: Ricardo Baptista | PIMS - Pacific Institute for the Mathematical Sciences

Topic

Processing Language, Images and Other Data Modalities

Speakers

University of Toronto

Details

A fundamental problem in artificial intelligence is how to simultaneously deploy data from different sources, such as audio, images, text, and video, collectively known as multimodal data. In this talk, I will present a mathematical framework for studying this question, focusing primarily on text and images. I will begin by describing how large language models (LLMs) operate, addressing the challenging issue of using real-number algorithms to process language. In particular, I will explain next-token prediction, the core of current LLM methodology. I will then focus on the canonical problem of measuring alignment between image and text data (contrastive learning). Finally, I will describe how images can be generated from text prompts (conditional generative modeling). From a mathematical perspective, a unifying theme underlying this work is the minimization of divergences defined on spaces of probability measures. A second key mathematical idea is the attention mechanism—a form of nonlinear correlation between vector-valued sequences. I aim to explain these concepts and their relevance to modern machine learning algorithms in an accessible fashion for a broad audience from the mathematical and computational sciences.

Jump to Event Details

This is a Past Event

Event Type

Scientific, Seminar

Date

April 17, 2026

Time

3:30pm - 4:30pm

Location

Simon Fraser University

Shrum Science Centre-K Building (SCK) K9509