In this seminar presentation, I will present a concise overview of sequence modeling using Structured State Space Models. I will start with an introduction to sequence modeling and motivate the use of state space models (SSMs) for deep learning on sequential data, specifically for long sequences where prevalent architectures do not scale efficiently. We will look into some works [1-3] in this domain that leverage structured state space models, starting with the basic formulation that initiated this line of research [1], and then move on to more recent works that present complete architectures (S4 [2]) operating under certain assumptions. We will then relax some assumptions to allow for more flexible and expressive models (Mamba [3]). We will also explore some core techniques that make these models work efficiently on long sequences, such as discretization and an associative scan operation to parallelize the computation. I will conclude by discussing some challenges with SSMs and outlining prospective research directions.
Additional resources:
- HiPPO: Recurrent Memory with Optimal Polynomial Projections - https://arxiv.org/abs/2008.07669
- S4: Efficiently Modeling Long Sequences with Structured State Spaces - https://arxiv.org/abs/2111.00396
- Mamba (S6): Linear-Time Sequence Modeling with Selective State Spaces - https://arxiv.org/abs/2312.00752
- Resurrecting Recurrent Neural Networks for Long Sequences - https://arxiv.org/abs/2303.06349
- Diagonal State Spaces are as Effective as Structured State Spaces - https://arxiv.org/abs/2203.14343
- Notes on the Mamba and the S4 model (Mamba: Linear-Time Sequence Modeling with Selective State Spaces) - https://github.com/hkproj/mamba-notes | https://www.youtube.com/watch?v=8Q_tqwpTpVU