
FlashFFTConv: Efficient Convolutions for Long Sequences with …
Nov 13, 2023 · We propose FlashFFTConv, a new algorithm for efficiently computing the FFT convolution on GPUs. FlashFFTConv speeds up convolutions by up to 7.93x over PyTorch …
Simple Long Convolutions for Sequence Modeling · Hazy Research
Feb 15, 2023 · In our new paper, we show that directly parameterizing the convolution kernel works surprisingly well – with a twist! We need to add a simple regularization, and then long …
Long Convolutions for GPT-like Models: Polynomials, Fast Fourier ...
Dec 11, 2023 · Three options for what to do when multiplying polynomials, and what it means for the resulting convolution. Thus, to make fourier models GPT-like, we need to adopt the “make …
Hyena Hierarchy: Towards Larger Convolutional Language Models
Mar 7, 2023 · The Hyena operator is defined as a recurrence (controlling layer size) of two efficient subquadratic primitives: an implicit long convolution (i.e. Hyena filters parameterized …
From Deep to Long Learning? · Hazy Research
Mar 27, 2023 · Turns out, two simple insights led us to the answer: Every SSM can be viewed as a convolution filter the length of the input sequence – so we can replace the SSM with a …
Monarchs and Butterflies: Towards Sub-Quadratic Scaling in …
Dec 11, 2023 · Monarch matrices are also the same basic idea behind FlashFFTConv. Since Monarch matrices generalize the FFT and are hardware-efficient, they form a natural …
Zoology (Blogpost 2): Simple, Input-Dependent, and Sub …
Dec 11, 2023 · In our paper, we provably analyze our gated convolution layer showing it provably simulates all gated convolution architectures (H3, Hyena, RWKV, RetNet, etc.).
The Safari of Deep Signal Processing: Hyena and Beyond
Jun 8, 2023 · Spectrum of long convolution filters of Safari models (H3 and Hyena), alongside visualization at initialization and after pretraining. The decay rate depends on the reduction …
Simplifying S4 · Hazy Research
Jun 11, 2022 · Here the convolution is of two long sequences, and so we use the standard FFT way of doing a convolution. However, much more important is a subtlety with batching that can …
Monarch Mixer: Revisiting BERT, Without Attention or MLPs
Jul 25, 2023 · Extra convolution connection: for BERT, we found that adding an extra convolution (a “residual” so to speak) improved performance on synthetic tasks and pretraining loss.