FlashAttention-4 Achieves 1.3x Speedup over cuDNN on NVIDIA Blackwell
TL;DR
Together AI released FlashAttention-4. It delivers up to 1.3x faster performance than cuDNN on NVIDIA Blackwell GPUs.
What changed
Together AI released FlashAttention-4. It achieves up to 1.3× faster performance than cuDNN on NVIDIA Blackwell GPUs. This update targets attention kernel optimizations for transformer models.
Why it matters
Developers building on NVIDIA Blackwell gain from FlashAttention-4 outperforming cuDNN by up to 1.3× in attention speed. This accelerates training and inference for large language models on the new hardware. Vibe Builders can integrate it to reduce compute time in custom AI pipelines.
What to watch for
Track FlashAttention-4 against cuDNN as the baseline alternative on Blackwell setups. Test it by installing from Together AI's repository and benchmarking your transformer workload on an NVIDIA Blackwell GPU. Monitor Together AI updates for broader GPU support beyond Blackwell.
Who this matters for
- Vibe Builders: Integrate FlashAttention-4 into your pipelines to cut compute time on Blackwell hardware.
Harsh’s take
FlashAttention-4 represents a significant leap in kernel optimization for the Blackwell architecture. This is a practical win for anyone managing large-scale transformer workloads where latency and throughput dictate the bottom line. Operators should prioritize testing this implementation immediately if they run custom training or inference stacks on Blackwell GPUs.
The performance delta is too large to ignore for production environments. Focus on benchmarking your specific model architectures against the new kernels to verify the gains. This release shifts the baseline for what developers should expect from their infrastructure providers regarding raw compute efficiency.
by Harsh Desai
More AI news
- LaunchAsian AI startups launch Mythos-like models as Anthropic export ban continues
Asian AI startups launched models with Mythos-like capabilities. The releases follow Anthropic's ongoing export restrictions.
- Daily RoundupGemini jetlag aid, OpenAI Jalapeño chip, and Vercel agent tools (daily focus hooks)
Google, Vercel, and OpenAI shipped practical AI updates while new models and benchmarks highlighted shifting hardware and capability limits.
- Model ReleaseOpenAI limits GPT-5.6 rollout after government request, says restrictions shouldn’t be the norm
OpenAI limited GPT-5.6 rollout after a government request. The company stated that such restrictions should not become the long-term default.