Giant Antique Postage Stamp style editorial illustration for the news article: FlashAttention-4 Achieves 1.3x Speedup over cuDNN on NVIDIA Blackwell

FlashAttention-4 Achieves 1.3x Speedup over cuDNN on NVIDIA Blackwell

By Harsh Desai12 May 2026

TL;DR

Together AI released FlashAttention-4. It delivers up to 1.3x faster performance than cuDNN on NVIDIA Blackwell GPUs.

What changed

Together AI released FlashAttention-4. It achieves up to 1.3× faster performance than cuDNN on NVIDIA Blackwell GPUs. This update targets attention kernel optimizations for transformer models.

Why it matters

Developers building on NVIDIA Blackwell gain from FlashAttention-4 outperforming cuDNN by up to 1.3× in attention speed. This accelerates training and inference for large language models on the new hardware. Vibe Builders can integrate it to reduce compute time in custom AI pipelines.

What to watch for

Track FlashAttention-4 against cuDNN as the baseline alternative on Blackwell setups. Test it by installing from Together AI's repository and benchmarking your transformer workload on an NVIDIA Blackwell GPU. Monitor Together AI updates for broader GPU support beyond Blackwell.

Who this matters for

Vibe Builders: Integrate FlashAttention-4 into your pipelines to cut compute time on Blackwell hardware.

Harsh’s take

FlashAttention-4 represents a significant leap in kernel optimization for the Blackwell architecture. This is a practical win for anyone managing large-scale transformer workloads where latency and throughput dictate the bottom line. Operators should prioritize testing this implementation immediately if they run custom training or inference stacks on Blackwell GPUs.

The performance delta is too large to ignore for production environments. Focus on benchmarking your specific model architectures against the new kernels to verify the gains. This release shifts the baseline for what developers should expect from their infrastructure providers regarding raw compute efficiency.

by Harsh Desai

Source:together.ai

More AI news

Feature13 May 2026
PitchDrop.ai adds a feature to turn pitches into live branded URLs
PitchDrop.ai launches a feature that converts pitches into live, branded URLs. Discussion | Link
Feature13 May 2026
Vercel launches Trusted Sources to secure your deployments
Vercel introduces Trusted Sources, letting protected deployments accept short-lived OIDC tokens from authorized Vercel projects and external services instead of long-lived secrets. Callers attach tokens in the x-vercel-trusted-oidc-idp-token header for Vercel to verify signatures and claims.
Feature13 May 2026
BossHogg launches agent-first CLI for PostHog analytics and flags
BossHogg releases agent-first CLI for PostHog analytics and feature flags.