Dev Tools · 2h ago
NVIDIA's cuTile Rust brings safe GPU kernels at 96% of cuBLAS speed
NVIDIA researchers introduced cuTile Rust, a DSL that extends Rust's ownership model to GPU kernels, eliminating the need for unsafe code. On a B200, it achieves 7 TB/s on memory-bound ops and 2 PFlop/s on GEMM, roughly 96% of cuBLAS performance. The approach partitions tensors into disjoint tiles, ensuring data-race-free parallelism without runtime overhead.
Meridian48 take
The performance parity with cuBLAS is impressive, but the narrow hardware support (sm_80+ and Linux only) limits immediate practical impact.
rustgpu-programming