AI · 1h ago

Why AI Can't Count the R's in Strawberry: BPE Tokenizers Explained

By Meridian48 News Desk · Summarised from DEV Community · July 3, 2026

Byte-Pair Encoding (BPE) tokenizers split text into subword tokens, not letters, causing LLMs to lose character-level information. A new interactive simulator lets users see how tokenization works and why models fail at simple letter-counting tasks. The tool reveals that token budget inflation can also increase API costs.

Meridian48 take

The strawberry blindness is a neat demo, but the real takeaway is that tokenization quirks affect everything from cost to reasoning—and most users have no idea.

Read the full reporting

Day 3: Watch your grammar with AI, it may cost you — Understanding BPE Tokenizers 🍓🔡 →

DEV Community

tokenizationllm-limitations

Why AI Can't Count the R's in Strawberry: BPE Tokenizers Explained

AI Still Can't Replace Skilled Workers After Four Years of Hype

Google DeepMind Partners with A24 on AI Research for Film

AI Engineer World's Fair wraps with 7,000 attendees, 100 workshops