Dev Tools · 1h ago
Tiny local models can write code if harnesses stop discarding right answers
An experiment forcing Gemma 4 2B to write real code without cloud fallback found that 60% of failures were due to broken indentation, not logic errors. Fixing the harness to re-indent correct code raised scores from 64 to 76 out of 100. The author concludes small models excel at bounded tasks but fail at planning and self-review.
Meridian48 take
The piece offers practical lessons for local AI development, but its small sample size and single-model focus limit generalizability.
Read the full reporting
I spent ten days forcing tiny local models to write real code. Here's what actually breaks. →
DEV Community
local-ai-modelscode-generation