AI · 2h ago

Training a 860B Legal AI on 2TB of Ukrainian Court Data

By Meridian48 News Desk · Summarised from DEV Community · July 3, 2026

SecondLayer plans to train a 860B-parameter MoE model on 2TB of Ukrainian legal data, including 96.2 million court decisions and legislation. The dataset, after cleanup, would yield 280-330 billion tokens, 50x smaller than DeepSeek V3's training set but highly domain-specific. The model aims to provide cheaper inference for legal applications via a scaled-up DeepSeek V3 architecture.

Meridian48 take

The project is ambitious but faces a steep data-to-parameter ratio; success hinges on whether focused legal data can compensate for the lack of general pretraining.

Read the full reporting

2 TB of Ukrainian Law + DeepSeek V3 860B on GCP: What We'd Get →

DEV Community

ukrainian-lawmixture-of-experts

Training a 860B Legal AI on 2TB of Ukrainian Court Data

9 Free AI Generators That Actually Deliver in 2026

AI in Lending: LLMs Are the Last Step, Not the First

Miyazaki calls AI art 'an insult to life itself'