Claude Code Quality Collapse: 1,352 HN Points and AMD Engineer Logs 7,000 Sessions of Regression

Developers say Claude Code got dramatically worse after Anthropic's February 2026 redact-thinking rollout. AMD's AI director analyzed 7,000 sessions showing read:edit ratio collapsed from 6.6 to 2.0 and stop-hook violations went from 0 to 173/day. A single HN complaint thread hit 1,352 points and 748 comments; a Reddit post about the regression racked up 1,060 upvotes. Anthropic staff acknowledged the issue but defended adaptive-thinking optimization.

Product Idea from this Signal

A background service that benchmarks every AI coding agent session against a frozen test suite and alerts when quality silently regresses

2.4k ▲

Anthropic's February 2026 redact-thinking rollout silently degraded Claude Code quality for weeks before users noticed. AMD's AI director had to manually analyze 7,000 sessions to prove the regression, finding that read-to-edit ratios collapsed from 6.6 to 2.0 and stop-hook violations went from 0 to 173 per day. Teams paying $2.5B annualized for these agents have zero visibility into when the model silently gets worse. This background service runs a frozen benchmark suite against every agent session locally, diffs results against a rolling baseline, and alerts the team the moment quality drops by more than a configurable threshold.

DEVTOOLOBSERVABILITYCLIAI-AGENTSTESTINGOPEN-SOURCE

CompetitiveView Opportunity →