
In a controlled experiment, nine autonomous Claude instances dramatically outperformed human researchers on an open alignment problem. But when Anthropic tried to transfer the winning method to its own production models, the effect vanished.
The article Claude beat human researchers on an alignment task, and then the results vanished in production appeared first on The Decoder.



