Discovering Language Model Behaviors with Model-Written Evaluations — LessWrong February 5, 2026 by jlamprecht https://www.lesswrong.com/posts/yRAo2KEGWenKYZG9K/discovering-language-model-behaviors-with-model-written?commentId=dFnCAH727oXyNqjGD