How to get URL link on X (Twitter) App
https://twitter.com/IvanArcus/status/2021592600554168414
Quick recap: our pipeline detects "unverbalized biases" - factors that systematically shift LLM decisions without being mentioned in chain-of-thought.
We call these "unverbalized biases": decision factors that systematically influence outputs but are never cited as such.
When asked the same question with arguments reversed, LLMs often do not change their answers, which is logically impossible. The CoT justifies the contradictory conclusions, showing motivated reasoning.