JULI: Jailbreak Large Language Models by Self-Introspection
Proceedings of the International Conference on Learning Representations (ICLR), 2026
We propose Jailbreaking Using LLM Introspection (JULI), which jailbreaks LLMs by manipulating the token log probabilities, using a tiny plug-in block, BiasNet.
Download here
