Abstract
Ensuring that AI systems, including artificial general intelligence and artificial superintelligence, behave in alignment with human values and interests presents significant challenges and is known as the AI alignment problem. As AI advances, concerns about control and existential risks become increasingly relevant. Here, we introduce the concept of agentic influenceability, behavioral neurodivergent diversity, opinion attack, associated opinion, and influenceability scores, and a mathematical proof of the inevitability of misalignment and the impossibility of full orchestrated controllability of agentic systems based on formal undecidability and irreducibility arguments. We explore whether embracing this inevitable misalignment can foster a dynamic ecosystem of adversarial and collaborative AI agents without central orchestration, which itself would constitute another agent, while still offering some degree of soft controllability. The investigation demonstrates that misalignment in foundation models can serve as a counterbalancing mechanism, enabling cooperation among agents most aligned with human interests to prevent divergent dominance by any single agent. Experiments with large language models show that open models exhibit greater behavioral diversity, whereas proprietary models, constrained by artificial guardrails, display more limited controllability. The findings advocate for neurodivergent influenceability as a contingent response to mathematically uncontrollable misalignment, leveraging agent divergence to improve AI safety.