Google DeepMind has launched an AI Control Roadmap to secure its AI agents against potential misalignments and threats. The roadmap employs a 'defense-in-depth' strategy, combining traditional security measures with model alignment to manage AI behavior. By treating AI agents as potential insider threats, the framework uses the MITRE ATT&CK framework to identify and mitigate risks. This approach aims to ensure that as AI systems become more advanced, they remain secure and aligned with human goals.
Read original