LLM07:2023 - Inadequate AI Alignment

Description:
Inadequate AI alignment occurs when the LLM’s objectives and behavior do not align with the intended use case, leading to undesired consequences or vulnerabilities.

Common AI Alignment Issues:

How to Prevent:

Example Attack Scenarios: Scenario #1: An LLM trained to optimize for user engagement inadvertently prioritizes controversial or polarizing content, resulting in the spread of misinformation or harmful content.

Scenario #2: An LLM designed to assist with system administration tasks is misaligned, causing it to execute harmful commands or prioritize actions that degrade system performance or security.

By focusing on AI alignment and ensuring that the LLM’s objectives and behavior align with the intended use case, developers can reduce the risk of unintended consequences and vulnerabilities in their LLM implementations.