LLM10:2023 - Training Data Poisoning

Description:
Training data poisoning occurs when an attacker manipulates the training data or fine-tuning procedures of an LLM to introduce vulnerabilities, backdoors, or biases that could compromise the model’s security, effectiveness, or ethical behavior.

Common Training Data Poisoning Issues:

How to Prevent:

Example Attack Scenarios: Scenario #1: An attacker infiltrates the training data pipeline and injects malicious data, causing the LLM to produce harmful or inappropriate responses.

Scenario #2: A malicious insider compromises the fine-tuning process, introducing vulnerabilities or backdoors into the LLM that can be exploited at a later stage.

By ensuring the integrity of the training data, implementing robust data sanitization techniques, and regularly auditing the LLM’s training and fine-tuning processes, developers can minimize the risk of training data poisoning and protect their LLMs from potential vulnerabilities.