Description:
Training data poisoning occurs when an attacker manipulates the training data or fine-tuning procedures of an LLM to introduce vulnerabilities, backdoors, or biases that could compromise the model’s security, effectiveness, or ethical behavior.
Common Training Data Poisoning Issues:
How to Prevent:
Example Attack Scenarios: Scenario #1: An attacker infiltrates the training data pipeline and injects malicious data, causing the LLM to produce harmful or inappropriate responses.
Scenario #2: A malicious insider compromises the fine-tuning process, introducing vulnerabilities or backdoors into the LLM that can be exploited at a later stage.
By ensuring the integrity of the training data, implementing robust data sanitization techniques, and regularly auditing the LLM’s training and fine-tuning processes, developers can minimize the risk of training data poisoning and protect their LLMs from potential vulnerabilities.