Adversarial machine learning is an increasingly important research domain to explore attacks on learning algorithms that attempt to influence predictions or learned models via data poisoning, and provides defense mechanisms against adversarial tampering. Research in the field of adversarial machine learning started in the early 2000s (e.g., Dalvi et al., 2004), but attracted considerable attention only in the mid-2010s, after the affect of malicious input on the vulnerability of deep neural networks has been clearly demonstrated by Szegedy et al.
Attacks against supervised machine learning algorithms can be classified by
- Attributes of the Attacker
- The Capability of the Attacker
- Attack influence
- Causative: the attacker can manipulate both training and test data (poisoning attacks)
- Exploratory: the attacker can manipulate test data only (evasion attacks)
- Presence or absence of data manipulation constraints
- Attack influence
- The Goal of the Attacker
- Security violation
- Integrity violation: attempts to get malicious samples misclassified as legitimate
- Availability violation: increases the misclassification rate of legitimate samples to make the classifier unusable (DoS)
- Privacy violation
- Security violation
- The Knowledge Available to the Attacker
- Perfect knowledge (white box) attacks
- Limited knowledge (gray box) attacks
- Zero knowledge (black box) attacks
- The Capability of the Attacker
- Attack specificity
- Targeted: specific samples are considered
- Indiscriminate: samples are not limited to specific samples
- Attack strategy
For more on adversarial machine learning in cybersecurity, see AI in Cybersecurity.