Adversarial Attacks: The Weak Spot of Black Box Algorithms
May 13, 2019

A recent study in the journal Science received coverage in the New York Times and other news outlets for its discussion of “adversarial attacks” –  small manipulations to data that can cause an AI system to fail. In one of the study’s examples, Finlayson and other researchers changed a tiny number of pixels (called perturbations) in an image of a benign mole, which fooled the AI system into classifying the mole as malignant with 100% confidence. [i]

Finlayson and his colleagues argue that scenarios like these need to be considered when developing policies for the safe development and regulation of AI technologies in healthcare. Certainly, for an AI to be trusted for use in patient care, there must be some degree of confidence that the system will not experience catastrophic failure when small perturbations are made to the input image.

Not all AI systems are easily fooled by adversarials

Fortunately, not all AI systems are so easily fooled. Studies by Lynch et al[ii] and Shah et al[iii] found that single neural network (or so-called “black box”) algorithms are more susceptible to catastrophic failure than biomarker-based algorithms that have a more robust framework.

The researchers set out to determine the susceptibility of various AI algorithms to catastrophic failure due to small perturbations in the image. They focused on AI algorithms that diagnose diabetic retinopathy, a complication of diabetes and a leading cause of blindness. To do this, the researchers used pictures of the retina (the back part of the eye) and made small adjustments to the pixels in the images that are invisible to the human eye.

A few pixels changed can lead to catastrophic failure

“We found that single neural network algorithm designs are prone to catastrophic failure from invisible perturbations,” said Shah. “It only takes changing a few pixels of an image to cause the algorithm to falsely report that there was no disease, even though there were clear signs of disease in the image.”

In the above image, these two retinas look identical to the naked eye. Any trained eye care specialist would look at both of these images and immediately recognize the presence of disease – exudates, hemorrhages and other lesions, which are biomarkers for retinal vasculopathy. Yet the image on the right, which had slight changes made to its pixels, causes single network designs to fail to recognize the obvious signs of disease 97% of the time.

Despite performing favorably under normal circumstances, when introduced with adversarial input, the algorithm failed catastrophically, and this was the case for different network designs and different training data sets.

Biomarker-based AI systems are more robust to adversarials

The same study also measured the susceptibility of biomarker-based AI designs. Such designs use multiple, partially dependent detectors, just like the visual cortex of primates and human clinicians has evolved.

“We found that biomarker-based AI designs are more robust to adversarial input than single neural network designs,” said Shah.

The risk involved with the “black box”approach is an important reason why the first FDA-cleared autonomous AI diagnostic system uses a biomarker-based architecture. The system, called IDx-DR, has detectors that respond to the same abnormalities and biomarkers that a clinician does, which acts as a checks and balances system should the image-based algorithm be faced with perturbed inputs.

An AI’s vulnerability to adversarials raises questions about patient safety

While “black box” algorithms can certainly be high-performing, their increased vulnerability to adversarial attacks raises questions about whether they have a sufficient level of transparency that aligns with the principles of safety, efficacy and equity of AI in patient care.

"Adversarial attacks represent a challenging and fascinating component of AI safety, in part because our understanding of the risks, incentives, and potential defenses all continue to be in rapid flux,”said Finlayson. “My colleagues and I were happy to hear that IDx has been taking these issues seriously, which is a real testament to their thoughtful leadership as the first company to have a fully autonomous AI device in the clinic.”


[i] Finlayson SG, Bowers JD, Ito J, ZittrainJL, Beam AL, Kohane IS. Adversarial attacks on medical machine learning.Science. 2019;363(6433):1287–9.


[ii] Lynch SK, Shah A, Folk JC, Wu X,Abramoff MD. Catastrophic failure in image-based convolutional neural networkalgorithms for detecting diabetic retinopathy. Investigative Ophthalmology& Visual Science. 2017Jun23;58(8):3776–.


[iii] ShahA, Lynch S, Niemeijer M, Amelon R, Clarida W, Folk J, et al. Susceptibility tomisdiagnosis of adversarial images by deep learning based retinal imageanalysis algorithms. 2018 IEEE 15th International Symposium on BiomedicalImaging (ISBI 2018). 2018;

Back to Latest Posts