Description
Terrorism is a major problem worldwide, causing thousands of fatalities and billions of dollars in damage every year. To address this threat, we propose a novel feature representation method and evaluate machine learning models that learn from localized news data in order to predict whether a terrorist attack will occur on a given calendar date and in a given state. The best model (a Random Forest aided by a novel variable-length moving average method) achieved area under the receiver operating characteristic (AUROC) of ≥ 0.667 (statistically significant w.r.t. random guessing with p ≤ .0001) on four of the five states that were impacted most by terrorism between 2015 and 2018. These results demonstrate that treating terrorism as a set of independent events, rather than as a continuous process, is a fruitful approach—especially when historical events are sparse and dissimilar—and that large-scale news data contains information that is useful for terrorism prediction. Our analysis also suggests that predictive models should be localized (i.e., state models should be independently designed, trained, and evaluated) and that the characteristics of individual attacks (e.g., responsible group or weapon type) were not correlated with prediction success. These contributions provide a foundation for the use of machine learning in efforts against terrorism in the United States and beyond.