As software architecture practice relies more and more on runtime data to inform decisions in continuous experimentation and self-adaptation, it is increasingly important to consider the quality of the data used as input to the different decision-making and prediction algorithms. One issue in data-driven decisions is that real-life data coming from running systems can contain invalid or wrong values which can bias the result of data analysis.
Data-driven decision-making should therefore comprise detection and handling of data anomalies as an integral part of the process. However, currently, anomaly detection is either absent in runtime decision-making approaches for continuous experimentation and self-adaptation or difficult to tailor to domain-specific needs.
In this paper, we contribute by proposing a framework that simplifies the detection of data anomalies in timeseries-outputs of running systems. The framework is generic, since it can be employed in different domains, and tunable, since it uses expert user input in tailoring anomaly detection to the needs and assumptions of each domain.
We evaluate the feasibility of the framework by successfully applying it to detecting anomalies in a real-life timeseries dataset from the traffic domain.