Regular expressions are used to characterize sets of strings (i.e., languages) using a pattern-based syntax. They are applied in different contexts as, for example, data validation in web forms.
However, writing a regular expression that exactly captures the desired set of strings could be particularly difficult, and techniques are sought to validate regular expressions or test their use in applications. A common means to regular expression validation and testing is the generation of a set of labeled strings (i.e., strings together with their evaluation).
We here propose a fault-based approach for generating strings usable as tests for regular expressions. We define some fault classes representing mistakes that could be made when writing a regular expression, and we introduce the notion of distinguishing string, i.e., a string that is able to expose a fault.
Given a regular expression, our approach generates a test suite composed of distinguishing strings that are able to detect possible faults in the regular expression. We present different versions of the approach, which provide different results in terms of test suite size and generation time.
Experiments show that the proposed approach can generate compact test suites and that, using suitable optimizations, the generation time is reasonable. Exploiting the proposed fault classes, we use the notion of mutation score to assess the ability of a generic set of strings in exposing possible faults contained in the regular expression under test.
A comparison with other test generation tools in terms of mutation score, size, and generation time shows the advantages and limits of our approach.