One of the challenges in designing storytelling systems is the evaluation of resulting narratives. As the story space is usually extremely large even for very short stories, it is often unfeasible to evaluate every story generated in the system by hand.
To help the system designers to maintain control over the generated stories a general method for semi-automatic evaluation of narrative systems based on clustering of similar stories has been proposed. In this paper we report on further progress in this endeavor.
We added new distance metrics and evaluated them on the same domain with additional data. We have also successfully applied the method to a very different domain.
Further, we made first steps towards automatic story space exploration with a random user.