This paper introduces BDgen, a generator of Big Data targeting various types of users, implemented as a general and easily extensible framework. It is divided into a scalable backend designed to generate Big Data on clusters and a frontend for user-friendly definition of the structure of the required data, or its automatic inference from a sample data set.
In the first release we have implemented generators of two commonly used formats (JSON and CSV) and the support for general grammars. We have also performed preliminary experimental comparisons confirming the advantages and competitiveness of the solution.