Michele Guerriero, Damian Andrew Tamburri, Elisabetta Di Nitto

ACM Transactions on Software Engineering and MethodologyVolume 30Issue

1 January 2021 Article No.: 1pp 1–30 

Abstract: Distributed streaming applications, i.e., applications that process massive streams of data in a distributed fashion, are becoming increasingly popular to tame the velocity and the volume of Big Data. Nevertheless, the widespread adoption of data-intensive processing is still limited by the non-trivial design paradigms involved, which deal with the unboundedness and volume of involved data streams and by the many distributed streaming platforms, each with its own characteristics and APIs. In this article, we present StreamGen, a Model-Driven Engineering tool to simplify the design of such streaming applications and automatically generate the corresponding code. StreamGen is able to automatically generate fully working and processing-ready code for different target platforms (e.g., Apache Spark, Apache Flink). Evaluation shows that (i) StreamGen is general enough to model and generate the code, offering comparable performance against a preexisting similar and well-known application; (ii) the tool is fully compliant with streaming concepts defined as part of the Google Dataflow Model; and (iii) users with little computer science background and limited experience with big data have been able to work with StreamGen and create/refactor an application in a matter of minutes.