In this paper, we introduce a highly scalable sort-merge join algorithm for RDF databases. The algorithm is designed especially for streaming systems; besides task and data parallelism, it also tries to exploit the pipeline parallelism in order to increase its scalability.
Additionally, we focused on handling skewed data correctly and efficiently; the algorithm scales well regardless of the data distribution.