Parallel data processing and parallel streaming systems become quite popular. They are employed in various domains such as real-time signal processing, OLAP database systems, or high performance data extraction.
One of the key components of these systems is the task scheduler which plans and executes tasks spawned by the application on available CPU cores. The multiprocessor systems and CPU architecture of the day become quite complex, which makes the task scheduling a challenging problem.
In this paper, we propose a novel task scheduling strategy for parallel data stream systems, that reflects many technical issues of the current hardware. In addition, we have implemented a NUMA aware memory allocator that improves data locality in NUMA systems.
The proposed task scheduler combined with the new memory allocator achieve up to 3x speed up on a NUMA system and up to 10% speed up on an older SMP system with respect to the unoptimized versions of the scheduler and allocator. Many of the ideas implemented in our parallel framework may be adopted for task scheduling in other domains that focus on different priorities or employ additional constraints.