The Parallel Pipelines software for Python enables the concurrent processing of data sets by splitting tasks and running them simultaneously. It helps to increase performance and minimize processing time by utilizing the available computing resources.
With PaPy, you can assign Piper instances to virtual resources referred to as IMaps. These are technically pools of local or remote processes or threads. These IMaps support multiple tasks and are interwoven rather than being evaluated one after the other. What's more, PaPy is flexible, which means that you have complete control over the number of inputs, IMaps, and used processes/threads. Additionally, pipers can be assigned to IMaps arbitrarily, and IMaps can be shared for load balancing.
If you'd like to take advantage of PaPy's capabilities, all you need to do is define functions to the nodes (Pipers) and edges (pipes) and create your IMap instances. These can be created to utilize 4 processes locally or 4 threads, while remote Pool can utilize 8 remote processes on two hosts for a memory-parallelism-laziness trade-off that suits your specific needs.
The input to the pipeline needs to be a collection, and PaPy processes the data in the pipeline in batches of adjustable sizes which allows for a parallel(memory consumption) vs lazy(immediate results) tradeoff. PaPy can handle graph topology configurations that are unrestricted and support cross-platform hosts seamlessly.
In summary, PaPy provides a flexible and powerful framework for constructing and executing pipelines in parallel. It's perfect for developers who need to process large amounts of data and want to achieve a high level of parallelism easily. Its unrestricted graph topology, arbitrary user-function code, and adjustable trade-off between memory-parallelism-laziness make it a must-have tool for those who want to do more with their Python code.
Version 1.0 Beta 1: N/A