September 22, 2009

The Parallel Pipelines software for Python enables the concurrent processing of data sets by splitting tasks and running them simultaneously. It helps to increase performance and minimize processing time by utilizing the available computing resources.

Version 1.0 Beta 1

License GPL v3

Platform Linux

Supported Languages English

Homepage muralab.org

Developed by Marcin Cieslik

If you're looking for a Python framework that can assist you with building and executing pipelines in parallel, PaPy might be the perfect fit for you. With this powerful tool, you can construct flow-charts of arbitrary tasks, and execute them in parallel using multi-processing or multi-threading both locally and remotely using RPC. PaPy enables you to represent the pipeline as a directed acyclic graph and define the nodes and edges (pipes) to represent data-flow or dependency.

With PaPy, you can assign Piper instances to virtual resources referred to as IMaps. These are technically pools of local or remote processes or threads. These IMaps support multiple tasks and are interwoven rather than being evaluated one after the other. What's more, PaPy is flexible, which means that you have complete control over the number of inputs, IMaps, and used processes/threads. Additionally, pipers can be assigned to IMaps arbitrarily, and IMaps can be shared for load balancing.

If you'd like to take advantage of PaPy's capabilities, all you need to do is define functions to the nodes (Pipers) and edges (pipes) and create your IMap instances. These can be created to utilize 4 processes locally or 4 threads, while remote Pool can utilize 8 remote processes on two hosts for a memory-parallelism-laziness trade-off that suits your specific needs.

The input to the pipeline needs to be a collection, and PaPy processes the data in the pipeline in batches of adjustable sizes which allows for a parallel(memory consumption) vs lazy(immediate results) tradeoff. PaPy can handle graph topology configurations that are unrestricted and support cross-platform hosts seamlessly.

In summary, PaPy provides a flexible and powerful framework for constructing and executing pipelines in parallel. It's perfect for developers who need to process large amounts of data and want to achieve a high level of parallelism easily. Its unrestricted graph topology, arbitrary user-function code, and adjustable trade-off between memory-parallelism-laziness make it a must-have tool for those who want to do more with their Python code.

What's New

Version 1.0 Beta 1: N/A

Free Download 1M

Softpile

Free Downloads

PaPy

Most Popular

Related Downloads