June 10, 2009

A Python library designed for deduplication purposes, helping to identify and remove duplicate values from datasets efficient enough to handle large datasets.

Version 2009-06-10

License GPL v3

Platform Linux

Supported Languages English

Homepage launchpad.net

Developed by Graham Poulter

Dedupe is an impressive Python library that can help you detect similar rows within a table of records such as a CSV file or database. It can also be used for linking the similar rows between two different tables. The processing of records with Dedupe is straightforward and involves three primary steps.

First, the records are indexed into blocks. Then, the comparison function compares all the pairs of records within each block. Finally, the pairs of records are clustered such that they either belong to a match or a non-match cluster.

In summary, if you have a database or CSV file with records that require similarity detection or linking, Dedupe is a reliable tool to consider as it provides a clean and efficient process with clear and accurate output.

What's New

Version 2009-06-10: N/A

Free Download

Softpile

Free Downloads

dedupe

Most Popular

Related Downloads