Clone Digger

 discovers duplicate code in Python and Java

Overview

Clone Digger aimed to detect similar code in Python and Java programs. The synonyms for the term "similar code" are "clone" and "duplicate code".

What is software clone?
Two continuous fragments of code form clone if they are similar enough.

Why is it important to detect clones?
The presence of clones can increase maintenance cost of the code. Detected clones can be refactored or just kept in mind.

Why should I use Clone Digger to detect clones?
There are several clone detection tools, they are listed here.
The benefits of Clone Digger are:

  • Variety of handled clone types. Strictly speaking, a pair of sequences of statements is considered a clone if one sequence can be obtained from the other by replacing some small subexpressions. Particularly, changes of variable and function names and constants are allowed.
  • It's free (provided under the GPL license).

Clone Digger uses complex algorithm of finding duplicate code, consisting of several phases. The principles can be found in the paper "Duplicate code detection using anti-unification".

How to use Clone Digger?
All allowed arguments are listed here.
In general, the process of finding clones can be compared to sieving. You should choose middle size of the mesh in order to find something interesting. There are two dimensions of the mesh in Clone Digger: the minimal size of clones (set by --length-threshold parameter) and the maximum distance between sequences, which are reported as clones (set by --distance-threshold).

Happy searching!