Clone Digger

 discovers duplicate code in Python and Java

General Information

You can take part in the Clone Digger evolution by means of participation Google Summer of Code 2008 program.

In the case of success you will be paid by Google and gain experience in designing new algorithms and implementing them in Python language.

Ideas List

  • To handle trivial clones faster.
    Clone Digger uses complex algorithm, which allows to find nontrivial clones (see documentation section).
    But some users can be interested in finding very simple clones, and they don't want to wait for 30 minutes to find such clones in large projects (for instance BioPython).
    A developer will be asked to add the feauture of fast-processing of trivial clones to the Clone Digger. By trivial clones we mean clones of two types:
    clones which differ in insignificant whitespaces and comments only,
    clones which differ in insignificant whitespaces,comments and function and variable names and constants
  • To correct the highlighting in the output HTML for Python code.
    As it can be seen on the examples page the output for the Python language has the following defect: the differences are highlighted on the string level and it can be confusing for the user (see the first clone of this example). It will be better to highlight differences on the abstract syntax tree (AST) level. When I tried to do it I faced the following problem: the AST structure doesn't provide the information about the position of the corresponding substring in the source code. The problem can solved by hacking the parser library (see my email here), but this solution will not be cross-platform. I suggest two different ways of solving the problem: using the ANTLR Python grammar or print the AST trees in the pretty form. A developer will be able to choose the way of solving this problem himself.
  • Automatic refactoring.
    It will be better to show the way of eliminating clones by offering possible refactoring. The example can be seen here. Trivial semantic analysis should be applied. For example, global variables can't act as function arguments and locally initialized variables can't act also.
  • To try to embed Clone Digger into Eclipse.
    This improvement will increase the usability of Clone Digger. The example can be seen here.

UPDATE: all ideas have been taken by the students