Debian vs. SourceForge – Round 2

Posted on February 10th, 2009 by Karl Beecher

And so, we revisit the posers put up in a previous post:

  1. Are Debian’s evolutionary characteristics significantly different to those of SourceForge?
  2. Does Debian act as a catalyst?

To answer these questions, we took a closer look at the software inside them. I’ll briefly explain the method here, but details of the steps will be part of later posts in the “Research Methods” strand.

We chose a mutually exclusive sample of 50 packages from Debian and 50 projects from SourceForge.  In both cases they were taken from the pool of “stable” projects only. They were all downloaded and each project’s activity was extracted from their version control system (using log commands) and recorded in a file. Then we delved into our little toolbox and used some nifty tools to extract the information we needed, that information being the project’s:

  • Age (time between first and last commit to the version control system)
  • Size (in lines of code)
  • Number of developers (monthly average)
  • Number of commits (monthly average)

Each attribute can be aggregated from the 50 projects into a summary value for the repository. So, for example, we can take the ages of the 50 Debian projects and use them to get a mean or a median age. If we do the same thing for SourceForge we can compare them.

And that’s just what we did.

And here’s just what we found:

Boxplots of measured attributes

Boxplots of measured attributes

Using statistical significance testing (again, I’ll cover this in a “Research Methods” post) we found that Debian projects had larger values for each attribute, i.e. they were older, larger, and attracted more developers who peformed a greater amount of work, all to a significant degree.

This leads us to our second question, is Debian responsible? Is it somehow a driver for these larger values? Our answer to this question comes in round 3.

Tags: , , ,

4 Responses to “Debian vs. SourceForge – Round 2”

  1. Karl Beecher says:

    For the original paper this work appeared in:

  2. Olivier Berger says:

    What do you call “Debian software” ? … you mean free software that is packaged by Debian ?

    Then, some of it may be maintained on SourceForge also…

    Or you mean, the packaging work only ?

    Really not clear :-/ … will read the paper to try and know more.

  3. Olivier Berger says:

    OK, I’ve read the paper (not in a detailed way)… and I think it’s pretty obvious that software in Debian is meant to be used by users, so this piece of software should be usefull, at first, and also mature enough… Compared to sourceforge, where maybe only sources are available, and one has to download and recompile to be able to use it, which means that maybe users are less prone to testing it, hence diminishing its “success”.
    So… wasn’t the sample a little biased from the beginning ? …

    Sourceforge is full of dead project, whereas Debian will drop unmaintained software. Period.

    Did I get it wrong ?

  4. Karl Beecher says:

    Hi Olivier, thanks for the comments.

    As English & Schweik, and Howison & Crowston have shown, you have to be careful when studying SourceForge because of the amount of stalled projects on there; hence we used projects marked “stable” only. And yes, we use Debian packages, making sure both samples contain projects not found in the other repository. I’ve updated the post to reflect all this.

    I don’t think the sample is biased because it isn’t fully established if, and to any extent, there is a difference in the rates of evolutionary activity. We could certainly *suspect* it, but then we’d only get as far as having a hypothesis, which is exactly what we started with in the study, and ended up testing. I could just as easily say “SourceForge projects will have more evolutionary activity because the barrier to entry is much lower and it’s easier to get to the source code”. It very easy to just say it, but I’m interested in what the *evidence* shows.

Leave a Reply