Debian vs. SourceForge – Round 1

February 3, 2009April 9, 2009 Karl Beecher ResearchDebian, evolution, metrics, SourceForge

We all know about SourceForge and Debian. Although they have different purposes, they both act as repositories of free software, and most of the practitioners will know that Debian hosts what is considered to be the best projects — judged most worthy by its army of package maintainers. Conversely, many (but by no means all) SourceForge projects languish in obscurity; these are, at best, of little interest outside of the developers who run them, or, at worst, have completely stalled. It is conventional wisdom then that Debian projects receive much more activity from developers than those on communities like SourceForge.

So today’s research question is: How true is this? How much more activity (if at all) do projects in Debian actually receive than their counterparts in SourceForge? To answer this query, two quantifiable and measurable questions are proposed:

Are the evolutionary characteristics of Debian projects significantly different from those in SourceForge? (In other words, do Debian projects receive so much more activity that we cannot conclude that random statistical noise is responsible for the difference?)
Does Debian act as a “catalyst”, so that when project are entered into Debian’s repository, the activity around the project increases?

To answer the questions, we need to measure proxies of evolutionary activity. We chose:

Project age
Project size
Number of developers
Number of commits

How these attributes were measured, and how they helped to answer the questions, will be addressed in the follow-up post.

7 thoughts on “Debian vs. SourceForge – Round 1”

Joss Winn says:

February 3, 2009 at 14:54

Karl, what do you think of the more recent use of distributed revision control, where all participants have a full working local copy of the repository i.e. github and gitorious? I woke up in the middle of the night the other day, thinking about this and noted to myself: “Git could be Torvald’s equivalent of the Linux kernel for collaboration. Provides a core tool set for many kinds of collaboration layers” I then had a look around and see I’m not the only one thinking this.
James Howison says:

February 3, 2009 at 16:48

I also encourage you to measure concentration; ie do people step forward and do more of the work, or is it that new people come online.

How do you intend to assess the statistical significance of any change? I looked into “interrupted time series” analysis for this; it’s largely an econometric technique which handles the endogenity of the time-series (which is a big problem for regression etc).
kbeecher says:

February 4, 2009 at 16:58

Joss: Distributed version control repositories present a bit of challenge to free software researchers. So far, I haven’t had much exposure to them (I’ve limited my research to CVS and SVN), but GIt is rising in prominence, and I know there are researchers looking into it.

How researchers can deal with them when it comes mining them depends on how they’re used. If they’re used in essentially the same “centralized repository” fashion as their predecessors, that shouldn’t be too much of a challenge. But when the repository is distributed over many locales,there’s whole new challenges. But I suspect constructing a solution isn’t too difficult; after all, the developers themselves need to manage their repos in an organized way. As long as you know their modus operandi for this, you can know what you’re capable of doing with regards to mining.
kbeecher says:

February 4, 2009 at 17:00

James: This article is a brief description of some completed and published research. Perhaps your questions will be answered in the next post; or I can point you to a copy of the paper that features it.
James Howison says:

February 4, 2009 at 19:01

Thanks, I’ll keep my eyes open. Feel free to send me the paper directly, too.

Cheers,
James
Free Software Miscellany » Debian vs. SourceForge - Round 2 says:

February 10, 2009 at 12:20

[…] And so, we revisit the posers put up in a previous post: […]
Karl Beecher says:

February 10, 2009 at 12:23

For the original paper this work appeared in: http://eceasst.cs.tu-berlin.de/index.php/eceasst/article/view/113/111

Computer Floss

Delightful digital distractions in the world of free, libre and open source software

Debian vs. SourceForge – Round 1

7 thoughts on “Debian vs. SourceForge – Round 1”

Leave a Reply