You are on page 1of 24

Introduction Data Sources References

Introduction to FLOSS Data Sources


Master on Free Software Daniel Izquierdo
dizquierdo@libresoft.es GSyC/Libresoft

17 de noviembre de 2011

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

(cc) 2011 Daniel Izquierdo. Some rights reserved. This document is distributed under the Creative Commons Attribution-ShareAlike 3.0 licence, available in http://creativecommons.org/licenses/by-sa/3.0/

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

Index

Introduction

Data Sources

References

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

Global software development implies a distributed environment of work In the specic case of FLOSS projects, specic infrastructure is used as communication channels.

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

That infrastructure is generally speaking, publicly available Thanks to this open nature of the repositories of information, it is possible to study them

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

Index

Introduction

Data Sources

References

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

Main data sources

Source code management system Mailing lists Bug tracking system Source code

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

Main data sources: SCM

This is used to manage le versions during the development process It is possible to obtain the state of the source code at any time during its history

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

Main data sources: SCM

Its main pieces of information are the following:


Who: Author/Committer When: Date of commit What: Files and lines touched

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

Main data sources: SCM

There are two main types: centralized and distributed Centralized: CVS or SVN Distributed: Git or Mercurial

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

Main data sources: SCM

git log hg log svn log ... git di hg di ...

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

Main data sources: Mailing lists

Used to organized libre software projects Asynchronous way of communication Basically, those forward received e-mail messages to subscribed e-mail addresses Usually stored in specic formats such as RFC 4155

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

Main data sources: Mailing lists

Some interesting headers:


From: this eld indicates the sender of the e-mail To: this eld indicates the receiver of the e-mail Message-ID: unique identier of the email In-Reply-To: indicates who is the father of this e-mail (will refer to the Message-ID eld) List-id: distribution list

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

Main data sources: Mailing lists

This type of tool is also used as a peer review process of the changes done in the SCM
Lists of Commits-watchers Patches attached

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

Main data sources: Bug tracking system

This tool is used to manage the evolution of the several errors detected in the source code

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

Main data sources: BTS

Some examples:
Bugzilla Sourceforge tracker GNATS: used by the FSF Jira Mantis

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

Main data sources: BTS

Life cycle: http: //www.bugzilla.org/docs/2.18/html/lifecycle.html

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

Main data sources: BTS

Some interesting elds:


Bug id: unique identier Description Opened: date when the bug report was sent Status: current status of the report - new, assigned, reopened, needinfo, veried, etc... Resolution: invalid, wontx, notabug, xed, and others...

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

Main data sources: Source code

Generally obtained by means of the releases of source code bug also from the trunk or master branch of the SCMs

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

Main data sources: Source code

There exist a hierarchical structure It is possible to discriminate by type of le (translation, source code, binary, makele, etc). Denitively, several types of les, and not all of them are useful for evaluating software!

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

Main data sources: Source code

Also interesting to evaluate the copyright and authors of the several les This makes possible similar studies about ownership in the source code as done in the SCM

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

Main data sources: Source code

Also interesting to study licenses (in the end, the license is found in the source code). Between the source code and what a developer claims, the source code will provide the actual information.

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

Index

Introduction

Data Sources

References

Daniel Izquierdo

Introduction to FLOSS Data Sources

Introduction Data Sources References

References

Producing OSS by Karl Fogel Tools and datasets for mining libre software repositories, by Gregorio Robles, Jess M. Gonzlez-Barahona, Daniel u a Izquierdo-Cortzar and Israel Herraiz a The promises and perils of mining git by Christian Bird, Peter Rigby, Earl Barr, David Hamilton, Daniel German, and Premkumar Devanbu

Daniel Izquierdo

Introduction to FLOSS Data Sources

You might also like