You are on page 1of 6

Proceedings of the 2009 13th International Conference on Computer Supported Cooperative Work in Design

Analyzing the use of VoIP Technology in Collaborative Modeling


Mauro C. Pichiliani1, Celso M. Hirata1 1 Instituto Tecnologico de Aeronautica, Sao Paulo, Brazil {pichilia,hirata}@ita.br Abstract
Collaborative Editing Systems often use communication channels in order to make the collaboration more effective. In this work, we present a case study of pair communication using audio via VoIP (Voice over Internet Protocol) technology during collaborative modeling sessions. The study provides an analysis of audio and textual chat as communication medium and presents data on usage patterns, user interaction and attitudes when using collaborative editing systems. The qualitative and quantitative analysis suggests that VoIP technology does have advantages over textual chat in collaborative modeling when used for communication. Keywords: Audio Collaboration. conferencing, VOIP, Chat, study was conducted in which users were divided into pairs that had either an audio or a text-based channel to communicate. Qualitative and quantitative analysis of data from the case study is presented to support the findings that VoIP technology, when used in this context, does have advantages over the text when it is used for communication. The rest of the paper is organized as follows. Related work and background studies are presented in Section 2. Section 3 describes a case study conducted to collect empirical data about the use of text and audio as a channel of communication. The section also presents the qualitative and quantitative analysis. Finally, Section 4 presents the conclusions, comments and the future work.

2. Related Work
In order to make the work more productive and reduce the costs of physical group meetings, real-time synchronous collaborative editing systems (CES) have been employed to allow multiple users edit a document simultaneously. Using CES users collaborate to accomplish tasks which would otherwise be difficult for individuals, such as those that require the synchronous interaction of users located in distributed geographically areas. However, the group work provided by CES can become unproductive and expensive without communication. In CES, communication is necessary, among other reasons, to coordinate users in order to define when they work and how they work. According to Kraut et al. [8], coordination can be defined as the activity of directing individuals' efforts towards achieving common and explicitly recognized goals. To study the coordination aspects of communication researchers divide it into two types: formal and informal. On formal communication a coordination mechanism is used in many degrees of formality. In this type of communication the coordination is accomplished by adherence to common rules, regulations, and standard operating procedures, through pre-established plans, schedules, forecasts and other standardized communications. The formal coordination mechanisms have in common communication that is specified in advance, is unidirectional, and is relatively impoverished [8]. Informal communication is defied by Whittaker et al. [14] as a long intermittent conversation containing multiple unplanned fragments that often lack openings and closings. In their work Whittaker et al. indicate that

1. Introduction
The ability to communicate is a well known factor that directly affects how people collaborate in order to perform tasks which demand group work. Many communication channels have been used in real-time synchronous collaborative editing systems (CES) to promote the interaction among multiple users while editing a shared document simultaneously. The common communication channels used to promote the interaction during group work are text and audio, which are provided by text-based chat and audio conferencing tools, respectively. Despite the widespread research in communication in multimedia conferencing systems, as presented in [5, 6, 10, 11], few efforts have been made to evaluate how the audio conferencing features provided by the VoIP (Voice over Internet Protocol) technology can be used in applications which are context dependent on collaborative requirements, such as collaborative modeling. The features that the VoIP technology offer include high-fidelity stereo audio, low bandwidth cost and the ability to change audio settings such as volume, echo feedback and level of distortion. The goal of this paper is to investigate how communication provided by the VoIP technology affects the collaboration in CES. The CES is a UML (Unified Modeling Language) collaborative editor. In order to investigate the benefits of VoIP in CES sessions, we compare text and audio usage. A case
978-1-4244-3535-7/09/$25.00 2009 IEEE

informal communication supports a number of different functions, such as the execution of work-related tasks, coordination of group activity, transmission of office culture, and social functions. The formal and informal types of communication are related to the nature of interaction in terms of scheduling, content or protocol, but do not necessarily characterize a particular modality of communication. It means that either audio or text-based communication can be formal or informal. However, the informal communication has been reported to account for over 30% of total work time, with over 90% of this time being spent on unplanned conversations [8, 14]. In terms of functional characteristics, formal and informal communications are best suited to different types of activities. Formal communication tend to be used for coordinating relatively routine transactions within groups and organizations while informal communication support group coordination, especially under conditions of uncertainty and novel or unplanned events, which are likely to occur during the use of CES. The common used communication channels in CES include the textual chat and the audio conference. There are some CES that use video conferencing over Internet as well; however, it is traditionally supported by dedicated conferencing rooms, which cut out the possibility of supporting informal communication, as indicated by [11]. Our conjecture is that a combined use of CES and video conference may be compromised because both systems require a heavy communication bandwidth in order to provide an acceptable quality of service. We also conjecture that the usage of both systems requires some formal coordination since the systems usually have their own coordination mechanisms whose usages may conflict with each other. A textual chat is defined as a "live" text-based synchronous communication where a participant type a message and it is immediately available to the group of participants; other participants eventually read the message and then write and send a response, which is also immediately available to the group. Textual chats can happen via instant messaging or via a virtual space, called a "chat room". Audio conferencing is analogous: a participant says a message on a microphone that immediately is reproduced on the sound speaker of the other participants and vice-versa. In general, textual chat is easier to install and set-up than audio communication because of the weaker requirements, i.e. low bandwidth and no need for microphones or speakers. Conversely, the audio conferencing systems are more complex to both implement and use because they require more resources [11]. In order to facilitate the use of audio conference in collaborative applications, many CSCW (Computer Supported Cooperative Work) designers are employing VoIP technology. VoIP is a technology that allows the transmission of audio over the IP network. The audio is encoded by a

codec, the software that define how the analog audio signal is converted to digital stream and then back again, and whether it is compressed along the way. When using standard and mobile telephones it is only possible to transmit audio at 8 kilohertz mono (8k) while using VoIP technology it is technically possible to transmit voice at CD quality (44.1k) or higher in stereo. The main reason to choose VoIP over telephone technology to promote audio conference is the quality of the audio. Yankelovich et al. [13] state that human brains are tuned to understand speech, even under the worst of audio conditions, but the clearer the audio signal, the easier it is to understand. When an audio signal is degraded due to low fidelity or background noise, as in telephone calls, some problems occur. First, listeners have to strain to hear the speech, thus expending considerable mental effort to understand the words. The effort makes it more difficult to focus on the content of what is being said. Also, a degraded audio signal makes considerably more difficult to understand soft-spoken and accented speakers. Therefore, if the clarity of the audio signal is improved the effort needed to understand the meaning of what a remote person says decreases. The VoIP technology became popular when VoIP providers, such as Skype [12], offered telephony services through Internet with low delay, high fidelity voice quality, no jitter and low costs. Other VoIP service providers developed applications that can be used in any network that supports the IP protocol, which are known as softphones in opposition to the real world physical telephones. The Telecommunications and CSCW literature contains many studies comparing the effects of audio and text in different settings [3, 7, 10]. In most of those studies the textual chat is often reported as an inferior form of communication based on the arguments that it is less effective for building trust than audio or video and that users prefer video and audio instead of text to communicate. Whereas those studies evaluate many collaborative settings, none of them takes in consideration the context dependent nature of collaborative modeling. There are few studies in the CSCW literature that evaluate the context dependent nature of collaborative modeling without remote communication features. Damm et al. [2] present a qualitative evaluation of workspace awareness by analyzing the use of Distributed Knight, a gesture-based diagramming tool that supports distributed collaboration for modeling of UML diagrams. Fidas et al. [4] examine the effect of heterogeneous resources during computer-supported problem solving promoted by the collaborative modeling of an educational activity in a secondary school. However, both studies do not evaluate the use of audio as a communication channel.

3. Case Study
The case study presented in this section includes an experiment that was conducted in a controlled environment in order to collect empirical data about how audio, provided by VoIP technology and text are used to support informal communication.

3.1. Theoretical Formulation


The evaluation design and methodology used in the case study presented in this section followed the guidelines suggested in CSCW Lab proposal for groupware evaluation [1]. According to CSCW Lab, a groupware evaluation can focus on one or more of three dimensions: evaluating the tool usability; evaluating if the tool helps participants to reach an appropriate collaboration level; and evaluating the cultural impacts the tool bring to each individual, to the group or to the organization. The evaluation presented in this work focus on the evaluation of the tool usability, more specifically the evaluation of the communication channel and its effects on the collaboration. Whatever is the evaluation aim, the CSCW Lab suggests that the group context must be well understood in order to interpret each evaluation result. The group context of the case study is the collaborative modeling of UML diagrams, since this type of modeling is known to generate a high number of interactions and dialogs between the modelers. Also, the analysis of students peer interaction conducted by Fidas et al. [4] reveals that the informal communication during computer-supported problem solving, promoted by the collaborative modeling, generated a high number of exchanged messages, involved the students in deeper discussions, and collaborated for building the constituent parts of the solution. However, the analysis evaluated only face to face communication. We argue that a high quality audio communication channel, powered by VoIP technology, provides more benefits for the collaboration than a textual channel, since the VoIP technology allows conferences with audio quality similar to face to face dialogs. The hypothesis for this case study is stated as: the use of high quality audio provided by VoIP technology can provide more benefits than text to support informal communication during the collaborative modeling of UML diagrams. To verify this hypothesis we conduct an experiment in a controlled environment in order to collect empirical data about how audio and text are used to support informal communication. The prototype used in the experiment was built using the mapping proposed by Pichiliani and Hirata [9] and consists of adaptation of an open-source single-user application named ArgoUML. A textual chat tool was embedded in the prototype and a VoIP softphone was used for the audio conference.

We ran an assessment with volunteers who knew each other only by name. The participants have similar communication, computer and modeling skills and were carefully selected so that they had similar knowledge about the tasks that they would perform. Another required condition to join the experiment is the completion of a Software Engineering course that taught how to model using the UML notation. The experiment was conducted by a group of twelve computer science graduate students. During the design phase of the experiment we observed how small software development teams created UML diagrams and found that the modeling task is usually made by two project members, a software architect and a domain expert, in physical collaborative modeling sessions. Based on this observation we divided the students into six pairs. The first three pairs, known as the chat group, were able to communicate only through the textual chat. The other three pairs of students, known as the VoIP group, could communicate only through the softphone. We are not able to set up a telephony audio testing facility in our environment, leading us to run the experiment using real-life conditions in the context of a real audio conference and textual chat. Each member of the pair was taken to a separated room with an observer that monitored the behavior of the students while the facial expressions and the communication they produced were filmed and recorded in tapes. Before the start of the experiment the students received a questionnaire asking about their previous knowledge of UML, collaboration technologies and other social aspects. Then the students received tutorials introducing them to the prototype and explaining how to elaborate simple UML Class and Use Case diagrams with the help of the prototype and how to change the settings of the softphone (volume, echo and level of distortion). After this training the students started the main part of the experiment. Each pair completed three collaborative tasks in which they received fictitious scenarios. For the first task the students modeled a Class diagram based on a scenario describing a soccer match. For the second task the students modeled a Use Case diagram based on a fictitious scenario describing how a dentist office works. For the third task they were asked to change an existing class diagram, which is based on a DVD rental store, to support the classification of movies according to four provided genres of movies and three rental prices. At the end of the last task, the students answered a final questionnaire with general questions about the experiment. Finally, each student participated in a interview with observers. The prototype recorded logs to collect quantitative data, including the dialog between the students.

3.2. Data Analysis


Qualitative and quantitative data were obtained from questionnaires, interviews and observations conducted during the experiment. By analyzing the answers of the questionnaires from both groups we found that the commentaries made by the students on the VoIP group indicate the effect of the communication on the collaboration. They also imply that the communication was somehow beneficial. A student commented the following: We can see that there was collaboration between the pair mainly because of the VoIP. The doubts between us were resolved in real time and the work was done more quickly. All the students from the VoIP group made brief and positive commentaries about the quality and the ease of use of the audio communication. In fact, the opinion that the audio conferencing is beneficial was shared by all our respondents in the VoIP group, even those that consider themselves as less active. Although the questionnaire does not contain any question about the direct effect of the communication channel, we could not perceive any positive mention about the effects of the communication channel in the chat group. This absence indicated that most students do not consider it worth mentioning or do not like the textual chat. The analysis of the chat transcripts suggests that there were some coordination and awareness issues caused by the nature of the textual chat. One specific problem of the textual chat that affected the coordination is the high number of mistyped words found in the transcript. These mistyped words increased the effort needed to understand the messages exchanged by the students. The understanding problem did not happen on the VoIP group, since the few misspelled words were understood correctly by them. The reason why the understanding problem does not happen in the VoIP group probably is due to the high-quality audio provided by the softphone that clarified the students pronunciation of the words. Some dialogs show that sometimes a student in the chat group stopped working while waiting an answer for a question that he had made. This behavior did not happen in the VoIP groups probably due to the synchronous property of the communication and the short time spent to answer the questions made. The students of the chat group demonstrated difficulties to coordinate their actions. One of the reasons for that is the lack of sufficient communication related to the task they performed concurrently. An example of this behavior is presented on the excerpt below that was typed immediately after one student realized that he duplicated the work of his partner:

On the next time we could try to coordinate our actions to avoid the risk of one of us does the work that the other is doing. Coordination problems did not happen in the VoIP group. This may be explained by the fact that this group not only exchanged more messages than the chat group but also communicated in a more elaborated way. Analyzing the audio of the VoIP group we found that students discussed alternative scenarios, created hypothetical situations, taught each other on topics related to UML modeling and even made assumptions about the fictitious scenarios presented in each task, among other discussions. The following suggestion, obtained from a student of the VoIP group, illustrates how elaborated was the communication. Lets suppose that in the class that you suggest we create an attribute to store this property () This is evidence that the conversation in the VoIP group was more elaborated than the conversation that occurred in the chat group. In the chat group, however, the students did not suggest any alternative solution or possible scenarios. Instead of an elaborated conversation, the transcript of the chat group shows that during the first and second tasks, for two distinct chat pairs, one student assumed the role of a leader, stating his ideas and saying to the other student what to do. When this behavior happened the other student followed the directions of the leader and occasionally made some observations, thus assuming a passive role in the collaboration. Other chat pairs showed a more democratic decision making process allowing students to exchange their strategies to model the diagrams. When the leader behavior happened, it was possible to note that the activities were not divided equally between the pair. The measure used to make this claim is the number of elements created, modified and deleted by each student in each pair of each group. In the VoIP pairs, however, the activities were divided almost equally. There is no data suggesting that the leader behavior happened in the VoIP pairs. In general, the qualitative analysis shows strong evidence to advocate that the communication in the VoIP group was more elaborated, contributed to a more richer discussion of ideas and promoted a more smooth and useful conversation than the communication produced by the chat group. At the end of each task the students indicated to the observers when they finished the diagrams. The average time spent in the three tasks performed by the chat group was almost 30% less then the average time that the VoIP group spent in the same three tasks. One reason to explain this difference is that talking is easier and faster to communicate than typing [11], thus students tend to express themselves more in the audio channel than in the text channel. This is confirmed by the fact that the chat groups do not communicate as

much as the VoIP groups, since the average word count per minute of both students of the chat group, for the three tasks, was 10.82 against the 26.15 value for the same metric in the VoIP group, which is more than twice. Although the VoIP pairs took longer to communicate and their communication was more elaborated than the pairs of the chat group, we cannot conclude that the VoIP group was more efficient. To compare the productivity it is necessary to evaluate the groups performances considering the effects of the communication channel and the quality of the work produced. With this in mind, we presented the diagrams modeled in the experiment to two UML specialists that evaluated them by assigning grades to objective characteristics, such as the errors on the diagrams, the clarity of the ideas and the representativeness of the model. We are still analyzing the grades produced by the specialists, but preliminary quantity analysis indicates that the collaborative models produced by the students with a high level of interaction during the sessions (the VoIP group) received better grades than those made by students that do not interact much with their partners (the chat group). Another metric calculated for this analysis is the number of elements created, modified and deleted by each student in each pair of each group. In order to analyze these data it is useful to take into account the nature of the task. For instance, in the third task, the students already have a diagram with a few elements, so in this task the number of elements created and deleted by the pair is less, on the proportion of 2:1, than the number for other tasks on the two groups. In the task 1 and 2, which both require the creation of a diagram from scratch, the number of elements created, modified and deleted by the pair, as a whole, was almost equal in both groups, with less than 5% of variation. In order to further investigate the effort needed to use the VoIP and the textual chat, questions were added regarding the perception on how hard was the communication during the experiment. At the end of each task the students were asked to rate their effort on a 5 point Likert scale (1 is very hard 5 is very easy). The data collected with the Likert scale suggest that discussion was harder in the chat group. The quantitative difference can be seen in the graph presented in Figure 1, showing the average effort perception in the three tasks of the experiment for each group. In the first task, most of the students of the chat group assigned the medium value of the Likert scale, while the students of the VoIP group indicated that the communication was easier, with one point of the Likert scale of difference. In the second task, students of both groups indicate the same values on the average perception effort question. In the third task, the average effort perception of the VoIP group was a little higher than the chat group.

The analysis of the quantitative data suggest that the VoIP group took longer to complete the tasks, discussed more and produced more work than the chat group during the tasks of the case study. The analysis also shows that students of the VoIP group, in two of the three tasks, spent less effort to communicate than the group that used the textual chat.

Figure 1. Effort perception average for each task separated by group. Although only pairs of students were evaluated in the experiment, we were able to simulate and observe the interactions produced by small software development teams when they meet to elaborate UML diagrams. As our main focus in the experiment is to observe the communication, we note that when the quantity of users increase in the collaboration it is likely that both the workload in the system and the required mental effort to communicate also increase. To mitigate these issues it is necessary to employ methods to coordinate the dialogs and avoid the strain to hear and understand the speech.

4. Conclusions, Comments and Future Work


In this paper we presented a case study conducted to analyze how communication, provided by the VoIP technology, affects the collaboration during collaborative modeling sessions that require the manipulation of UML diagrams. The case study corresponds to a controlled experiment that collected data about the usage patterns, user interaction and attitudes. The qualitative analysis of the data showed that the communication with high quality audio was more elaborated, contributed to a richer discussion of the ideas and promoted a more smooth and useful conversation. The quantitative data produced evidences that the participants discussed and worked better when the audio is used. The study also provides an analysis of the perception of effort by reporting that users spent less effort when the audio channel was used. The evidences and the analysis of the data presented in the case study led to the conclusion that high quality audio does have advantages over text when promoting communication during modeling with CES in terms of

quality of the discussions, the amount of work produced, and effort made to communicate. Whereas the evidence produced by the analysis of the data suggest that high quality audio have more benefits than text during collaborative modeling, it is important to consider the limitations of the case study. The main limitations of the case study are the designation of only two students to co-work in order to produce UML diagrams, the lack of real world scenarios to model, the restriction of two types of UML diagrams used on the experiment and the focus on the analysis of tools effect instead of a focus on the effect of functionality affordance. Also, the number of pairs used for the groups in this experiment is too small and may contribute to some bias in the results. Future work includes the observation of how users coordinate their activities on other semantic modeling tasks and more detailed analysis of the data collected during the experiment, such as facial expressions and quality of the diagrams that were produced. The evaluation of the data in other collaborative scenarios such as: (i) group session with more than two participants, (ii) other kind of tasks, for instance, collaborative drawing or text editing is also a possible work, since the characteristics of other cases studies can increase the knowledge of usage patterns, user interaction and attitudes in a collaborative context. The research proposed in this paper creates precedent for a evaluation of high fidelity voice quality in collaborative modeling. The quantitative and qualitative results presented can encourage the developers of collaborative applications to consider the use of VoIP technology in their prototypes and experiments.

References
[1] R.M. Araujo, F.M. Santoro and M.R.S. Borges, The CSCW Lab for groupware evaluation, Proc. of the 8th Collaboration Research International Workshop on Groupware, La Serena, Chile, Sept. 1-4, 2002, pp. 222231. [2] C. Damm and K. Hansen, An Evaluation of Workspace Awareness in Collaborative Gesture-based Diagramming Tools, Proc. of the 2004 HCI Conference, Leeds, UK, Sept. 6-10, 2004, pp. 25-50 [3] X. Ding, T. Erickson, W.A. Kellogg, S. Levy, J.E. Christensen, J. Sussman, T.V. Wolf and W.E. Bennett, An Empirical Study of the Use of Visually Enhanced VoIP Audio Conferencing: The Case of IEAC, Proc. of the 2007 Conference on Human Factors in Computing Systems, California, USA, Apr. 28 - May 3, 2007, pp. 1019-1028. [4] C. Fidas, V. Komis, S. Tzanavaris and N. Avouris, Heterogeneity of Learning Material in Synchronous Computer-supported Collaborative Modelling, Computers & Education, 2005, 44(2), 135-154. [5] R.E. Grinter and M.A. Eldrigde, y do tngrs luv 2 txt msg?, Proc. of the 7th European Conference on Computer Supported Cooperative Work, Bonn, Germany, Sept. 16-20, 2001, pp. 219-238.

[6] M. Handel and J.D. Herbsleb, What Is Chat Doing in the Workplace?, Proc. of the 9th ACM Conference on Computer Supported Cooperative Work, Lousiana, USA, Nov. 16-20 ,2002, pp. 1-10. [7] J.D. Herbsleb, D. Atkins, D.G. Boyer, M. Handel and T.A. Finholt, Introducing Instant Messaging and Chat into the Workplace, Proc. of the 2002 HCI Conference, Minneapolis, USA, Apr. 20-25, 2002, pp. 171-178. [8] R.E. Kraut, R. Fish, R. Root and B. Chalfonte, Informal Communication in Organizations: Form, Function, and Technology. Claremont Symposium on Applied Social Psychology, California, USA, Feb. 10, 1990, pp. 145-199. [9] M.C. Pichiliani and C.H. Hirata, A Guide to Map Application Components to Support Multi-user Real-time Collaboration, Proc. of the 2nd International Conference on Collaborative Computing: Networking, Applications and Worksharing, Georgia, USA, Nov. 17-20, 2006. [10] T. Schliemann, T. Asting, A. Folstad and J. Heim, Medium Preference and Medium Effects in Personperson Communication, Proc. of the 20th ACM Conference on Human Factors in Computing Systems, Minnesota, USA, Apr. 20-25, 2002, pp. 710-711. [11] J. Scholl, J.D. McCarthy and R. Harr, A Comparison of Chat and Audio in Media Rich Environments, Proc. of the 11th ACM Conference on Computer Supported Cooperative Work, Alberta, Canada, Nov. 4-8, 2006, pp. 323-332. [12] Skype, 2008, http://www.skype.com, Visited at 04/02/2009. [13] N. Yankelovich, J. Kaplan, J. Provino, M. Wessler and J.M. DiMicco, Improving Audio Conferencing: Are Two Ears Better than One? Proc. of the 11th ACM Conference on Computer Supported Cooperative Work, Alberta, Canada, Nov. 4-8, 2006, pp. 333-342. [14] S. Whittaker, D. Frohlich and O. Daly-Jones, Informal Workplace Communication: What is it Like and How Might we Support it?, Proc. of the 12th ACM Conference on Human Factors in Computing Systems, Massachusetts, USA, Apr. 24-28 , 1994, pp. 131-137.

You might also like