You are on page 1of 10

Published in the Proceedings of

A Case Study on the Process Productivity Model for Software Projects in Korea
In-kyoung Hur, In-soo Hwang, David Yu

Abstract
Derived measures like simple Source Lines of Code per Person Month or Function Point per Person Month do not explain enough about the productivity of a project or an organisation. To complicate the matter, many variables interact with each other in a nonlinear way to influence the productivity during project lifecycles. In this paper, we search for the patterns of software productivity using some real project data from Korea. These patterns and trends will be used to construct and validate a project productivity or process productivity model. Process productivity is a notion introduced by Lawrence H. Putnam. It is characterized by the two factors called productivity parameter (PP) and productivity index (PI). Interesting patterns can be observed in these factors when they are classified by project properties like application type, application complexity, etc. This paper tries to find a relationship between project size, effort, and time using the Korean project data. Along the way, the corresponding PP and PI values will be derived. Also, the variables affecting the software productivity in Korea will be identified through a non-linear regression analysis. Simply put, the following topics are covered: a method of constructing a process productivity model; the results and trends implied by the model; the validation results obtained by applying the model on test data; and the lessons learned along the way.

1. Introduction
Software companies in Korea traditionally represented their development productivity by finding a ratio between a volume of development output and effort, as in number of programs per person-month. This tendency, however, has been changing since year 2004 after the adoption of function point analysis by the Korean Government. Function points are fast becoming the choice of functional sizing because it is deemed to be more logical than physical sizing techniques. Additionally, a total development effort can be estimated at an early phase by converting functional user requirements into function points and multiplying it by a productivity metric based on function points as in FP per person-month. For this reason, numerous organisations in Korea apply function points to estimate their project costs. Notably, the Korean Government has been guiding that all public sector IT projects should apply function points to negotiate project costs. This act has proliferated the usage of function points in Korea as the preferred measure of quantifying the development size and productivity. Nevertheless, the productivity metric based on function points per person-month has its own limitations. Namely, project durations cant be estimated easily; activity-level efforts are hard to estimate; and various productivity factors like application complexity, quality level, team skills, and project environments are often not considered. A care must be taken in using the metric because it may fail to reflect all the special and unique characteristics of a project. So with that being said, this research aims to introduce a customized productivity model that can alleviate the stated problems and adequately reflect the uniqueness of Korean project environments. To do this, a number of finished IT projects in Korea were analysed to derive certain development process patterns.

305

Published in the Proceedings of

Specifically, this research has customized Putnams process productivity model for the Korean environment and validated it over a test data to check its performance. The crux of Putnams productivity model is the productivity indexes (PI) that are matched against certain intervals of productivity parameter (PP).

2. Process Productivity Overview


In the mid 1970s, Lawrence Putnam proposed a different idea on defining software development productivity, which differed in many ways from the traditional concept. In Putnam opinion, the right way to represent software development productivity is to construct a process-oriented, empirical model that can reflect actual aspects of software project. At the time, the following ideas were prevalent in the study of software metrics: software development productivity is a ratio between a product size and an effort; effort and time are linearly related; and effort and time really represent the same value. Putnam, however, considered that these ideas may not be entire true. In his view, software development productivity must consider a time factor as an independent variable, and the size, effort and time were correlated in a non-linear way. Putnam analysed the project data from Army Computer Systems Command and discovered the following trend. He summarized the trend in Equation below. Size = Productivity Parameter * (Effort/B)(1/3) * Time(4/3) (0.16 < B < 0.39) In the equation, Putnam proposed that the effort is proportional to the 4th power of the time. The equation was checked by fitting the curve on the large sample data and it yielded a significant P- value, which strongly supported the empirical relationship [1]. Putnam also stated that the process productivity is influenced by various productivity factors like application type, application complexity, and development pattern. Some typical productivity factors are described below: The state of project management practices. The level of requirements engineering, design, coding, inspection and testing. The level of application language. The state of technology, such as software tools, development equipment, and machine capabilities being applied - often termed the 'software environments'. The skill and experience level of team members. The application complexity. To represent a level of productivity, Putnam defined the two factors called productivity index (PI) and productivity parameter (PP). Productivity parameters (PP) were grouped into discrete intervals and each interval was assigned a PI. In his original research, PP extended from 754 to 3,524,578, and PI extended from 1 to 36. PP grows exponentially as PI increases. For example, an increase of PI from 5 to 6 will result in an increase of PP value by 600; an increase of PI from 33 to 34 will result in an increase of PP value by 500,000. This indicates that in the lower ranges of process productivity, the development productivity improves little as PI enhances; and in the upper ranges of process productivity, the development productivity improves much as PI grows. Hence, this relationship asserts that the meaningful improvement in development productivity is ultimately traceable to and dependent upon the process productivity. The bottom line is that the process productivity must be upgraded first in order to increase PI.

306

Published in the Proceedings of

The Process productivity, which determines PP and PI values, is a concept grounded on various productivity factors like application complexity, developer and team skills, tools, techniques, and project environment in addition to project effort and duration. Thus, an empirical heuristic model should be used to represent the process productivity. The model should not rely on a pure mathematical theory alone [2].

3. Constructing a Korean Process Productivity Model


Based on a common sense that software projects in Korea are fundamentally analogous to those used in developing Putnams process productivity model, this research has adopted Putnam's framework to derive a Korean process productivity model. This model is discussed throughout the paper.

3.1. Dataset Information


In order to develop the model, 58 finished software projects were collected and analysed. These projects belong to domains like public, banking/finance, and manufacturing/services. The size, effort, and duration of these projects adhere to the normal distributions as shown in the figure below. For each measure, its normality was validated through Anderson Darling of Normality test. The resulting P-values were less than 0.005. For most projects, the size clustered around 1000 FP to 4000 FP; the effort clustered around 50 M/M to 100 M/M; and the durations ranged from 3 months to 18 months (Figure 1).
N u m b e rs o f P r o j e c t s

N u m b e r o f P r o je c t s

N u m b e r o f P r o je c t s

14 12 10 8 6 4 2 0

25

18 16 14 12 10 8 6 4 2 0

20

15

10

2000

2000

4000

6000 8000 10000

120

120

240

360

480

12

18

24

30

Size (FP)

Effort (Person-Month)

Duration (Months)

Figure 1: Distribution of Size, Effort, Duration: dataset of 58 observations A process productivity model should well represent unique characteristics of a particular environment that it is trying to model. Accordingly, this paper has researched some of the characterising features that are unique about the Korean IT industry. These are the findings. First, when it comes to setting a project deadline, Korean clients tend to put more emphasis in fulfilling management goals or winning targeted biddings rather than carefully considering project conditions. Although it is understandable that staying competitive is crucial for their business, such attitude is causing a neglect of project realities and is ultimately burdening ongoing projects. In general, project deadlines in Korea are set at only 1/3 of the default durations estimated by typical estimation tools. Non-critical development activities are usually dealt with little or no care due to such short project deadlines. Activities like requirements engineering, design, and testing are often neglected and many project documents are poorly documented. Defects are not removed to an acceptable level and systems are delivered without a proper quality.

307

Published in the Proceedings of

Secondly, Korean clients tend to adopt changes in technology more rapidly. Clients want to emphasize to their market that they possess the leading-edge IT infrastructure. Once the utility of a new technology is confirmed, Korean clients plunge into adopting it one after the other. This is a strange but unique trend of the Korean IT industry. As a result, the level of egovernment and enterprise IT infrastructure in Korea are more advanced than many other countries in the world. In order to keep up with such trend, new technologies are adopted early and become obsolete sooner in Korea. Thus, project turn-around times tend to be shorter in Korea. Also, there are many worried voices that executed budgets are wasted because redevelopment or improvement projects are common in Korea. This criticism is the rationale that the clients use to request budget cuts. Thirdly, there is a tendency that more weight is put on price over quality when choosing a contractor. This, however, does not mean that Korean clients do not expect quality. They often regard that quality is something already granted and choose a contractor with lowest price. Contractors development and quality assurance competencies are not scrutinized carefully. This often leads to delivered products that are considered inadequate in quality and that in which budgets spent are considered wasted. This has adversely affected the mind of Korean clients. Many times, the planned project budgets are inadequate. Over-competition and lowest price bidding habit is creating an environment where contracts are being awarded at an absurdly negative profit margin. Lastly, Korean clients frequently embark on a new project even though they do not fully understand the system requirements. Even worse, many contractors disregard such ambiguity and engage in the contracts anyway. As a result, enormous scope creep is common and actual costs often over-run the planned budgets. This leads to all the burdens being forced upon software developers. Since products must be delivered under such circumstances, overtime hours are inevitable. This research has collected a number of core measures, which reflect the conditions outlined above. Some of these measures are size (FP/SLOC), effort (person-months), and schedule (calendar month) measured from the Korean projects. Project team members collected the measures. The source of the data is an enterprise-level information system that stores and maintains the project data. An independent auditor validated each data. Also, numerous productivity factors like development technique, application complexity, and people skills were collected. These factors were measured by an independent metrics analyst and by surveying those who have participated in the projects.

3.2. Application of Process Productivity


The equation below implies that the effort is proportional to the 4th power of the duration. Also, the factor B is proportional to the application size. This indirectly hints that the effort is also being affected by the application size. Size = Productivity Parameter * (Effort/B)(1/3)* Time(4/3) (0.16 < B < 0.39) Putnams model chose source line of codes as the size measure. This paper, however, utilized function points instead. In our view, source line of code has a limited usage because it is a by-product of engineering oriented processes. In contrast, function points can be derived in all life cycle processes including analysis, design, control, and support. Hence, function points have wider life-cycle scope on which they can be applied.

308

Published in the Proceedings of

For this reason, this paper chose function points over source line of codes. Moreover, function points measure the quantity of functionality seen by the user whereas source line of codes measure level of engineering and technical work volume. Function points provide a more logical view of work volume [4]. Figure 2 displays the correlations between the actual and estimated application size after applying the Korean version of Putnams equation. The relationship between the size derived from the Putnam model and the actual size was analysed. The Pearsons coefficient of correlation (R) was 0.77 (R2=0.59), indicating a significant correlation between the two variables. The resulting P value was 0.00. Applying the same logic, the relationship between the estimated and actual values of effort and time were analysed. The correlation on project duration was highly significant. Its Pearson factor (R and R2) came out to be almost 1. As for the project effort, the correlation was also meaningful. Its Pearson correlation factor (R) was 0.67 (R2=0.45). The trend shown in the figure strongly hints that Putnam's equation may be valid in Korea.
40000

R = 0.77 P value = 0.00


Actual S ize (FP)
30000

20000

10000

0 0 5000 10000 15000 20000

Estimated Size

Figure 2: The relation between Actual Size and Estimated Size Rearranging the equation above to separate the productivity parameter, the following relationship is derived. Productivity Parameter = Size / [(Effort/B)(1/3)* Time(4/3)] Putnams original model was based on the project data from US Department of Defence. Its PP values range from 754 to 3,524,578. The Korean version of PP values range from 192 to 8207. The Korean data points show the smaller variation in PP. This, however, is not adequate enough to conclude that the productivity variation is smaller in Korea. Other adjusting productivity factors like application complexity, level of tools applied, and level of people skills must be considered together in analysing PP values [3]. In Korea, the mean and variance of PP fluctuate depending on development technology and business domain. (Figure 3) Development technology is an overall technology level that incorporates language, design tool, analysis tool, architecture, middleware/server platform, and work process.

309

Published in the Proceedings of

For example, C/S 3 tier type usually utilizes a middleware; web standard type optionally uses a WAS server; and web J2EE type must use a WAS server. According to the left graph of Figure 3, the mean PP of web J2EE projects is lower than that of web standard projects. It indicates that web J2EE projects are generally more difficult and complex than web standard projects. Also, the PP values are varying by business domain. According to the right graph of Figure 3, the finance domain has the lower mean but has smaller variance. To summarize, the PP values change by application type in Putnams model. In Korean model, the PP values change by development technology and business domain (Figure 3).
Project Technology Type Product iv it y Paramet er (PP)

Application Domain Product iv it y Paramet er (PP)


4000

4000

3000

3000

2000

2000

1000

1000

C/ S 3 Tier

WEB Standard

WEB J2EE

Finance

Manufacture

Public

Service

Figure 3: The box plot of Projects by Development Technology and Business Domain Analysing the data, the derived PP values were mapped with the matching PI values. Similar to Putnams model, it was found that when the PI values are greater than or equal to 9, the corresponding PP values increase exponentially (Table 1). Table 1: An index was assigned to each cluster of process productivity parameters Productivity Productivity Project Standard Index Parameter Technology Type Deviation 1 57 2 154 3 243 4 378 5 474 Web J2EE +/- 2 6 608 Web Standard +/- 5 7 654 8 706 9 784 10 1,129 11 1,279 12 2,675 After analysing the PI distribution by development technology, it was found that web J2EE projects are clustered around the PI values of 3 to 7 with 5 being the mean. As for web standard projects, the PI variance is spread across 1 to 11 with the 6 being the median. Projects under C/S 3 Tier, C/S 2 Tier, and Host categories did not show any particular pattern

310

Published in the Proceedings of

(left graph of Figure 4). By business domain, the banking/finance domain had a lower mean and smaller variance. The variance is between 4 to 6 (right graph of Figure 4).
Productivity Parameter (PP)
2000
Business Domain Finance Manufacture Public Service

Web General Web J2EE

1500

1000

500

10

12

Productivity Index (PI)

Figure 4: The PP and PI by Development Technology and Business Domain

3.3. Application of Rayleigh Curve


Since project effort and duration are affected by environmental factors, Putnam decided that an empirical model is more suitable than a pure mathematical one when constructing an estimation model. Putnam chose Rayleigh curve to base his empirical model. The two equations below estimate the current and cumulative manpower. 1. Cumulative Manpower Utilisation Y = K[1-eR] where R = -at2, K = 1, a = 0.02 2. Current Manpower Utilisation Y' = 2KateR where R = -at2, K = 1, a = 0.02 The pattern of Rayleigh curve derived from the Korean data does not differ much from that of Putnams model. As in an exponential function, the distributions of current and cumulative manpower become inversely proportional to the project progress. The only differentiating pattern was that for the Korean version, more manpower is dissipated on coding activities rather than on analysis and design compared to Putnams. Using the collected data, a non-linear regression analysis was performed. A Korean version of cumulative manpower curve was derived as in the equation below. As shown in Table 2, the parameters of the equation, K and a, were calibrated to reflect the Korean environment. The derived curve was indeed very similar to that of Putnams. Cumulative Effort Y = K[1-eR] where R = -at2, K = 94, a = 0.05 Table 2: Non-linear Regression Summary Statistics Asymptotic 95 % Confidence Interval Estimate Std. Error Lower Upper 94.35 2.19 90.01 98.70 .0447 .0031 .0385 .0511

Parameter K a

In order to determine the significance of the regression equation, its F statistics must be observed. The calculated F value was 2,371 and the significance value was 0.000. Moreover, the analysis yielded the adjusted R squared value of .84374, which is close to 1. This indicates that the derived regression equation is very significant.
311

Published in the Proceedings of

To check the model, a normality test was performed on the residues between the predicted and actual values. The residue analysis on the derived regression equation revealed the Pvalue of 0.002. This indicates that the equation is indeed significant. The Rayleigh curve based on the Korean projects and Putnams model showed a similar pattern. Table 3: Nonlinear Regression Summary Statistics-ANOVA Source DF Sum of Squares Mean Square F value Regression 2 635,455 317,727 2,371 Residual 138 18,552 134 Uncorrected Total 140 654,008 (Corrected Total) 139 118,733 R squared = 1 - Residual SS / Corrected SS = .84374 In Korea, fewer efforts are spent on analysis/design tasks and more efforts are spent on coding/implementation tasks. This was reflected on the Korean version of Rayleigh curve. The curve had its overshoot point much further down compared to Putnams curve (Figure 5). Software development projects in Korean put less emphasis on requirements engineering and embarks on coding even before a detailed design is ready. This is the main reason why the curve showed such pattern. Putnams model was based the sample projects that spent about 30~40% of total effort on testing activities like integration testing, preliminary verification, and final verification. In contrary, the Korean projects do not separate testing activities from coding. It is well known that the Korean projects generally spend less effort on testing [1]. This is common because project deadlines are usually very tight and requirements are not clear.
Effort Percentile
100% 80% 60% 40% 20% 0%

P value 0.000

Cumulativ e Manpow er Utiliz ation


Korea Ray leigh curv es

Effort Percentile
60% 50% 40% 30% 20% 10%

Current Manpow er Utiliz ation


Korea Ray leigh curv es

Putnam' s Ray leigh curv es

Putnam' s Ray leigh curv es

Phase

0%

Phase

4. Validation of Process Productivity Model


PI values are grouped by application type in Putnams process productivity model. In Korea, the PI values were grouped by development technology and business domain. The PP values were also customized to fit the Korean project environment. To check the utility of the Korean model, a validation was performed. The yielded model showed a limited performance in general, but still had a good performance when applied on web or C/S type projects.

Define Spec.

Figure 5: Cumulative, Current Manpower Utilisation by Project Phase

Analysis

Design

Code

Implementaion /Close

Define Spec.

Analysis

Design

Code

Implementaion /Close

312

Published in the Proceedings of

Table 4 outlines the results of applying the test data on the Korean process productivity model. The test data came from 4 finished projects and they were not included in the 58 projects that were used as a dataset. The test data was set apart intentionally from the outset for cross-validation. The Project 1 yielded 687 for the PP and 7 for the PI. Although this PI is near the mean value, it actually belongs to the higher end of mean because it is categorized as a web J2EE project. The PP and PI values of the remaining three projects did not differ much from those PI values in Table 1. In Table 1, the PI values were grouped by development technology. Table 4: The validated data for the process productivity model
Project ID 1 2 3 4 Technology Type Web J2EE Web Standard Web Standard Size 3,712 877 1,194 1,668 Effort(MM) 195 15 52 46 Time(Year) 0.69 0.48 0.58 0.65 B 0.28 0.16 0.16 0.16 PP 687 516 358 448 PI 7 5 3 4

As for the phase estimates, for example, the analysis phase had the estimated cumulative man-power of 18% of the total effort (Y=0.18) with the actual being around 15%, showing approximately 3% difference. The remaining three projects also showed little gap between the estimates and actuals. Table 5: Information of Projects for validation of Rayleigh curve
Project ID 1 2 3 4 Phase Analysis Design Coding Analysis Time 20 % 32 % 65 % 23 % Estimates 18 % 48 % 82 % 25 % Actuals 15 % 34 % 85 % 24 % Gap +3 % +14 % -3 % +1%

5. Conclusions
The strengths of process productivity model are that it is applicable across all project lifecycle and that it comprehensively covers various factors like duration, quality, and application complexity. Thus, the model can not only estimate total project efforts and duration, but it can also be applied to estimate other measures like process type, schedule, man-power, effort, and quality level. The process productivity model can be also used as an useful tool to establish project plans. This research hopes that the derived model can act as a catalyst in enhancing the objectivity of software estimation in Korea. As already stated, the Korean IT industry is still immature in its estimating practices because it still depends predominantly on expert judgment and analogy techniques. This research also hopes that the habit of creating detailed project plans using objective estimation results would prevail in Korea in near future. The process productivity model can be easily utilized as a guide to objectively assess the productivity levels. It could function as a useful benchmarking guide. Also, numerous hidden productivity factors can be uncovered in an effort to refine the model. These uncovered factors could provide useful insights that can lead to productivity improvements. This research lacked the defect data. A Korean version of Rayleigh curve that can estimate software quality could not be constructed. More detailed project plans could be developed if such Rayleigh curve can be found. A future research should be carried out to develop a Rayleigh curve that can estimate software quality.

313

Published in the Proceedings of

6. References
[1] [2] [3] [4] Lawrence H. Putnam, Ware Myers, Measures for Excellence reliable software on time, within budget, Yourdon Press, 1992 T.Capers Jones, Estimating Software Costs, McGraw-Hill, 1998 Walt Sccchi, Understanding Software Productivity, Software Engineering and Knowledge Engineering, Volume 4, 1995 T.Capers Jones, Applied Software Measurement, McGraw-Hill, 1996

314

You might also like