Professional Documents
Culture Documents
192
applied in an attempt to better predict the website's tures are extracted from the HTML pages regarding the
comprehensibility level. Finally, we validate our anal- graphical contents such as the counts of images, counts
ysis based on a larger data set containing 800 website of animated artworks and number of audio or video
data points. clips in the web page, or maximum height and width of
the images.
3. Comprehensibility implications While the three measures above focus on informa-
tion contents of web pages, another three measures are
The main interest in comprehensibility is the extrac- used to examine the internet construction values of the
tion of the instructional value of the information con- websites.
tent in the web pages and the related linked web pages. x Comprehensibility is also enhanced by the
The instructional value is a measure of the knowledge ease of focusing attention on the most relevant infor-
contained within the page, including the degree of ap- mation of a website. The consistency and uniformity of
plication, analysis, synthesis, inquiry or any other narr- the presentation of the pages on the website should add
ative that provokes learning activity. to comprehensibility, while the arbitrary use of colors,
Comprehensibility is the degree to which a group of fonts, background, images distracts from the ease of
web pages provide direct access to the substance of the reading the text. Affective Attention (AA) rating is
information in the hypertext space without distractions. determined by evaluating the format, appearances and
A hypertext space is the web page being viewed plus aesthesis. 35 features regarding text formatting and
all related linked pages which are necessary for the page formatting such as number of words that are
reader to understand the information within the related bolded, italic or capitalized and the presence of style
web pages. The reader should easily be able to deter- sheets are examined.
mine where they are in the hypertext space relative to x Comprehensibility is evidenced by web pages
where they started. We conceive that the Web compre- that allow for ease of linking through and selectively
hensibility is dependent on the following aspects of the discovering the meaning relevant to the reader. The use
websites: Information Value, Information Credibility, of short paragraphs, bullet points, tables, or other
Media Instructional Value, Affective Attention, Organ- summary presentation allowing for quick scanning of
ization and Usability. the information to find the central ideas should also
x Comprehensibility encompasses ease of find- add to comprehensibility. Organization Structure (SO)
ing and understanding of the concepts presented as- indicates the effectiveness of navigation (uses of list,
suming that the reading level of the text is equal or less tables, headings, and links) and website contents and
than the reading level of the evaluator. Information layout design consistency. Related features are maxi-
Value (IV) checks the readability and information mum crawling depth for a site, the number of hyper-
richness and completeness in general. We compute 38 links in a page, the counts of page files in different
features such as the number of words in the title, in types (PHP pages, ASP pages and TXT documents,
meta contents, in the body text; and a number of reada- and etc.) and computed variances of the page-level
bility indexes such as Fog-Gunning, SMOG readability, features.
Flesch-Kincaid readability. For example, Fog-Gunning x Usability (UA) is established using 15 fea-
index is computed according to formula: tures. These features look at average downloading time
(words_per_sentence + percent_complex_words) * 0.4. for a page (indicating whether the web page load
x Comprehensibility also includes the sense of quickly) and ease of use (by examining the use of
credibility and trust and access to the source, author forms, framesets, and etc.). The usability and accessi-
and date of the information being presented. Informa- bility of the website contribute to comprehensibility. If
tion Credibility (IC) examines the knowledge of the a site is not easy to use or cannot be adjusted for acces-
information sources, authority of information and the sibility then its comprehensibility is diminished.
correctness of information. 16 features including the We examined a large pool of quantitative computa-
counts of HTML syntax errors and warnings reported ble features which are supposed to provide sufficient
by HTML Tidy [12], and number of images indicating conceptual equivalence to the above heuristic used by
advertisements are computed. Also, whether copyright human evaluators when rating the websites. In total,
information and date of last update are present are ex- 191 features are computed to quantify the heuristics,
amined. but because of the limited space of this paper, full de-
x The Media Instructional Value (MV) is used scriptions of the 191 features will not be presented in
to evaluate whether the use of graphics, icons, anima- this paper.
tion, or audio enhance the clarity of the information
and necessary to communicate the concepts. 25 fea-
193
4. Data collection page not found), 540 websites are included in our anal-
ysis.
We downloaded the most recent version of websites
in a pre-compiled list of 800 URLs that are relevant to
Science Technology Engineering and Mathematics 5. Analytical modeling
(STEM) topics. The downloading was restricted to
HTTP and HTTPS protocols. Instead of creating a Our major objective in this study is to automatically
complete copy of a website, we control the quota of determine website comprehensibility by computing a
downloading from a site up to 50 megabytes, and set set of website features. We accomplish this by model-
the number of maximum levels for the Breadth-First- ing relationships between the website comprehensibili-
Search to be 5. We do not save a copy of multimedia ty and the set of page-level and site-level features. We
files such images, audios and videos, as we do not per- use regression analysis to construct the mathematical
form multimedia processing in this study. Among these models that can best predict the website comprehensi-
800 target sites, 69 websites fail to download due to bility. Regression analysis is the most widely used me-
problems such as inactive hyperlinks, and so, we suc- thod to both describe and predict one variable (depen-
cessfully downloaded 731 websites that are listed on dent variable) as a function of a number of independent
the given entry page sheet. In total, around 0.7 million variables (explanatory or predictor variables) from
web pages are downloaded, averaging at about 1000 observed or experimental data [13]. The general form
pages per site. of our problem is ݂ = ݕሺݔ1 , ݔ2 , ǥ , ݊ݔሻ, modeling the
Four professional librarians applied their judgment comprehensibility ݕas a function of ݊ computed fea-
in the review and evaluation of these websites. The ture variables ݔ1 , ݔ2 , ǥ , ݊ݔat page level or site level.
outcome serves as a good approximation to the gold A feature vector is constructed for each web page
standard, because, in their day-to-day work, the libra- first. The features of the pages within a website are
rian evaluators interface with the public to select a then aggregated to produce a site-level feature vector
broad range of appropriate websites and pages for according to the topological structure of the site. The
people interested in learning about a topic. The review topology structures are inferred from the linkage struc-
process consists of accessing a website page, finding ture of the documents. When there is a hyperlink point-
and reading the central concept, linking it to related ing from page ݅to page ݆, a directed path between
pages as necessary to understand the central concept, node ݅ and node ݆ is said to exist. The pages and the
and evaluating the website for adequacy for learning or linkage between them thus comprise a directed graph
instructional purposes. The librarian evaluators rated for a website. We mimic the browsing behavior of a
25 to 50 websites in a trial period to understand the learner by starting from an entry page (the first page
evaluation process and the criteria. from which a learner starts to navigate the site), and
The librarians evaluate a website for each of the six then pick a hyperlink to jump to another page. The
criteria: Information Value, Information Credibility, probability of a learner visiting a particular page is
Media Instructional Value, Affective Attention, Organ- approximated by a geometric function of the minimum
ization and Usability. And finally, an overall rating is number of hops to that page having started from the
given to each site indicating the comprehensibility of entry page. The minimum number of hops is computed
the hypertext space presented by the website. The rat- from the constructed topological graph with a shortest
ings are scored on a 1-5 Likert scale, one indicating the path algorithm. Denoting the minimum number of hops
lowest score for each criterion while five the highest between the entry page of site ݅ and page ݆ as ݆݅ݏ, we
score. Approximately each librarian reviewed four assume that the probability of browsing a page is ߙ ݆݅ ݏ,
hundred websites in four months, with an average of where ߙ might take any fractional value, so that the
10 minutes allocated per website. Obviously they probability is within [0, 1]. The computer program thus
sample a relatively small set of pages instead of re- computes the site-level feature vectors by aggregating
viewing every page within that site. The review dura- the page-level features with a weight factor for the ݆݄ݐ
tion is sufficient to meet our objective and is typical of page of ߙ ݆݅ ݏ. Therefore, the site-level features are
many real settings where judgments regarding a web-
aggregated the way that: ݊ features = ݅( ݅ݔ
site are often made very quickly by Web browsers.
1, 2, ǥ , ݊) for a website ȱ is a weighted summation of
Each site is evaluated by at least two evaluators, and
the rating results are averaged to represent the final its page-level features = ݆( ݆݅ݔ1, 2, ǥ , )for pag-
scores of the site. Excluding the websites that are es in ȋ according to Equation (1) as shown below.
skipped by the evaluators due to various reasons (e.g.,
194
݆ ݏ
σ݆ =1 ߙ ݆݅ ݔ Table 1. Backward linear regression results
= ݅ݔ ݏ (1) with varying alpha value
σ݆ =1 ߙ ݆
Alpha Adjusted R Std. Error of
When ݆݅ݏtakes the value of zero, i.e., page ݆ is ac- (ࢻ) Square Estimation
tually the entry page, only the entry page is included in 0.0 0.259 0.88547
the model, ݆݅ݔ = ݅ݔ, as the weighting factor becomes 0.1 0.272 0.88492
zero for all the rest pages. 0.2 0.271 0.88220
We model the relationship between the web com- 0.3 0.293 0.88537
prehensibility and the feature vectors by regressing 0.4 0.303 0.89559
from the data set that has been evaluated by human 0.5 0.318 0.81282
evaluators. The models are inferred with both a linear 0.6 0.325 0.79087
and a non-linear regression technique. The effects of 0.7 0.343 0.75768
different ߙ values on the predictive power of the linear 0.8 0.378 0.75281
model are also discussed. 0.9 0.419* 0.69883
1.0 0.437 0.64872
*ANOVA analysis (model fitness when ࢻ = 0.9)
5.1. Linear regression modeling Sum of Squares = 233.098
Degree of Freedom = 81
The general form of a multiple linear regression
Mean Square = 2.878
model with k independent variables is given by
F value = 5.746
ߚ = ݕ0 + ߚ1 ݔ1 + ߚ2 ݔ2 + ǥ + ߚ݊ ݊ݔ+ ߝ , where
Sig. = .000
ߚ0 , ߚ1 , ǥ , ߚ݊ are the regression coefficients that need
to be estimated, and İ is a random error term. ݕis the In the above aggregation model, we take every page
comprehensibility scale of a website to predict, and in a website into our analysis. All the pages will con-
there are ݊ independent variables ݔ1 , ݔ2 , ǥ , ݊ݔ tribute to characterizing the particular website accord-
representing ݊ features that are computed for the cor- ing to a weighting factor. However, parsing nearly
responding website. 1000 pages for each website is relatively computation-
A base regression model consists of 191 feature va- ally expensive, therefore only a subset of the pages are
riables. There are 540 labeled data points available used to lower the cost. First, we analyze the entry pag-
from observation. The multiple linear regression model es only, that is, the page we start from when browsing
has the rating values as dependent variables, and the the particular site. When alpha goes to zero, the weight
191 features are independent variables. Four outliers of pages other than the entry page goes to zero, so only
were removed by eliminating the data points with stan- the entry pages are considered in this case. The result is
dardized residuals outside the outlier cutoff point (plus shown as the first row of Table 1 (Adjusted R Square =
or minus 2.5). The regression produced a model with 0.259). Second, instead of parsing a single page, pages
191 predictors with an Adjusted R Square at 0.356 (ߙ indicating characteristics as a backup entry page is also
= 0.9). considered (aggregated features are the average of the
A backward selection procedure then was used to set of pages). For example, http://abc/index.htm,
search the optimal feature subset. It begins with all http://abc/index1.asp or http://abc/default.html are all
predictor variables in the regression equation and then parsed along with http://abc/index.html. On average, 5
sequentially removes them with a removal criterion pages for each website are computed. The following
specified (the entry criterion is that the significance of table shows the linear regression statistics for the
F value <= .050 and removal criterion is that the signi- second case. We actually saved about 80% computing
ficance of F value >= .100). In our case, backward se- power, while the predictive power of the test indicated
lection works better than the forward selection and by Adjusted R Square is 0.302.
stepwise selection methods because of the dependency
between a few features. For example, the sizes of im-
age files are approximated by the production of the
height and width of the images. The resulting linear
model contains 81 features, with an Adjusted R Square
at 0.437 (ߙ = 1.0).
195
Table 2. A linear regression prediction with a that have been eliminated from the backward selection
subset of pages procedure in linear regression were not included. Each
R R Adjusted R Std. Error of feature is linearly scaled to the range [0, 1], to avoid
Square Square Estimation attributes in greater numeric ranges dominate those in
0.655 0.430 0.302* 0.76880 smaller numeric ranges, and also avoid numerical dif-
*ANOVA analysis (model fitness analysis) ficulties during the calculation. We employed the epsi-
Sum of Squares = 117.088 lon-SVR with a Radial Basis Function (RBF) nonlinear
Degree of Freedom = 63 kernel. A detailed technical discussion on epsilon-
Mean Square = 1.985 SVR can be found in [15]. Parameter selection (model
F value = 3.358 selection) is essential for obtaining good SVR models.
Sig. = .000 We conducted a grid search for optimal parameters
through the parameter space of cost parameter (c), the
The linear model built shows how each feature con- epsilon in loss function (p), and gamma in the kernel
trLEXWHVWRWKHSUHGLFWLRQRIWKHZHEVLWH¶VFRPSUHKHQVi- function (g). The resulted parameters are then used in
bility. The effects are discussed by feature categories, the regression process, and they are shown under Table
as an individual feature does not explain the variations 4.
of the dependent variable. By running linear regression V-fold cross-validation is used to evaluate the mod-
on the feature sets by the categories we discussed in el fitness. V-fold cross validation chooses v random
Section 3, we notice: 1) features in MV category, partitions of the data set such that v-1out of v portions
mainly graphic elements and formatting features, are are used for SVR training and the last portion held
most closely correlated with our dependent variable. 2) back as a test set. Table 4 summarizes the SVR regres-
Text elements features such as readability indexes, sion results with 10-fold cross validation.
number of words are also factoring an important role in When comparing the Adjusted R Square of SVR
the evaluation. However, the numbers of features in with that of the linear regression at 0.419 using the
different categories are not even, i.e., a category may same input space (ߙ = 0.9), the nonlinear model shows
contain a larger number of features than another, there- an obvious higher predictive power than the linear
fore, comparing the prediction power of each category model. The only way to explain this is that some fea-
needs more caution. The regression results are shown tures contribute to the comprehensibility nonlinearly.
in Table 3. However, due to the difficulty in deciphering the black
box solutions generated by SVR models, how different
Table 3. Linear regression prediction with web characteristics contribute to the comprehensibility
features by categories (ߙ = 0.9) of a website cannot be known. Linear models can be
Feature R R Adjusted Std. Error analyzed instead to shed light on the relations between
Category Square R of Estima- comprehensibility and web characteristics as shown
Square tion earlier.
MV 0.499 0.249 0.218 0.82070
IV 0.486 0.236 0.197 0.83163 Table 4. Cross validation SVR experiments
SO 0.487 0.237 0.142 0.85970 (n = 536, k = 102)
AA 0.391 0.153 0.111 0.87498 Number R Square Adjusted R
UA 0.332 0.110 0.091 0.88471 of folds Square
IC 0.320 0.102 0.075 0.89275
10 0.658 0.5774
-c 16.0 -g 0.0078125 -p 0.00390625
5.2. Support Vector Regression
6. Conclusion and Future Work
Experiments employing support vector regression
(SVR) from the open source package of LIBSVM [14] Self-directed learners seeking Web content that they
are also conducted. SVR based on statistical learning is can easily read and understand, (i.e. content with some
a useful tool for nonlinear regression problems. Nonli- instructional value, and within a comprehensibility
near relation that may exist between the comprehensi- level that satisfies their learning objective), are chal-
bility score and the feature vectors will thus be cap- lenged to quickly evaluate a website they find through
tured with the SVR method. typical search engine results, therefore, would benefit
Our input space contains 536 vectors in 81 dimen- from an automated evaluation of website comprehensi-
sions (the dataset with an alpha at 0.9). The features bility. Our research uses an analytical approach to im-
196
proving information retrieval process for self-directed [3] Fogg, B., et al. "What Makes Web Sites Credible? A
learners by automatically evaluating web site compre- Report on a Large Quantitative Study". SIGCHI'01. 2001.
hensibility using web page characteristics shown to be Seattle, WA, USA.
most indicative of website rated high on comprehensi- [4] NIST, Web Static Analyzer Tool (WebSAT). 2002.
bility by professional librarians. We developed the
[5] Brajnik, G. "Automatic web usability evaluation: Where
artifact that quantitatively measures a large group of
is the limit?" Proceedings of the 6th Conference on Human
page-level and site-level features, and deducted analyt- Factors and the Web. 2000. Austin, TX.
ical models based on our search for a set of optimal
metrics that helps evaluate website comprehensibility. [6] Ivory, M.Y. and M.A. Hearst, "State of the art in automat-
The analytical model developed was rigorously eva- ing usability evaluation of user interfaces". ACM Computing
Surveys, 2001. 33(4): p. 470±516.
luated to see how well its assessment of the website
comprehensibility compares with the evaluations made [7] Ivory, M.Y. and M.A. Hearst, "Improving Web Site De-
by librarians. Predictive performance of both a linear sign". IEEE INTERNET COMPUTING, Special Issue on
model and a nonlinear model based of SVR is reported. Usability and the World Wide Web, 2002. 6(2).
The linear model is easier to interpret, while the SVR [8] DuBay, W.H., The Principles of Readability. 2004.
model is superior to the linear model with 16% higher
predictive power. With about 60% variations of the [9] Salmerón, L., J.J. Cañas, and W. Kintsch, "Reading Strat-
egies and Hypertext Comprehension". Discourse Processes,
comprehensibility of a website explained by 81 meas-
2005. 40(3): p. 171-191.
ured Web characteristics with the SVR model, we see
the developed artifact an effective and reliable solution [10] Díaz, P., M.-Á. Sicilia, and I. Aedo. "Evaluation of
to the comprehensibility prediction problem. Hypermedia Educational Systems: Criteria and Imperfect
However, the comprehensibility scoring is not de- Measures". International Conference on Computers in Edu-
FDWLRQ,&&(¶. 2002.
terministic but still remains useful when applied ap-
propriately. The scoring would highlight when a web [11] Ma, J., Z. Zhang, and R. Garcia. "Automatically Deter-
page may be a challenge to access and comprehend. mining Web Site Comprehensibility". The 16th Workshop on
Re-sorting search results based on comprehensibility Information Technologies and Systems (WITS 2006). 2006.
scoring would benefit the learner by presenting web- Milwaukee, WI, USA.
sites with higher probable comprehensibility. The [12] Raggett, D., HTML Tidy for Linux/x86 released on 1
comprehensibility scoring is useful for making a quick September 2005, HTML Tidy Project Page:
initial determination for volumes of web pages or for a http://tidy.sourceforge.net/.
specific web page that a learner desires to assess. [13] Kleinbaum, D.G., L.L. Kupper, and K.E. Muller, Ap-
So far in our preliminary study, we only conducted plied Regression Analysis and Other Multivariate Methods.
the experiments on regressing for the overall rating. Second Edition ed. 1988: PWS-KENT Publishing Company,
Experiments will be conducted for each of the sub- Boston.
category ratings in the near future. Future research will
[14] Chang, C.C. and C.J. Lin, LIBSVM: a library for sup-
also explore the possibility of filtering out of websites port vector machines. 2001.
that may be addressing specific audiences where the
content is not instructional. For example, evaluating e- [15] Smolay, A.J. and B. Scholkopfz, "A tutorial on support
commerce site on comprehensibility may be of limited vector regression". Statistics and Computing, 2004. 14: p.
199-222.
value and therefore identifying these categories of
website for removal from comprehensibility scoring
may be necessary. Lastly, an interesting application of
the comprehensibility scoring would be to include it
within a focused crawler which finds the link path with
the highest comprehensibility given a specific topic.
References
197