On Fraud Detection Method For Narrative Annual Reports

Proceedings of The Fourth International Conference on Informatics & Applications, Takamatsu, Japan, 2015
On Fraud Detection Method for Narrative Annual Reports

Yuh-Jen Chen
Department of Accounting and Information Systems
National Kaohsiung First University of Science and Technology
Kaohsiung, Taiwan, ROC.
yjchen@nkfust.edu.tw
ABSTRACT
1 INTRODUCTION
Annual reports present the activities of a listed

company in terms of its operational performance,
financial conditions, and social responsibilities.
These reports also provide valuable reference for
numerous investors, creditors, or other accounting
information end-users. However, many annual
reports exaggerate enterprise activities to raise
investor capital and support from financial
institutions, thereby diminishing the usefulness of
such reports. Effectively detecting fraud in the
annual report of a company is thus of priority
concern during an audit. Therefore, this work
develops a novel fraud detection method for
narrative annual reports to effectively detect fraud
in the narrative annual report of a company in order
to reduce investment losses and investor- and
creditor-related risks, as well as enhance investment
decisions. A developmental procedure of fraud
detection is designed for narrative annual reports.
Fraud detection-related techniques are then
developed for narrative annual reports, followed by
an experiment and evaluation of the proposed fraud
detection
method.
Fraud
detection-related
techniques for narrative annual reports consist
mainly of establishing a fraudulent feature term
library and clustering fraudulent and non-fraudulent
narrative annual reports. Moreover, establishing
fraudulent feature term library involves data
preprocessing, term-pair combination, and filtering
of fraudulent feature terms.
In addition to orienting investors the operational

performance, risks, and growth potential of an
enterprise, an annual report assures creditors
and suppliers of the debt payment capability of
an enterprise and facilitates governmental
auditing of company revenues for tax purposes.
An annual report also allows an enterprise to
reduce information asymmetry with information
end-users. However, many annual reports
exaggerate enterprise activities to raise investor
capital and support from financial institutions,
thereby diminishing the usefulness of these
reports. Effectively detecting fraud in the
annual report of a company is thus of priority
concern during an audit.
Various fraud detection methods for financial
reporting/statements have been developed
recently. For example, Kirkos et al. (2007) [1]
explored the effectiveness of data mining (DM)
classification schemes in detecting firms that
issue fraudulent financial statements (FFS).
That work also identified FFS-related factors.
Huang et al. (2008) [2] developed an innovative
fraud detection mechanism based on Zipf's Law
to assist auditors in reviewing the overwhelming
volumes of datasets and identifying potentially
fraudulent records. Debreceny and Gray (2010)
[3] examined emerging data mining applications
to identify fraud in journal entries. Glancy and
Yadav (2011) [4] proposed a computational
fraud detection model (CFDM), in which a
textual data approach detects fraud in financial
reporting. Pai et al. (2011) [5] devised a support
vector machine-based fraud warning (SVMFW)
model to reduce the related risks caused by
KEYWORDS
Narrative annual report, fraud detection, decision
support, support vector machine, queen genetic
algorithm
ISBN: 978-1-941968-16-1 2015 SDIWC
121
inexperienced auditors without expertise who

are the last line of defense to detect FFS. By
using data mining methods such as multilayer
feed forward neural network (MLFF), support
vector machines (SVM), genetic programming
(GP), group method of data handling (GMDH),
logistic regression (LR), and probabilistic
neural network (PNN), Ravisankar et al. (2011)
[6] identified companies that resort to financial
statement fraud. Based on linguistic credibility
analysis, Humpherys et al. (2011) [7] identified
fraudulent financial statements. Gupta and Gill
(2012) [8] proposed a data mining framework
to prevent and detect financial statement fraud.
By using a genetic algorithm (GA) and
MARLEDA-a
modern
estimation
of
distribution algorithm (EDA)-to evolve and
train several fuzzy rule-based classifiers
(FRBCs), Alden et al. (2012) [9] detected
patterns of financial statement fraud. However,
above studies focus mainly on detecting
numerical financial statement fraud rather than
concentrating on narrative annual report fraud
detection (e.g., reports to shareholders).
Therefore, the reliability and quality of
narrative annual reports do not necessarily
affect the decision benefits of narrative annual
report users.
Therefore, this work develops a novel fraud
detection method for narrative annual reports to
effectively detect fraud in the narrative annual
report of a company in order to reduce
investment losses and investor- and creditorrelated risks, as well as enhance investment
decisions. A developmental procedure of fraud
detection is designed for narrative annual
reports. Fraud detection-related techniques are
then developed for narrative annual reports,
followed by experiment and evaluation of the
proposed fraud detection method. Fraud
detection-related techniques for narrative
annual reports consist mainly of establishing a
fraudulent feature term library and clustering
fraudulent and non-fraudulent narrative annual
reports. Moreover, establishing a fraudulent
feature
term
library
involves
data
ISBN: 978-1-941968-16-1 2015 SDIWC
preprocessing, term-pair combination, and

filtering of fraudulent feature terms.
The rest of this paper is organized as follows.
Section 2 reviews the developmental procedure
for detecting fraud in narrative annual reports.
Section 3 then describes the developmental
procedure-related techniques. Next, Section 4
demonstrates the effectiveness of the proposed
method. Conclusions are finally drawn in
Section 5.
2 DEVELOPMENTAL PROCEDURE for
DETECTING FRAUD in NARRATIVE
ANNUAL REPORTS
This section describes the developmental
procedure for detecting fraud in narrative
annual reports, which consists of fraudulent
feature term library establishment and narrative
annual report clustering (Fig. 1). Establishing
fraudulent feature term library involves data
preprocessing, term-pair combination, and
filtering of fraudulent feature terms.
Meanwhile, clustering of narrative annual
reports identifies fraudulent narrative annual
reports.
(1) Establishment of a Fraudulent Feature Term
Library
Data preprocessing: The term set of nonfraudulent and fraudulent narrative annual
reports is extracted by using Chinese
Knowledge Information Processing Group
(CKIP System) [10] for word and sentence
breaking, word part-of-speech tagging,
stop-term filtering, and punctuation removal
(not including comma and full stop) of nonfraudulent and fraudulent narrative annual
reports.
Term-pair combination: The professional
terms in finance and accounting may be
broken up when executing the word and
sentence breaking for non-fraudulent and
fraudulent narrative annual reports. In this
122
case, accurate financial and accounting

terms cannot be extracted. Hence, these
segmented terms must be recombined
through the term-pair combination to ensure
the accuracy of professional terms.
Filtering of fraudulent feature terms:
Based on the established non-fraudulent
term set, fraudulent feature terms are
filtered to establish a fraudulent feature
term library in order to detect fraud in
narrative annual reports by using the term
frequency-inverse document frequency (TFIDF) [11][12].
(2) Clustering of Narrative Annual Reports
According to the established fraudulent
feature term library, fraudulent and nonfraudulent narrative annual reports are
identified through QGA-SVM clustering.
These identified fraudulent and nonfraudulent narrative annual reports are then
artificially confirmed with securities crime
sentences, empty and misappropriation, or
bounced checks of the chairman of the
board [13][14] for the training dataset of
fraudulent and non-fraudulent narrative
annual reports.
3 FRAUD DETECTION TECHNIQUES for

NARRATIVE ANNUAL REPORTS
Based on the developmental procedure
designed in Section 2 for detecting fraud in
narrative annual reports, this section introduces
techniques involved in the developmental
procedure, including data preprocessing,
term-pair
combination,
filtering
of
fraudulent feature terms, and narrative annual
report clustering, all of which are discussed in
the following subsections.
3.1 Data Preprocessing
In preprocessing narrative annual reports, the
CKIP system [10] of Chinese Knowledge
Information Processing Group is applied to
segment sentences into meaningful terms, tag
the part-of-speech characteristics of terms, filter
stop-terms (e.g., articles and prepositions), and
remove punctuations, respectively. Figure 2
depicts the algorithm.
Establishment of Fraudulent Feature Term Library

Download
Data Preprocessing
(CKIP System)
Non-Fraudulent Narrative
Annual Reports
Term-Pair
Combination
Keyword Filtering
Non-Fraudulent
Term Library
Finance and
Accounting
Corpus
Download
Fraudulent Narrative
Annual Repors
Market Observation
Post System of
Taiwan Stock
Exchange
Data Preprocessing
(CKIP System)
Term-Pair
Combination
Filtering of
Fraudulent Feature
Terms
Fraudulent Feature
Term Library
Clustering of Narrative Annual Reports

Narrative Annual
Reports
Narrative Annual
Report Clustering
Enterprises
Annual Reports
Fraudulent Narrative Annual Reports
as Training Samples
Annual Reports
Non-Fraudulent and Fraudulent

Narrative Annual Report Confirmation
Securities Crime
Sentence
Non-Fraudulent Narrative Annual Reports

as Training Samples
Misappropriation
Bounced Check of
Chairman of the Board
Figure 1. Developmental Procedure for Detecting Fraud

in Narrative Annual Reports
ISBN: 978-1-941968-16-1 2015 SDIWC
123
Annual Reports
Annual Reports
Sentence Breaking &

POS Tagging(Wj)
Terms from Fraudulent

Annual Reports
Terms from Non-Fraudulent

Annual Reports
CKIP System
First Term(FTi) Matching

Stop-Word(STi) Filtering
Finance and
Accounting
Corpus
Stop-Words
List
NO
NO
POS(Wj)=
Stop-Word(STi)
?
Wj=FTi
?
YES
YES
Second Term(STn) Matching
POS(Wj) Removal
Punctuation(Pm) Filtering
Punctuations
List
NO
NO
Wj+1=STn
?
YES
POS(Wj)=
Punctuation(Pm)
?
Term Combination
CTp=Wj+Wj+1
YES
POS(Wj) Removing
NO
Annual Reports

Annual Reports
Figure 2. Algorithm for Preprocessing of Narrative

Annual Reports
3.2 Term-Pair Combination

In breaking up terms from data preprocessing,
professional terms in finance and accounting
may be accidentally broken up, leading to
incorrect professional terms. Therefore, this
work designs a term-pair combination
algorithm to restore the broken up professional
terms in order to facilitate the filtering of
financial and accounting keywords (Fig. 3).
ISBN: 978-1-941968-16-1 2015 SDIWC
All Term(Wj)
Matching
?
YES
END
Figure 3. Algorithm for Term-Pair Combination
3.3 Filtering of Fraudulent Feature Terms

To filter fraudulent feature terms, the term
frequency-inverse document frequency (TFIDF) [11][12] calculates the fraudulent and
non-fraudulent terms acquired from fraudulent
and non-fraudulent narrative annual reports to
judge the importance of each fraudulent/nonfraudulent term for each fraudulent/nonfraudulent
document.
Moreover,
each
fraudulent term is matched with the nonfraudulent term library to remove nonfraudulent terms from fraudulent terms. Finally,
based on information gain [15][16], fraudulent
feature terms highly correlated with fraudulent
narrative annual reports are selected to establish
a library for fraudulent feature terms. Figure 4
illustrates the algorithm for filtering fraudulent
feature terms, where the equations for TF-IDF
124
and information gain are shown as Eqs. (1) and

(2), respectively.
IG(C E ) H(C)-H(C E )
C
H(C) - p(c i )log 2 p(c i )

i 1

Annual Reports
Term Frequency Calculation

(TF-IDF)
Non-Fraudulent
Term Library

Annual Reports
E
C
H(C E ) p(e j )- p(c i e j ) log 2 p(c i e j )

j 1
i 1
Term Frequency Calculation

(TF-IDF)
Term Matching
Term Selection
(Information Gain)
Fraudulent Feature
Term Library
Figure 4. Algorithm for Filtering Fraudulent Feature

Terms
(2)
where IG(C|E) denotes the information gain of
fraudulent/non-fraudulent
term
E
in
fraudulent/non-fraudulent correlated term class
C, H(C) denotes the entropy of fraudulent/nonfraudulent correlated term class C, H(C|E)
denotes the relative entropy of fraudulent/nonfraudulent term E in fraudulent/non-fraudulent
correlated term class C, p(ci) denotes the
probability of fraudulent/non-fraudulent correlated
term class C, p(ej) denotes the probability of
fraudulent/non-fraudulent term E; and p(ci|ej)
denotes the probability of fraudulent/nonfraudulent term E conditional on the occurrence
of fraudulent/non-fraudulent correlated term
class C .
3.4 Narrative Annual Report Clustering
TFIDFi, j TFi, j IDFi ; TFi, j
ni, j
n
k
n
IDFi log
df i
,
k, j
(1)
where
is the frequency of term i appearing
on a fraudulent/non-fraudulent document j,
is the frequency of term i appearing on
fraudulent/non-fraudulent documents,
is the
number of term i appearing on fraudulent/nonfraudulent document j,
is the total
number of all terms appearing on fraudulent/
non-fraudulent documents, is the total number
of fraudulent/non-fraudulent documents, and
is the number of fraudulent/non-fraudulent
documents with term i.
An attempt is made to accurately detect fraud in

a narrative annual report as a valuable reference
for investors, creditors, and other accounting
information end-users making decisions. The
fraudulent feature term set obtained in Section
3.3 is first calculated by using the weighted
method (Eq. (3)). Moreover, the weighted score
is regarded as the variable value for
establishing the data set. Furthermore, the
established data set is divided into a training
data set and testing data set for training and
testing the fraud detection model for narrative
annual reports.
Scorem
TFIDF
i, m
(3)
nm
where
represents the weighted score of
the fraudulent feature term,
represents the
total number of words in the m-th article, and
ISBN: 978-1-941968-16-1 2015 SDIWC
125
represents the product of term

frequency and inverse document frequency of
fraudulent feature term i appearing in the m-th
article.
Based on the training data set and testing data
set established after the weighted calculation,
the SVM parameters are adjusted and optimized
by using support vector machine (SVM)
[17][18] integrated with the queen genetic
algorithm (QGA) [19][20] to accurately detect
fraud in narrative annual reports. Figure 5
presents the algorithm and Eqs. (4), (5), (6) and
(7) show the related calculations.
Dm 1 M(qi di )
model. Equation (8) shows the formula for

weight voting.
H(x) arg max t (ln

(8)
where H(x) denotes the class index of QGASVM, ht(x,y) denotes the class index of SVM,
and t denotes the weight of SVM.
Finally, the testing data set is input into the
QGA-SVM clustering model to determine the
narrative annual report clustering results (i.e.
fraud or non-fraud).
Support Vector Machine Algorithm
(4)
F(d i ) rank(Dm 1 )
Testing Dataset
Sample Normalization
Parameters c and g
Selection
(6)
i 1
x- xi 2
K(x, xi ) exp 22
Non-Fraudulent
Narrative Annual
Reports
(7)
where f(x) represents the optimal decision

function, a represents the Lagrange multiplier, y
represents the class index of various indicators,
b represents the offset value, K(x,xi) represents
the RBF, and represents the parameter of
RBF.
SVM is acquired after many iteration times.
The weight voting for SVM is performed based
on the weight to generate the QGA-SVM
ISBN: 978-1-941968-16-1 2015 SDIWC
Queen Genetic Algorithm

Question Initialization and
Chromosome Generation
Dm {d1 , d 2 ,..., d n }
Calculation of Chromosome
Fitness
F (di )
The Fitness>Threshold ?
NO
YES
Radial Basis Functions
Kernel K(x, xi ) Selection
Optimal Decision Function

f(x) Training
f(x) sign( i yi K(x, xi ) b)
Training Dataset
(5)
Where
denotes the fitness value,
denotes the primal objective function,
denotes the randomly selected fitness function
in the optimal function sequence, and
denotes the randomly selected fitness function
in all function sequences.
1
) ht (x, y)
t
Fraudulent
Narrative Annual
Reports
Optimal Solution
(Optimal Parameters)
Selection of Queen Cohort

Qm
{q1 , q2 ,..., qn }
Random Selection of
Chromosome from Queen
Cohort qi
Random Selection of
Chromosome from Whole
Population di
Crossover
Dm1 qi di , is crossover
Mutation
Dm1 M ( Dm1 ), M is Mutation
Figure 5. Algorithm for Clustering Narrative Annual

Reports
4 EXPERIMENT and EVALUATION of the

PROPOSED FRAUD DETECTION
METHOD for NARRATIVE ANNUAL
REPORTS
126
This section describes implementation of the

fraud detection method for narrative annual
reports by using Visual Studio C#2010 and
Matlab R2010b. Additionally, an experiment of
the proposed method is also demonstrated using
the reports to shareholders of listed companies
in Taiwan. Moreover, the detection accuracy is
evaluated by comparing with other fraud
detection methods to demonstrate the
effectiveness of the proposed method.
4.1 Experiment of the Proposed Method
This section describes the experiment of the
proposed fraud detection method for narrative
annual reports, based on the reports to
shareholders of listed companies in Taiwan.
The detailed steps are presented as follows.
(1) Collect fraudulent/non-fraudulent narrative
annual reports
Based on the security-related crimes from
the regulations retrieving system in The
Judicial Yuan of the Republic of China Law
as well as major information published on
the market observation post system, 31
listed companies in Taiwan cited in
financial report fraud in 1995-2012 are
selected as the fraudulent companies.
Moreover, 14 fraudulent companies in
Taiwan defined as a misappropriation or
bounced check in the crisis events of the
basic database of the Taiwan Economic
Journal are also selected. Additionally, a
maximum of 3 companies in the same
industry of fraudulent companies with
similar general assets to the fraudulent
companies in which fraud occurred one year
or two years earlier are selected as the nonfraudulent companies. Therefore, 135 nonfraudulent companies in Taiwan are
selected.
According to the selected 45 fraudulent
companies
and
135
non-fraudulent
companies in Taiwan, their reports to
ISBN: 978-1-941968-16-1 2015 SDIWC
shareholders are retrieved and collected

from the market observation post system as
the fraudulent and non-fraudulent narrative
annual reports.
(2) Cluster reports to shareholders
The residual data samples are divided into a
training dataset and testing dataset for
clustering reports to shareholders, as shown
in Table 1.
Table 1. Sample Division for Clustering Reports to
Shareholders
Sample
Fraudulent
Reports to
Shareholders
Non-Fraudulent
Reports to
Shareholders
15
45
10
30
Training
Dataset
Testing
Dataset
The training dataset of reports to

shareholders clustering is input into the
proposed clustering model, QGA-SVM
(Fig. 5). Moreover, ten-fold cross validation
is made for training and testing the
clustering model. Table 2 lists the relevant
parameters and the settings of the clustering
model. Table 3 summarizes the results of
fraudulent and non-fraudulent reports to
shareholders clustering.
Table 2. Parameter Settings for the QGA- SVM
Clustering Model
Parameter
Name
QGA
Population
QGA
Evolution
QGA
Threshold
c and g of
SVM
Value Set
20
200
0.9
Based on the
results of QGA
127
Table 3. QGA-SVM Testing Results and Detection at a

Significance Level of 0.01
NonFraudulent
Fraudulent
Reports to
Reports to
Shareholders
Shareholders
10
30
Testing Sample
Total
Correctly
Identified
Incorrectly
Identified
P-Value
Detected Upper
at
0.01 level Lower
25
1.53x10-09
6
1.93x10-66
29
4.2 Evaluation of Clustering Accuracy

Based on the above samples, seven clustering
models (i.e. Decision Tree, Bayes, PNN, GridSVM, PSO-SVM, GA-SVM and QGA-SVM)
are compared and evaluated in terms of
clustering accuracy. According to Table 4, this
comparison obviously reveals that the adopted
clustering model (i.e. QGA-SVM) is superior to
models in precious studies in terms of accuracy.
Table 4. Clustering Accuracy Comparison
Clusterin
g Model
Decision
Tree
---
---
Bayes
---
---
PNN
---
---
GridSVM
PSOSVM
0.108
8
0.137
6
0.250
GA-SVM
3
QGA- 5.205
SVM
8
27.857
6
62.412
4
11.898
9
12.634
3
Elapsed
Accuracy
Time
73.3333
--%
70.0000
--%
78.3333
--%
2.35588 80.0000
0
%
7.95027 82.5000
7
%
2.71258 82.5000
5
%
2.94319 85.0000
3
%
5 CONCLUSIONS
ISBN: 978-1-941968-16-1 2015 SDIWC
This work proposed a novel fraud detection

scheme for narrative annual reports through
designing a developmental procedure of fraud
detection for narrative annual reports and
developing fraud detection-related techniques,
as well as an evaluation of the proposed fraud
detection method. The fraud detection-related
techniques contain data preprocessing, termpair combination, filtering of fraudulent feature
terms, and clustering of fraudulent and nonfraudulent narrative annual reports. Finally, this
study implemented a mechanism for detecting
fraud in narrative annual reports based on the
aforementioned techniques.
The results of this research facilitate the
realization of fraud detection for narrative
annual reports and the enhancement of narrative
annual report accuracy to reduce investment
losses and investor- and creditor-related risks,
as well as enhance investment decisions.
REFERENCES
[1] E. Kirkos, C. Spathis, and Y. Manolopoulos, Data
mining techniques for the detection of fraudulent
financial statements, Expert Systems with
Applications, Vol. 32, No. 4, 2007, pp. 995-1003.
[2] S. M. Huang, David C. Yen, L. W. Yang, and J. S.
Hua, An investigation of Zipfs law for fraud
detection, Decision Support Systems, Vol. 46, No. 1,
2008, pp. 70-83.
[3] R. S. Debreceny and G. L. Gray, Data mining
journal entries for fraud detection: an exploratory
study, International Journal of Accounting
Information Systems, Vol. 11, No. 3, 2010, pp. 157181.
[4] F. H. Glancy and S. B. Yadav, A computational
model for financial reporting fraud detection,
Decision Support Systems, Vol. 50, No. 3, 2011, pp.
595-601.
[5] P. F. Pai, M. F. Hsu, and M. C. Wang, A support
vector machine-based model for detecting top
management fraud, Knowledge-Based Systems, Vol.
24, No. 2, 2011, pp. 314-321.
[6] P. Ravisankar, V. Ravi, G. R. Rao, and I. Bose,
Detection of financial statement fraud and feature
selection using data mining techniques, Decision
Support Systems, Vol. 50, No. 2, 2011, pp. 491-500.
[7] S. L. Humpherys, K. C. Moffitt, M. B. Burns, J. K.
Burgoon, and W. F. Felix, Identification of
fraudulent financial statements using linguistic
128
credibility analysis, Decision Support Systems, Vol.

50, No. 3, 2011, pp. 585-594.
[8] R. Gupta and N. S. Gill, A data mining framework
for prevention and detection of financial statement
fraud, International Journal of Computer
Applications, Vol. 50, No. 8, 2012, pp. 7-14.
[9] M. E. Alden, D. M. Bryan, B. J. Lessley, and A.
Tripathy, Detection of Financial Statement Fraud
Using Evolutionary Algorithms, Journal of
Emerging Technologies in Accounting, Vol. 9, No. 1,
2012, pp. 71-94.
[10] http://ckipsvr.iis.sinica.edu.tw/, Chinese Knowledge
and Information Processing.
[11] K. Meijer, F. Frasincar, and F. Hogenboom, A
semantic approach for extracting domain
taxonomies from text, Decision Support Systems,
Vol. 62, 2014, pp. 78-93.
[12] G. Salton and C. Buckley, Term-weighting
approaches in automatic text retrieval, Information
Processing and Management, Vol. 24, 1988, pp.
513523.
[13] http://jirs.judicial.gov.tw/FJUD/,
Law
and
Regulations Retrieving System, The Judicial Yuan
of The Republic of China.
[14] http://163.18.1.9/record=b1164640,
Taiwan
Economic Journal.
[15] J. R. Quinlan, Induction of decision trees, Machine
Learning, Vol. 1, No. 1, 1986, pp. 81-106.
[16] C. E. Shannon, A mathematical theory of
communication, The Bell System Technical Journal,
Vol. 27, No. 3, 1948, pp. 379-423.
[17] C. Cortes and V. Vapnik, Support-vector networks,
Machine Learning, Vol. 20, No. 3, 1995, pp. 273297.
[18] X. Zhou, W. Jiang, Y. Tian, and Y. Shi, Kernel
subclass convex hull sample selection method for
SVM on face recognition, Neurocomputing, Vol. 73,
No. 10-12, 2010, pp. 2234-2246.
[19] H. Stern, Y. Chassidim and M. Zofi, Multiagent
visual area coverage using a new genetic algorithm
selection scheme, European Journal of Operational
Research, Vol. 175, No. 3, 2006, pp. 1890-1907.
[20] E. Tsang, P. Yung, and J. Li, EDDIE-Automation, a
decision support tool for financial forecasting,
Decision Support Systems, Vol. 37, No. 4, 2004, pp.
559-565.
ISBN: 978-1-941968-16-1 2015 SDIWC
129

On Fraud Detection Method For Narrative Annual Reports

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

On Fraud Detection Method For Narrative Annual Reports

Uploaded by

Copyright:

Available Formats

Proceedings of The Fourth International Conference on Informatics & Applications, Takamatsu, Japan, 2015

On Fraud Detection Method for Narrative Annual Reports

Annual reports present the activities of a listed

In addition to orienting investors the operational

ISBN: 978-1-941968-16-1 2015 SDIWC

inexperienced auditors without expertise who

ISBN: 978-1-941968-16-1 2015 SDIWC

preprocessing, term-pair combination, and

case, accurate financial and accounting

3 FRAUD DETECTION TECHNIQUES for

Establishment of Fraudulent Feature Term Library

Clustering of Narrative Annual Reports

Non-Fraudulent and Fraudulent

Non-Fraudulent Narrative Annual Reports

Figure 1. Developmental Procedure for Detecting Fraud

ISBN: 978-1-941968-16-1 2015 SDIWC

Sentence Breaking &

Terms from Fraudulent

Terms from Non-Fraudulent

First Term(FTi) Matching

Second Term(STn) Matching

Terms from Fraudulent

Figure 2. Algorithm for Preprocessing of Narrative

3.2 Term-Pair Combination

ISBN: 978-1-941968-16-1 2015 SDIWC

Figure 3. Algorithm for Term-Pair Combination

3.3 Filtering of Fraudulent Feature Terms

and information gain are shown as Eqs. (1) and

H(C) - p(c i )log 2 p(c i )

Terms from Non-Fraudulent

Term Frequency Calculation

Terms from Fraudulent

H(C E ) p(e j )- p(c i e j ) log 2 p(c i e j )

Term Frequency Calculation

Figure 4. Algorithm for Filtering Fraudulent Feature

TFIDFi, j TFi, j IDFi ; TFi, j

An attempt is made to accurately detect fraud in

ISBN: 978-1-941968-16-1 2015 SDIWC

represents the product of term

model. Equation (8) shows the formula for

H(x) arg max t (ln

where f(x) represents the optimal decision

ISBN: 978-1-941968-16-1 2015 SDIWC

Queen Genetic Algorithm

Optimal Decision Function

f(x) sign( i yi K(x, xi ) b)

Selection of Queen Cohort

Figure 5. Algorithm for Clustering Narrative Annual

4 EXPERIMENT and EVALUATION of the

This section describes implementation of the

ISBN: 978-1-941968-16-1 2015 SDIWC

shareholders are retrieved and collected

The training dataset of reports to

Table 3. QGA-SVM Testing Results and Detection at a

4.2 Evaluation of Clustering Accuracy

ISBN: 978-1-941968-16-1 2015 SDIWC

This work proposed a novel fraud detection

credibility analysis, Decision Support Systems, Vol.

ISBN: 978-1-941968-16-1 2015 SDIWC

You might also like