1516 t3 IS415 Project Report Team Aprunber

Automated and Customized Dengue Case Clustering Method
Cao Xinge Du Xue Wang Tiantong

School of Information System School of Information System School of Information System
Singapore Management University Singapore Management University Singapore Management University
Xinge.cao.2013@sis.smu.edu.sg Xue.du.2013@sis.smu.edu.sg tt.wang.2013@sis.smu.edu.sg
ABSTRACT project in handling input data from various sources and

In this paper, we describe a customized clustering method by analyzing the correlation between different sources of data.
using QGIS 1 which is an open source Geographic Information The automated and customized dengue clustering method this
System. It is able to process different types of spatial data file paper proposed, provides a solution to efficiently generate,
and provide modeler function that could automate the process of visualize, and report dengue case distribution pattern.
geoprocessing activities. Dengue Fever is a fast spread disease
that has caught many countries attention. Currently, most of the
countries are forming dengue clusters using statistical method
that is accurate by time consuming. Our model provides an 2. EXISTING RELATED WORKS
automated process to generate intuitive clusters bases on 2.1 Morans I in Dengue Case Distribution
infected building shapes and provide interactive user interface to Morans I is a measure of spatial autocorrelation of polygons.
view dengue case distribution report. The solution is highly Values of I range from 1 to +1. Negative values indicate
extensible with proposed future work to make it more efficient negative spatial autocorrelation and positive values indicate
and accurate. positive spatial autocorrelation. A zero value indicates a random
spatial pattern. Many countries are performing clustering with
this algorithm2:
Keywords In Chachoengssao Province Thailand, Global Morans I statistic
Dengue Case Distribution, Geospatial Modeler, Customized was used to identify characteristics of the global spatial patter. I
Clustering Method, Hotspot Detection, Interactive View of statistic was calculated to evaluate autocorrelation in dengue
Dengue Clusters spatial distribution and test how villages were clustered or
dispersed in space with respect to their dengue fever morbidity
rate (DFMR). With an infectious disease like dengue, the spatial
1. INTRODUCTION patterns usually present a strong clustered auto correlation due
Dengue is the most rapidly spreading mosquito-borne viral to the spatial relationships between cases and the propagation
disease in the world. The disease has a significant impact on mechanism of the disease, involving distance and neighborhood.
health, the economy and the entire society. In Singapore, Local Morans I value was used to examine the local level of
National Environment Authority (NEA) and Ministry of Health spatial autocorrelation in order to identify villages where values
(MOH) are devoted large amount of resources to fight with this of the DFMR were both extreme and geographically
disease. NEA public health officers collect dengue cases data on homogeneous. This identifies dengue hotspots, where the value
daily basis to identify potential risk for the residents around the of the index was extremely pronounced across localities, as well
infected area. In addition, public health officers collect sample as those of spatial outliers [1].
mosquitos to find out the specific species breeding habit in order In Haonoi, Vietnam, the spatial autocorrelation of the
to prevent the disease from further spreading. A large amount of expected incidence rates of dengue fever in three different
raw data is collected during the process. There is lack of an periods was assessed using Morans I statistic in the
efficient system to take in daily data input, present in a more program ArcGIS 9.2. Spatial autocorrelation was
interactive way, and join data from different information from considered significant if the p<0.05 [2]. Other places such as
sources. Dhaka, Bangladesh are also performing their dengue
Fight with dengue is always one of the hottest issue in clustering method using similar method.
Singapore, which needs solutions to prevent further aggravation. This method provides reference on statistical analysis of the
By analyzing the dengue cases distribution pattern, NEA and autocorrelation among areas based on number of cases
MOH could identify hotspot in a timely manner and implement reported. It is able to provide information on the diffusion of
preventive actions. We aim to value add to authorities decision
the diseases but not provide a intuitive view of clustering
making process in determination of outbreak, planning fogging
pattern on the map. In addition, this method can be adopt for
exercise, implementing outbreak control, and informing public
to prevent disease. Our system are required to be highly
high level control of dengue cases to design preventive
interactive and updated frequently. This poses difficulties to the strategies.
1 2
http://qgis.org/en/site/ https://en.wikipedia.org/wiki/Moran%27s_I
1|P a g e
2.2 Kernel Density Estimation (KDE) elaborate on how to solve it and prevent error occurring in the
In Kuala Lumpur, Malaysia, the spatial density of dengue cases following sections. Better and updated geocoding API is
within six zones was examined using spatial statistical tools. The encouraged to be used in this step.
KDE interpolation technique was utilized in order to analyze the 3.1.2 Data Formatting
hotspot localities [3]. KDE is referred as an advanced technique
Based on the simulated data provided by prof, the format of
to generalize the incident locations to the whole study area
NOTIF_DATE attribute is mm/dd/yyyy. However, based on
where it is involved in the identification of high risk areas within
our later discovery, we realized that this format is not able to be
point patterns of disease incidence by producing a continuous
used during the cases selection process, the reason of which will
and smooth surface which gives the information about the level
be elaborated later. Therefore, before importing the data into
of risk for a particular area [4].
QGIS, we reformatted the date in yyyy-mm-dd.
In Singapore, the dengue clusters are also formed using KDE on
raster layer, then convert the output layer into vector layer. And
extract the same of the cluster bases on an approximate value of 3.2 Clustering
KDE value. The map below shows an example of cluster view
of dengue distribution3. 3.2.1 Customized Clustering Method
As discussed in related works section, most countries perform
dengue hotspot clustering based on statistical method such as
Moran I or Kernel Density. The clusters formed are based on
statistical measures and may cut through buildings. For better
managing roles and responsibilities of implementing corrective
and preventive actions, we are proposing a more intuitive view
of clusters in the shape of building buffers.
Cases are reported with the persons residential addresses which
are located at the centroids of the residential buildings. We
define a cluster by joining buildings with dengue cases reported
within 14 days within 150 meters. Number of cases is counted
by cluster. The day cluster formed is used to label each cluster.
3.2.2 Clustering Rules and Process

Figure 2.1 Building Reclassification 1. Cases in one cluster
Daily reported dengue cases are combined with all the past
cases. Cases within 14 days are selected.
KDE was observed as a better hotspot identifier compared to
cluster analysis such as Morans I, as it can help in 2. Buildings contains Cases
calculating density of point features around output raster cell Building data are Building data set is obtained from OSM
[5]. The hotspots extracted using KDE are more accurate and extract for Singapore website4. The data set has 48,040 entries;
provide visualization of high density of dengue cases however it contains very limited useful information except the
occurrences on the map. geometry information of the buildings in Singapore. Following
difficulties are encountered while preprocessing the data:
In our method, we are aiming to provide a more intuitive
type Column: The type of the buildings are filled in
view of cluster bases on the shape of building. It will
randomly which is hard to properly classify the data.
highlight the cluster of buildings with dengue cases, hence
allow the responsibility for corrective action can be better Wikipedia classifications of building type are used as
delegated to the building blocks. reference, to standardize the building classification. Field
Calculator in QGIS can be used to convert useful type
information into proper class:
3. OUR METHODS
3.1 Data Preprocessing
3.1.1 Geocoding
As what we learned before, geocoding is an important and
necessary step if we do not have X-Y coordinates in our raw
data. The same situation applies in our project - the raw data
with single cases contains only postal codes without accurate X-
Y coordinates. Therefore we need to use geocoding to generate
the corresponding coordinates first before the next step.
What we used in our project is the geocoding API provided in
class. It is inevitable that there are some invalid postal codes
which will results in null in value. For such null values, we will
Figure 3.2.1 Building Reclassification
3 4
http://www.dengue.gov.sg/subject.asp?id=74 http://download.bbbike.org/osm/bbbike/Singapore/
2|P a g e
Buildings outside Singapore: In order to efficiently select
only those buildings located within Singapore coastal outline,
building layer is intersected with Singapore Costal Outline
data layer in QGIS by using Vector > GeoProcess Tools >
Intersect to select buildings within coastal outline. After
selection, 44,792 buildings are kept.
Figure 3.2.4 Generate Dissolved Buffers

Figure 3.2.2 Building Reclassification
However, once buffers are dissolved there is only one line of
After reclassifying buildings, there are still too many buildings record for the buffer. We want to have individual joined buffer
on the map that is not easy to visualize the building information area to catch individual cluster information. We further split the
to support analysis of dengue case distribution. We only dissolved buffer record by using the tool: Multipart to
interested in the building that has reported dengue cases, so Singleparts. Each cluster will have its own data entry with
buildings are further selected for those cases only contains latest polygon geometry information.
dengue cases using the function in QGIS: Vector > Research
Tools > Select by Location.
Figure 3.2.3 Select Buildings Contain Latest 4. Cluster Identifier

Dengue Cases After splitting, all clusters are labeled with the same ID. We
need to ensure each cluster is uniquely identified for easy
3. Building Buffers
retrieval. We used filed calculator to add one column with
Selected building polygons are buffered for 150 meter. If any
default record ID which is $id in QGIS.
building buffers are touching each other, these buffers are joined
together and formed into one cluster. We can use dissolve buffer
function when we are generating the building buffers.
3|P a g e
more powerful query function to reduce steps in the whole
process.
The spatial database allows uploading vector layers from QGIS

into database and executed by queries. In this method, we do not
need to store duplicated cases information by duplicate cases
that are assigned in different clusters on different date. PoGIS is
powerful that is able to concatenate all the cases IDs into one
column. In this case, there is no need to duplicate case data by
different clusters.

In addition, we need to add time series information to each
cluster to allow user visualizing cluster information by date. The
new column need to be set as type string instead of date to
ensure the date are correctly exported into GeoJson format. We
formulate date with the query: format_date ( to_date ( now()), Figure 3.2.7 PoGIS Query
'yyyy-MM-dd').
All the clusters formed in different dates are consolidated
together. Each cluster is uniquely identified by cluster record ID
and cluster formation date. The assumption to use this composite
identifier is that one day will only generate one cluster.
Otherwise, this identifier cannot be used to uniquely identify
clusters formed.
5. Manage Cases in Clusters
One case can be assigned to multiple clusters before it peters out.
Cases are linked to the clusters by assigning cluster ID and
formation date to each selected cases within 14 days.
Technically, in the consolidated case data file, each case object
will be recorded 14 times and each one is uniquely identified by
its own object ID and its assigned cluster identifier.
Figure 3.2.8 Query Result Preview
3.2.3 Techniques Used in Clustering However, as an open source software, PostGIS may post
1. Geoprocessing Tools security concern to government authorities. Hence, It is has
This is an easier way to link cases to clusters by referencing higher requirements for users and not as light weighted as QGIS
clusters identifier, i.e. cluster ID and cluster date. in-build methods.
Geoprocessing tool such as join attribute by location can be used 3. SpatialLite
to append cluster information to each case. SpatiaLite is a spatial extension to SQLite, providing vector
geodatabase functionality. It is light weighted, the whole SQL
The advantage of this method is that it is an embedded function
engine is directly embedded within the application itself: a
in QGIS hence can be automated in QGIS modeler. And do not
require any database server license. complete database simply is an ordinary file which can be freely
copied (or even deleted) and transferred from one computer/OS
However, it requires more complicated front end display logic to to a different one without any special precaution. The SQL
retrieve cases bases on cluster ID and cluster date. In addition, query uses a slightly different way. Following are the query used
this method requires duplicated case information to be stored to append contained cases id behind each cluster:
which may reduce the efficiency in data storage and increased SELECT cluster_id, group_concat(obj_ID) as casesIDs,
data processing time. count(*) as numCases
2. PostGIS FROM "caseWithClusterId "
Leveraging the database management technologies to manage GROUP BY cluster_id
all the special data information of dengue cases and dengue ORDER BY cluster_id
clusters will increase the data processing efficiency and execute However, SpatialLite is not integrated with QGIS modeler.
4|P a g e
3.3 Modeler
3.3.1 Automation with Modeler
QGIS Modeler is an inbuilt tool with QGIS, which provides
easy solution to complex operation flows. By creating complex
models using the graphical modeler, we can easily replace
manual execution with automation workflows, which saves
much time and effort.
Of course, we need to first figure out what processes are needed
using modeler. After discussion and consulting prof, we Figure 3.3.2 Expression for selecting cases within 14
finalized the following steps (underscores means process days
Input raw data (csv) Geocoding (csv) Import csv data as

vector layer Append new cases to existing cases
3. Assign each case to building layer and dissolve
Clustering(building buffer, 14 days, 150 meters) Generate
dengue clusters display cluster map in web application and
buffer
generate report For this step, we use the select by location algorithm in the
modeler to select those case points that are within each building.
3.3.2 Modular Process
1. Append daily new dengue cases to existing dengue
cases.
Based on sponsors requirement, dengue cases are considered
during clustering process must be within 14 days. Therefore, we
need to select those cases fulfilling this requirement from all the
past data. Only by using merge vector layer algorithm, can the
new data be appended to the past existing data for further
selection and process.
Figure 3.3.3 Assign cases to building and dissolve

Figure 3.3.1 Model for appending dengue cases buffer for cluster
For this process, the input two files are two vector layers,
including ExistingDengueCases point vector layer containing
all past dengue cases, and DailyNewCases csv file containing Since we uses building buffer to identify clusters, the Multipart
daily dengue cases in rows. to singleparts is used to identify each cluster that were
combined in a single row in the previous step. At the end, we
2. Select dengue cases within 14 days from $now use Field calculator to assign cluster id to each cluster.
Based on sponsor requirement, after appending daily new cases However, during the process of assigning cluster identifier to
with existing cases, we need to select those within days from each case, which is the join by attribute algorithm, we
today in order to form valid clusters. Therefore, we choose to encountered NoneType erroe from python. After
use Select by expression algorithm in QGIS Modeler and filter investigating, we realized that python identifies null in X/Y
out those required data. The configuration here needs to be coordinates as NoneType, thats why the model is not able to
highlighted because we figured out how to embed date format further proceed to next steps.
data into expression and integrate with current date, which is
expressed as day( age( $now , to_date( "notif_date" ) ) ) <14 Therefore, we updated the query for selecting cases within 14
where the age() function is used to get the interval between days to exclude those without valid coordinates X-Coordina
$now and the notif_date attribute in our data; the day() is Not NULL AND day( age( $now , to_date( "notif_date" ) ) )
function is to specify the date unit we would like to use. <14.
5|P a g e
After brainstorming, we decided to try to convert the date format
in csv files into integer so that we can easily use minus 14
in QGIS selection expression. Unfortunately, this ways also
failed because if we want the generated web map to be flexible
and interactive, we then need to prompt users to input any date
as they want to retrieve different data. The $now attribute in
Date and Time field can be used to form clusters on a daily
bases (here we assume that NEA will update todays dengue
cases on a daily bases). Where there comes the problem is that
Figure 3.3.4 NoneType Error
the format of $now is in yyyy-mm-dd, so it cannot be used
together with date in integer format for selection query, which
means we have to figure out another way to make it work.
Finally, after considering the usability, effort taken and the
technical limitation of QGIS, we decide to change the format of
date to yyyy-mm-dd in the csv file, which is the same as we
introduced in the above Data Pre-processing section and still
use csvt to specify the Date format of this attribute.. We
believe that this will not take more time or effort compared with
the current date format being used because once the format is
set, all newly updated data will follow the same format. In this
way, we can then use the $now function with QGIS selection
query to retrieve those cases within 14 days based on
requirement.
Figure 3.3.5 Updated Expression for selecting

cases within 14 days
4. Assign cluster id to each case

In order to dynamically retrieve case information based on
cluster id for final visualization, we need to assign cluster id to
each cases within that cluster. Therefore, the process Join
attributes by location merge vector layers are for cluster id
assignment.
Figure 3.3.7 Customize date format in csv files.
Figure 3.3.6 Assign cluster id to each case
Figure 3.3.8 Newly updated date format in csv file

3.3.3 Difficulties and Solutions
Beside the NoneType error as we mentioned in the last
section, we also faced a lot of difficulties during the process. 2. Shapefiles cannot be directly read to generate
However, we explored QGIS Modeler together, tried alternative interactive web map
and referred to documentations, all the problems were solved.
One of our final goals is to create an interactive web map for
1. Date Format Not Working in the Raw Data visualization. The interactive here does not simply mean user
input of date and the dynamic generation of cluster/case
In the original csv data file, the format of notification date is information. We also aim to allow the export process to become
Short Date mm/dd/yyyy. However, with this format, QGIS easier and handier. At the beginning, we were only able to
was not able to recognize it as date, although we added csvt manually export the generated dengue cluster and case layers
files to specify the data type. Without notification date, we were from QGIS using leaflet, and then put the exported folder in our
then not able to filter those cases within 14 days for clustering.
6|P a g e
prepared user interface for visualization. This step is considered 3. Wrong coordinates in geojson file
troublesome, because every time there is newly updated dengue
cases and newly generated clusters, users would have to Successfully generating geojson file is a big step, meaning that
manually finish the exportation process. Given that NEA we can realize the automation in file exportation. During testing
updates new cases on a daily bases, this manual process is phase, the geojson file can also display correct cluster map as
QGIS vector layer. However, when we further test the geojson
definitely not preferred, so we decided to figure out a more
automated alternative to replace this process, which can also file, the coordinates inside are totally wrong all the numbers
consist with our initial motivation automation. are very huge, making it impossible for our code to read and
display properly in our web app.
Fortunately, QGIS Modeler allows us to choose to save final
generated clusters and ids as geojson file. Thus, our web After debugging for quite a while, we realized that geojson files
application can directly read data from the specified folder and target projection is WGS84 [EPSG:4326] and our shapefiles are
update map and report based on new data, which means projected as SVY21/Singapore [EPSG:3414], and thats why the
whenever the model is executed, the web application page can coordinates do not match in geojson files. After further
exploration, we found that we can easily change the projection
be updated accordingly just by simply refreshing, without any
manual exportation. to WGS84 using QGIS Modeler. In this way, we could finally
successfully achieve the automation in web map visualization
and report generation in correct and user-friendly interface.
Figure 3.3.10 Reproject layers using QGIS Modeler
Figure 3.3.9 Generated Geojson File

After figuring out the whole process above and solving the issue
we faced during the implementation, we finally created a model
that is customized for NEAs dengue hotspot identification,
which realizes full automation.
Figure 3.3.11 Overall model flow

7|P a g e
4. RESULTS
4.1 Read Modeler Output Data
In order to improve the flexibility of the system, we decide to
make the web application directly read the data generated from
the QGIS modeler. Thus, when specify the location of the output
data to be the data folder of the web application. After
comparing different types of output data (shapefile, KML and
geojson), we decided to choose geojson because of its
convenience when display layer as well as read all the cases or
clusters data. Figure 4.2 Date Picker for Input Date
As mentioned before, there are two outputs generated by the To display the cluster layer based on the date, we used the filter
modeler. One contains appended clusters data which is used to function provided by Leaflet for adding the geojson layer to the
display cluster layer on the web application, the other one web application.
contains the appended cases data which is used to generate the
report of each dengue cluster.
In order to read the two output files, we used the
XMLHttpRequest which is an API provides client functionality
for transferring data between client and server. As a result, this
function provides an easy way to get data from URL without Figure 4.3 Filter Cluster Data
reload the page. This functionality can be used to retrieve any
type of data. After retrieve data from the file, we convert the text As shown in the code, the layer was filter based on the date that
into JSON format for further processing. The code below shows the cluster exists.
the steps to read the data.
4.2.2 Hotspot Legend
In order to make the clusters more meaningful and easier for the
user to analyze, we classified different clusters based on number
of dengue cases in each clusters. As shown in the legend, the
clusters with more cases will have deeper color.
Figure 4.1 Read Data File
4.2 Display Cluster Layer Figure 4.4 Legend for display Cluster
As shown above, open Street Map is used as the base map of the
web application. The clusters are shown on the base map is
based on the geometry data generated. It is clear to identify the To classify different clusters, we used the features of leaflet and
location of each cluster. In general, we used Leaflet to display change the style of each clusters as shown in the code (the
the map. Leaflet is an open-source JavaScript library to display example is used to style the cluster with 1 or 2 cases inside it).
interactive maps. As the input data file of the web application is
geojson, we add the data into the web application as a layer
function - L.geoJson( <Object> geojson?, <GeoJSON
options> options? ). The file for displaying the clusters is the
appended cluster data.
4.2.1 Time Series Data

Data generated from QGIS modeler is time series data. To make
the application more flexible, we enable the system to display
the clusters based on the date selected by user. As shown in the
figure below, we used the datepicker, which is an API from
JQuery and allows to open an interactive calendar, to prompt
user to select date when they would like to explore dengue
clusters. Figure 4.5 Code for style the clusters
4.2.3 Highly Interactive

Except for the color classification of the clusters, the web
application became even more interactive by more details shown
for each cluster. When user click on the cluster they would like
8|P a g e
to explore, there will be a window pop up to show the basic
information like cluster ID, cluster type (building type inside the
cluster) and number of cluster case about the selected cluster (as
shown in the figure). We implemented the pop up window
feature of leaflet and customized the information fro display.
And code shown below is how we implemented the feature.
Figure 4.9 Demographic Chart of the cluster

In order to generate these 2 figures, we used the same filter
function the get number of cases based on different categories.
4.3.2 Dengue Case Report

The dengue case report shows detail of each case in the cluster
as shown below.
Figure 4.6 Popup of each cluster
Figure 4.7 Code to implement the pop up feature
Figure 4.10 Report by each case

More detailed information of the cluster will be shown on the
other side which will be further discussed in the later part of the In order to read case data, the application looped all the selected
report. data filtered in the previous data.
4.3 Report for The Clusters

As mentioned above, the right side of the application is the
report of each cluster. The report of the cluster is in two part,
one is the case summary report of the cluster, and the other one
is the details of each cases in the cluster. The report generated is
read from appended case file data. In order to display report of
each cluster, we joined two dataset based on the cluster ID and
date of the cluster. As shown below, in order to join data, we
filtered appended cases data based on the cluster ID and cluster
Date.
Figure 4.11 Code for displaying case report
Figure 4.8 Get case data based on selected cluster
4.3.1 View Cases by Demographic Chart

There are two parts of the case summary generated from the
case data file. As shown in the figure, one chart is shown based
on the gender and the other one is shown based on age.
9|P a g e
Figure 4.12 Overall Web Application
5 DISSCUSSION
5.1 New Practice Introduced 3. NEA would still prefer to combine QGIS Modeler and
The highlight of our project is the use of QGIS Modeler, which ArcGIS, which is what they are currently using. However,
is a new open source tool introduced by prof and further due to the limitation of software adaptability, this is not
explored by our team. Different from other Geospatial Analytics feasible for now.
projects, we do not focus on analysis part, but more for proving 4. NEA cares about the speed of running model with large
a concept. Since our aim is to customize and propose a solution amount of data, which was not tested using the simulated
to NEA, we focused on the automation and flexibility of our data. Further improvement can be worked on this part.
cluster formation process and the use of our web map and report. 5. The dynamic web report and interactive way of showing
dengue cases distribution and demographic distribution are
With the use of QGIS Modeler, we successfully implemented impressive.
the whole cluster formation process in a much more 6. The flexibility of change algorithms parameters in QGIS
convenience and easy way, which also provides flexibility in Modelers for better customization also caters to their need
terms of possible changes in criteria, adjustment of details steps, and interest.
type of output files and etc. 7. The display of cluster map with different colors darkness
In addition, with our extra code in the web map side, we also indicating cluster density is also interesting and impressive.
provide an interactive approach to dynamically view and update
maps as well as dengue case report, which we believe is a better
5.3 Disadvantage of Our Method
alternative for unchangeable pdf report. Firstly, the cluster formed are simply according to the rule that
whether two building buffer touches each other. If more
5.2 User Study buildings are clustered, it implies more cases are in the cluster,
On poster day, we were honored to present our project outcome even if there is only one case in each building. This shows it
to representatives from NEA. Based on our observation, we cannot demonstrate the density in the same building. In another
found that some of our proposed methods are really impressed to word, the values to show the severity of the clusters are not
them and has the potential to be taken and implemented by taking into account the factor of the area size of the cluster.
NEA. A few points mentioned by representatives from NEA are Secondly, even if there is only one case in the building. The area
listed below. of 150m building buffer is identified as cluster. It allows
1. Clustering can be auto-generate by QGIS Modeler without authority to manage each indivisual cases at its emerging states.
human effort, which can be very important and a big effort- It is hardly to be called as a cluster in traditional sense.,
saving method for NEA.
2.
10 | P a g e
6 FUTURE WORK
Our system currently is just an implementation of proposal to
NEA the use of QGIS Modeler and building buffer. It may not
8 REFERENCES
100% fits NEAs current execution/research process, so further [1] Phaisarn Jeefoo, Nitin Kumar Tripathi and Marc Souris.
refinement and improvement can also be important. 2011. Spatio-Temporal Diffusion Pattern and Hotspot
Detection of Dengue in Chachoengsao Province, Thailand.
6.1 Database for large amount of data Int, J. Envirron. Res. Public Health 2011, 8, 51-74.
Doi:10.3390/ijerph8010051.
The system right now is directly using shapefiles for executing
models and generating web map and report, which works [2] Do Thi Thanh Toan, Wenbiao Hu, Pham Quang Thai,Luu
perfectly for small size of data (400 rows of data was used for Ngoc Hoat, Pamela Wright and Pim Martens. 2013. Hot
testing) by joining different files and retrieve data based on spot detection andspatio-temporal dispersion of
primary keys. However, as what representatives from NEA were denguefever in Hanoi, Vietnam. PUBLIC HEALTH IN
concerned about, it may take longer for large amount of data. VIETNAM, Coation.
Luckily, there is an alternative existing integration of database [3] S. Aziz, R.M. Aidil, M.N. Nisfariza, R. Ngui, Y.A.L. Lim,
and QGIS. During our exploration of clustering formation, we W.S. Wan Yusoff & R. Ruslan 2014. Spatial density of
did try to integrate PostGIS with QGIS Modeler and proved that Aedes distribution in urban areas: A case study of breteau
the speed of executing QGIS Modeler can increase data retrieval index in Kuala Lumpur, Malaysia. Department of
efficiency by simply using database queries with correct Geography, Faculty of Arts and Social Sciences, University
database connection and configuration, which solves the of Malaya
potential problem and can be further implemented.
[4] Bitchell JF. An application of density estimation to
6.2 Integration with other tools geographical epidemiology. Stat Med 1990; 9: 691701.
Although QGIS Modeler allows much convenience by [5] Levine N. Crimestat: A spatial statistic program for the
automating the execution process and provides flexibility with analysis of crime incident locations (Version 3.0).
various parameter configurations. However, as the situation Washington, D.C.: Levine & Associates; and Houston, TX:
mentioned by NEA, they prefer to integrate QGIS Modeler with National Ned Institute of Justice 2007; p. 12.
ArcGIS to decrease learning curve and adoption time. Although [6] Daley, D. J. & Gani, J. (2005). Epidemic Modeling: An
such integration is not implemented currently, QGIS Modeler Introduction. NY: Cambridge University Press.
can export models as Python scripts which can be read by other
software like ArcGIS. This potential integration can be tempting [7] N.A. (n.d.). The graphical modeler. Documentation QGIS
to our potential users, but the configurations and connection in 2.0:http://docs.qgis.org/2.0/ca/docs/user_manual/processing
order to enable integration can also be challenging and tedious. /modeler.html
[8] N.A. (n.d.). XMLRequest. Mozilla Developer Network:
6.3 Auto-Geocoding https://developer.mozilla.org/en-
If we go through the whole process, geocoding is one of the very US/docs/Web/API/XMLHttpRequest
few parts where manual work is involved, and the accuracy and [9] N.A. (n.d.). Using GeoJSON with Leaflet. Leaflet:
efficiency of geocoding can also be crucial to the following http://leafletjs.com/examples/geojson.html
steps. For now the X-Y coordinates are converted using the
geocoding API provided, which is slightly outdated. Therefore,
a more accurate auto-geocoding process can be implemented to
further improve the workflow.
6.4 Show the Stage of Each Cluster

Clusters may be at different stage that requires different level of
attention to be allocated to each stage. SIR Model can be
referred to describe the development stages of the disease. It
includes 3 sages: Susceptible, Infectious, Recovered. If a new
cluster of very few cases (1 or 2) and at a new location is just
emerging, it is under susceptible stage. It requires preventive
action to be implemented. If a cluster is enlarging and at a fast
speed it is under infectious stage which requires more corrective
actions to be implemented the nearby neighborhood. If a cluster
is shrinking fast, it is considered as recovered stage [6].
Applying this model allows authority to better manage cluster
and enable more efficient resource allocation.
7 CONCLUSION
The customized dengue cluster process allows authorities to
automate their dengue clustering process. They can generate
clusters bases on building shape, view clusters by different date,
and view case report in both summary format and individual
case format. It uses a more intuitive way to provide reference to
authoritys decision making and strategy setting.
11 | P a g e

1516 t3 IS415 Project Report Team Aprunber

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1516 t3 IS415 Project Report Team Aprunber

Uploaded by

Copyright:

Available Formats

Automated and Customized Dengue Case Clustering Method

Cao Xinge Du Xue Wang Tiantong

ABSTRACT project in handling input data from various sources and

3.2.2 Clustering Rules and Process

Figure 3.2.4 Generate Dissolved Buffers

Figure 3.2.5 Generate Dissolved Buffers

Figure 3.2.3 Select Buildings Contain Latest 4. Cluster Identifier

The spatial database allows uploading vector layers from QGIS

Figure 3.2.6 Generate Dissolved Buffers

Input raw data (csv) Geocoding (csv) Import csv data as

Figure 3.3.3 Assign cases to building and dissolve

Figure 3.3.5 Updated Expression for selecting

4. Assign cluster id to each case

Figure 3.3.7 Customize date format in csv files.

Figure 3.3.6 Assign cluster id to each case

Figure 3.3.8 Newly updated date format in csv file

Figure 3.3.10 Reproject layers using QGIS Modeler

Figure 3.3.9 Generated Geojson File

Figure 3.3.11 Overall model flow

Figure 4.1 Read Data File

4.2.1 Time Series Data

4.2.3 Highly Interactive

Figure 4.9 Demographic Chart of the cluster

4.3.2 Dengue Case Report

Figure 4.6 Popup of each cluster

Figure 4.7 Code to implement the pop up feature

Figure 4.10 Report by each case

4.3 Report for The Clusters

Figure 4.11 Code for displaying case report

Figure 4.8 Get case data based on selected cluster

4.3.1 View Cases by Demographic Chart

6.4 Show the Stage of Each Cluster

You might also like