You are on page 1of 12

Weather prediction using CPT+ algorithm

Introduction
Weather forecasting is a vital application in meteorology and has been one of the most
scientifically and technologically challenging problems around the world in the last century.
Weather forecasting entails predicting how the present state of the atmosphere will change.
Present weather conditions are obtained by ground observations, observations from ships and
aircraft, radio-sounds, Doppler radar, and satellites.
This information is sent to meteorological centers where the data are collected, analyzed, and
made into a variety of charts, maps, and graphs. Modern high-speed computers transfer the many
thousands of observations onto surface and upper-air maps. Computers draw the lines on the
maps with help from meteorologists, who correct for any errors. A final map is called an
analysis. Computers not only draw the maps but predict how the maps will look sometime in the
future. The forecasting of weather by computer is known as numerical weather prediction.
Climate is the long-term effect of the sun's radiation on the rotating earth's varied surface and
atmosphere. The Day-by-day variations in a given area constitute the weather, whereas climate is
the long-term synthesis of such variations. Weather is measured by thermometers, rain gauges,
barometers, and other instruments, but the study of climate relies on statistics. Nowadays, such
statistics are handled efficiently by computers. A simple, long-term summary of weather changes,
however, is still not a true picture of climate. To obtain this requires the analysis of daily,
monthly, and yearly patterns.
Climate change is a significant and lasting change in the statistical distribution of weather
patterns over periods ranging from decades to millions of years. It may be a change in average
weather conditions or the distribution of events around that average (e.g., more or fewer extreme
weather events). The
term is sometimes used to refer specifically to climate change caused by human activity, as
opposed to changes in climate that may have resulted as part of Earth's natural processes.
Climate change today is synonymous with anthropogenic global warming. Within scientific
journals, however, global warming refers to surface temperature increases, while climate change
includes global warming and everything else that increasing greenhouse gas amounts will affect.

Proposed Scheme
In this section we present a model for lossless weather prediction that is CPT+. Given a set of
training sequences, the problem of sequence prediction consists in finding the next element of a
target sequence by only observing its previous items. The number of applications associated with
this problem is extensive. It includes applications such as web page pre-fetching, consumer
product recommendation, weather forecasting and stock market prediction. The literature on this
subject is extensive and there are many different approaches. Two of the most popular are PPM
(Prediction by Partial Matching) and DG (Dependency Graph) . Over the years, these models
have been greatly improved in terms of time or memory efficiency but their performance remains
more or less the same in terms of prediction accuracy. Markov Chains are also widely used for
sequence prediction. However, they assume that sequences are Markovian. Other approaches
exist such as neural networks and association rules. But all these approaches build prediction
lossy models from training sequences. Therefore, they do not use all the information available in
training sequences for making predictions. In this paper, we propose a novel approach for
sequence prediction that use the whole information from training sequences to perform
predictions. The hypothesis is that it would increase prediction accuracy.

A Decision Tree

A Decision Tree is a flow-chart-like tree structure. Each internal node denotes a test on an
attribute. Each branch represents an outcome of the test. Leaf nodes represent class distribution.
The decision tree structure provides an explicit set of if-then rules (rather than abstract
mathematical equations), making the results easy to interpret. In the tree structures, leaves
represent classifications and branches represent conjunctions of features that lead to those
classifications. In decision analysis, a decision tree can be used visually and explicitly to
represent decisions and decision making. The concept of information gain is used to decide the
splitting value at an internal node. The splitting value that would provide the most information
gain is chosen. Formally, information gain is defined by entropy. In other to improve the
accuracy and generalization of classification and regression trees, various techniques were
introduced like boosting and pruning.
Compact Prediction Tree
The Compact Prediction Tree (CPT) is a recently proposed prediction model [5]. Its main
distinctive characteristics with respect to other prediction models are that (1) CPT stores a
compressed representation of training sequences with no loss or a small loss and (2) CPT
measures the similarity of a sequence to the training sequences to perform a prediction. The
similarity measure is noise tolerant and thus allows CPT to predict the next items of
subsequences that have not been previously seen in training sequences, whereas other proposed
models such as PPM and All-K-order-markov cannot perform prediction in such case. The
training process of CPT takes as input a set of training sequences and generates three distinct
structures: (1) a Prediction Tree (PT), (2) a Lookup Table (LT) and (3) an Inverted Index. During
training, sequences are considered one by one to incrementally build these three structures.

SOFTWARE ENVIRONMENT
Java
Java is a general-purpose computer programming language that is concurrent, class-
based, object-oriented, and specifically designed to have as few implementation dependencies as
possible. It is intended to let application developers "write once, run anywhere" (WORA),
meaning that compiled Java code can run on all platforms that support Java without the need for
recompilation. Java applications are typically compiled to bytecode that can run on any Java
virtual machine (JVM) regardless of computer architecture. As of 2016, Java is one of the most
popular programming languages in use, particularly for client-server web applications, with a
reported 9 million developers.Java was originally developed by James Gosling at Sun
Microsystems (which has since been acquired by Oracle Corporation) and released in 1995 as a
core component of Sun Microsystems' Java platform. The language derives much of its syntax
from C and C++, but it has fewer low-level facilities than either of them.

The original and reference implementation Java compilers, virtual machines, and class
libraries were originally released by Sun under proprietary licences. As of May 2007, in
compliance with the specifications of the Java Community Process, Sun relicensed most of its
Java technologies under the GNU General Public License. Others have also developed
alternative implementations of these Sun technologies, such as the GNU Compiler for Java
(bytecode compiler), GNU Classpath (standard libraries), and IcedTea-Web (browser plugin for
applets).
The latest version is Java 8, which is the only version currently supported for free by
Oracle, although earlier versions are supported both by Oracle and other companies on a
commercial basis.
Eclipse
Eclipse is an integrated development environment (IDE) used in computer programming,
and is the most widely used Java IDE. It contains a base workspace and an extensible plug-in
system for customizing the environment. Eclipse is written mostly in Java and its primary use is
for developing Java applications, but it may also be used to develop applications in other
programming languages through the use of plugins, including: Ada, ABAP, C, C++, COBOL, D,
Fortran, Haskell, JavaScript, Julia, Lasso, Lua, NATURAL, Perl, PHP, Prolog, Python, R, Ruby
(including Ruby on Rails framework), Rust, Scala, Clojure, Groovy, Scheme, and Erlang. It can
also be used to develop documents with LaTeX (through the use of the TeXlipse plugin) and
packages for the software Mathematica. Development environments include the Eclipse Java
development tools (JDT) for Java and Scala, Eclipse CDT for C/C++ and Eclipse PDT for PHP,
among others.

The initial codebase originated from IBM VisualAge. The Eclipse software development kit
(SDK), which includes the Java development tools, is meant for Java developers. Users can
extend its abilities by installing plug-ins written for the Eclipse Platform, such as development
toolkits for other programming languages, and can write and contribute their own plug-in
modules. Since Equinox, plug-ins can be plugged/stopped dynamically and are known as (OSGI)
bundles
CloudSim Simulation Framework

CloudSim is a framework for modeling and simulation of cloud computing infrastructures


and services. Originally built primarily at the Cloud Computing and Distributed Systems
(CLOUDS) Laboratory, The University of Melbourne, Australia, CloudSim has become one of
the most popular open source cloud simulators in the research and academia. CloudSim is
completely written in Java.
CloudSim is a software framework that supports several core functionality of cloud like
job/task queue, processing of events, creation of cloud entities, communication between entities,
implementation of broker policies etc. This toolkit allows to:

1. Test application services in repeatable and controllable environment.

2. Tune the system bottlenecks before deploying apps in actual cloud.

3. Experiment with different workload mix and resource performance scenarios on


simulated infrastructure for developing and testing adaptive application provisioning
techniques.

Core features of CloudSim are:

1. Support of modeling and simulation of large scale computing environment.

2. A self contained platform for modeling clouds, service brokers, provisioning and
allocation policies.

3. Support for simulation of network connections among the simulated system elements.

4. Facility for simulation of federated cloud environment, that inter-networks resources


from both private and public domains.

5. Availability of a virtualization engine that aids in the creation and management of


multiple independent and co-hosted virtual services on a data center node.

6. Flexibility to switch between space shared and time shared allocation of processing cores
to virtualized services.

MySQL

MySQL is an open-source relational database management


system (RDBMS). Its name is a combination of "My", the name of co-founder Michael
Widenius' daughter, and "SQL", the abbreviation for Structured Query Language.
The MySQL development project has made its source code available under the
terms of the GNU General Public License, as well as under a variety
of proprietary agreements. MySQL was owned and sponsored by a single for-
profit firm, the Swedish company MySQL AB, now owned by Oracle Corporation. For
proprietary use, several paid editions are available, and offer additional
functionality.

MySQL is a central component of the LAMP open-source web application software


stack. LAMP is an acronym for "Linux, Apache, MySQL, Perl/PHP/Python". Applications that
use the MySQL database include: TYPO3, MODx, Joomla, WordPress, phpBB, MyBB,
and Drupal. MySQL is also used in many high-profile, large-scale websites,
including Google Facebook Twitter, Flickr, and YouTube.

MySQL is written in C and C++. Its SQL parser is written in yacc, but it uses a home-
brewed lexical analyzer. MySQL works on many system platforms, including AIX, BSDi,
FreeBSD, HP-UX, eComStation, i5/OS, IRIX, Linux, macOS, Microsoft Windows, NetBSD,
Novell NetWare, OpenBSD, OpenSolaris, OS/2 Warp, QNX, Oracle Solaris, Symbian, SunOS,
SCO OpenServer, SCO UnixWare, Sanos and Tru64. A port of MySQL to OpenVMS also exists.

The MySQL server software itself and the client libraries use dual-licensing distribution.
They are offered under GPL version 2, beginning from 28 June 2000 (which in 2009 has been
extended with a FLOSS License Exception) or to use a proprietary license.

SYSTEM DESIGN AND DEVELOPMENT


Software design sits at the technical kernel of the software engineering process and is
applied regardless of the development paradigm and area of application. Design is the first step
in the development phase for any engineered product or system. The designers goal is to
produce a model or representation of an entity that will later be built. Beginning, once system
requirement have been specified and analyzed, system design is the first of the three technical
activities -design, code and test that is required to build and verify software.

INPUT DESIGN

Input design is one of the most important phase of the system design. Input design is the
process where the input received in the system are planned and designed, so as to get necessary
information from the user, eliminating the information that is not required. The aim of the input
design is to ensure the maximum possible levels of accuracy and also ensures that the input is
accessible that understood by the user. The input design is the part of overall system design,
which requires very careful attention. If the data going into the system is incorrect then the
processing and output will magnify the errors.

Admin Login form

In this admin login form we used labels to display the text and textbox to get the
username and password. The admin have a unique username and password. In this form the
username and password is correct, the admin can access this website.

New user registration:

In this module admin add the Employee and his details such as name, department, phone
number etc,. Admin can edit the employee details

Existing system Form

In this form the admin can perform the weather prediction operation by applying the
decision tree algorithm.

Proposed system Form

In this form the admin can perform the weather prediction by applying the cpt+ algorithm

3.3 OUTPUT DESIGN

The output form of the system is either by screen or by hard copies. Output design aims
at communicating the results of the processing of the users. The reports are generated to suit the
needs of the users .The reports have to be generated with appropriate levels. In our project
outputs are generated by asp as html pages. As its web application output is designed in a very
user-friendly this will be through screen most of the time.

New user registration reports


This report contains registered user details. This report includes the users personal
details.
Existing system report
This report provides the output based on the dataset which contains the maximum
temperature, minimum temperature, rain possibility etc
Proposed system Reports
This report provides the output based on the dataset which contains the maximum
temperature, minimum temperature, rain possibility etc
DATABASE DESIGN

The overall objective in the development of database technology has been to treat data as
an organizational resource and as an integrated whole. DBMS allow data to be protected and
organized separately from other resources. Database is an integrated collection of data. The most
significant form of data as seen by the programmers is data as stored on the direct access storage
devices. This is the difference between logical and physical data. Database files are the key
source of information into the system. It is the process of designing database files, which are the
key source of information to the system. The files should be properly designed and planned for
collection, accumulation, editing and retrieving the required information.

Source or Destination of data

Data Flow

Process

Storage

Testing Methodologies
Testing
The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub assemblies, assemblies and/or a finished product It is the
process of exercising software with the intent of ensuring that the

Software system meets its requirements and user expectations and does not fail in an
unacceptable manner. There are various types of test. Each test type addresses a specific testing
requirement.

Types Of Tests
Unit testing
In this system the Unit testing is performed by sepating the whole projects into units such
as function, blocks and classes etc. this involves the design of test cases that validate that the
internal program logic is functioning properly, and that program inputs produce valid outputs. All
decision branches and internal code flow should be validated. It is the testing of individual
software units of the application .it is done after the completion of an individual unit before
integration.

This is a structural testing, that relies on knowledge of its construction and is invasive.
Unit tests perform basic tests at component level and test a specific business process, application,
and/or system configuration. Unit tests ensure that each unique path of a business process
performs accurately to the documented specifications and contains clearly defined inputs and
expected results.

Integration testing
Integration tests are performed to test the integration between forms. i.e whether the
form integration is performing correct or not. For example the login page integrate with the next
form once the validation is complete. This test is designed to test integrated software components
to determine if they actually run as one program.

Testing is event driven and is more concerned with the basic outcome of screens or fields.
Integration tests demonstrate that although the components were individually satisfaction, as
shown by successfully unit testing, the combination of components is correct and consistent.
Integration testing is specifically aimed at exposing the problems that arise from the
combination of components.

TEST CASES:

Screen Name: Admin login

TES TEST TEST TEST EXPECTE ACTUAL FINAL


T ID CONDITION DESCRIPTIO DATA D RESULT RESULT RESULT
S N

TC01 admin should Admin enter valid admin System should System Pass
enter the user user name accept the data accepts the
name data

Admin should Admin enter in maha System should System not Pass
enter the user valid user name not accept the accepts the
name data data

admin
TC02 admin should Admin enters valid System should System Pass
enter the password password accept the data accepts the
data
hhg
Admin should Admin enters in System should System not Pass
enter the password valid password not accept the accepts the
data data

System should
TC03 Admin should Admin clicks the Sign redirect to the System Pass
click the login login button up home page. redirects to
button the home
page.
Screen Name: new user Registration

TE TEST TEST TEST DATA EXPECTE ACTUAL FINAL


ST CONDITIONS DESCRIPTIO D RESULT RESULT RESU
ID N LT
User should enter User enters the siva,******, System should System Pass
TC0 details in the valid data in the ******,sivaram accept the data accepts the
1 registration page fields @gmail.com data
User should enter User enters the in Sivaram*gmail- Msg displays Msg Pass
details in the valid data in email com invalid mail displayed
registration page id field id as invalid
mail id
User should enter User enters the ***,***** Msg displays Msg Pass
TC0 data in password name different data as : password displayed
2 and confirm in password and mismatch as
password confirm password password
mismatch
User should enter User enters the 15 *************** System should System Fail
the password digits in password ***** not accept the accepts the
field data data
User should click User not click the System cant System not Pass
TC0 the check box check box I allow signup allowing
3 accept process sign up
User should click User click the System allow Signup Pass
the check box check box I signup process process
accept process
completed
successfully
Msg must
TC0 User should click User clicks the Sign up display as Msgdisplay Pass
4 the signup button signup button registration ed as
successfully registration
and move to successfully
sign in page and moved
to sign in
page
Conclusion

The proposed scheme in this paper enhances the technique of weather


prediction compared to the existing methodologies. Our approach is incremental,
offers a low time complexity for its training phase and is easily adaptable for
different applications and contexts. Results show that CPT yield higher accuracy
on most datasets (up to 12% more than the second best approach), has better
training time .
Yet there is always a possibility for improvement in a certain approach. This
approach may be enhanced at some points but some areas might still be
improvised.

You might also like