Knowledge discovery in databases(kdd) is an umbrella term used to describe all activities involved in making sense of data stored in large and complex databases. It encompasses a number of terms that are currently receiving attention namely data warehousing, datamart and data mining.
Knowledge discovery in databases(kdd) is an umbrella term used to describe all activities involved in making sense of data stored in large and complex databases. It encompasses a number of terms that are currently receiving attention namely data warehousing, datamart and data mining.
Copyright:
Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPTX, PDF, TXT or read online from Scribd
Knowledge discovery in databases(kdd) is an umbrella term used to describe all activities involved in making sense of data stored in large and complex databases. It encompasses a number of terms that are currently receiving attention namely data warehousing, datamart and data mining.
Copyright:
Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPTX, PDF, TXT or read online from Scribd
V An exciting recent movement in the database area is
knowledge discovery in databases(KDD)
V KDD is an umbrella term used to describe all activities
involved in making sense of data stored in large and complex databases
V KDD encompasses a number of terms that are currently
receiving attention namely data warehousing, datamart and data mining V Data warehousing-Database consists of data stored on a computer that facilitates retrieval
V Data warehousing is a refinement of the database
concept that makes an improved data resource available to the users
V It enables the users to manipulate and use data in
intuitive ways
V Key concept is that it encompasses a very wide range of
computer based data V he data resource is here called as data warehouse and its typically very large of very high quality and highly retrievable
V But the large size of data does not come at the cost of poor quality
V his is because extensive data cleaning,ie removal of
incorrect and inconsistent data and converting it into higher quality V jne statistical technique is clustering which arranges the data in the ways users want to view it
V his is similar to like goods arranged together in a
supermarket
V Data warehousing is typically performed in mainframe
computers because of extremely large amount of stored data V he data is performed in a relational database
V DBMS vendors such as oracle,sybase and informix are
promoting the use of their products as data warehouse platforms
V IBM is actively positioned itself as the builder of
computer hardware that supports data warehousing activity V he data mart-
V Achieving a data warehouse sounds like a big
challenge so that experts recommended taking a modest approach
V A data mart is a database that contains data
describing only a segment of the firms operations
V A firm may have a marketing data mart and,human
resources data mart and so on V Data mining-A term is often used in conjunction with data warehousing and data mart is data mining
V Itǯs the process of finding relationships in data that are
unknown to the user
V Data mining helps the user by discovering the
relationships and presenting them in an understandable way V he relationships may provide the basis for decision making
V Data mining enables the user to discover knowledge in
databases that the user may not know it exists
V Its not presenting the same data in a different format,it
shows relationships that were not previously recognised V ake an eg: of a bank who have decided to offer mutual funds to its customers
V Bank management wants to aim promotional
materials to the customers
V hey want to target the customer segment that offers
greatest potential for business
V For this there is data mining required to relate to the
customer database and prospects V °erification driven data mining-jne approach is for the managers to identify characters they believe the members of the target will have
V Assume the managers want to target young, married,
two income and high networth customers
V he query could be entered in to the DBMS and
appropriate records will be retrieved V Such an approach which begins with the users hypothesis of how the data is related is called verification driven data mining
V he short coming of this approach is that the
retrieval process is guided entirely by the user
V he selected information can be no better than the
users view of the data V Discovery driven data mining-Another approach enables a data mining system to identify the best customers for the promotion
V his system enables the system to analyse the database
and looks for group with common charecteristics
V In the previous bank eg:the mining system will not only
target the young married group but also retired married couple having incomes,thereby recommending a promotional campaign for both the groups V ïombined discovery and verification data mining-he concept enables the user and computer to work together to solve a problem
V he user applies expertise in the problem domain and
computer performs the data analysis
V his combination selects the appropriate data and put
it in the right form for decision making V he speed of data transmission is slower in telephone systems than between two computers connected by a telephone wire V ïomputers need extremely reliable connections but the humans who use the telephone can understand communication even when the line is static V Protocols for the public telephone system were established to meet the minimun criteria of voice transactions V he telephone system quality is significantly below the needs of computer data transmission V ðetworks are differentiated by the size of audience that is served
V echnology plays a role because there are physical
limits to the distance between computers based on the communications medium used
V he distinction between different types of networks has
blurred as communication technologies improve and the quality of data transmission also improves V o be included on a network,each device-each computer,printer,or similar device must be attached to the communications medium
V his is done using a network interface card
V he network interface card (ðIï) acts as an
intermediary between the data moving to and from the computer or other device V he ðIï is more than just a buffer to allow data storage
V It deciphers information from the packets to
determine if the data is meant to be captured
V It also decides if the data should be allowed to pass
down the communications medium V ocal area networks- A Að is a group of computers and other devices(such as printers) that are connected together by a common medium
V Aðs typically join computers that are physically close
together such as in the same room or building
V jnly a limited number of computers and other devices
can be connected on a single Að V he limitations vary based on the medium connecting the computers and devices as well as the Að software being used
V As a general rule,a Að will cover a total distance of
only half a mile
V he distance between computers linked by
communication medium is typically at least 2 feet and not more than 60 feet V he distances are only guidelines since the specifications imposed by the type of communication medium, the network interface card used and the Að software dictate the actual distances
V he current transmission speed of data along a Að
generally runs from 10 million bits per second to 100 mbps V Að use only private network media and they do not transfer data to the public telephone system
V jnly a single network protocol such as Ethernet or
token-ring can be used on a single Að V Að topology and implementation-Að utilizes three separate configurations for connecting the computers and other devices
V he network configuration is called topology and three
major topologies are used
V he three are ring,bus and hub topologies which are
named after their form of arrangement in the network V he importance of stars and hubs to most professionals has less to with the technology and more to do with the communication
V he managers and professional staff became more
dependent on computer resources
V hey were realising the difficulties in passing
information from one to another and was time consuming to communicate V Advantages of Að
V Að allowed work groups to share computer based
data and to utilise computer resources (like laser printer),not in the workers desk but on the network
V It was possible to send electronic messages to
coworkers
V he ability to share costly hardware like a laser printer
proved to be a cost saving strategy V Sharing electronic messages allowed individual users to act as a group
V Benefits from group decisions became apparent to
firms
V hey started to take advantage of other network
technologies to link local groups to other local groups and then to the entire company V ¢ireless Að Ȃis the extension of ordinary Aðs which feature a wireless interface that permits inclusion of small portable terminals
V he wireless Að can be connected to a fixed Að in
which the users portion is fixed
V ¢Að consists of services provided by vendors who
offer nationwide email service and access to fixed hosts on a fee basis V Internet-is the collection of networks that can be joined together
V If you have Að in one office and Að in different
office,you can join them and that will create an internet
V Using road as the medium you can travel travel two
blocks to meet a friend which is an example of internet,however with an interconnected set of roads and plane routes you can travel virtually any where in the world V Internet is public and anyone who has a computer and access to the communication medium can travel the internet
V If an organisation is seeking new customers you can
reach a wide range of customers
V However a person using the internet may retrieve data
that that the company wished to keep in private V jrganisation can limit access to their networks to other members of their organisation by using an intranet V Intranet uses the same network protocols as the internet but limits accessibility to computer resources to a select group V he Að has no physical connection to another network,the intranet has a connection to another network but uses software ,hardware or combination of both called as a firewall V Firewall prevents communication from devices other than those authorised to use the internet V Some authorised users may be outside the boundaries of the organisation
V A supplier might need access to the computer based
records of inventory levels
V ¢hen an intranet is expanded to include users beyond
the organisation its called an extranet
V jnly trusted customers and business partners are
afforded extranet access and firewalls prevent unauthorised users V ¢orld wide web-is information space on the internet where documents are stored and retrieved by means of a unique addressing scheme
V Rather than handling only textual material, its also
possible to store and retrieve hypermedia-multimedia consisting of text, graphics, audio and video
V he worldwide web is also called web,www and ¢
V he internet provides network architecture and the web provides the method for storing and retreiving its documents
V Internet is the global communication network that
connects millions of computers
V he www is the collection of computers acting as
Internet servers that host documents formatted to allow viewing of text,graphics and audio as well as link to other documents on web www terminologies:
V ¢ebsite-refers to a computer linked to the internet
containing hypermedia that can be assessed from any other computer in the network by means of hypertext links
V Hyper text link-his refers to a pointer consisting of
text or graphic that is used to access hypertext stored at any website,this text is underlined and displayed in blue V ¢eb page-his refers to a hypermedia file stored at a website,which is identified by a unique address
V Home page-refers to the first page of a web site, other
pages at the site can be reached from the home page
V UR-Universal resource locator-his refers to the
address of a web page V A protocol is a set of standards that govern the communication of data
V http is the protocol for hypertext and the letters stand
for hypertext transport protocol
V he protocol name is followed by a colon and two
slashes
V A domain name is the address of the website where the
web page is stored V he last three letters of the domain name the typeofwebsite,edu(education),com(commercial),org(n on profit org)and gov(government)
V he domain name is followed by a single slash
V he path can identify a certain directory/subdirectory
and file at the web site
V Html or htm is the suffix for the program code that
designates hypetext files V Browser- refers to a software system that enables you to retrieve hypermedia by typing in search parameters or clicking on a graphic
V his cpability relieves you of having to know the url of
the web page that contains the information needed
V A browser is also called a search engine
V File transfer protocol(FP) refers to software that enables you to copy files on to your computer from any website
V o do this url of the website must be known
V Many FP sites offer transfer of data in one direction
only
V Firms have used internet sites off their premises that
are providing files to users such as product information,news releases eÚ