You are on page 1of 3

1

Propositionalization

Propositionalization is an approach which is proposed for multi-relational data mining which can be defined as a transformation of relational learning problems into attribute-value representation suitable for conventional data mining systems. Attributes are called as features and they form the basis for columns in single table representations of data. During propositionalization, features are constructed from relational background knowledge and structural properties of a database, results can be served as inputs for many processes of data mining. Propositionalizations can be either complete or partial. In complete propositionalization, no information is lost in the process whereas in partial, information is lost and the representation change is incomplete: the goal is to automatically generate a small relevant set of structural features. Further, general-purpose approaches to propositionalization can be distinguished from special-purpose approaches that could be domain-dependent or applicable to a limited problem class only. Relational data, especially relational databases, are a widespread means for managing data in many institutions. Data preparation for relational data mining is costly, since it has to be accomplished by experts in the domain to be analyzed and in KDD. Eective, ecient, and easy to use tools such as those for propositionalization could help signicantly. Data is very important in any organization and propositionalization is the efficient way of organizing relational data into a structured tabular representation which can further be stored in database. This database can be further used to reporting and analysis. This process is known as Data Warehousing and the database is known as Data Warehouse. The directive of data warehouse is over data storage; in this process data is cleaned, refined, transformed, organized and made available for use by higher management in the organization for data mining processes, online analytical processing, market research and decision support.

1.1

How Propositionalization related to Data Warehousing & OLAP?

As we discussed in the section above, Data warehouse is the database which stores and process data in a structural manner so that it can be used by managements in organization for research and decision support. For this

data to be organized in a structural manner, we need to follow certain approach which should be effective and efficient. Propositionalization is the approach which is used for relational data representation into a single structured tabular representation. By effective, we meant the quality of data and by efficiency we meant, time taken to give the required results. A data warehouse is a subject-oriented, integrated, time varying, non-volatile collection of data that is used primarily in organizational decision making. In an organization, the data warehouse is maintained separately operational databases. The data warehouse supports on-line analytical processing (OLAP), which is quite different from the functional and performance requirements of the on-line transaction processing (OLTP) applications which are traditionally supported by the operational databases. OLTP applications typically execute clerical data processing tasks or daily transactions such as order entry and banking transactions within an organization. These tasks are structured, repetitive, and short isolated transactions. Database consistency and recoverability are critical factors, and maximizing transaction efficiency is the key performance metric. So, the database is designed to reflect the operational capabilities of known applications, and, in particular, to reduce concurrency conflicts. Data warehouses, in contrast, targets for the area of decision support. Historical data, consolidated and summarized is more important than detailed, individual records. Since data warehouses contain consolidated data, from several operational databases, over long periods of time, has much more significant importance than operational databases; enterprise data warehouses are projected to be hundreds of gigabytes to terabytes in size. Query performance and lower response times are more important than transaction performance. Most databases implement the relational model for data storage, to use this data in propositional way, a data transformation process or a propositionalization task has to take place. Propositionalization as the process of transforming multi-relational dataset, containing structured data into a propositioned dataset with derived attribute-value features describing the structural propertied of data. The process can be defined as summarising data stored in multiple tables in a single table. The ultimate aim of this process, is to pre-process multi-relational data for subsequent analysis.

You might also like