You are on page 1of 8

LANDMARK TECHNICAL PAPER 1

LANDMARK TECHNICAL PAPER

Meeting Increasing Demands for Metadata

Presenter: Janet Hicks, Senior Manager, Strategy and Business Management,


Information Management
Presented at Petroleum Network Education Conference (PNEC) 2009
LANDMARK TECHNICAL PAPER 1

Users are Increasing Their Demand for Metadata. Can the Industry
Meet Their Expectations?
Presenter: Janet Hicks, Senior Manager, Strategy and Business Management,
Information Management

What is Metadata?
One of the most common applications of metadata is to provide additional contextual information to unstructured
documents or files. A library card catalog is a classic example: the catalog entry provides the metadata for the book
or item it references. Digital document management systems such as Documentum and FileNet are focused on
managing unstructured information along with this type of metadata. In both cases, metadata is used to speed up and
enrich searching of such data, which is often more difficult to do than data stored in a database.

What role can metadata play for a structured data solution?


In the case of document indexing, the distinction between data and metadata is clear. For structured database data, the
distinction between data and metadata is more complex. Many of the items in typical Exploration and Production (E&P)
data models found in OpenWorks, Finder, and other data repositories could be classified as metadata. True measured
data is largely well logs and seismic data. A majority of related information in the data model adds context to this
information. Most of what we capture about interpretation data beyond geometry, topology, and property data could
be considered metadata. Historically, the industry has captured common types of metadata for use by applications,
including the creator of the data, date stamp, type of data, source, etc.

Are we capturing all the metadata that users need to efficiently and effectively do their work?
Should we be extending the way data managers think of metadata to incorporate more of what end users actually
need? And finally, are the tools in place to be able to effectively carry metadata throughout a users workflow?

Metadata Standards
There are several organizations that have made significant progress on defining standard metadata definitions. A few
key initiatives are discussed below.

Dublin Core Metadata Initiative


The Dublin Core Metadata Initiative (DCMI) provides an open forum for online metadata standards that supports a
broad audience. Originating in Dublin, Ohio, as part of a workshop sponsored by the Online Computer Library Center
(OCLC), this initiative provides a basic, but extensible metadata element set. DCMI continues the initiative through
working groups, global conferences and workshops, standards liaisons, and educational efforts to promote widespread
acceptance of metadata standards and practices. The stated vision of DCMI is to provide simple standards to
facilitate the finding, sharing, and management of information. While DCMI provides a good framework for classifying
and handling metadata, further work needs to be done to make it applicable to our vertical industry.
LANDMARK TECHNICAL PAPER 2

International Organization for Standards


The International Organization for Standards (ISO) has addressed metadata in a number of its standards, including
schemas for describing geographic information, XML metadata exchange formats, and specifications for the structure
of a metadata registry with the basic attributes which are required to describe metadata items. Of these efforts, the
standard defining the schema required for describing geographic information and services seems to have the widest
interest in the E&P sector.

US Geological Survey
The US Geologic Survey (USGS) hosts three committees focusing on geospatial metadata standards: the Federal
Geographic Data Committee (FGDC), the Geospatial One-Stop (GOS), and the National Map. The FGDC focuses on
defining standards and policies for geospatial metadata. According to US Executive Order 12906, all Federal agencies
must use this standard to document geospatial data created as of January, 1995. In addition, it has been implemented
beyond the federal level, with state and local governments adopting the metadata standard, as well. This standard is
supported in ESRI tools commonly used in E&P.

Metadata in E&P
Metadata is information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage
information. It adds understanding and context to data. While interest in preserving and standardizing metadata has
traditionally been a concern of data managers, there is a growing awareness of metadata and its importance to end
users as well.

So does this mean that users are now asking for their companies to invest in better metadata management? What the
users is need is a set of best practices associated with good data management.

Some of the common problems that end users face are:


easily finding the data they need,
understanding the data provenance, and
capturing data lineage and the processes that were used to produce the data.

We know that users spend too much time locating relevant data. Studies have been produced for the last 20 years
that show locating data can consume up to 80 percent of users time. Even though companies continuously work to
improve the users ability to find data they need, data types and volumes continue to grow exponentially along with
the applications where the data originates and the number of databases where the data is stored. Metadata that
links data types together based on relationships appropriate to E&P is needed to allow better searching and retrieval
of complete data records. For instance, if a user locates a well that they need for their interpretation, they should be
able to easily find all the related data like well logs, well tests, picks, etc. In addition, metadata that identifies data
by coverage or geographysuch as country, basin, region, etc.would provide users with additional ways of searching
and locating data. This type of data relation is inherent in relational databases, but less prevalent in unstructured data
like documents. In addition, this data is often spread among multiple databases, making it difficult to connect the data
together. Common viewers that can logically aggregate data for users have partially solved this problem for many
companies. However, due to the number of types and locations of data, end users find it difficult to know if the data
they find is the most complete or current available.
LANDMARK TECHNICAL PAPER 3

A major concern that users have is understanding data provenance. Users are hesitant to reuse interpretations created
by others because they dont know the origin and relationships between the objects (for example, horizons derived
by other horizons, fault polygons made from a horizon, etc). Consequently, they have to retrieve data in a piecemeal
fashion, and are not confident they have found all the relevant and most current data. This search is made more difficult
in environments that have multiple application and project databases, but lack corporate databases that distill the
best data the company has. Even in the case where a consolidated corporate database is present with quality checked
data available to the user, they may decide not to use the data without essential information related to objects, such
as the interpreter that created it, when it was created, and the status of the object (initial vs. final). Often the stated
ownership of the data goes beyond an individual interpreter, and may be additionally characterized by the role (data
loader, data QC) and location of the creating group.

Another problem that users have is tracking the data evolution as it is created in their technical workflows. This
involves documenting the input and output of processes, as well as the process parameters that were used to generate
the resultant data. For instance, if a user created an amplitude extraction from a seismic volume and interpreted
horizon, information about each of the input files, as well as the parameters used to generate the amplitude horizon
should be stored with the resultant data. If at a later date a new reprocessed seismic volume was produced, the user
might wish to re-run the initial process using the new data. While the rerun of the process might happen in a workflow
manager application, without the metadata about the process, this would not be possible and the work would have to
be regenerated from scratch. An example of a data model that incorporates this type of metadata is the R5000 version
of the OpenWorks model, which includes some significant metadata areas, including seismic processing history and
mapping data parameters. Another example of metadata derived during the analysis and interpretation process is the
capture of the geological environment that relates to the data. Metadata of this type would include corporate standard
geological categories, such as source rock, depositional environment, etc.

At decision gate and milestones, it is also important to capture the results of decisions made. Documenting additional
information about the decision that may not be inherent in the data itself is important to ensure the data is reused in
future work. Capturing a milestone may entail renaming data to align with corporate standards, and will likely also
include associating final reports and documents to the data being captured. An advantage of capturing the metadata at
a decision gate or milestone is having a full record of decisions for regulatory compliance.

Metadata that is captured automatically by applications is preferable to relying on users to populate the fields
manually. While manual population cannot be avoided in some cases, automatic capture not only ensures that the
metadata is recorded, it also allows for corporate standards and business rules to be applied.

Metadata in Multivendor Environments


Capturing metadata that users require is relatively easy to do within the context of a single vendors application.
Since most vendor applications share common databases, storing metadata captured during a workflow is fairly
straightforward. The real challenge comes with sharing metadata of this nature across several vendors applications.
Some of the companies who should be interested in sharing data include organizations that transmit data using
structures such as XML or Web services, as well as organizations who would like to break down silos of information
LANDMARK TECHNICAL PAPER 4

captured within applications or proprietary file formats. Inconsistency in the way different applications use and store
metadata makes the mapping of this data between different vendors daunting. If this problem is to be solved, what
must we do as an industry?

There are some focused standardization efforts for the petroleum industry. Many of these initiatives are within
government agencies who dictate the way data is delivered to them. Groups like the USGS, British Geological
Survey, the Government in Australia, and others require certain metadata types and formats for data they collect. In
addition, we are seeing public consortiums like the Professional Petroleum Data Management Association (PPDM)
and Energistics tacking metadata through their work with their members. The PPDM Association is in the process of
creating recommendations for using metadata standards in conjunction with a PPDM database. The intention is to
create best-practice recommendations on metadata standards and how they should be used.

A potential source of metadata that can be used cross-vendor is the expansion of metadata coverage in common
data formats like SEG-Y, WITSML, DLIS, LAS, etc. Since these formats are commonly understood by multiple
vendors applications, carrying more metadata in these formats would be a viable way of passing metadata between
applications. Little metadata is carried in these formats beyond the information required to load and use the data.
Typical data included in their headers today describes the acquisition parameters, when the data was acquired, units
of measure, etc. While applications should make every effort to capture the existing metadata during the load process,
opportunities for expanding the metadata coverage should be examined.

Another possible solution is the development of an industry-wide metadata registry. A metadata registry is a central
location where metadata definitions are stored and maintained. Metadata registries are used whenever data must
be used consistently within an organization or group of organizations. Most of the metadata standards organizations
mentioned at the beginning of this paper have developed metadata registries, along with their corresponding terms
and vocabularies.

Conclusion
Users continue the struggle to find and effectively leverage the right data for their workflows. This results in more time
spent finding all the data they need, duplicating work that has already been done due to lack of confidence in data they
find, and the loss of resulting data that is not effectively captured at the end of an analysis. Metadata that documents
the data provenance, as well as captures data lineage and the processes that were used to produce the data, is
becoming more important. Metadata can be captured and effectively managed within a single vendors applications.
The real challenge is managing metadata consistently in a workflow that uses proprietary applications and those from
multiple vendors. In order to meet this challenge, the E&P industry must develop and publish standards that can be
leveraged by petroleum and service companies.
www.landmarksoftware.com

2013 Halliburton. All rights reserved. Sales of Halliburton products and services will be in accord solely with the terms and conditions contained in the
contract between Halliburton and the customer that is applicable to the sale.
H010326 2013

You might also like