You are on page 1of 28

Data Quality Environment

Data Discovery Installation Guide

AB INITIO SOFTWARE LLC 201 Spring St. Lexington MA 02421 Voice +1 781.301.2000 support@abinitio.com
Wells Fargo Enterprise Data Analytics : S/N: 41774
NOTICE
This document contains confidential and proprietary information of Ab Initio. Use and disclosure are
restricted by license and/or non-disclosure agreements. You may not access, read, and/or copy this
document unless you (directly or through your employer) are obligated to Ab Initio to maintain its
confidentiality and to use it only as authorized by Ab Initio. You may not copy the printed version of
this document, or transmit this document to any recipient unless the recipient is obligated to Ab Initio
to maintain its confidentiality and to use it only as authorized by Ab Initio.

Wells Fargo Enterprise Data Analytics : S/N: 41774


Data Quality Environment
VERSION 3.3.4

Data Discovery Installation Guide

February 2018 Part Number AB4573


AB INITIO SOFTWARE LLC 201 Spring St. Lexington MA 02421 Voice +1 781.301.2000 support@abinitio.com
Wells Fargo Enterprise Data Analytics : S/N: 41774
Intellectual Property Rights & Warranty Disclaimer
COPYRIGHTS
Copyright © 2015-2018 Ab Initio. All Rights Reserved.

Reproduction, adaptation, or translation without prior written permission is prohibited, except as allowed under copyright law or license from Ab Initio.

CONFIDENTIAL & PROPRIETARY

All provided documentation is confidential and a trade secret of Ab Initio. This documentation is furnished under a license and may be used only in accordance with the terms of
that license and with the inclusion of the copyright notice set forth below.

TRADEMARKS

The following are worldwide trademarks or service marks of or licensed to Ab Initio (those marked ® are registered in the U.S. Trademark Office, and may be registered in other
countries):
® ® ®
> Cooperating I>O
® ®
Ab Initio Data>Profiler Init.com
® ® ®
Ab Initio I>O Director INIT
® ® ®
Abinitio.com Dynamic Data Mart Meta Operating System
® ® ®
BRE E2E Meta>Operating System
® ® ®
Co>Operating Enterprise EME Meta OS
® ® ®
Co>Operating System EME Desktop Portal Meta>OS
® ®
Co>Operating EME Management Console Metadata Portal
® ® ®
Co>Operation EME Portal Plan>It
® ®
Co>Operative Engine by Ab Initio Query>It
® ® ®
Co>OpSys Enterprise Meta>Environment Re>Posit
® ® ®
Co>Ordinate Enterprise Metadata Environment Re>Source
® ® ®
Co>Ordinator Enterprise MetaEnvironment Server++
® ® ®
Conduct>It Express>It Server+Server
® ® ®
Continuous Flows GDE Shop for Data
® ®
Continuous>Flows Graphical Development Environment The Company Operating System
® ®
Cooperating Enterprise Graph It
® ®
Cooperating System Graph>It

Certain product, service, or company designations for companies other than Ab Initio are mentioned in this documentation for identification purposes only. Such designations are
often claimed as trademarks or service marks. In instances where Ab Initio is aware of a claim, the designation appears in initial capital or all capital letters. However, readers
should contact the appropriate companies for more complete information regarding such designations and their registration status.

RESTRICTED RIGHTS LEGEND

If any Ab Initio software or documentation is acquired by or on behalf of the United States of America, its agencies and/or instrumentalities (the “Government”), the Government
agrees that such software or documentation is provided with Restricted Rights, and is “commercial computer software” or “commercial computer software documentation.” Use,
duplication, or disclosure by the Government is subject to restrictions as set forth in the Rights in Technical Data and Computer Software provisions at DFARS 252.227-7013(c)(1)(ii)
or the Commercial Computer Software – Restricted Rights provisions at 48 CFR 52.227-19, as applicable. Manufacturer is Ab Initio Software LLC, 201 Spring Street, Lexington, MA
02421.

WARRANTY DISCLAIMER

The information in this documentation is subject to change without notice. Ab Initio makes no warranty of any kind with regard to this material, including, but not limited to, the
implied warranties of merchantability and fitness for a particular purpose. Ab Initio shall not be liable for errors contained herein or for incidental or consequential damage in
connection with the furnishing, performance, or use of this material.

Wells Fargo Enterprise Data Analytics : S/N: 41774


Contents
1. Installing Data Discovery 10
Ab Initio software requirements 11

Installation prerequisites 12

Installation overview 13

Installing the Data Discovery software 14

Upgrading from a previous version 15

2. Post-installation tasks 16
Managing Data Discovery project specifications 17

Setting up a Data Discovery private project 18


Customizing the Metadata Hub for use with Data Discovery 19

A. Parameter reference 21
Required parameters 22

Parameters required for use with the Metadata Hub 23

Optional parameters 24

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 5

Wells Fargo Enterprise Data Analytics : S/N: 41774


About this book
This book explains how to install Ab Initio’s Data Discovery.

Audience
This document is intended for technical staff who install and administer Data Discovery applications in
Express>It.

Documentation conventions
Unless otherwise noted, this documentation uses the conventions described below.

Typographic conventions in code examples and DML function syntax

Following are the typographic conventions for code examples and DML function syntax:

Convention Meaning Example

Bold text, symbols, and Literal text that must be entered exactly • CLI command:
punctuation as shown. m_env -version
• DML function syntax (literal text
shaded):

Italic text Arguments or variables that must be • CLI command:


replaced with valid values or expressions. ab-key add pathname
• DML function syntax (replaceable text
shaded):

Non-bold, non-italic text (in The data type of the return value of a DML function syntax (data types shaded):
DML function syntax only) function, and the data types of function
arguments.

Symbol conventions in syntax descriptions

Following are the conventions for non-bold symbols used in syntax descriptions, such as descriptions of
commands and functions:

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 6

Wells Fargo Enterprise Data Analytics : S/N: 41774


Convention Meaning

= (DML function syntax only) An equal sign indicates, and is followed by, an argument’s default
value.

⇒ An arrow indicates the result of a computation.

... An ellipsis indicates that the preceding item can be repeated one or more times.

{ } Curly braces group the enclosed items.

[ ] Square brackets group the enclosed items and indicate that the group is optional.

| A vertical bar separates alternatives.

These conventions are illustrated in the following examples:

Example Meaning

The default value of the method argument is 0.

2 + 2 ⇒ 4 2 plus 2 equals 4

a b ... a followed by at least one b

{ a b } ... One or more instances of a b

[ a b ] ... Zero or more instances of a b

a[,b[,c]] a or a, b or a, b, c

a|bc|de a or b c or d e

[a|b] a or b or nothing

{a|bc}d a d or b c d

[a|bc]d a d or b c d or d

DML core function syntax conventions

The following annotated example illustrates the typographic and symbol conventions used in the syntax
description of a DML core function (the function signature):

1. The return type of the function

2. The function name (literal text)

3. An opening parenthesis (literal text)

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 7

Wells Fargo Enterprise Data Analytics : S/N: 41774


4. An opening curly brace (the start of a group)

5. The argument’s data type

6. The name of the argument (text that must be replaced by a valid value or expression)

7. A vertical bar indicating alternative arguments

8. Between arguments, a comma separator (literal text), followed by a space

9. An equal sign followed by the argument’s default value

10. A closing parenthesis (literal text)

Byte conventions

All -byte terms refer to powers of 2 rather than powers of 10.

Term Abbreviation Number of bytes

kilobyte kB 1,024 (2 to the 10th power)

megabyte MB 1,048,576 (2 to the 20th power)

gigabyte GB 1,073,741,824 (2 to the 30th power)

terabyte TB 1,099,511,627,776 (2 to the 40th power)

Conventions for graphical representations of data

Unless otherwise specified, numeric byte values are in base 10 (decimal).

The following is an example of the graphical representation of data:

’J’ ’o’ ’h’ ’n’ ’4’ ’2’ ’ ’ ’ ’ ’ ’

The example represents a block of nine bytes. The bytes contain, in order, the native codes for the characters
J, o, h, n, 4, 2, and three spaces. Note the following:
• Each cell represents a single byte of data.

• Successive bytes run left to right.

The “first” byte in memory — the one with the lowest address — is leftmost.

• Characters are shown in single quotes.

Unless otherwise specified, characters are in the native character set of the computer running the Ab
Initio software.

Getting assistance
Product documentation is available in online help and, for most books, as PDFs. You can also find the
documentation, reusable solutions, and user discussions through the Ab Initio online discussion browser.
(To install the browser, see Ab Initio Help.) To report documentation issues, please send email to
documentation@abinitio.com.

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 8

Wells Fargo Enterprise Data Analytics : S/N: 41774


To contact Ab Initio Support, send email to support@abinitio.com or call +1 781-301-2100.

When reporting a problem, include the following:


• The Co>Operating System version and, if applicable, other Ab Initio product software versions

• The platform (operating system and version) your Co>Operating System is running on

• The complete error message (if any)

• A description of what you were doing when the error message (if any) appeared

• For database issues:


• The type and version of the database (for example, DB2 EEE version 7.2)

• The platform the database server is on

• The JDBC driver version (if applicable)

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 9

Wells Fargo Enterprise Data Analytics : S/N: 41774


1
Installing Data Discovery
This chapter describes how to install Ab Initio’s Data Discovery. It covers the following topics:

• Ab Initio software requirements


• Installation prerequisites
• Installation overview
• Installing the Data Discovery software
• Upgrading from a previous version

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 10

Wells Fargo Enterprise Data Analytics : S/N: 41774


Ab Initio software requirements
Data Discovery requires the following Ab Initio software:
• Co>Operating System Version 3.2 or later, with a software activation key that enables data profiling and
Conduct>It.
NOTE: Hadoop and Hive support require Co>Operating System Version 3.3.2.5 or later.

For information on installing the Co>Operating System, see the Server Software Installation and
Administration Guides.

• Express>It Version 3.2.2 or later.

For information on installing Express>It, see the Express>It Installation and Administration Guide. If you
plan to use the Metadata Hub with Data Discovery, Express>It should be installed with the Metadata
Hub integrated.
• (Optional) Metadata Hub Version 3.2.2 or later.

For information on installing the Metadata Hub, see the Metadata Hub Installation Guide.

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 11

Wells Fargo Enterprise Data Analytics : S/N: 41774


Installation prerequisites
Following are the prerequisites for installing Data Discovery:
• Verify that Express>It has been properly installed and is running and operating as expected.

• Ensure that the Ab Initio Environment has been installed. For more information, see the following topics
in the Ab Initio Environment Guide and Reference:
• About the Ab Initio Environment

• Installing and configuring the Ab Initio Environment

• If you plan on using Data Discovery with the Metadata Hub, ensure that the following configuration
variables are set:
• AB_MHUB_HOME — The directory in which the Metadata Hub administration and import tools are
installed.
• AB_MHUB_DEPLOYMENT_DIR — The Metadata Hub deployment directory.

• Ensure that the user who is installing Data Discovery has write permission to the directory specified by
the AB_APPCONF_ROOT_DIR configuration variable. Write permission is required in order for the icons
for the Data Discovery user interface to be written to the $AB_APPCONF_ROOT_DIR/images/datad directory.

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 12

Wells Fargo Enterprise Data Analytics : S/N: 41774


Installation overview
This section provides an overview of the tasks you must complete to install or upgrade Data Discovery.
1. Install the required Ab Initio software. See “Ab Initio software requirements”.

2. Set up your environment, as described in “Installation prerequisites”.

3. Do one of the following:


• If this is a new installation, install Data Discovery as described in “Installing the Data Discovery
software”

• If you are upgrading to Data Discovery from a previous version, follow the instructions in “Upgrading
from a previous version”.

4. Configure Data Discovery as described in “Post-installation tasks”.

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 13

Wells Fargo Enterprise Data Analytics : S/N: 41774


Installing the Data Discovery software
This section explains how to install the Data Discovery software.

If you are upgrading to the current version of Data Discovery from a previous version, see “Upgrading from
a previous version”.

► To install Data Discovery:


1. Unpack the Data Discovery installation files.

2. In the directory where the installation files are located, run the dd_install.ksh script.
NOTE: You must have Technical Repository administrator privileges to run this script.

Follow the prompts to install the software.

When you have finished installing Data Discovery, continue with “Post-installation tasks”.

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 14

Wells Fargo Enterprise Data Analytics : S/N: 41774


Upgrading from a previous version
This section explains how to upgrade to the current version of Data Discovery from a previous version.

► To perform an upgrade of Data Discovery:


1. Log in to the Co>Operating System host as the owner of the Ab Initio bridge.

2. Set the AB_AIR_ROOT, AB_APPCONF_ROOT_DIR, and AB_AIR_BRANCH configuration variables.

3. Log in to Express>It in order to refresh the contents of all private project sandboxes containing Data
Discovery configurations.

4. Back up the psets of all private projects containing Data Discovery configurations.

5. Check in all Data Discovery configurations in all projects to the technical repository.

6. Install the new version of the Data Discovery software by running the dd_install.ksh script in the
directory where Data Discovery was previously installed.

7. Refresh all Data Discovery configurations by running the ac-appconf refresh utility as follows:
ac-appconf -i app-id refresh path-to-config-name.appconf [--import-changes]

Where:
• app-id is the application identifier that tells the utility where to perform the command.

• path-to-config-name.appconf is the filesystem path of the existing .appconf file to be refreshed. If


you want to refresh multiple Data Discovery dataset configurations using a single command, supply
the path of each, using a space as the delimiter. Wildcards are also supported.

For more information, see “ac-appconf refresh” in the Express>It Installation and Administration Guide.

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 15

Wells Fargo Enterprise Data Analytics : S/N: 41774


2
Post-installation tasks
This chapter describes post-installation tasks that you must perform to configure Data Discovery. It contains
the following topics:

• Managing Data Discovery project specifications


• Setting up a Data Discovery private project
• Customizing the Metadata Hub for use with Data Discovery

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 16

Wells Fargo Enterprise Data Analytics : S/N: 41774


Managing Data Discovery project specifications
Express>It provides a project specification file in which you designate projects that will be checked out into
each user's private sandboxes from the technical repository. Specify the following built-in Data Discovery
projects in the Express>It .projects.xml file:
• The datad public project — Includes application templates, graphs, plans, and other artifacts that are
required for Data Discovery. This project must be included in every private project in which data profiling
will be run.

• The expressit_common public project — Included in the datad project. The expressit_common project
includes compound control templates, dynamic subgraphs, and other artifacts that are required for Data
Discovery source data.

• The data_discovery private project — Includes useful examples, such as examples demonstrating data
profiling and functional dependency calculation. This project is a good starting place for users who are
new to Data Discovery.

The datad and expressit_common public projects are typically checked out as global projects. For more
information, see “Checking out common projects to the global directory” in the Express>It Installation and
Administration Guide.

For more information about the .projects.xml file, see “Managing project specifications” in the Express>It
Installation and Administration Guide.

Once you have finished adding the Data Discovery projects to the project specification file, continue with
“Setting up a Data Discovery private project”.

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 17

Wells Fargo Enterprise Data Analytics : S/N: 41774


Setting up a Data Discovery private project
This section describes how to create and configure a Data Discovery private project.

► To create and configure a Data Discovery private project:


1. Run the create-project command to create the private project:
create-project -rel-loc relative_pathname_in_repository -type private -checkin Y

For more information about this command, see “create-project” in the Co>Operating System Graph
Developer’s Guide.

2. Navigate to the project directory that you just created.

3. Lock the project's pset by running the air sandbox lock command:
air sandbox lock -parameters -set

For more information about this command, see “air sandbox lock” in the Technical Repository Command
Reference.

4. Include the global datad project in the private project by running the air sandbox parameter command:
air sandbox parameter -basedir . datad -common datad-project-sandbox-path

For more information about this command, see “air sandbox parameter” in the Technical Repository
Command Reference.

5. Create the required data directories for the new project by running the project-directories command:
project-directories -create

For more information about this command, see “project-directories” in the Co>Operating System Ab
Initio Environment Guide and Reference.

6. In the directory to which you extracted the installation package, run the dd_setup.ksh script.

This script prompts you to provide parameter values for the project you just created. For more
information about these parameters, see “Parameter reference”.

7. Check the private project back in to the technical repository.

8. Add the private project to the Express>It .projects.xml project specification file.

For more information about adding projects to this file, see “Managing project specifications” in the
Express>It Installation and Administration Guide.

If you will be using Data Discovery with the Metadata Hub, continue with “Customizing the Metadata Hub
for use with Data Discovery”.

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 18

Wells Fargo Enterprise Data Analytics : S/N: 41774


Customizing the Metadata Hub for use with Data
Discovery
This section describes the customization steps that you must perform before you can use Data Discovery
with the Metadata Hub. You perform this procedure on the host on which the Metadata Hub is installed.

► To customize the Metadata Hub for use with Data Discovery:


1. Stop the Metadata Hub application server.

2. Navigate to the datad project sandbox, and then navigate to the mhub/customizations/load
subdirectory.

3. Load the required extensions into the Metadata Hub datastore.


a. Load the schema extensions from the 00.EntitySchemaExtensions.xml file into the Metadata
Hub datastore:
mh-admin datastore Metadata-Hub-datastore-name extend-object-model
-extensions-file mhub/customizations/load/00.EntitySchemaExtensions.xml
b. Unzip the 02.EntityViewCustomizations.zip and 03.DataSetViews.zip files, which contain view
customization extension sets:
unzip mhub/customizations/load/02.EntityViewCustomizations.zip
unzip mhub/customizations/load/03.DD_DataSetViews.zip
c. Load the view customization extensions into the Metadata Hub datastore:
mh-admin datastore Metadata-Hub-datastore-name extension-set -load -input
mhub/customizations/load/02.EntityViewCustomizations
mh-admin datastore Metadata-Hub-datastore-name extension-set -load -input
mhub/customizations/load/03.DD_DataSetViews

For more information, see “Loading extension sets into a Metadata Hub datastore” in the Metadata
Hub Customization Guide.

4. Start the Metadata Hub application server.

5. Refresh the Metadata Hub datastore import model:


mh-import model refresh
6. Save the Data_Discovery_DataSet_Import.rule import rule to the datad project sandbox directory:
mh-import rule save . /mhub/ruledef/Data_Discovery_DataSet_Import.rule

For more information about running this command, see “mh-import rule save” in the Metadata Hub
Import Command Reference.

7. Add the following lines to the Metadata Hub import.profile file:


• To cleanly shut down the micrograph service:
export AB_MHUB_MICROGRAPH_SHUTDOWN=CLEAN
• To tell the Metadata Hub importer how MSLI files should be created:
export AB_MHUB_XFR_SPLITS_INPUT=ALWAYS
• To start Metadata Hub imports properly from within an Express>It job:
unset AB_JOB_PREFIX

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 19

Wells Fargo Enterprise Data Analytics : S/N: 41774


For more information about this file, see “About the import.profile file” in the Metadata Hub Installation
Guide.

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 20

Wells Fargo Enterprise Data Analytics : S/N: 41774


A
Parameter reference
This appendix describes private project override parameters whose values you are prompted to specify when
you run the dd_setup.ksh setup script. It contains the following topics:
• Required parameters

• Parameters required for use with the Metadata Hub

• Optional parameters

For more information about parameters, see “Parameters” in the Co>Operating System Graph Developer's
Guide.

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 21

Wells Fargo Enterprise Data Analytics : S/N: 41774


Required parameters
You must specify a value for the following parameters when you are prompted to do so by the Data Discovery
setup script:

Name Override value

PRIVATE_DB $AI_DB

PRIVATE_DML $AI_DML

PRIVATE_MP $AI_MP

PRIVATE_PSET $AI_PSET

PRIVATE_RUN $AI_RUN

PRIVATE_XFR $AI_XFR

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 22

Wells Fargo Enterprise Data Analytics : S/N: 41774


Parameters required for use with the Metadata Hub
If you want to use the Metadata Hub with Data Discovery, you must specify a value for the following
parameters when you are prompted to do so by the Data Discovery setup script:

Name Override value

AI_DATAD_MHUB_APPLICATION The Metadata Hub application to which Data Discovery datasets are to be
assigned. This is typically the name of the specific private project.

Set the default to the following PDL (Parameter Definition Language)


expression:

$[ string_substring(PROJECT_DIR,string_rindex(PROJECT_DIR,"/")+1,

length_of(PROJECT_DIR)) ]

For more information about PDL, see “Parameter Definition Language” in


the Co>Operating System Parameter Reference.

AI_DATAD_MHUB_DEPLOYMENT_DIR The location of the Metadata Hub deployment directory. The recommended
setting is the value of the AB_MHUB_LOCAL_DIR configuration variable.

AB_MHUB_HOME The location of the directory in which the Metadata Hub administration
and import tools are installed.

AI_DATAD_MHUB_INSTALLED A boolean specifying whether Data Discovery is used with the Metadata
Hub.

Default: 0 (no Metadata Hub)

AI_DATAD_MHUB_SYSTEM The Metadata Hub system to which Data Discovery datasets are to be
assigned.

Default: Data Discovery

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 23

Wells Fargo Enterprise Data Analytics : S/N: 41774


Optional parameters
You may specify a value for the following parameters when you are prompted to do so by the Data Discovery
setup script:

Name Override value

AI_DATAD_DEFAULT_EMAIL_SUFFIX The default email suffix; for example,


@your-company-name.com.

AI_DATAD_DML_BROWSE_ROOT_DIRECTORY The root directory in which you can browse for


record format (.dml) files in the datad public
project. If required, this parameter can be set to
the top-level data directory location, to which
access will automatically be limited.

Default: $PRIVATE_DML

AI_DATAD_DML_BROWSE_START_DIRECTORY The start directory under the specified root


directory in which you can browse for record
format (.dml) files. Defined in conjunction with
the
AI_DATAD_DML_BROWSE_ROOT_DIRECTORY
parameter.

Default: $PRIVATE_DML

AI_DATAD_DO_NOT_SAVE_VALUE_CENSUS Whether the value census file is to be saved to


disk.

By default, Data Discovery saves the data profile


value census file to disk. If data discovery tasks
in your private project will be limited to data
profiling and the calculation of functional
dependencies, set this parameter to 1 to prevent
the value census from being saved.
NOTE: If you do not save the value census to disk,
the discovery of cross-field dependencies
will not be enabled in the private project.

Default: 1 (value census is not saved)

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 24

Wells Fargo Enterprise Data Analytics : S/N: 41774


Name Override value

AI_DATAD_EMULATE_DP_EMPTY_BLANK_STRING_HANDLING A boolean that determines the treatment of


empty and blank strings in the computation of
cross-field relationships:
• 1 causes Data Discovery to treat these strings
as valid values.

• 0 causes Data Discovery to ignore these strings


when computing cross-field relationships.

Default: 0 (strings are ignored)

AI_DATAD_ENABLE_EMAIL The option for an email notification to be sent


when a Data Discovery configuration finishes
running.

Default: 0 (no notification)

AI_DATAD_ENABLE_FD A boolean to enable or disable the calculation of


functional dependencies.

Default: 1 (calculation is enabled)

AI_DATAD_HADOOP_HOST_DEPTH The level of parallelism on each data node; the


value is used for fixed or dynamic layouts of
Hadoop data. For example, if you specify a host
depth of 4, the software runs four ways parallel
on each node. The minimum value is 1.
NOTE: The host depth value specifies the number
of ways parallel in the Hadoop filesystem;
it is not an Ab Initio multifile system
parameter.

If you change the value of this parameter, any


configurations that have Hadoop input data
sources must be opened and saved so that they
use the parameter's new value.

AI_DATAD_HADOOP_HOST_LIST A comma-separated list of hosts in the Hadoop


cluster on which Data Discovery configurations
are to be run.

AI_DATAD_HIVE_DEFAULT_DB The default Hive database to be used when a new


Hive Data Discovery dataset is created.

AI_DATAD_HIVE_LAYOUT The host on which Data Discovery is to perform


its initial processing. This computer must have
access to Hadoop and Hive. Typically, it is an edge
node.

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 25

Wells Fargo Enterprise Data Analytics : S/N: 41774


Name Override value

AI_DATAD_PRIMARY_KEY_CONTROL_ALLOWED A boolean to enable primary key computation


options for datasets and dataset groups:
• 0 (default) — A primary key is automatically
computed when datasets and dataset groups
are profiled (the default behavior in previous
releases of Data Discovery).

• 1 — Primary key computation during profiling


for datasets and dataset groups is optional.
With a setting of 1, Data Discovery users will
be able to turn off or on primary key
computation for a dataset, for all datasets in
a dataset group, or for particular datasets in a
dataset group.

AI_DATAD_FILE_BROWSE_ROOT_DIRECTORY The root directory in which you can browse for


files in the datad public project. If required, this
parameter can be set to the top-level data
directory location, to which access is limited.

Default: $AI_SERIAL

AI_DATAD_FILE_BROWSE_START_DIRECTORY The start directory under the defined root


directory in which you can browse for data files
in the datad public project. Defined in conjunction
with the
AI_DATAD_FILE_BROWSE_ROOT_DIRECTORY
parameter.

Default: $AI_SERIAL

AI_DATAD_FROM_EMAIL_ADDRESS The "from" address for Data Discovery emails.


This parameter is typically set to a specific email
address; for example, janedoe@acompany.com.

AI_DATAD_FROM_EMAIL_ADDRESS_READABLE_NAME The project’s readable "from" address for Data


Discovery emails; for example, no-reply.

Default: Data Discovery

AI_DATAD_OBEY_DS_LOCKS A boolean that determines the behavior of dataset


locks that prevent multiple users from
simultaneously accessing the same Data
Discovery configuration.

Default: 1 (locks are enabled)

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 26

Wells Fargo Enterprise Data Analytics : S/N: 41774


Name Override value

AI_DATAD_PARALLEL_WORK_DIR The parallel working directory used in the


calculation of functional dependencies. After
functional dependencies are calculated, the
contents of this directory are deleted.

Default: $AI_MFS_TEMP

AI_DATAD_PROFILE_COMMON_VALUES The maximum number of common values and


patterns to be computed for a data profile. This
parameter is typically overridden in a Data
Discovery Dataset configuration. It can be set to
a value greater than or equal to 5 and less than
or equal to 1000.

Default: 10

AI_DATAD_SERIAL_PROFILE_RESULT_DIR The directory to which data profiles are to be


written.

Default: $AI_SERIAL

AI_DATAD_SERIAL_WORK_DIR The serial working directory used in the


calculation of functional dependencies. After
functional dependencies are calculated, the
contents of this directory are deleted.

Default: $AI_SERIAL_TEMP

AI_DATAD_SHOW_ADDL_PROFILE_ATTR A boolean to enable or disable the profiling


options Common Values, Deciles, and
Histograms.

Default: 1 (options are enabled)

AI_DATAD_SMTP_SERVER_PORT The default SMTP port number used for sending


email from a Data Discovery application.

Default: 25

AI_DATAD_SMTP_SERVER_HOST The name or IP address of the email server host.

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 27

Wells Fargo Enterprise Data Analytics : S/N: 41774


Name Override value

AI_DATAD_USE_USER_NAME A boolean to specify whether Data Discovery is


to add a subdirectory to both browsable root
directories (that is, the value of the
AI_DATAD_FILE_BROWSE_ROOT_DIRECTORY
and
AI_DATAD_DML_BROWSE_ROOT_DIRECTORY
parameters) using the current user login name as
the name of the new subdirectory.

For example, if this parameter is set to 1, if the


browsable root data directory is $AI_SERIAL/data,
and if the user login name is jdoe, the browsable
data root directory for user jdoe is set to
$AI_SERIAL/data/jdoe.

Default: 0 (no new subdirectory)

AI_DATAD_VALUE_CENSUS_RESULT_DIR The directory to which the data profile value


census is to be written.

Default: $AI_MFS

DATAD_JAVA_HOME The Java home directory. The recommended


setting is $JAVA_HOME.

AB INITIO CONFIDENTIAL AND PROPRIETARY — DO NOT COPY 28

Wells Fargo Enterprise Data Analytics : S/N: 41774