Professional Documents
Culture Documents
Document Number: D20484 Rev. 1 Software Release: 4.5.2 Revised: January 30, 2009
Netezza Corporation Corporate Headquarters 26 Forest St., Marlborough, Massachusetts 01752 tel 508.382.8200 fax 508.382.8300 www.netezza.com
The specifications and information regarding the products described in this manual are subject to change without notice. All statements, information, and recommendations in this manual are believed to be accurate. Netezza makes no representations or warranties of any kind, express or implied, including, without limitation, those of merchantability, fitness for a particular purpose, and noninfringement, regarding this manual or the products' use or performance. In no event will Netezza be liable for indirect, incidental, consequential, special, or economic damages (including lost business profits, business interruption, loss or damage of data, and the like) arising out of the use or inability to use this manual or the products, regardless of the form of action, whether in contract, tort (including negligence), breach of warranty, or otherwise, even if Netezza has been advised of the possibility of such damages. Copyright 2005-2009 Intelligent Integration Systems, Inc. Portions of this publication were derived from PostgreSQL documentation. For those portions of the documentation that were derived originally from PostgreSQL documentation, and only for those portions, the following applies: PostgreSQL is copyright 1996-2001 by the PostgreSQL global development group and is distributed under the terms of the license of the university of california below. Postgres95 is copyright 1994-5 by the Regents of the University of California. Permission to use, copy, modify, and distribute this documentation for any purpose, without fee, and without a written agreement is hereby granted, provided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies. In no event shall the University of California be liable to any party for direct, indirect, special, incidental, or consequential damages, including lost profits, arising out of the use of this documentation, even if the University of California has been advised of the possibility of such damage. The University of California specifically disclaims any warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. The documentation provided hereunder is on an "as-is" basis, and the University of California has no obligations to provide maintenance, support, updates, enhancements, or modifications. Netezza, the Netezza logo, NPS, Snippet, Snippet Processing Unit, SPU, Snippet Processing Array, SPA, Performance Server, Netezza Performance Server, Asymmetric Massively Parallel Processing, AMPP, Intelligent Query Streaming, SQL-Blast and other marks are trademarks or registered trademarks of Netezza Corporation in the United States and/or other countries. All rights reserved. The Netezza implementation of the ODBC driver is an adaptation of an open source driver, Copyright 2000, 2001, Great Bridge LLC. The source code for this driver and the object code of any Netezza software that links with it are available upon request to source-request@netezza.com. Red Hat is a trademark or registered trademark of Red Hat, Inc. in the United States and/or other countries. Linux is a trademark or registered trademark of Linus Torvalds in the United States and/or other countries. D-CC, D-C++, Diab+, FastJ, pSOS+, SingleStep, Tornado, VxWorks, Wind River, and the Wind River logo are trademarks, registered trademarks, or service marks of Wind River Systems, Inc. Tornado patent pending. APC and the APC logo are trademarks or registered trademarks of American Power Conversion Corporation. All document files and software of the above named third-party suppliers are provided "as is" and may contain deficiencies. Netezza and its suppliers disclaim all warranties of any kind, express or implied, including, without limitation, those of merchantability, fitness for a particular purpose, and noninfringement. In no event will Netezza or its suppliers be liable for indirect, incidental, consequential, special, or economic damages (including lost business profits, business interruption, loss or damage of data, and the like), or the use or inability to use the above-named third-party products, even if Netezza or its suppliers have been advised of the possibility of such damages. All other trademarks mentioned in this document are the property of their respective owners. Document Number: 20484 Software Release Number: 4.5.2 NPS SQL Extensions Toolkit Users Guide Copyright 2009 Netezza Corporation. All rights reserved. Regulatory Notices Install the NPS 8000 Series in a restricted-access location. Ensure that only those trained to operate or service the equipment have physical access to it. Install each AC power outlet near the NPS rack that plugs into it, and keep it freely accessible. You must provide all disconnect devices and over-current protection devices. Product may be powered by redundant power sources. Disconnect ALL power sources before servicing. FCC Statement This equipment has been tested and found to comply with the limits for a Class A digital device, pursuant to part 15 of the FCC rules. These limits are designed to provide reasonable protection against harmful interference when the equipment is operated in a commercial environment. This equipment generates, uses, and can radiate radio-frequency energy and, if not installed and used in accordance with the instruction manual, may cause harmful interference to radio communications. Operation of this equipment in a residential area is likely to cause harmful interference, in which case users will be required to correct the interference at their own expense. CSA Statement This Class A digital apparatus meets all requirements of the Canadian Interference-Causing Equipment Regulations (ICES-003). Cet appareil numrique de la classe A est conforme la norme NMB-003 du Canada. CE Statement (Europe) This product complies with the European Low Voltage Directive 73/23/EEC and EMC Directive 89/336/EEC as amended by European Directive 93/68/EEC/. Warning: This is a class A product. In a domestic environment this product may cause radio interference in which case the user may be required to take adequate measures.
Contents
Preface 1 Installation and Setup
Licensing Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 NPS Administration Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 NPS System Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 Installing the Netezza SQL Extensions Toolkit . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 Enabling SQL Functions Support in a Database . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 User Account Permissions and Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4 Displaying the SQL Extensions Toolkit Version . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4 Upgrading the SQL Extensions Toolkit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4 Disabling the SQL Extensions Toolkit in a Database . . . . . . . . . . . . . . . . . . . . . . 1-4 Removing the SQL Extensions Toolkit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5 Using Different Versions of the SQL Extensions Toolkit . . . . . . . . . . . . . . . . . . . . 1-5 Best Practices for Upgrading NPS Systems with the SQL Extensions Toolkit . . . . . 1-5 Best Practices for Backups and Restores of the NPS Data . . . . . . . . . . . . . . . . . . 1-6 Known Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6
2 XML Data
User Type XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 Referencing Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 Getting Started: Publishing SQL Data as XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 Using XPath Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7 XML Function Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8 IsValidXML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8 IsXML. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8 XMLAGG. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 XMLAttributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10 XMLConcat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10 XMLElement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11 XMLExistsNode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11 XMLExtract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12 XMLExtractValue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12 XMLParse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13
iii
3 Data Transformation
Data Transformation Function Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 compress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 decompress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 encrypt/decrypt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 uuencode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 uudecode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4
4 Hashing
Hash Function Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 hash. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 hash4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3 hash8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3
iv
6 Text Analytics
Word Comparison Function Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 word_diff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 word_find . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 word_key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3 word_key_tochar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4 word_keys_diff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5 word_stem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6 Regular Expression Function Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6 The Flags Argument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6 regexp_extract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7 regexp_extract_all . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7 regexp_extract_all_sp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8 regexp_extract_sp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8 regexp_instr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9 regexp_like . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10 regexp_match_count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10 regexp_replace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11 regexp_replace_sp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11
7 Text Utility
Text Utility Function Reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 hextoraw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 rawtohex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 replace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2 strleft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2 strright . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3
8 Array
Array Function Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1 add_element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1 array. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2 array_combine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3 array_concat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3 array_count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3 array_split. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4 array_type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4
9 Collection
User Type Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1 Collection Function Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1 collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1 element_type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2
10 Miscellaneous
Miscellaneous Function Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1 greatest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1 least. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2 mt_random . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2 corr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3 covar_pop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3 covar_samp. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3
Index
vi
List of Tables
Table 1-1: Table 3-1: Table 3-2: Table 4-1: Table 6-1: Table 6-2: Table 8-1: Known Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6 Uuencoding, Part I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 Uuencoding, Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 Algorithms Supported for Cryptographic Hashing . . . . . . . . . . . . . . . 4-2 Algorithms Supported for Phonetic Encoding . . . . . . . . . . . . . . . . . . 6-4 Flags used in Regular Expressions Functions . . . . . . . . . . . . . . . . . . 6-6 Array Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2
vii
viii
Preface
This document describes the SQL Extension Toolkit for the Netezza platform. The Netezza SQL Extensions Toolkit was developed by NDN innovator, Intelligent Integration Systems, Inc.
Audience
This guide is intended for users who require the additional capabilities provided by the SQL Extension functions, which enable users to manipulate SQL data in more sophisticated ways. Users should be familiar with the basic operation and concepts of the NPS system. Users should also be familiar with C style function declarations, as the API defined in this document uses C style declarations rather than SQL style declarations.
Topics
See
System prerequisites, installation, version Installation and Setup on page 1-1 information, upgrading, disabling, and removing the toolkit, using different toolkit versions, backups, and restores. Importing and storing XML data in a SQL database, manipulating XML within the database, and publishing both XML and conventional SQL data in XML form. XML Data on page 2-1
Data Transformation on page 3-1 Transforming data by compressing, encrypting, or uuencoding, and restoring to the original form using decompress, decrypt, and uudecode. Using hash functions for cryptography, checksums, and lookups. Using date and time functions to compare values of type date or of type timestamp. Performing fuzzy comparisons (approximately matching a search key) and using regular expressions to match precise patterns of characters. Converting between ASCII hexadecimal and ASCII, substituting strings, and extracting strings. Hashing on page 4-1 Date and Time Comparisons on page 5-1 Text Analytics on page 6-1
ix
Topics Creating, combining and splitting arrays, and retrieving, deleting, replacing and counting array elements.
Grouping heterogeneous pieces of data, i.e. Collection on page 9-1 data of different types. Determining the greatest/least value, corre- Miscellaneous on page 10-1 lation coefficient, covariance, and generating random numbers.
xi
xii
CHAPTER 1
Installation and Setup
Whats in this chapter
Licensing Information NPS Administration Information Known Issues
The Netezza SQL Extensions Toolkit is an optional package for Netezza Performance Server (NPS) systems. This toolkit was developed by NDN innovator, Intelligent Integration Systems, Inc. This chapter provides information on installing and configuring the Netezza SQL Extensions Toolkit on an NPS system, as well as special information for managing backups and upgrades.
Licensing Information
Netezza customers can obtain the toolkit from the Netezza FTP server in the Releases area. The software kit is contained in two files, libnetcrypto-version.tar.gz and libnetxml-version.tar.gz, where version indicates the currently released version of the software kit. The software kit contains a readme file, libraries, the object files for the functions, and scripts which ease the process of defining and using the toolkit functions in an NPS database, as well as disabling and removing the functions.
1-1
The command extracts two files, libnetcrypto-version.tar.gz and libnetxml-version.tar.gz. 4. Extract the software files and compiled objects in the libnetcrypto-version.tar.gz file:
tar -xzf libnetcrypto-version.tar.gz
The tar command uncompresses and untars the contents to a directory named libnetcrypto/version in the current directory, where version is the version number of the SQL Extensions Toolkit. 5. Extract the software files and compiled objects in the libnetxml-version.tar.gz file:
tar -xzf libnetxml-version.tar.gz
The tar command uncompresses and untars the contents to a directory named libnetxml/version in the current directory, where version is the version number of the SQL Extensions Toolkit.
3. Run the following command and specify the database name where you want to define the SQL Extensions functions and the NPS user account and password who will own the functions:
./install -d <dbname> -u <username> -W <password>
The command could take up to one minute to run. Upon completion, the command displays the message Successfully Installed Crypto Library to <dbname>. Note: If your database name uses spaces or mixed-case letters such as myDatabase, make sure that you specify double-quotation marks around the database name and escape the quotes. For example: ./install -d \"myDatabase\" -u user -W password
1-2
D20484
Rev.1
4. Change to the directory where the second part of the SQL Extensions library files resides, where dir is the directory in which you untarred the files:
cd <dir>/libnetxml/version
5. Run the following command and specify the database name where you want to define the SQL Extensions functions and the NPS user account and password who will own the functions:
./install -d <dbname> -u <username> -W <password>
The command could take up to one minute to run. Upon completion, the command displays the message Successfully Installed XML Library to <dbname>. These commands define the SQL Extensions Functions and register them in the specified database. The NPS user account you specify becomes the owner of the functions. After this procedure, NPS administrators can manage the SQL Extensions functions as objects in the NPS database, and users who have permission to use the SQL Extensions functions can include them in queries. Figure 1-1 shows a sample NzAdmin window for an NPS system that has the SQL Extensions Toolkit.
Figure 1-1: NzAdmin Interface with the Netezza SQL Extensions Toolkit Functions
D20484
Rev.1
1-3
To display the version of the rest of the functions available in the SQL Extensions toolkit, use the following SQL command:
SELECT CRYPTO_VERSION();
3. Run the following command and specify the database name, NPS user name, and password for your system:
./install -R -d <dbname> -u <username> -W <password>
The command displays the message Successfully Uninstalled XML Library from <dbname> when it completes.
1-4
D20484
Rev.1
4. Change to the installation location of the rest of the functions in the toolkit, for example:
cd <install-dir>/libnetcrypto/version
5. Run the following command and specify the database name, NPS user name, and password for your system:
./install -R -d <dbname> -u <username> -W <password>
The command displays the message Successfully Uninstalled Crypto Library from <dbname> when it completes. 6. Repeat Steps 2-5 for each database in which you want to disable the SQL Extensions query support. This install command uses the DROP FUNCTION|AGGREGATE commands to drop the SQL Extensions functions that were added by the install script.
Best Practices for Upgrading NPS Systems with the SQL Extensions Toolkit
After you install the Netezza SQL Extensions Toolkit, take special precautions before you patch or upgrade the NPS software on your system. While most patch and service pack updates should not affect the operation of the toolkit functions, it is possible that an upgrade could stop the functions from working. For example, an upgrade from one major release to another could require you to obtain a new toolkit installation package with new function object files. Before you upgrade the NPS software on your system, make sure that you consult with Netezza Support to ensure that the planned upgrade will not affect your toolkit functions. The NPS Release Notes or the service pack readme file identifies any known situations where an update or upgrade can impact the functions.
D20484
Rev.1
1-5
Known Issues
This release of the Netezza SQL Extensions Toolkit has the following known issues: Table 1-1: Known Issues Reference Issue Description 44849 XMLAgg() can only aggregate VARCHAR columns, not CHAR columns. For example, if emp.name is defined as CHAR(12), the following SELECT will return an error:
SELECT XMLElement ('emp', XMLAgg (XMLElement ('name', name))) from emp; ERROR: 0 : XML: Corrupted XML Block
44894
Only arrays of type varchar support replacing elements by name. For example, given an array of integers, attempting to replace the array element named one with the integer 22 returns an error:
SELECT replace_element(myarray,'one',22); ERROR: 16 : Expected string argument
44384
1-6
D20484
Rev.1
CHAPTER 2
XML Data
Whats in this chapter
User Type XML Referencing Columns Getting Started: Publishing SQL Data as XML Using XPath Expressions XML Function Reference
One of the most intriguing and urgent requirements to arise from the appearance of XML is a well-defined relationship between XML and SQL. Vast quantities of business data are currently stored in SQL database systems and great demand exists for the ability to present that data in XML form to various client applications. (Special Interest Group on Management of Data, ACM) The XML functions provided by Netezza as extensions to the SQL language are modeled after the SQL/XML specification contained in SQL-2003. The SQL/XML specification defines ways of importing and storing XML data in a SQL database, manipulating it within the database, and publishing both XML and conventional SQL data in XML form. Publishing conventional SQL data in XML form enables you to transform the flat (non-hierarchical) result sets of SQL queries into hierarchically structured XML data; one important use of this transformation is to make this data available via web services. The functions used to publish SQL data in XML format are XMLRoot, XMLElement, XMLConcat, XMLAgg, and XMLAttributes. Data that is already stored in the database as XML can be queried, manipulated, and updated using functions such as XMLExistsNode, XMLExtract, XMLExtractValue, and XMLUpdate. Because XML data consists of a tree of nodes, these functions rely on W3C XPath expressions to locate individual XML nodes within the tree. Note: Certain features of the SQL 2003 SQL/XML specification, including the ability to pass column names into functions and the ability to construct sets, are not supported by Netezza user-defined functions (UDFs). For more information on industry standards for SQL extensions, refer to ISO/IEC 9075-14.
2-1
Referencing Columns
The SQL/XML specification supports the ability to pass column names directly into functions. Netezza user-defined functions (UDFs) do not support this ability. Therefore, element names must be explicitly specified as additional parameters, as in the following example:
SELECT XMLElement('Employee', XMLAttributes('EID', a.id), a.name) from employees a;
It is very important to note that the output from the XMLElement function is a value of type XML, which is the Netezza compiled representation of the XML element. So if you typed the preceding select statement, the return would be the type name XML:
XMLELEMENT ----------XML (1 row)
In order to see the actual XML element created by the XMLElement call (<Parent>Parent Text</Parent>), you need to wrap the XMLElement call with XMLSerialize. For example:
select XMLSerialize(XMLElement('Parent', 'Parent Text'));
2-2
D20484
Rev.1
The real power of XMLElement is that the function calls can be nested to produce the hierarchical structure required for XML data. For example:
select XMLElement('Parent', XMLElement('Child', 'Child text'));
The publishing functions can be nested as required, up to a limit of 10,000 nested calls. For example:
select XMLElement('Parent', XMLElement('Child', XMLElement('GrandChild', 'Grandchild text')));
As a more realistic example, suppose there is a DEPARTMENTS table that contains three columns: DEPTNO, DEPTNAME, and DEPTLOC:
DEPTNO DEPTNAME ------ ---------10 20 30 40 MARKETING HR SALES ENGINEERING DEPTLOC --------BOSTON BOSTON NEW YORK NEW YORK
A plain SQL query to list all departments would look like the following:
select * from departments;
But suppose you needed to return all four rows of department data as XML, with one <Dept> node for each department, and each <Dept> node containing three child nodes, <Number>, <Name>, and <Location>, as shown in the following XML document:
<Departments> <Dept> <Number>10</Number> <Name>MARKETING</Name> <Location>BOSTON</Location> </Dept>
D20484
Rev.1
2-3
<Dept> <Number>20</Number> <Name>HR</Name> <Location>BOSTON</Location> </Dept> <Dept> <Number>30</Number> <Name>SALES</Name> <Location>NEW YORK</Location> </Dept> <Dept> <Number>40</Number> <Name>ENGINEERING</Name> <Location>NEW YORK</Location> </Dept> </Departments>
To create this XML document, you would use a SELECT statement modeled after the following:
SELECT XMLElement('Departments', XMLAGG( XMLElement('Dept', XMLConcat( XMLElement('Number', d.deptno), XMLElement('Name', d.deptname), XMLElement('Location', d.deptloc))))) from departments d;
In each of the first two XMLElement calls, the content of the element is created by a nested XML function call. To create a hierarchically structured XML document of parent and child nodes, you nest the XMLElement calls within a SQL statement. So the first XMLElement function in the query creates the top-level <DEPARTMENTS> node:
XMLElement('Departments', XMLAgg (
The XMLAgg call is used for the second argument, indicating that the content for the toplevel <DEPARTMENTS> node is a group of aggregated nodes, which means these nodes will be child nodes of a single parent node. The second XMLElement call Establishes <DEPT> as the name of each child node of the <Departments> parent node, and then relies on the next three embedded XMLElement calls for the contents of each <DEPT> child node
XMLElement('Dept', XMLConcat( XMLElement('Number', d.deptno), XMLElement('Name', d.deptname), XMLElement('Location', d.deptloc)))))
2-4
D20484
Rev.1
These three embedded XMLElement calls create as many <DEPT> child nodes as necessary to wrap the rows of data returned from the Departments table. It is very important to understand the use of the XMLAGG function. This function aggregates child nodes under their parent node, which in the preceding example means that there is a single parent <DEPARTMENTS> node that contains all four <DEPT> nodes; without the XMLAGG call, the XML produced would contain four <DEPARTMENTS> nodes, each of which contained a single <DEPT> node, which would result in an invalid XML document, as shown here:
<Departments> <Dept> <Number>10</Number> <Name>MARKETING</Name> <Location>BOSTON</Location> </Dept> </Departments> <Departments> <Dept> <Number>20</Number> <Name>HR</Name> <Location>BOSTON</Location> </Dept> </Departments> <Departments> <Dept> <Number>30</Number> <Name>SALES</Name> <Location>NEW YORK</Location> </Dept> </Departments> <Departments> <Dept> <Number>40</Number> <Name>ENGINEERING</Name> <Location>NEW YORK</Location> </Dept> </Departments>
This is not valid XML syntax because there are four instances of the <DEPARTMENTS> document element. This demonstrates how important it is to use the IsValidXML function to ensure that the XML you create with the function library can be parsed as XML. Furthermore, if you are using schemas, then you are also responsible for returning well-formed XML (XML that conforms to the structure specified by the schema). As another example, suppose you want to return a list of employees by department, tagged as follows:
D20484
Rev.1
2-5
<EmployeesByDepartment> <Dept DeptNo=10> <Name>ACCOUNTING</Name> <Location>NEW YORK</Location> <Employees> <Employee EmpNo=7782> <Name>CLARK</Name> <Job>MANAGER</Job> <Manager>7839</Manager> <Salary>2450</Salary> </Employee> <Employee EmpNo=7839> <Name>KING</Name> <Job>PRESIDENT</Job> <Salary>5000</Salary> </Employee> ... </Employees> </Dept> ... <EmployeesByDepartment>
To return employees by department, two select statements are required: first create an employee grouping and then group the employees by department:
CREATE temp table emp_grouping AS SELECT deptno, XMLElement ('Employees', XMLAGG ( XMLElement ('Employee', XMLAttributes ('EmpNo', empno), XMLConcat ( xmlelement ('name', name) xmlelement ('job', job) xmlelement ('manager', mgr) xmlelement ('salary', sal) xmlelement ('comm', comm))))) AS xml FROM emp INNER JOIN dept ON emp.deptno = dept.deptno GROUP BY deptno; SELECT XMLElement('EmployeesByDepartment', XMLAGG( XMLElement('Dept', XMLAttributes('DeptNo', deptno), XMLConcat( XMLElement('Name', D.DNAME), XMLElement('Location', D.LOC), emp_grouping.xml)))) FROM dept INNER JOIN emp_grouping ON dept.deptno = emp_grouping.deptno;
2-6
D20484
Rev.1
// *
[]
nodename
bookstore, no matter where they are under the bookstore element. . .. @ functionname Selects the current node. Selects the parent of the current node. Selects attributes. For example, //@lang selects all attributes that are named lang. XPath supports a set of built-in functions such as substring(), round(), and not(). In addition, user-defined functions can be made available using namespaces.
D20484
Rev.1
2-7
IsValidXML
Determines whether or not a character string can be parsed as XML.
Description
The IsValidXML function has the following syntax:
boolean = IsValidXML(varchar input);
Returns
The function returns true if the character string input can be parsed as XML; otherwise, the function returns false. For example:
select IsValidXML('<tag1>12</tag1>'); select ISValidXML('<tag1><tag2>');
This first example returns true; the second example returns false.
IsXML
Determines whether the input argument is a compiled Netezza XML document; in other words, whether the input argument is of type XML.
Description
The IsXML function has the following syntax:
bool = IsXML(XML input);
Returns
The function returns true if the input varchar is a compiled Netezza XML document. Otherwise it returns false. It is important to explicitly check whether the XML you produce by embedding SQLX functions within your SQL is valid XML, since the underlying SQLX engine does not perform any error checking or validation. Note that if you are using schemas, then you are also responsible for returning well-formed XML (meaning that it conforms to the structure specified by the schema). For example:
select IsXML(XMLParse('<tag1>12345</tag1>'));
2-8
D20484
Rev.1
XMLAGG
This publishing function aggregates the set of XML inputs into a single XML object.
Description
The XMLAGG function has the following syntax:
XML = XMLAGG(Set(XML) inputs);
The inputs value specifies the set of XML inputs to aggregate into a single XML object.
Returns
The function returns a compiled representation (type XML) of a single XML object which has been aggregated from a set of XML inputs. For example:
SELECT XMLElement('Departments', XMLAGG( XMLElement('Dept', XMLConcat( XMLElement('Number', d.deptno), XMLElement('Name', d.deptname), XMLElement('Location', d.deptloc))))) from departments d;
Assuming that the query returns three rows of data, a possible return value might look like this:
<Departments> <Dept> <Number>10</Number> <Name>MARKETING</Name> <Location>BOSTON</Location> </Dept> <Dept> <Number>20</Number> <Name>HR</Name> <Location>BOSTON</Location> </Dept> <Dept> <Number>30</Number> <Name>SALES</Name> <Location>NEW YORK</Location> </Dept> </Departments>
D20484
Rev.1
2-9
XMLAttributes
This publishing function constructs an XML Attribute object. This object is not a valid XML object; rather, it must be assigned as an attribute value of an XMLElement.
Description
The XMLAttributes function has the following syntax:
XML_Attrib = XMLAttributes(varchar name, varchar value);
The name value specifies the name of the XML attribute to construct. The value value specifies the value of the XML attribute to construct.
Returns
The function returns an XML Attribute object. The following example produces an Emp element for each employee, with an ID and name attribute:
SELECT XMLELEMENT ( 'Emp', XMLATTRIBUTES (e.id,e.fname ||' ' || e.fname AS "name")) AS "result" FROM employees e WHERE employee_id > 200;
XMLConcat
This publishing function concatenates two XML objects (either two elements or two attributes) to produce a single XML object.
Description
The XMLExtract function has two forms, one for concatenating elements and another for concatenating attributes:
XML = XMLConcat(XML inputa, XML inputb); XML_Atrrib = XMLConcat(XML_Attrib inputa, XML_Attrib inputb);
The inputa value specifies the first XML object to concatenate. The inputb value specifies the second XML object to concatenate.
Returns
The function returns a compiled representation (type XML) of the concatenated XML input objects as a single XML object. If either of the input XML objects is null, the function returns null. For an example of the use of XMLConcat, see the example for XMLAgg.
2-10
D20484
Rev.1
XMLElement
This publishing function constructs an XML Element. The XMLElement function is typically nested to produce a hierarchically structured XML document.
Description
The XMLElement function has the following syntax:
XML = XMLElement(varchar name, [XML_Attrib attrib,] varchar value);
The name value specifies the name of the enclosing tag for the XML element. If the identifier specified is NULL, then no element is returned. Note that the name cannot be a column name or column reference, a difference from the SQL/XML specification. One or more optional attrib values specify one or more name-value pairs that create attributes for the XML element. The input value specifies the content of the newly constructed XML element.This can be either a scalar value or a nested XMLElement call.
Returns
The function returns a compiled representation (type XML) of an XML element with the specified name, content, and optionally a collection of attributes. It does not create prolog information. For example:
select XMLElement('Parent', XMLElement('Child', 'Child text'));
XMLExistsNode
Determines whether using an XPath to traverse the XML input document results in at least a single XML element or text node.
Description
The XMLExistsNode function has the following syntax:
bool = XMLExistsNode(XML input, varchar XPath);
The input value specifies a compiled representation of an XML file. Values can be any builtin SQL type. The XPath value specifies the XPath of the XML node to extract.
Returns
Returns true if the XPath leads to an XML element or text node in the XML input object. Otherwise returns false. For example:
SELECT person FROM MAILINGLIST WHERE existsNode(person,'/MailingList[Occupation=Doctor]') = 1;
This example returns rows from MAILINGLIST only if nodes exist that satisfy the condition. Note: When using the XMLExistsNode() function in a query, it must always be specified in the WHERE clause, not in the SELECT list.
D20484
Rev.1
2-11
XMLExtract
Finds the XML node(s) specified by the XPath expression. The extracted nodes can be elements, attributes, or text nodes. XMLExtract can be used to extract: Numerical values on which function-based indexes can be created to speed up processing. Collection expressions for use in the FROM clause of SQL statements. XML fragments to be combined into a single XML document.
Description
The XMLExtract function has the following syntax:
XML = XMLExtract(XML input, varchar XPath);
The input value specifies the XML file from which to extract the node. The XPath value specifies an XPath query which specifies an XML node within the XML file.
Returns
If more than one item is found by this function, only the first will be returned. If no item is found, null is returned. The following example uses XMLExtract to query the value of the Reference column for orders with SpecialInstructions set to Rush:
SELECT XMLExtract(object_value,'/PurchaseOrder/Reference') "REFERENCE" FROM PURCHASEORDER WHERE XMLExistsNode(object_value,'/ PurchaseOrder[SpecialInstructions=Rush]') = 1;
XMLExtractValue
Extract the actual (scalar) value from the XML input object specified by the XPath parameter. The result of the XPath query must be a single node and either an element, a text node, or an attribute. If a specific datatype is desired, XMLExtractValue can be wrapped with a conversion function, for example a function that converts the varchar to a date.
Description
The XMLExtractValue function has the following syntax:
varchar = XMLExtractValue(XML input, varchar XPath);
The input value specifies an XML file. The XPath value specifies the XPath query.
2-12
D20484
Rev.1
Returns
If the result is an element then it must have a single text node as its child; the child node provides the text content for the scalar return value. If the node does not exist, this function returns null. If more than one node is returned by the XPath expression or if the expression points to an element node with anything other than a single text child node, this function returns an error. For example, the following query extracts the scalar value of the Reference column:
SELECT XMLExtractValue(object_value,'/PurchaseOrder/Reference') "REFERENCE" FROM PURCHASEORDER WHERE XMLExistsNode(object_value,'/ PurchaseOrder[SpecialInstructions=Rush]') = 1;
An example of a possible return value is shown below. Note the difference from the return value for the similar example for XMLExtract. In that example, each line of data is wrapped with a <Reference> element. Here, just the scalar value is extracted and returned:
JSMITH-20021009123336271PDT ABELL-20021009123336321PDT JDOE-20021009123337303PDT GWASHINGTON-20021009123337123PDT
XMLParse
Converts a value of type varchar to a value of type XML, which is the Netezza compiled representation of an XML object (stripping white space by default). The inverse function is XMLSerialize. Note: XMLParse is not intended for parsing and loading external data into XML columns. Though it is possible to call XMLParse as a part of an external table load, the resulting XML datatype is stored as a VARCHAR which has a maximum size of 64000 bytes.
Description
The XMLParse function has the following syntax:
XML = XMLParse(varchar input)
Returns
The function returns the Netezza compiled representation of an XML object. If the input varchar resolves to null, the function returns null. For example:
select XMLParse('<Parent>Parent Text</Parent>');
This example returns a value of type XML which is the compiled representation of the XML object <Parent>Parent Text</Parent>.
D20484
Rev.1
2-13
XMLRoot
This publishing function creates a new XML value by providing the version and standalone properties in the XML root information (prolog) of the specified value of type XML. This creates the root node if it does not already exist. Typically, this is done to ensure data-model compliance.
Description
The XMLRoot function has the following syntax:
XML = XMLRoot(XML input, float version, bool standalone);
The input value specifies the XML object to update. The version value specifies the version property of the input XML object. The standalone value specifies the standalone property of the input XML object.
Returns
The function returns the updated object. If a prolog already exists, an error is returned. For example:
INSERT INTO employees ( id, xvalue) VALUES (1001, XMLROOT (XMLPARSE ('<Emp> John Smith </Emp>'), '1.0', true)
XMLSerialize
Converts a value of type XML to a value of type varchar. The inverse function is XMLParse.
Description
The XMLSerialize function has the following syntax:
varchar = XMLSerialize(XML input);
The input value specifies a value of type XML, which is the Netezza compiled representation of an XML file. Values can be any built-in SQL type.
Returns
The function returns the varchar representation of the input XML object. For example:
select XMLSerialize(XMLElement('Parent', 'Parent Text'));
Without the XMLSerialize call, the XMLElement call returns the type name XML:
XMLELEMENT ----------XML (1 row)
2-14
D20484
Rev.1
XMLUpdate
Updates the portion of an XML document (elements, attributes, or nodes) identified by XPath with a new value. The datatypes of the XPath target and the new value must match. XMLUpdate cannot be directly used to insert a new node or delete an existing node, element, or attribute. Instead, you need to update the containing parent element with the new value.
Description
The XMLUpdate function has two forms, one to update the XML document with a scalar (varchar) value and another to update the XML document with an XML document:
XML = XMLUpdate(XML input, varchar XPath, varchar value); XML = XMLUpdate(XML input, varchar XPath, XML value);
The input value specifies an XML document that contains the fragment to be updated. The Xpath value specifies the XPath expression used to locate the XML fragment to update. If Xpath is an XML element, then the corresponding value must be type XML. If Xpath is an attribute or text node, then the value can be any scalar datatype. The value value specifies the new value to assign the XML fragment.
Returns
The function returns an XML document that contains an updated fragment. For example:
update sales_tab set order = XMLUpdate(order, '/order/company/name', XMLParse('<Name>Netezza</Name>')) where sales_person = John Smith
This example updates the company name in order XML documents to Netezza, where the salesperson is John Smith.
D20484
Rev.1
2-15
2-16
D20484
Rev.1
CHAPTER 3
Data Transformation
Whats in this chapter
Data Transformation Function Reference
The functions in this chapter transform data into a different representation, for the purposes of security, space savings, or transmission time savings. The functions in many cases rely on industry-standard algorithms, as noted in the function descriptions. For more information on these algorithms, refer to the publicly available documentation. Note: Compressed and encrypted data exists in a binary format that is not readable. To display this data, it must first be decompressed/decrypted to avoid output alignment problems. If table columns contain compressed or encrypted data, selects on that table need to use the decompress/decrypt functions to process the binary data in those columns properly.
compress
Compresses a varchar using the public source zlib software library. The zlib library uses the DEFLATE compression algorithm, a variation of LZ77 (Lempel-Ziv 1977). Compression is the process of encoding data so that it uses fewer bits. For example, compression replaces instances of contiguous, repeated characters with a single character and a count. Compressed data must be decompressed before it can be used.
Description
The compress function has the following syntax:
varchar = compress(varchar input[, int level]);
The input value specifies the varchar to be compressed. The level value specifies the compression level used. It can be between 0 and 9 with 0 indicating the least compression and 9 indicating the most compression. The default is 6. Increasing the compression level increases the processing time.
3-1
Returns
The function returns the compressed varchar. For example:
select decompress (compress('1234567890'));
decompress
Decompresses a previously compressed varchar.
Description
The decompress function has the following syntax:
varchar = decompress(varchar input);
Returns
The function returns the decompressed varchar. For example:
select decompress (compress('1234567890'));
encrypt/decrypt
Encrypts or decrypts the input varchar using the supplied key. Encryption is the process of transforming data in order to maintain its secrecy; the data can be read (unencrypted) only if the recipient has the required key. The Netezza implementation uses symmetric encryption, also known as private or secret key encryption, because the same secret key is used to encrypt and to decrypt data. This means that this secret key must be made available on any server that is decrypting previously encrypted data. You can choose which symmetric encryption algorithm the function uses to encrypt/decrypt the data, either AES (Advanced Encryption Standard) or RC4. Private key encryption is more secure than public key encryption because all public key encryption schemes are susceptible to brute force key search attacks. But private key encryption depends on maintaining the secrecy of the key, so you should periodically change the private key and take steps to ensure that it cannot be discovered in use, in storage, or in distribution (see the description of the key argument below for Netezza specific security recommendations). Note: This is field level encryption, not database encryption.
Description
The encrypt function has the following syntax:
varchar = encrypt(varchar text, varchar key [, int algorithm]);
3-2
D20484
Rev.1
The key value specifies the key to use to encrypt/decrypt the value. Care must be taken to secure the key or else the security will be compromised. Keep in mind the architecture of the Netezza system when designing your security system including the following SQL functions are logged in the pg.log file on the Netezza host so executing encrypt(secret_column, my_secret_key) will reveal your key to anyone who can read the pg.log file. ODBC/JDBC conversations are easily captured with any number of diagnostic/hacking tools. If your key is transmitted as part of the SQL, it can be compromised during this process. For these reasons it is recommended that the secret key be stored in a table and passed into the encrypt/decrypt functions through a table join. For example:
SELECT decrypt(a.value, b.key) FROM my_table a, my_keys b WHERE b.key_id = 1;
The algorithm value can be either RC4 or one of the versions of AES, as shown in the following list. RC4, although the most widely-used encryption algorithm (used for example by SSL and WEP), is not cryptographically secure and is vulnerable to attacks. The Advanced Encryption Standard (AES) is the encryption standard adopted by the United States government and is required for all classified information. The three versions of AES differ only in the design and strength of the key lengths. While all three key lengths are sufficient to protect classified information up to the SECRET level, TOP SECRET information requires the use of key lengths 192 or 256. 0 RC4 (default if no algorithm given) 1 AES 128 2 AES 192 3 AES 256
Returns
The function returns an encrypted/decrypted varchar. For example:
Select decrypt (encrypt('123456',100,0),100,0);
uuencode
Encodes a binary value as ASCII using the Unix UUencode format. The encoding translates the binary value into ASCII character codes in the range 32 and above. Uuencoding has historically been used to encode files destined for e-mail transmission. The uudecode function reverses the effect of uuencode, recreating the original binary file exactly. The uuencode algorithm does the following: 1. Divides the binary value into groups of three bytes (24 bits), adding zeroes to the end of the binary value if necessary to create a final group of three bytes.
D20484
Rev.1
3-3
2. Split the 24 bits into four groups of six bits each. This creates four decimal numbers which lie in the range 0 to 63. 3. Add decimal 32 to each number to create ASCII characters in the range 32 (space) to 95 (underscore). Step 1 is illustrated by the following table. Table 3-1: Uuencoding, Part I ASCII Input ASCII Decimal ASCII Binary (8 bit) h 104 01101000 a 97 01100001 t 116 01110100
Steps 2 and 3 are illustrated by the following table. Note the transformation of the three 8 bit ASCII Binary values in the preceding table to the four 6 bit Binary values in the first line of the table: Table 3-2: Uuencoding, Part II 6 Bit Binary Decimal Equivalent Decimal + 32 Uuencoding 011010 26 58 : 000110 6 38 & 000101 5 37 % 110100 52 84 T
Description
The uuencode function has the following syntax:
varchar = uuencode(varchar input);
Returns
The function returns a UUencoded string. For example:
select uuencode ('hat');
uudecode
Decodes an ASCII value that was previously encoded using the Unix UUencode format.
Description
The uudecode function has the following syntax:
varchar = uudecode(varchar input);
3-4
D20484
Rev.1
Returns
The function returns a UUdecoded string. For example:
select uudecode (':&%T');
D20484
Rev.1
3-5
3-6
D20484
Rev.1
CHAPTER 4
Hashing
Whats in this chapter
Hash Function Reference
Hashing functions are used to encode data, transforming the input into a hash code or hash value. The hash algorithm is designed to minimize the chance that two inputs will have the same hash value, termed a collision. Hashing functions are used to speed up the retrieval of data records (simple one-way lookups), for the validation of data (checksums), and for cryptography. For lookups, the hash code is used as an index into a hash table which contains a pointer to the data record. For checksums, the hash code is computed for the data before storage/transmission and then recomputed afterward to verify data integrity; if the hash codes do not match, the data is corrupt. Cryptographic hash functions are used for data security. Some common use cases for hashing functions include: Detect duplicated records. Because the hash keys of duplicates will hash to the same bucket in the hash table, the task reduces to scanning buckets that have more than two records, a much faster method than sorting and comparing each record in the file. (This same technique can be used to find similar records, because similar keys will hash to buckets that are contiguous, the search for similar records can therefore be limited to those buckets.) Locate points that are near each other. Applying a hashing function to spatial data effectively partitions the space being modeled into a grid, and as in the previous example, the retrieval/comparison time is greatly reduced because only contiguous cells in the grid need to be searched. This same technique works for other types of spatial data, such as shapes and images. Verify message integrity. The hash of message digests is made both before and after transmission and the two hash values compared to determine whether the message was corrupted. Verify passwords. During authentication, the users login credentials are hashed and this value is compared with the hashed password stored for that user.
4-1
hash
Returns a 128 bit, 160 bit, or 256 bit hash of the input data, depending on the algorithm selected. This function provides between 2128 and 2256 distinct return values and is intended for cryptographic purposes. hash() is generally much slower to calculate than hash4() or hash8(). The return type is a 16 to 32 byte binary varchar. This can make hash comparisons slower than a simple integer comparison On the Netezza platform, a column of these hashes cannot make use of zone-maps and other performance enhancements.
Description
The hash function has the following syntax:
varchar = hash(varchar data [, int algorithm]);
The data value specifies the varchar to hash. The algorithm value is specified by an integer code (defaults to 0). The available algorithms and the size of the resulting hash value are shown in the following table: Table 4-1: Algorithms Supported for Cryptographic Hashing Code 0 1 2 Description MD5 SHA-1 SHA-2 Result 128 bit 160 bit 256 bit
Both the MD5 and SHA algorithms are message digest algorithms derived from MD4. The SHA (Secure Hash Algorithm) hash functions are the result of an effort by the National Security Agency (NSA) to provide strong cryptographic hashing capabilities. Security flaws have been identified in both SHA-1 and MD5. SHA-2 is still considered secure as of the publication date of this manual, but SHA-3 development is currently underway to prepare for any future security flaw discovered in SHA-2.
Returns
The function returns the hashed input data. For example:
select hash4('Netezza',0);
4-2
D20484
Rev.1
hash4
Returns the 32 bit checksum hash of the input data. This function provides 232 (approximately 4 billion) distinct return values and is intended for data retrieval (lookups).
Description
The hash4 function has the following syntax:
int4 = hash4(varchar data [, int algorithm]);
The data value specifies the varchar to hash. The algorithm can be one of the following (defaults to Adler): 0 Adler 1 CRC32 Adler is the fastest checksum hash that is provided. However, it has poor coverage when the messages are less than a few hundred bytes (poor coverage means that two different integers hash to the same value, referred to as a collision). In this case, use the CRC32 algorithm, or switch to hash8 instead.
Returns
The function returns the hashed input data. For example:
select hash4('Netezza',0);
hash8
Returns the 64 bit hash of the input data. The function provides 264 distinct return values and is intended for data retrieval (lookups).
Description
The hash8 function has the following syntax:
int8 = hash8(varchar data [, int algorithm]);
The data value specifies the varchar to hash. Only one algorithm value is supported for this hashing function, 0, which indicates the Jenkins algorithm.
Returns
The function returns the hashed input data. For example:
select hash8('Netezza');
D20484
Rev.1
4-3
4-4
D20484
Rev.1
CHAPTER 5
Date and Time Comparisons
Whats in this chapter
Date and Time Function Reference
There are three types associated with the date and time functions date, time, and timestamp. The timestamp type is implicitly converted to date and time and can therefore be passed into any of the date/time functions. The date type is implicitly converted to type timestamp (but not time) and can therefore be supplied to any function that takes either a date or a timestamp. Values of type time cannot be converted into anything and therefore can only be supplied to functions that take this type. For example, although the signature for the next_month function indicates that the function takes an input value of type date, it is permissible to pass an input value of type timestamp into the next_month function.
day
Determine the weekday in the specified date. Note: These can also be accomplished using the Netezza date_part() function.
Description
The day function has the following syntax:
int1 = day(date input);
Returns
Returns an integer representation of the day in the specified input. For example:
select day('1996-2-29');
5-1
days_between
Determine the truncated number of full days between two timestamps.
Description
The days_between function has the following syntax:
int = days_between(timestamp t1, timestamp t2);
The t1 value specifies the beginning timestamp. The t2 value specifies the ending timestamp.
Returns
Returns the truncated number of full days between t1 and t2. For example:
select days_between('1996-02-27 06:12:33' , '1996-03-01 07:12:33');
hour
Determine the hours value in the specified time. Note: This can also be accomplished using the Netezza date_part function.
Description
The hour function has the following syntax:
int1 = hour(time input);
Returns
Returns an integer representation of the hour in the specified time. For example:
select hour ('01:12:55');
hours_between
Determine the truncated number of full hours between two timestamps.
Description
The hours_between function has the following syntax:
int = hours_between(timestamp t1, timestamp t2);
The t1 value specifies the beginning timestamp. The t2 value specifies the ending timestamp.
Returns
Returns the truncated number of full hours between t1 and t2. For example:
select hours_between('1996-02-27 06:12:33' , '1996-03-01 07:12:33');
5-2
D20484
Rev.1
minute
Determine the minutes value in the specified time. Note: This can also be accomplished using the Netezza date_part function.
Description
The minute function has the following syntax:
int1 = minute(time input);
Returns
Returns an integer representation of the minute in the specified time. For example:
select minute ('01:12:55');
minutes_between
Determine the truncated number of full minutes between two timestamps.
Description
The minutes_between function has the following syntax:
int = minutes_between(timestamp t1, timestamp t2);
The t1 value specifies the beginning timestamp. The t2 value specifies the ending timestamp.
Returns
Returns the truncated number of full minutes between t1 and t2. For example:
select minutes_between('1996-02-27 06:12:33' , '1996-02-27 07:12:00');
month
Determine the month in the specified date. Note: This can also be accomplished using the Netezza date_part function.
Description
The month function has the following syntax:
int1 = month(date input);
Returns
Returns an integer representation of the month in the specified input. For example:
select month('1996-2-29');
D20484
Rev.1
5-3
next_month
Determine the first day of the next month after the specified date.
Description
The next_month function has the following syntax:
date = next_month(date input);
Returns
Returns a date value representing the first day of the next month after the month specified by the input. For example:
select next_month('1996-2-29');
next_quarter
Determine the first day of the next quarter after the quarter specified by the input.
Description
The next_quarter function has the following syntax:
date = next_quarter(date input);
Returns
Returns a date value representing the first day of the next quarter after the quarter specified by the input. For example:
select next_quarter('1996-2-29');
next_year
Determine the first day of the next year after the year specified by the input.
Description
The next_year function has the following syntax:
date = next_year(date input);
Returns
Returns a date value representing the first day of the next year after the year specified by the input. For example:
select next_year('1996-2-29');
5-4
D20484
Rev.1
second
Determine the seconds value in the specified time. Note: This can also be accomplished using the Netezza date_part function.
Description
The second function has the following syntax:
int1 = second(time input);
Returns
Returns an integer representation of the seconds value in the specified time. For example:
select second ('01:12:55');
seconds_between
Determine the truncated number of full seconds between two timestamps.
Description
The seconds_between function has the following syntax:
int = seconds_between(timestamp t1, timestamp t2);
The t1 value specifies the beginning timestamp. The t2 value specifies the ending timestamp.
Returns
Returns the truncated number of full seconds between t1 and t2. For example:
select seconds_between('1996-02-27 06:12:33','1996-02-27 06:55:22');
this_month
Determine the first day of the month in the specified date. Note: This functionality is also provided by the Netezza date_trunc() function.
Description
The this_month function has the following syntax:
date = this_month(date input);
Returns
Returns a date representing the first day of the month specified by input. For example:
select this_month('1996-2-29');
D20484
Rev.1
5-5
this_quarter
Determine the first day of the quarter in which the specified date occurs.
Description
The this_quarter function has the following syntax:
date = this_quarter(date input);
Returns
Returns a date value representing the first day of the specified quarter. For example:
select this_quarter('1996-2-29');
this_week
Determine the first day of the week in the specified date.
Description
The this_week function has the following syntax:
date = this_week(date input);
Returns
Returns a date value representing the first day of the week specified by input. For example:
select this_week('1996-2-29');
this_year
Determine the first day of the year in the specified date. Note: This functionality is also provided by the Netezza date_trunc() function.
Description
The this_year function has the following syntax:
date = this_year(date input);
Returns
Returns a date value representing the first day of the year specified by input. For example:
select this_year('1996-2-29');
5-6
D20484
Rev.1
weeks_between
Determine the truncated number of full weeks between two timestamps.
Description
The weeks_between function has the following syntax:
int = weeks_between(timestamp t1, timestamp t2);
The t1 value specifies the beginning timestamp. The t2 value specifies the ending timestamp.
Returns
Returns the truncated number of full weeks between t1 and t2. For example:
select weeks_between('1996-02-27 06:12:33' , '1996-03-05 07:12:33');
year
Determine the year in the specified date. Note: This can also be accomplished using the Netezza date_part function.
Description
The year function has the following syntax:
int2 = year(date input);
Returns
Returns an integer representation of the year in the specified date. For example:
select day('1996-2-29');
D20484
Rev.1
5-7
5-8
D20484
Rev.1
CHAPTER 6
Text Analytics
Whats in this chapter
Word Comparison Function Reference Regular Expression Function Reference
The functions in this chapter fall into two distinct groupings. The word comparison functions are useful for fuzzy comparisons, finding records in a database that approximately match a search key, phonetically or lexically. The regular expression functions identify precise patterns of characters and are useful for data validation, for example type checks, range checks, and checks for illegal characters.
word_diff
Finds the number of modifications that are required to change the first string into the second string. Adding, deleting, substituting, or changing the case of a single character in the string each count as one modification. Transposing two adjacent characters counts as two modifications in all but the Damerau-Levenshtein algorithm, which counts transposition as a single modification. Note: Using the word_diff function with the Soundex or Double-Metaphone algorithms achieves the same result as using the combination of the word_key function to convert the strings to their phonetic encodings and then using the word_keys_diff function to compare those encodings. The word_diff function both converts the strings to their phonetic encodings and compares those encodings.
Description
The word_diff function has the following syntax:
int1 = word_diff(varchar word1, varchar word2 [, int algorithm]);
6-1
The word1 value specifies the first word in the comparison. The word2 value specifies the second word in the comparison. Algorithm is one of the following: 0 Soundex-Miracode 1 Soundex-Simplified 2 Soundex-SQLServer 3 Double-Metaphone (default if no algorithm given) 10 Levenshtein 11 Damerau-Levenshtein Note: The built-in Netezza le_dst() function is equivalent to using the word_diff() function with the Levenshtein algorithm. The built-in Netezza dle_dst() function is equivalent to using the word_diff() function, with the Damerau-Levenshtein algorithm.
Returns
Returns an integer that indicates how similar or different the two strings are. A value of 0 indicates the strings are the same. The results vary depending on the algorithm chosen. For example:
select word_diff('anderson','andrsn',0);
This example returns 0, because the Soundex algorithms consider only the initial vowel, not subsequent vowels. Suppose the algorithm is changed to Damerau-Levenshtein, as in the following example:
select word_diff('anderson','andrsn',11);
This call returns 2, because Damerau-Levenshtein accounts for the missing vowels e and o in the second string.
word_find
Searches the input varchar text for the first word that matches the input parameter word within the specified tolerance.
Description
The word_find function has the following syntax:
int4 = word_find(varchar word, varchar text, int1 difference [, int algorithm1 [, int algorithm2 [, int algorithm3]]]);
The word value specifies the word you want to search for in text. The text value specifies the varchar text to search. The difference value specifies the tolerance used by each specified algorithm when searching for a match. Each specified algorithm will be used to try and find a match within the tolerance defined by difference. If no algorithms are specified or if the only algorithm specified is a stemming algorithm then an exact (case-insensitive) match is required. algorithm is one of the following:
6-2
D20484
Rev.1
Returns
Returns the position of the first character of the matching string. For example:
select word_find('swimming', 'she swims in the competition in red wsimwear', 0, 11, 100);
word_key
Phonetically encode a word, according to its pronunciation in English, using the Double Metaphone algorithm or one of the three supported varieties of the Soundex algorithm. The phonetically encoded words can subsequently be compared with the word_keys_diff function for a fuzzy comparison. Words with the same pronunciation but different spellings are encoded the same; depending on the algorithm selected, similar sounding words might also be encoded the same. The goal is to enable you to match names based on their pronunciation and reduce misses that might result from spelling variations. For example, this type of fuzzy comparison can be used to find duplicate records resulting from spelling errors; another use is to find ancestor names in a genealogical database when the spelling has changed slightly over time. The phonetic matching functions are case-insensitive comparisons: the phonetic representations are the same for two strings that have the same spelling but different letter casing. The functions ignore any characters outside the ASCII subset.
Description
The word_key function has the following syntax:
int4 = word_key(varchar word [, int algorithm]);
The input value specifies the varchar word to be given a phonetic encoding.
D20484
Rev.1
6-3
The algorithm value is specified by an integer code (defaults to 3). The available algorithms are listed in the following table: Table 6-1: Algorithms Supported for Phonetic Encoding Co de 0 Name SoundexMiracode Description The original Soundex algorithm used to encode surnames in the United States census between 1880 and 1930. All surnames are encoded as a four-character string: the first character represents the first letter of the persons last name, and characters two, three, and four are integer encodings for the remaining consonants in the name, ignoring vowels, collapsing duplicate encodings to a single value, and right-padding with zeroes if necessary. An updated form of the original Soundex algorithm, it is identical to Miracode except that it does not encode H or W. The version of the Soundex algorithm implemented in Microsoft SQL Server. It does not encode H or W rule and similarity grouping starts after the first character. Encodes most English words, not just names. The algorithm better quantifies the rules of English pronunciation and also recognizes a subset of non-Latin characters, making it a much better choice than Soundex (it is the algorithm used by most spell checkers). Whereas Soundex encodes all names with a key of the same length, Double-Metaphone outputs variable length encodings that more accurately represent the sounds of the word. The algorithm also handles the case in which a word has an alternate pronunciation by returning a primary and a secondary encoding.
1 2
SoundexSimplified SoundexSQLServer
DoubleMetaphone
Note: The Netezza built-in dbl_mp() function is equivalent to using the word_key() function with the Double Metaphone algorithm. The Netezza built-in nysiis() function is roughly equivalent to using the word_key() function with the Soundex-Simplified algorithm.
Returns
The function returns the word_key code of a word as an integer. These codes can be compared using the word_keys_diff() function. For example:
select word_key('persistent',1);
word_key_tochar
Returns the varchar representation of the phonetic encoding produced by the word_key function.
6-4
D20484
Rev.1
Description
The word_key_tochar function has the following syntax:
varchar = word_key_tochar(int wordkey [, int algorithm]);
The wordkey value specifies the word_key encoding to be given a varchar representation. Algorithm is one of the following: 0 Soundex-Miracode 1 Soundex-Simplified 2 Soundex-SQLServer 3 Double-Metaphone (default if no algorithm given)
Returns
For example word_keys_tochar(word_keys(Ashcroft, 0), 0) will return A261. For example:
select word_key_tochar(word_key('PERsisteNT',2),2);
word_keys_diff
Computes the lexical difference between phonetic encodings produced by the word_key function. Note: Soundex word keys can be compared for an exact match by comparing the int4 keys directly without using this function.
Description
The word_keys_diff function has the following syntax:
int1 = word_keys_diff(int4 wordkey1, int4 wordkey2 [, int algorithm]));
The wordkey1 value specifies the first word_key encoding in the comparison. The wordkey2 value specifies the second word_key encoding in the comparison. Algorithm is one of the following: 0 Soundex-Miracode 1 Soundex-Simplified 2 Soundex-SQLServer 3 Double-Metaphone (default if no algorithm given)
Returns
Soundex will return a value between 0 and 4. 0 represents an exact match. 1-4 represent increasing degrees of inexactness. For example:
select word_keys_diff(word_key('Johnson',0),word_key('Jeppeson',0),0);
This example returns 1 because the two soundex encodings differ by 1 character; the soundex code for Johnson is J525 and the soundex code for Jeppeson is J125.
D20484
Rev.1
6-5
word_stem
Returns the root stem of the given varchar word. (e.g. fishing, fished, fisher all return fish).
Description
The word_key function has the following syntax:
varchar = word_stem(varchar word [, int algorithm]);
The word value specifies the varchar word whose root stem you want. The algorithm value has just one option, 100, which indicates the Porter algorithm. This is the default, so no algorithm need be specified.
Returns
The function returns the root stem of the given varchar word. For example:
select word_stem('fishing'); select word_stem('fisher');
i c s
n x
6-6
D20484
Rev.1
regexp_extract
Pulls out the matching text item. Note: Analogous to the REGEXP_SUBSTR() function provided by some vendors.
Description
The regexp_extract function has the following syntax:
varchar = regexp_extract(varchar input, varchar pattern [, int start_pos [, int reference]] [, varchar flags]);
The input value specifies the varchar on which the regular expression is processed The pattern value specifies the regular expression. The start_pos value specifies the character position at which to start the search (defaults to position 1). The reference value specifies which instance of the pattern to extract (defaults to 1). For a description of flags, see The Flags Argument on page 6-6.
Returns
For example:
select regexp_extract(hello to you, .o,1,1); select regexp_extract(hello to you, .o,1,2); select regexp_extract(hello to you, .o,1,3);
This first example returns lo, the second returns to, and the third returns yo.
regexp_extract_all
Pulls out all the matching text items and returns them in a varchar array.
Description
The regexp_extract_all function has the following syntax:
array(varchar) = regexp_extract_all(varchar input, varchar pattern [, int start_pos] [, varchar flags]);
The input value specifies the varchar on which the regular expression is processed. The pattern value specifies the regular expression. The start_pos value specifies the character position at which to start the extract (defaults to position 1) For a description of flags, see The Flags Argument on page 6-6.
Returns
For example:
select array_combine(regexp_extract_all('Steven .Stephen are best player','Ste(v|ph)en'),'|');
D20484
Rev.1
6-7
regexp_extract_all_sp
Processes the specified regular expression on the varchar input. All sub-patterns are returned in an array with the first element (element 0) corresponding to the full match.
Description
The regexp_extract_all_sp function has the following syntax:
array(varchar) = regexp_extract_all_sp(varchar input, varchar pattern [, int start_pos][, varchar flags]);
The input value specifies the varchar on which the regular expression is processed. The pattern value specifies the regular expression. The start_pos value specifies the character position at which to start the extract (defaults to position 1). For a description of flags, see The Flags Argument on page 6-6.
Returns
For example:
select array_combine(regexp_extract_all_sp('Robert Szissel, 128 Folson St, Boston', '([^,]*),[[:space:][:digit:]]*([^[:space:]]*).*,[[:space:]]*(.*)'),'|' );
This example returns Robert Szissel, 128 Folson St, Boston|Robert Szissel|Folson|Boston
regexp_extract_sp
Processes the specified regular expression on the varchar input, returning the specified sub-pattern.
Description
The regexp_extract_sp function has the following syntax:
varchar = regexp_extract_sp(varchar input, varchar pattern , int start_pos , int reference[, varchar flags]);
The input value specifies the varchar on which the regular expression is processed The pattern value specifies the regular expression. The start_pos value specifies the character position at which to start the extract (defaults to position 1). The reference value specifies which instance of the pattern to extract. For a description of flags, see The Flags Argument on page 6-6.
Returns
For example, consider the following database:
6-8
D20484
Rev.1
create table sample(col1 varchar(20)); CREATE TABLE insert into sample values('bcaaabc'); INSERT 0 1 insert into sample values('abcbc'); INSERT 0 1 insert into sample values('bbb'); INSERT 0 1 insert into sample values('bcd'); INSERT 0 1 insert into sample values('bccdebc'); INSERT 0 1 insert into sample values('def'); INSERT 0 1 insert into sample values('efgbcbc'); INSERT 0 1
c c f f
regexp_instr
Pulls out the index of the matching text item.
Description
The regexp_instr function has the following syntax:
int = regexp_instr(varchar input, varchar pattern [, int start_pos [, int reference]] [, varchar flags]);
The input value specifies the varchar on which the regular expression is processed The pattern value specifies the regular expression. The start_pos value specifies the character position at which to start the search for a match (defaults to position 1). The reference value indicates a specific instance of the pattern. For a description of flags, see The Flags Argument on page 6-6.
D20484
Rev.1
6-9
Returns
If there is no match, or else if there are less than reference occurrences of the pattern, this will return 0. For example:
select regexp_extract(hello to you, .o,1,1); select regexp_extract(hello to you, .o,1,2); select regexp_extract(hello to you, .o,1,3);
This first example returns 4, the second returns 7, and the third returns 10.
regexp_like
Returns true if there is at least one matching occurrence in input.
Description
The regexp_like function has the following syntax:
bool = regexp_like(varchar input, varchar pattern [, int start_pos] [, varchar flags]);
The input value specifies the varchar on which the regular expression is processed. The pattern value specifies the regular expression. The start_pos value specifies the character position at which to start the search for a match (defaults to position 1). For a description of flags, see The Flags Argument on page 6-6.
Returns
For example:
select regexp_like('my password is 09124 or 069az6','[0-9][^0-9]+[09]$');
regexp_match_count
Returns the number of matching occurrences in input.
Description
The regexp_match_count function has the following syntax:
int = regexp_match_count(varchar input, varchar pattern [, int start_pos] [, varchar flags]);
The input value specifies the varchar on which the regular expression is processed. The pattern value specifies the regular expression. The start_pos value specifies the character position at which to start the search for a match (defaults to position 1). For a description of flags, see The Flags Argument on page 6-6.
6-10
D20484
Rev.1
Returns
For example:
select regexp_match_count('Steven Jones and Stephen Smith are the best players','Ste(v|ph)en');
regexp_replace
Replaces each instance of pattern in input with the value in the varchar replacement.
Description
The regexp_replace function has the following syntax:
varchar = regexp_replace(varchar input, varchar pattern, varchar replacement [, int start_pos [, int reference]] [, varchar flags]);
The input value specifies the varchar on which the regular expression is processed The pattern value specifies the regular expression. The replacement value specifies the value to substitute for each instance of pattern. The start_pos value specifies the character position at which to start the replace (defaults to position 1) The reference value specifies which instance of the pattern to replace. For a description of flags, see The Flags Argument on page 6-6.
Returns
If reference is set to 0 (or not specified) then all occurrences of the string will be replaced. For example:
select regexp_replace('Awake! Fear, Fire, Foes!','Foes','Flee');
regexp_replace_sp
Processes the specified regular expression on the varchar input and replaces each instance of a sub-pattern with the values in the array replacements.
Description
The regexp_replace_sp function has the following syntax:
varchar = regexp_replace_sp(varchar input, varchar pattern, array replacements [, int start_pos] [, varchar flags]);
The input value specifies the varchar on which the regular expression is processed The pattern value specifies the regular expression. The replacement array specifies the values to substitute for each instance of the subpattern. The start_pos value specifies the character position at which to start the replace (defaults to position 1)
D20484
Rev.1
6-11
Returns
For example:
select regexp_replace_sp('Robert Szissel, 128 Folson St, Boston', '([[:digit:]]+)[^.]*,.*(Boston)', array_split('37000,Cleveland', ','));
6-12
D20484
Rev.1
CHAPTER 7
Text Utility
Whats in this chapter
Text Utility Function Reference
The text utility functions in this chapter enable you to convert between ASCII hexadecimal and ASCII, substitute substrings, and extract substrings.
hextoraw
Interprets each pair of characters (left to right) in the input varchar as the hexadecimal code for an ASCII character and converts the hexadecimal sequence into a character string.
Description
The hextoraw function has the following syntax:
varchar = hextoraw(varchar input);
Returns
For example:
SELECT hextoraw(68656C6C6f);
rawtohex
Converts a character string into the ASCII hexadecimal representation.
Description
The rawtohex function has the following syntax:
varchar = rawtohex(varchar input);
7-1
Returns
For example:
SELECT rawtohex(hello);
replace
Replaces each instance of pattern in input with the value in the varchar replacement.
Description
The replace function has the following syntax:
varchar = replace(varchar input, varchar pattern, varchar replacement);
The input value specifies the varchar in which the characters are replaced. The pattern value specifies the characters to replace. The replacement value specifies the characters to substitute for each instance of pattern.
Returns
For example:
select replace('persisaent','a','t');
strleft
Returns the left-most n characters from the varchar input.
Description
The strleft function has the following syntax:
varchar = strleft(varchar input, int n);
The input value specifies the varchar from which the characters are returned. The n value specifies the number of characters to return.
Returns
For example:
Select strleft ('1234567891',5)
7-2
D20484
Rev.1
strright
Returns the right-most n characters from the varchar input.
Description
The strright function has the following syntax:
varchar = strright(varchar input, int n);
The input value specifies the varchar from which the characters are returned. The int value specifies the number of characters to return.
Returns
For example:
Select strright ('1234567891',5)
D20484
Rev.1
7-3
7-4
D20484
Rev.1
CHAPTER 8
Array
Whats in this chapter
Array Function Reference
The array functions in the Netezza SQL Extensions Toolkit rely on the array data type. Because the Netezza database currently does not support user-defined types, the array type is stored in a varchar field. The maximum size of a varchar field is 64000 bytes. The array type consists of a sequence of name-value pairs. Names can be a maximum of 40 characters in width. Values can be any built-in SQL type, but must be the same type for the entire array. Elements can be referenced by either name or by the 1-based index.
add_element
Appends a new array element to the end of the input array and assign it the specified value. This is an overloaded function, with 7 forms corresponding to the 7 data types.
Description
The syntax of the add_element function has eight forms, one for each data type:
array = add_element(array input, varchar value [, varchar name]) array = add_element(array input, nvarchar value [, varchar name]) array = add_element(array input, int8 value [, varchar name]) array = add_element(array input, double value [, varchar name]) array = add_element(array input, time value [, varchar name]) array = add_element(array input, date value [, varchar name]) array = add_element(array input, timestamp value [, varchar name]);
The input value specifies the array to which the element is appended. The value value specifies the value to store in the new array element. The optional name value specifies the name of the array element being appended.
8-1
Returns
For example:
add_element(my_array, 45)
Assuming my_array has four elements, then this example appends a fifth element to the end of the array and stores the value 45 in that element
array
Creates an array of the given type.
Description
The array function has the following syntax:
array = array(int type);
The type value specifies the type of array to create. The type takes an integer code between 1 and 11 that indicates the type, as shown in the following table: Table 8-1: Array Types Code 1 2 3 4 5 6
Type
Size 8 bit 16 bit 32 bit 64 bit Ranging from January 1, 0001, to December 31, 9999. Disk Usage: 4 bytes Hours, minutes, and seconds to 6 decimal positions. Ranging from 00:00:00.000000 to 23:59:59.999999. Disk Usage: 8 bytes Has a date part and a time part, with seconds stored to 6 decimal positions. Ranging from January 1, 0001 00:00:00.000000 to December 31, 9999 23:59:59.999999. Disk Usage: 8 bytes Variable length to a maximum length of n. No blank padding, stored as entered. The maximum character string size is 64,000. Uses N+2 or fewer bytes depending on the data. Variable-length Unicode data with a maximum length of 16000 characters. Using UTF-8 encoding, each Unicode code point requires 1-4 bytes of storage. So a 10-character string requires 10-bytes of storage if it is ASCII, up to 20 bytes if it is Latin, or as many as 40 bytes if it is pure Kanji (but typically 30 bytes). Floating point number with precision 1 to 15. Precision less than 6 uses 4 bytes. Precision between 7 and 15 uses 8 bytes. Equivalent to float with precision 15, using 8 bytes
Timestamp
Varchar
NvarChar
10 11
Float Double
8-2
D20484
Rev.1
Returns
For example:
create table array_t(col1 int,col2 varchar(100));
array_combine
Combines the array elements in the array input into a single varchar delimited by delimiter.
Description
The array_combine function has the following syntax:
varchar = array_combine(array input, char delimiter);
The input value specifies the array to decompose into a single varchar. The delimiter value specifies the delimiter that distinguishes the array elements.
Returns
For example:
select array_combine(col2,'|')from array_t;
array_concat
Concatenates two arrays, creating a new array that contains all the elements in the first array followed by all the elements in the second array. Note: The two arrays must be of the same type and element names cannot be the same.
Description
The array_concat function has the following syntax:
array = array_concat(array array1, array array2);
The array1 value specifies the first of the two arrays to concatenate. The array2 value specifies the second of the two arrays to concatenate.
Returns
For example:
select (array_concat (array(2),array(2)));
array_count
Returns the number of elements in the array.
Description
The array_count function has the following syntax:
int = array_count(array input);
D20484
Rev.1
8-3
Returns
For example:
select array_count(col2)from array_t;
array_split
Parses the input for elements separated by a delimiter to create an array.
Description
The array_concat function has the following syntax:
array = array_split(varchar input, varchar delimiter [, [int type]);
The input value specifies a character delimited list of elements. The delimiter value specifies the delimiter used in the input. The optional type value specifies the type of the array; the type defaults to varchar.
Returns
For example:
select array_combine(array_split('1,2,3,4,5,6,7,8',','),'|');
array_type
Returns the type of the array.
Description
The array_type function has the following syntax:
int = array_type(array input);
The input value specifies the array for which to get the type.
Returns
For example:
select array_type(array(4));
This example returns 4: This second example determines the array type of an array that is stored in a table:
select array_type(col2)from array_t;
8-4
D20484
Rev.1
delete_element
Deletes an element from the input array.
Description
The syntax for the delete_element function supports deleting by name or by index:
array = delete_element(array input, int index); array = delete_element(array input, varchar name);
The input value specifies the array which contains the element to delete. The index value specifies the index of the element to delete from the input array. The name value specifies the name of the element to delete from the input array.
Returns
For example:
select delete_element(col2,1)from array_t;
element_name
Returns the name of an element if it exists.
Description
The element_name function has the following syntax:
varchar = element_name(array input, int index);
The input value specifies the array which contains the named element. The index value specifies the element for which to retrieve the name.
Returns
For example:
select element_name(add_element(array(4),4,'Netezza'),1);
get_value_type
Retrieves the value stored in the specified array element. The name of the function is of the form get_value_type, where type is the data type of the element to retrieve, for example get_value_varchar. There are seven data types, but there are two versions of the function for each data type, enabling you to retrieve array elements by index or by name.
Description
The get_value_type function has the following syntax:
varchar = get_value_varchar(array input, int index); varchar = get_value_varchar(array input, varchar name); nvarchar = get_value_nvarchar(array input, int index); nvarchar = get_value_nvarchar(array input, varchar name);
D20484
Rev.1
8-5
int8 = get_value_int(array input, int index); int8 = get_value_int(array input, varchar name); double = get_value_double(array input, int index); double = get_value_double(array input, varchar name); time = get_value_time(array input, int index); time = get_value_time(array input, varchar name); date = get_value_date(array input, int index); date = get_value_date(array input, varchar name); time_tz = get_value_timestamp(array input, int index); time_tz = get_value_timestamp(array input, varchar name);
The input value specifies the array which contains the element to retrieve. The index value specifies the index of the element to retrieve from the input array. The name value specifies the name of the element to retrieve from the input array.
Returns
This function attempts to perform type conversion if the specified element is of a different type than the function returns. If unsuccessful in conversion, or if the element does not exist, it will return an error. For example:
select get_value_int(col2,1)from array_t;
replace_element
Replaces an array element in the input array. This is an overloaded function, with 14 forms corresponding to the 7 data types (by name or by array index).
Description
The syntax of the add_element function has 16 variations, two for each of the 8 data types (one for referencing an element by name and one for referencing an element by index):
array = replace_element(array input, int index, varchar value) array = replace_element(array input, varchar name, varchar value) array = replace_element(array input, int index, nvarchar value) array = replace_element(array input, varchar name, nvarchar value) array = replace_element(array input, int index, int8 value) array = replace_element(array input, varchar name, int8 value) array = replace_element(array input, int index, double value) array = replace_element(array input, varchar name, double value) array = replace_element(array input, int index, time value) array = replace_element(array input, varchar name, time value) array = replace_element(array input, int index, date value) array = replace_element(array input, varchar name, date value) array = replace_element(array input, int index, timestamp value) array = replace_element(array input, varchar name, timestamp value);
8-6
D20484
Rev.1
The input value specifies the array in which the element is replaced. The index value specifies the position in the array at which the element is replaced. The name value specifies the name of the array element to replace. value specifies the new value for the specified array element.
Returns
For example:
select replace_element(col2,1,15)from array_t;
D20484
Rev.1
8-7
8-8
D20484
Rev.1
CHAPTER 9
Collection
Whats in this chapter
User Type Collection Collection Function Reference
Collections are useful for grouping together heterogeneous information; in other words, information of different data types can be stored in each element in the collection, unlike arrays in which each element must be of the same data type.
collection
Creates an empty collection.
Description
The collection function has the following syntax:
collection = collection();
Returns
For example:
create table collection_t(col1 int, col2 varchar(100));
9-1
element_type
Returns the type of the collection element.
Description
The element_type function has the following syntax:
int = element_type(collection input, int index); int = element_type(collection input, varchar name);
The input value specifies the collection. The index value specifies the index of the element to find the type of. The name value specifies the name of the element to find the type of.
Returns
For example:
select element_type(col2,1)from collection_t;
9-2
D20484
Rev.1
C H A P T E R 10
Miscellaneous
Whats in this chapter
Miscellaneous Function Reference
This chapter contains those functions that do not fit neatly into the functional groupings in the preceding chapters of this manual.
greatest
Returns the largest of the input values, up to a maximum of four (variable length lists are not supported).
Description
The syntax of the function has three forms, depending on the data type of the values being compared:
int4 = Greatest(int4 value1, int4 value2, ...); int8 = Greatest(int8 value1, int8 value2, ...); double = Greatest(double value1, double value2, ...);
The value1 value specifies the first input to compare. The value2 value specifies the second input to compare. The value3 value specifies the third input to compare. The value4 value specifies the fourth input to compare
Returns
For example:
select greatest(12,45,85);
10-1
least
Returns the smallest of the input parameters, up to a maximum of four (variable length lists are not supported).
Description
The syntax of the function has three forms, depending on the data type of the values being compared:
int4 = Least(int4 value1, int4 value2, ...); int8 = Least(int8 value1, int8 value2, ...); double = Least(double value1, double value2, ...);
The value1 value specifies the first input to compare. The value2 value specifies the second input to compare. The value3 value specifies the third input to compare. The value4 value specifies the fourth input to compare
Returns
For example:
select least(14,45,75);
mt_random
Returns a pseudo-random number between 0.0 and 1.0 using the Mersenne Twister pseudo-random number generator, an open source library that quickly generates high quality pseudo-random numbers with a period of 219937 and very good distribution. The pseudo-random numbers are excellent for simulations, such as Monte Carlo simulations, as well as for polling, for example providing a random sample of 1000 records from a table of one million records. This algorithm by itself is not suitable for cryptography because as few as 624 iterations are required to predict all future iterations. Wrapping this function with a hash function is likely sufficient to provide cryptographically secure random numbers. Note: NPS offers a built-in random() function which is based on the Linear Congruential Generator algorithm. The Mersenne Twister algorithm is often favored for certain randomness applications.
Description
The mt_random function has the following syntax:
mt_random = mt_random();
Returns
The function returns a pseudo random number between 0.0 and 1.0. The following example pulls a very well distributed random sample of 10 records from the Customer_Table:
SELECT * FROM Customer_Table ORDER BY mt_random() LIMIT 10;
10-2
D20484
Rev.1
corr
This aggregate function returns the correlation coefficient of the set of inputa to inputb.
Description
The corr function has the following syntax:
double = corr(Set(double) inputa, Set(double) inputb);
The inputa value specifies the first in the set. The inputb value specifies the next in the set.
Returns
For example, assuming a table function_t with following values 1.2, 1.4, and 1.6 in col1 and the values1.4, 1.6, and 1.8 in col2:
select corr(col1,col2)from function_t;
covar_pop
This aggregate function returns the population-based covariance of the set of number pairs inputa and inputb.
Description
The covar_pop function has the following syntax:
double = covar_pop(Set(double) inputa, Set(double) inputb);
The inputa value specifies the first number of the set. The inputb value specifies the next number of the set.
Returns
For example, assuming a table function_t with following values 1.2, 1.4, and 1.6 in col1 and the values1.4, 1.6, and 1.8 in col2:
select covar_pop(col1,col2)from function_t;
covar_samp
This aggregate function returns the sample-based covariance of the set of number pairs inputa and inputb.
Description
The covar_samp function has the following syntax:
double = covar_samp(Set(double) inputa, Set(double) inputb);
The inputa value specifies the first number of the set. The inputb value specifies the next number of the set.
D20484
Rev.1
10-3
Returns
For example, assuming a table function_t with following values 1.2, 1.4, and 1.6 in col1 and the values1.4, 1.6, and 1.8 in col2:
select covar_samp(col1,col2)from function_t;
10-4
D20484
Rev.1
Index
Index
A
accented characters 6-1 add_element 8-1 Adler algorithm 4-3 Advanced Encryption Standard 3-2 AES 3-2 Algorithms AES 3-2 algorithms Adler 4-3 CRC32 4-3 Damerau-Levenshtein 6-1 DEFLATE 3-1 Double-Metaphone 6-1, 6-3, 6-4, 6-5 Jenkins 4-3 MD5 4-2 Mersenne Twister 10-2 SHA 4-2 Soundex 6-1, 6-3, 6-4, 6-5 array data type 8-1, 8-2 array function 8-2 array_combine 8-3 array_concat 8-3 array_count 8-3 array_split 8-4 array_type 8-4 ASCII 6-1, 6-3 ASCII to hexadecimal conversions 7-1
D
Damerau-Levenshtein algorithm 6-1 data transformation functions 3-1 data type array 8-1, 8-2 array elements 8-2 collection 9-1 converting in XMLExtractValue 2-12 date 5-1 implicit conversion of date and time 5-1 SQL 2-2 time 5-1 timestamp 5-1 type checking 6-1 user defined types 2-2 database, registering SQL Extension functions in 1-2 date data type 5-1 day function 5-1 days_between 5-2 dbl_mp 6-4 decompress function 3-2 decrypt function 3-2 DEFLATE compression algorithm 3-1 delete_element 8-5 detecting duplicated records 4-1 dle_dst 6-2 double function 10-2 Double-Metaphone algorithm 6-1, 6-3, 6-4, 6-5 duplicate records, detecting 4-1
B
backups, for SQL Extensions toolkit 1-6
E
element_name 8-5 element_type 9-2 encrypt function 3-2 encryption private key 3-2 secret key 3-2 symmetric 3-2 examples XMLAgg 2-4 XMLConcat 2-4 XMLElement 2-2, 2-3, 2-4 XMLSerialize 2-2 expressions, XPath 2-7
C
characters, accented 6-1 checksum hash function 4-3 checksums 4-1 collection data type 9-1 compress function 3-1 conversion ASCII to hexadecimal 7-1 hexadecimal to ASCII 7-1 corr function 10-3 correlation coefficient 10-3 covar_pop 10-3 covar_samp 10-3 covariance population based 10-3 sample-based 10-3 CRC32 algorithm 4-3 cryptographic hash function 4-2 cryptography 4-1
F
fuzzy comparisons 6-1
G
get_value_date 8-5 get_value_double 8-5 get_value_int 8-5 get_value_nvarchar 8-5 get_value_time 8-5
Index-1
Index
N
Netezza SQL Extensions Toolkit backups and restores 1-6 disabling in a database 1-4 displaying version 1-4 installing 1-2 obtaining 1-1 registering functions in a database 1-2 removing 1-5 upgrading 1-4 next_month 5-4 next_quarter 5-4 next_year 5-4 nysiis 6-4 NzAdmin screenshot with functions 1-3
H
hash function 4-2 cryptographic 4-2 hash functions checksum 4-3 lookup 4-3 lookups 4-3 hash table 4-1 hash4 4-3 hash8 4-3 hexadecimal to ASCII conversions 7-1 hextoraw 7-1 hour function 5-2 hours_between 5-2
O
ODBC conversations 3-3
I
installation instructions 1-2 ISO/IEC 9075-14 2-1 IsValidXML 2-5, 2-8 IsXML 2-8
P
passwords, verifying 4-1 pattern matching 6-1 Perl 5 regular expressions 6-6 pg.log file 3-3 phonetic comparisons 6-1, 6-3 population-based covariance 10-3 Porter algorithm algorithms Porter 6-3 private key encryption 3-2 pseudo-random number 10-2 publishing XML data 2-2
J
JDBC conversations 3-3 Jenkins algorithm 4-3
K
key search attacks 3-2
R
random function 10-2 random number generator 10-2 range checks 6-1 rawtohex 7-1 regexp_extract 6-7 regexp_extract_all 6-7 regexp_extract_all_sp 6-8 regexp_extract_sp 6-8 regexp_instr 6-9 regexp_like 6-10 regexp_match_count 6-10 regexp_replace 6-11 regexp_replace_sp 6-11 regular expressions 6-1 flags argument 6-6 overview 6-6 removal instructions 1-5 replace function 7-2 replace_element 8-6 restores, for SQL Extensions toolkit 1-6
L
le_dst 6-2 least function 10-2 Levenshtein algorithm algorithms Levenshtein 6-2, 6-3 lexical comparisons 6-1 libnetsqlextensions.tar.gz file, untarring 1-2 license information 1-1 locating spatial points 4-1 lookup hash function 4-3 lookups 4-1
M
MD5 algorithm 4-2 Mersenne Twister algorithm 10-2 messages, verifying integrity 4-1 minute function 5-3 minutes_between 5-3 month function 5-3
Index-2
Index
S
sample-based covariance 10-3 second function 5-5 seconds_between 5-5 secret key encryption 3-2 SHA algorithm 4-2 Soundex algorithm 6-1, 6-3, 6-4, 6-5 spatial points, locating 4-1 SQL 2003 2-1, 2-2 SQL Extension functions, registering 1-2 SQL Functions toolkit disabling 1-4 strleft 7-2 strright 7-3 symmetric encryption 3-2 system prerequisites 1-1
X
XML data type 2-2, 2-13, 2-14 XML data, publishing 2-2 XML examples XMLAgg 2-4 XMLConcat 2-4 XMLElement 2-2, 2-3, 2-4 XMLSerialize 2-2 XML functions, nesting 2-3 XML standalone property 2-14 XML version property 2-14 XMLAGG 2-9 XMLAgg 2-1, 2-2 XMLAttributes 2-1, 2-2, 2-10 XMLConcat 2-1, 2-2, 2-10 XMLElement 2-1, 2-2, 2-11 XMLExistsNode 2-1, 2-11 XMLExtract 2-1, 2-12 XMLExtractValue 2-1, 2-12 XMLParse 2-13 XMLRoot 2-1, 2-14 XMLSerialize 2-14 XMLUpdate 2-1, 2-15 XPath expressions 2-7
T
text fuzzy comparisons 6-1 lexical comparisons 6-1 phonetic comparisons 6-1, 6-3 regular expressions 6-1 this_month 5-5 this_quarter 5-6 this_week 5-6 this_year 5-6 time data type 5-1 timestamp data type 5-1 transliterating accented characters 6-1 type checks 6-1
Y
year function 5-7
Z
zlib library 3-1 zone maps 4-2
U
UDFs 2-1, 2-2 uninstall instructions 1-5 user accounts, permissions 1-4 user defined types 2-2 uudecode 3-4 uuencode 3-3
V
verifying message integrity 4-1 verifying passwords 4-1 version, displaying for SQL Extensions toolkit 1-4
W
weeks_between 5-7 word_diff 6-1 word_find 6-2 word_key 6-3 word_key_tochar 6-4 word_keys_diff 6-5 word_stem 6-6
Index-3
Index
Index-4