80827v00 Machine Learning Section4 Ebook v03

1.
PHOENIX – Select a data source to work on

1.1 Assign the load(s) to yourself in Phoenix
1.2 If there are multiple loads from the same source data, assign them ALL to yourself
1.3 Copy data from Z:\ (only keep the recent record)
1.3.1 Copy the files to your personal directory
1.4 Analysis – review the data source in Phoenix
1.4.1 Review the Comments fields
1.4.2 Review the Contact Tab – check if there is newer data available
If there is newer data, download it into an appropriately named folder on the server
(see BestPractices - RecordDates.docx) and save a zipped copy (see Section 1.1 of
BestPractices- Data.docx).
1.4.3 If the data is 3+ months old, check with manager
1.4.4 Ensure that the Record Date for the Load(s) is appropriate in Work – Load Status
2. EXCEL (.xlsx)
2.1 Clean the data if necessary, and save a copy
Check the data for hidden columns and aggregate (i.e., totals, subtotals) or blank rows. If
necessary, remove extraneous items from the source file. Check last row of data in Excel:
CTRL+SHIFT+END
2.2 Check MaxLength in Macro
If there are fields with >4000 characters, they will need to be fit into CLOB flex fields.
If the company name or address fields have values >80 characters, these will need to be
split in Oracle.
2.3 Date fields are preserved using Text to Columns in Excel.
IT IS ESSENTIAL THAT THE DATA IN ORACLE MATCH THE DATA FROM THE SOURCE – THE
DATES WE SEE IN ORACLE MUST MATCH THE DATE FORMAT THAT WE SEE IN THE SOURCE
DATA.
2.4 Are LAT & LONG values integer or two decimals?
3. PYTHON LOAD (.csv) – Ask Ian/Thomas to concert to (.accdb)

4. ACCESS (.accdb) – Create Access file
4.1 Import the required tables into MS Access
Import the data into MS Access by selecting the correct source file type (e.g., Excel, text,
.dbf). See Figure 8.
4.2 Avoid truncation, BINARY and DATE format conversions, and column name changes
When importing from the source file, pay careful attention to import fields as “Text” or
“Memo” to avoid date conversions, binary conversions, and truncation
(Text <255, Memo >255)
4.3 Insert a dummy row of ‘Text’ to automatically have Access import as text
4.3.1 Highlight the required date column by clicking on the header of the column.
4.3.2 Go to Data → Text to Columns. See screenshot below
4.3.3 This brings up a pop-up screen, click on next. The settings in this screen are left as
default.
Delimited → Next
4.3.3 In the next screen, the settings are left as default.

Tab → Text qualifier: * → Next
4.3.4 In the final screen, Text is picked from the Radiobutton.

Text → Finish
4.4 If the number of Excel files are numerous, use VBA code to auto import
4.5 Run ReplaceSpecialCharactersInAccess.accdb
5. ORACLE – Match → Update
5.1 Load required tables from Access to Oracle
5.1.1 Open SQL Developer
5.1.2 Create a New Connection to import your table into SQL Developer
The New/Select Database Connection window will appear. Enter a Connection Name relevant
to the data source (any name is fine since connections are local to your SQL Developer – no
other users will see them), and click on Access to import an Access database
Navigate to the location of the Access file, click Open in the Open File dialog window, then click
Connect to complete the connection. You should now see your new connection in your list of
connections.
To copy a table from the Access database, expand the connection to see the tables. Once you
see the table you wish to import, right click and Copy to Oracle. Note that if you copy the table
into a shared connection, other users will see the table name.
5.1.3 Copy the table into “Oracle” workplace. Click on connection, expand “Table” and right
click on “Source Table”
5.1.4 Click “Copy to Oracle”, the window below will pop up. Choose the correct connection
(US or Canada).
5.1.5 Click “Apply”, the table will get copied to Oracle, click “OK” after the table is finished
copied.
5.1.6 To confirm the table was copied to the work place, go to “ERIS_US_LOAD” and click on
“Tables”, this should open up all the tables in the work place, including your table that
was just imported and copied to the work place
5.2 Oracle split screen → click 'New Document Tab Group'

5.3 Review old copies of the tables from the previous load
Create and save a copy of the existing load procedure by appending the eris_data record
date (the record date of the last load) to the end.
Rename procedure: LOAD_SOURCE_PROV_DDMMMYYYY
5.4 Match previous table with current table
Name old table → SOURCE_PROV_OLD
Determining which records already exist in production, matching them to edw records, and
populating the eris_data_ids requires that the analyst:
• Understand the source provided primary key (if there is one)
o Is it unique?
o Are there cases where the same facility ID/registration number is assigned to
different facilities/locations? Is the source provided primary key being recycled?
• Understand the data
o Which fields can be used for matching records?
5.4.1 Document which fields were used to match records

5.4.2 Use trim( ) and upper( ) where appropriate
5.4.3 Identifying records and setting to Insert
After matching, set edw ‘Update’ records back to ‘Insert’ (and set eris_data_id to null)
where these 3 conditions are met:
1) Status is ‘Update’
2) Length(EDW.address_orig) < 3 OR EDW.address_orig is null (invalid address)
3) Length(ED.address_orig) > 2 (valid address)
What happens is that we will Insert the new record, and the ‘old’ record in ED will be
moved to the receptacle.
5.5 Make a note in Load Comments

SY DDMMMYYYY
- no clob/sub/multi
- no flex changes
6. PHOENIX – Address Cleaning

Use Phoenix function PH 1789 – Work – Address Clean to clean addresses.
Click Run All → Match City
6.1 Populate address_dmti and city_dmti fields

SELECT distinct city_orig FROM eris_data_work WHERE source = 'THE_SOURCE'
SELECT * FROM eris_data_work WHERE source = 'THE_SOURCE'

AND ((SUBSTR(city_orig, 1,1) = SUBSTR(city_orig,2,1) AND
SUBSTR(city_orig,2,1) = SUBSTR(city_orig,3,1)) OR (length(city_orig) <= 3) OR
LOWER(city_orig) IN
7. QAS – standardize values for ADDRESS, CITY and Geocode
7.1 >> → Source: SOURCE_PROV → Start

7.2 Tools → Geocode
7.3 Tools -> Update Coordinates
7.4 Review geocoding results in Oracle

Verify that the record count of the QAS and geocoding results makes sense – remember, only
records with values in ADDRESS_DMTI will be included.
7.5 Query ALL unplottables with null address/city and source coordinates in state
7.6 PWSW (water wells) sources do not follow the same rules for using source provided
coordinates
Each well should be it’s own eris_data_work record.
select flex_longitude, flex_latitude, edw.get_state(flex_longitude,
flex_latitude, 'UT')
from eris_data_work
where source = 'SOURCE_UT'
and edw.get_state(flex_longitude, flex_latitude, 'UT') = 'UT';
After the select, run the update query to update:
update eris_Data_work
set x= flex_longitude, y = flex_latitude, rcode ='800'
where source = 'SOURCE_UT'
and edw.get_state(flex_longitude, flex_latitude, 'UT') = 'UT';
8. Query and export Missing(Moved) Facilities

9. Populate and modify flex_work for new loads/significant FLEX changes
9.1 Update report labels to reflect new/modified ID field labels
10. Complete the Data Loads Checklist (appendix in Data Best Practices)
10.1 Perform mizu_check in SQL Developer

80827v00 Machine Learning Section4 Ebook v03

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

80827v00 Machine Learning Section4 Ebook v03

Uploaded by

Copyright:

Available Formats

1.

PHOENIX – Select a data source to work on

2.4 Are LAT & LONG values integer or two decimals?

3. PYTHON LOAD (.csv) – Ask Ian/Thomas to concert to (.accdb)

4.3.3 In the next screen, the settings are left as default.

4.3.4 In the final screen, Text is picked from the Radiobutton.

5.2 Oracle split screen → click 'New Document Tab Group'

5.4.1 Document which fields were used to match records

5.5 Make a note in Load Comments

6. PHOENIX – Address Cleaning

6.1 Populate address_dmti and city_dmti fields

SELECT * FROM eris_data_work WHERE source = 'THE_SOURCE'

7.1 >> → Source: SOURCE_PROV → Start

7.4 Review geocoding results in Oracle

8. Query and export Missing(Moved) Facilities

You might also like