Professional Documents
Culture Documents
cover
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide.
The following are trademarks of International Business Machines Corporation, registered in many
jurisdictions worldwide:
BigInsights DB2 IBM Watson
InfoSphere Many Eyes Notes
Power
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Windows is a trademark of Microsoft Corporation in the United States, other countries, or both.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of
Oracle and/or its affiliates.
VMware and the VMware "boxes" logo and design, Virtual SMP and VMotion are registered
trademarks or trademarks (the "Marks") of VMware, Inc. in the United States and/or other
jurisdictions.
Other product and service names might be trademarks of IBM or other companies.
TOC Contents
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Exercises configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Exercises description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
iv BigInsights Analytics for Business Analysts Copyright IBM Corp. 2012, 2014
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V9.0
Instructor Exercises Guide
TMK
Trademarks
The reader should recognize that the following terms, which appear in the content of this training
document, are official trademarks of IBM or other companies:
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide.
The following are trademarks of International Business Machines Corporation, registered in many
jurisdictions worldwide:
BigInsights DB2 IBM Watson
InfoSphere Many Eyes Notes
Power
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Windows is a trademark of Microsoft Corporation in the United States, other countries, or both.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of
Oracle and/or its affiliates.
VMware and the VMware "boxes" logo and design, Virtual SMP and VMotion are registered
trademarks or trademarks (the "Marks") of VMware, Inc. in the United States and/or other
jurisdictions.
Other product and service names might be trademarks of IBM or other companies.
vi BigInsights Analytics for Business Analysts Copyright IBM Corp. 2012, 2014
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V9.0
Instructor Exercises Guide
pref
Instructor exercises overview
Each exercise depends on successful completion of the first exercise.
viii BigInsights Analytics for Business Analysts Copyright IBM Corp. 2012, 2014
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V9.0
Instructor Exercises Guide
pref
Exercises configuration
Add instructions to the instructor on configuration issues like:
Each student has a separate system and students work independently.
x BigInsights Analytics for Business Analysts Copyright IBM Corp. 2012, 2014
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V9.0
Instructor Exercises Guide
pref
Exercises description
This course includes the following exercises:
Importing Data into Workbooks
Adding Sheets to a Workbook
Working with Functions
Big SQL Integration
Analyzing Social Media and Structured Data
In the exercise instructions, you can check off the line before each step as
you complete it to track your progress.
Most exercises include required sections which should always be completed.
It might be necessary to complete these sections before you can start later
exercises. Some exercises might also include optional sections that you
might want to complete if you have sufficient time and want an extra
challenge.
xii BigInsights Analytics for Business Analysts Copyright IBM Corp. 2012, 2014
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V9.0
Instructor Exercises Guide
EXempty
Exercise 1. Importing Data into a Workbook
Estimated time
0:30
Requirements
Requires the DW643 lab images.
Copyright IBM Corp. 2012, 2014 Exercise 1. Importing Data into a Workbook 1-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
Exercise instructions
Preface
The password for root is dalvm3. The password for biadmin is ibm2blue.
Classroom
__ 1. If you are not already logged on to your Windows machine, do so now. You must get the
userid and password from your instructor.
__ 2. Execute the VMPlayer program and choose to boot your lab image. This, too, might require
some instruction from your instructor.
__ 3. You might get prompted to create a new unique identifier. Choose to create a new one.
__ 4. When prompted for a userid and password, enter biadmin and a password of ibm2blue.
You now have two desktops. To remove possible confusion during the exercises, I suggest that
you maximize the VMPlayer desktop.
eLabs
To log into your eLab system, you point your browser to a Citrix server that is running somewhere in
IBM. You have your own Citrix userid and password.
Once you have logged into Citrix, click the icon that represents your lab image. Your VMWare
image should automatically boot.
__ 1. After your Linux image boots, enter biadmin and a password of ibm2blue.
You now have a desktop displayed in your browser window. To remove possible confusion
during the exercises, I suggest that you maximize your browser desktop.
1-2 BigInsights Analytics for Business Analysts Copyright IBM Corp. 2012, 2014
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V9.0
Instructor Exercises Guide
Important
If you used this image for some DW612 exercises, then you might have problems starting Hadoop.
(TaskTracker cannot start.) If this is the case, stop the monitoring processes and Hadoop.
$BIGINSIGHTS_HOME/bin/stop.sh monitoring
$BIGINSIGHTS_HOME/bin/stop.sh hadoop
Next, execute the following commands:
cd $BIGINSIGHTS_HOME/bin/hdm/IHC/bin
./start-dfs.sh
./start-mapred.sh that Map/Reduce and HDFS aer started.
Copyright IBM Corp. 2012, 2014 Exercise 1. Importing Data into a Workbook 1-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
Note
Due to the fact that you are accessing your desktop remotely, you can have a problem when drilling
down on directories using the double-clicking technique. Due to network delay, the double-clicks
might not arrive within the correct time interval. If this happens, select the directory and press the
Enter key.
1-4 BigInsights Analytics for Business Analysts Copyright IBM Corp. 2012, 2014
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V9.0
Instructor Exercises Guide
EXempty __ 5. Select the GutenbergDocs directory and click the Upload icon.
__ 6. Click the Browse pushbutton and drill down to
File System->home->biadmin->labfiles->DW64. Select last_of_the_mohicans.txt. Click
Open.
__ 7. Then, click OK.
Copyright IBM Corp. 2012, 2014 Exercise 1. Importing Data into a Workbook 1-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
End of exercise
1-6 BigInsights Analytics for Business Analysts Copyright IBM Corp. 2012, 2014
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V9.0
Instructor Exercises Guide
EXempty
Exercise 2. Adding Sheets to a Workbook
Estimated time
0:30
Requirements
Requires the DW643 lab images.
Requires exercise 1 to be completed
Copyright IBM Corp. 2012, 2014 Exercise 2. Adding Sheets to a Workbook 2-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
Exercise instructions
Section 1: Background
In the previous exercise you created a workbook based on the results of a web crawler. The crawler
was directed to extract information from a website that dealt with patents. Essentially the web
crawler looked at a site that had a list of names. Each name is a hyperlink to a page that lists the
patents for that person.
__ 1. In your web browser, open a new tab by clicking the plus-signed tab.
__ 2. Go to the following website:
http://www.ibm.com/software/ebusiness/jstart/bigsheets/demo/Patents.html
__ 3. There you see a list of names. Click on any name and it takes you to a page that lists all of
the patents registered to that individual. This is to give you a frame of reference when doing
this exercise.
__ 4. You can close this newly opened tab. (If you did not open a new tab, the click BigInsights
Console in the bookmark toolbar to point you back to the web console.)
2-2 BigInsights Analytics for Business Analysts Copyright IBM Corp. 2012, 2014
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V9.0
Instructor Exercises Guide
Copyright IBM Corp. 2012, 2014 Exercise 2. Adding Sheets to a Workbook 2-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
End of exercise
2-4 BigInsights Analytics for Business Analysts Copyright IBM Corp. 2012, 2014
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V9.0
Instructor Exercises Guide
EXempty
Exercise 3. Working with Functions
Estimated time
0:30
Requirements
Requires the DW643 lab images.
Requires exercise 1 to be completed
Copyright IBM Corp. 2012, 2014 Exercise 3. Working with Functions 3-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
Exercise instructions
Section 1: Background
In the first exercise, you ran the Word Count program. This generated a record for each unique
character string found in the specified document. It then totaled the number of occurrences of each
unique character string. Your goal is to create a sheet that has the number of occurrences for all
character strings that occur the same number of times. (If that is confusing, just bear with me.)
__ 1. Log into the BigInsights Console. biadmin \ ibm2blue
__ 2. Select the BigSheets tab.
__ 3. Select the Wordcount workbook. Displayed for each row is a character string, followed by a
tab character, followed by a number. The number represents the occurrences of that
character string in the last_of_the_mohicans.txt document.
Your mission is to count the number of character strings that occur the same number of
times.
In the previous exercise, you used Function sheets to apply functions to your data. This time,
you are going to add new columns to a sheet and code the needed functions directly.
3-2 BigInsights Analytics for Business Analysts Copyright IBM Corp. 2012, 2014
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V9.0
Instructor Exercises Guide
EXempty any character. The plus sign means any number of occurrences. The \t means the tab
character. So this extracts any number of characters that are followed by the tab character.
(.+\t)
__ 7. Once you have typed in the above, you can see the portion of the test data that matches.
Next, append the following to your regular expression rule. Translation: [0-9] means any
digit zero through nine. The plus sign means any number of occurrences.
[(0-9]+)
The entire expression should look like: (There is no space between the two groups.)
(.+\t)([0-9]+)
__ 8. The entire test string(s) should now be highlighted. If you look in the Matched area (lower
right), you should see the following under sub-pattern.
1:[abc ]2:[33]
This indicates that the second group extracted the value 33.
Note
For those of you who work with regular expressions, you might have wondered why I did not have
you code (.+\t)(\d+). The reason is that when that expression is entered directly into a BigSheets
regular expression function for a column, it mistakenly is detected as an error.
__ 9. You can close the regular expression wizard and then close Eclipse.
Copyright IBM Corp. 2012, 2014 Exercise 3. Working with Functions 3-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
You are to reference the Wordcount sheet and, as stated, this new column is to have the name
of Occurrences. The final parameter, 2, says to extract the second group from the regular
expression.
__ 7. Code the following in the fx field.
Wordcount!A1 : [Occurrences = GETGROUPMATCH(#Header,'(.+\t)([0-9]+)',2) ]
__ 8. Click the green checkmark.
Note
Now you have some additional knowledge that can help you with the second exercise to create a
sheet that returns the number of patents that are associated with each owner. However, you need
one or two more pieces of information that you will get from the last exercise.
End of exercise
3-4 BigInsights Analytics for Business Analysts Copyright IBM Corp. 2012, 2014
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V9.0
Instructor Exercises Guide
EXempty
Exercise 4. Big SQL Integration
Estimated time
0:20
Requirements
Requires the DW643 lab images.
Copyright IBM Corp. 2012, 2014 Exercise 4. Big SQL Integration 4-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
Exercise instructions
Section 1: Verify that the test data exists in the HDFS
Do the following steps to load the news_data.txt and products.csv into the HDFS.
__ 1. Log into the BigInsights Console. biadmin \ ibm2blue
__ 2. Select the Files tab.
__ 3. Drill down to user/biadmin and select that directory.
__ 4. Click the Create Directory icon and create a directory called Watson
__ 5. Select the Watson directory and click the Upload icon.
__ 6. Browse to home->biadmin->labfiles->DW64. Select news-data.txt and click Open.
__ 7. Then, click OK.
4-2 BigInsights Analytics for Business Analysts Copyright IBM Corp. 2012, 2014
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V9.0
Instructor Exercises Guide
EXempty __ 18. You can do this is through the BigInsights Console. Click the Welcome tab and select the
Run BIG SQL Queries under the Quick Links workbook.This will open up a new browser
tab.
__ 19. In the textbox, you can enter your BIG SQL query. Enter in a simple one to get started:
select * from sheets.WatsonNews;
__ 20. Click the Run pushbutton. You should see the spinning icon in the middle when it is running.
__ 21. You should now see results below in the Results tab.
__ 22. Enter in another query: select * from sheets.WatsonNews where country = US;
__ 23. The scope of this lab is not to teach Big SQL queries, so we will stop here. Understand that
now you have successfully create a table from which you can run Big SQL queries against
it.
Copyright IBM Corp. 2012, 2014 Exercise 4. Big SQL Integration 4-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
You are brought into the BigSheets tab where the new workbook you just created is on the screen.
From here you can do anything you normally would with a workbook, except you cannot create a
table, that pushbutton has been grayed out since this workbook came from a table.
End of exercise
4-4 BigInsights Analytics for Business Analysts Copyright IBM Corp. 2012, 2014
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V9.0
Instructor Exercises Guide
EXempty
Exercise 5. Analyzing Social Media and
Structured Data
Estimated time
2:00
Requirements
Requires the DW643 lab images.
Requires exercise 1 to be completed
Copyright IBM Corp. 2012, 2014 Exercise 5. Analyzing Social Media and Structured Data 5-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
Exercise instructions
Section 1: Load the Test Data into HDFS
The social media data used in this exercise has been loaded on your local system. This data was
created using the Boardreader app that comes with BigInsights. (Although to use this app, you must
have a license.)
Also, data was exported from a database system info a csv format.
__ 1. Log into the BigInsights Console. biadmin \ ibm2blue
__ 2. Select the Files tab.
__ 3. Drill down to biadmin and select that directory.
__ 4. Click the Create Directory icon and create a directory called DBMS.
__ 5. Select the Watson directory and click the Upload icon.
__ 6. Browse to File System->home->biadmin->labfiles->DW64. Select blogs-data.txt and click
Open.
__ 7. Select the DBMS directory and click the Upload icon.
__ 8. Browse to File System->home->biadmin->labfiles->DW64. Select RDBMS-data.csv and
click Open. Then, click OK.
5-2 BigInsights Analytics for Business Analysts Copyright IBM Corp. 2012, 2014
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V9.0
Instructor Exercises Guide
EXempty __ 20. Scroll to the bottom of the workbook to view the workbook details. If you do not see the
detail information, you should click the Toggle from Normal to Fullscreen icon that is
above the workbook data in the upper right of the BigSheets page.
__ 21. To add a tag, click the green plus sign. Type in a tag value and then click the green
checkmark. The tags to add are Watson, IBM, and Blogs.
__ 22. Next, using the same basic steps, create a second workbook for the news-data.txt file. Call
the workbook WatsonNews. Add Watson, IBM, and News as tags to this new workbook.
Start off by clicking the BigSheets tab.
__ 23. Click the BigSheets tab to get a list of all workbooks.
__ 24. Click the Tags pushbutton. A cloud list of tags gets displayed. Click News and only those
workbooks with that tag are displayed.
__ 25. In the filter field, (to the left of the Tags pushbutton) type in tag:Watson. Then, press Enter.
This is another way of filtering on a tag.
Copyright IBM Corp. 2012, 2014 Exercise 5. Analyzing Social Media and Structured Data 5-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
__ 35. For the Watson Blogs workbook, follow the same steps as above to remove unwanted
columns. Call this new workbook Watson Blogs Revised. Make sure to run your Watson
Blogs Revised workbook.
5-4 BigInsights Analytics for Business Analysts Copyright IBM Corp. 2012, 2014
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V9.0
Instructor Exercises Guide
EXempty __ 54. Click the Add Columns to Sort drop down box and select Type. Click the green plus sign.
Keep the default of Ascending.
__ 55. Click the green checkmark.
__ 56. Click the Fit column(s) pushbutton so that you can see both the Language and the Type
columns.
Note
The sort that was performed is only running on a subset of the data, in this case the first 2000
records. When you save and run the workbook, the sort gets applied to all of the data so you might
see some differences. For example, the subset of data has only one record where the Language is
Vietnamese. This changes when all of the data is used.
You can change the sampling size in $BIGINSIGHTS_HOME/sheets/conf/m2config.ini. The
property name is "sampleSize": 2000. You would have to restart the console after changing this
property.
Copyright IBM Corp. 2012, 2014 Exercise 5. Analyzing Social Media and Structured Data 5-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
__ 67. Give this new column a name of Language_Revised. (Remember, there cannot be any
spaces in column names.)
After saving the column name, the cursor was moved to the fx field. The idea is that you are
going to provide a function that is to be used to populate this new column.
__ 68. Here is the gist of the function. You want to look at the Language value for each row. If that
value begins with Chin, then you want the value in the Language_Revised column for that
row to be Chinese. Otherwise, you want the value to be what is in the Language column.
Type the following in the fx field.
IF(SEARCH(Chin*, #Language) > 0, Chinese, #Language)
__ 69. Click the green checkmark.
__ 70. Save, exit and run the workbook.
__ 71. Click triangle for the Language Coverage tab at the bottom to modify the chart settings.
__ 72. Select Chart Settings.
__ 73. Change the Value field to Language_Revised. Click the green checkmark.
__ 74. Click the Language Coverage tab to bring up the modified chart.
__ 75. Click the Run pushbutton.
__ 76. Now you can see that the Chinese segment is the second largest.
5-6 BigInsights Analytics for Business Analysts Copyright IBM Corp. 2012, 2014
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V9.0
Instructor Exercises Guide
Copyright IBM Corp. 2012, 2014 Exercise 5. Analyzing Social Media and Structured Data 5-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
5-8 BigInsights Analytics for Business Analysts Copyright IBM Corp. 2012, 2014
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V9.0
Instructor Exercises Guide
EXempty __ 127.In order to start with a workbook that has all of the items in it that you need, start with the
WatsonNewsAndBlogs workbook. Open this workbook.
__ 128.Build a new workbook and call it Watson Media Analytics.
__ 129.Again, you need the URLHOST column added to your new workbook. So, add a sheet that
runs the URLHOST function and carries over all of the columns. Call the sheet URL Hosts.
__ 130.Add a sheet that Loads the Media Contacts workbook into your new, Watson Media
Analytics workbook. Call this sheet Media Contacts.
__ 131.To make the last column of the Media Contacts more clear, rename it to Last_Contact.
Move the cursor over the header4 column and click the triangle. Choose to rename the
column.
__ 132.Change the name of the header3 column to URL.
__ 133.Join the data. Add a new sheet and select the Join icon.
__ 134.Call the sheet Join URLHOSTS and Contacts.
__ 135.Select to do an inner join.
__ 136.In the Add sheets drop down, select URL Hosts. Click the green plus sign. Then, add the
Media Contacts sheet and click the green plus sign.
__ 137.For the URL Hosts sheet, select the URLHOST column. For the Media Contacts sheet,
select the URL column.
__ 138.Click the green checkmark.
__ 139.As an additional way to make your results look more intuitive, you can reorganize the order
of the columns by using the Organize Columns option or by dragging and dropping the
column. Do that by a left-click-mouse-grab on the letter above the column name. Also, do
not forget about the Fit Columns pushbutton.
__ 140.Save, exit, and run the workbook.
Section 12:Dashboard
This section introduces you to quickly and easily creating and managing custom dashboards. A
custom dashboard allows you to gain total visibility over a set of data, a system, or analysis on a set
of data depending on the types of widgets being managed by the dashboard. You create a simple
dashboard with 3 widgets showcasing charts and data from other parts of this lab.
__ 141.Click the Dashboard tab on the BigInsights Console.
__ 142.Add a new dashboard by clicking the New Dashboard icon
Copyright IBM Corp. 2012, 2014 Exercise 5. Analyzing Social Media and Structured Data 5-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Exercises Guide
End of exercise
5-10 BigInsights Analytics for Business Analysts Copyright IBM Corp. 2012, 2014
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V9.0
backpg
Back page