You are on page 1of 7

1.

Introduction

BigQuery is a fully-managed, petabyte-scale, low-cost enterprise


data warehouse for analytics. BigQuery is serverless. You do not
need to set up and manage clusters.

A BigQuery dataset resides in a project. A dataset contains one or


more tables with data.

In this codelab, you will use the BigQuery web UI to query public
datasets, load your own data, and export data to a Cloud Storage
bucket.

Note: In addition to the BigQuery web UI, you can use


the bq command-line tool to perform operations on BigQuery
datasets. For more information about the bq command-line tool,
see bq Command-Line Tool.

For more information about BigQuery, see BigQuery


documentation.

What you'll learn


 How to query public datasets
 How to load data into a table
 How to export table data

What you'll need

To complete this lab, you need:

 The latest version of Google Chrome


 A Google Cloud Platform billing account

2. Getting set up

Create a project

Open the Cloud Platform Console. Click Select a project, and


then click the + icon to create a project.

Note: If you have one or more existing projects, click the project
name to the right of Google Cloud Platform in the title bar and
choose Create a project.

In the New Project dialog, for Project name, type BQProject.


Click Create.

3. Working with public datasets

Browse the Stackoverflow dataset

Open the Stackoverflow dataset. The BigQuery web UI opens in a


new browser tab. The BigQuery web UI displays the tables in the
Stackoverflow dataset. The navigation pane lists other public
datasets. Each dataset comprises one or more tables.

For more information about all the public datasets available in


BigQuery, see Google BigQuery Public Datasets.

Query the Stackoverflow dataset

Click Compose Query. In the New Query text area, copy and
paste the following SQL query. The query uses
the #standardSQL prefix to specify that BigQuery should treat this
query as standard SQL. The UI validates the query and displays a
green checkmark below the text area to indicate that the syntax is
valid.

#standardSQL
SELECT badge_name AS First_Gold_Badge,
COUNT(1) AS Num_Users,
ROUND(AVG(tenure_in_days)) AS Avg_Num_Days
FROM
(
SELECT
badges.user_id AS user_id,
badges.name AS badge_name,
TIMESTAMP_DIFF(badges.date, users.creation_date, DAY) AS
tenure_in_days,
ROW_NUMBER() OVER (PARTITION BY badges.user_id
ORDER BY badges.date) AS row_number
FROM
`bigquery-public-data.stackoverflow.badges` badges
JOIN
`bigquery-public-data.stackoverflow.users` users
ON badges.user_id = users.id
WHERE badges.class = 1
)
WHERE row_number = 1
GROUP BY First_Gold_Badge
ORDER BY Num_Users DESC
LIMIT 10
Click Run Query. The query generates the top 10 gold badges,
ranked by how many users got them as their first gold badges.
The query also determines how many days it took for these gold
badges to be obtained on average.

Row First_Gold_Badge Num_Users Avg_Num_Days

1 Famous Question 176982 1169.0

2 Fanatic 15085 618.0

3 Unsung Hero 12875 595.0

4 Great Answer 10641 1363.0

5 Electorate 5769 829.0

6 Populist 5491 1227.0

7 Steward 1005 993.0

8 Great Question 582 717.0

9 Copy Editor 253 595.0

You can also query sets of tables using wildcard table names. For
more information, see Querying sets of tables using wildcard
tables.

Load table data


Download the baby names zip file. The zip file, provided by the US
Social Security Administration, contains approximately 7 MB of
data about popular baby names.

Unzip the file in a local folder and note its location.

In the BigQuery web UI's navigation pane, next to the project


name BQProject, click the down arrow icon, and click Create new
dataset.

In the Create Dataset dialog, for Dataset ID, type babynames.


Leave all of the other default settings in place and click OK.

In the navigation pane, hover over the babynames dataset ID that


you just created. Click the down arrow icon next to the ID and
click Create new table.

In the Source Data section, click the Choose file button. Navigate
to the data you unzipped earlier, and select the yob2014.txt file.

In the Destination Table section, for the destination table name,


type names_2014

In the Schema section, click Edit as Text.

Replace the contents of the Schema input area with the following
schema:

name:string,gender:string,count:integer

Click Create Table.

Export table data


Switch to the Cloud Platform Console browser tab. In the Cloud
Platform Console's Products & Services menu, click Storage, and
then click Create bucket.

In the Create a bucket page, specify a unique bucket


name, Regional storage class, and us-central1 regional location.
Click Create. Note the bucket name. You will use this bucket to
store the data exported from BigQuery.

Switch to the BigQuery web UI browser tab. In the navigation


pane, under the babynames dataset, hover over
the names_yob2014 table. Click the down arrow icon, and then
click Export Table.

In the Export to Google Cloud Storage dialog, select CSV as the


export format.

For Google Cloud Storage URI, type <export bucket


name>/babynames2014.csv. Replace the angle brackets and the
text inside with the name of the bucket you created earlier.

Click OK.

After the export completes, switch to the Cloud Platform Console


browser tab. Refresh the page to view the complete contents of
the Cloud Storage bucket. The babynames2014.csv object in the
bucket contains the exported data.

4. Cleaning up

Delete the resources


In the BigQuery web UI's navigation pane, hover over
the babynames dataset ID. Click the down arrow icon next to the
ID and click Delete dataset.

In the Delete Dataset dialog, for Dataset ID, type babynames,


and click OK.

Switch to the Cloud Platform Console browser tab. In the


navigation pane, under Storage, click Browser.

Select the bucket that contains your BigQuery export data, and
click Delete.

Delete the project

In the Product & Services menu, click IAM & Admin.

Click Settings, then click the Delete.

In the Shut down project dialog, read the notes and then, type
the project ID.

Click Shut down.

Close the Cloud Platform Console.

Congratulations! You have now learned how to use the BigQuery


web UI to query public datasets, export data, and load your own
data. Note that you did not have to set up or manage clusters to
work with datasets.

You might also like