You are on page 1of 4

A great deal of information has come to us from the official Human Genome Project, and the official

projects from many other species as well. But other data has come from individual laboratories doing
traditional benchwork; some has come from the literature; and some of the data has come from new
large-scale technologies that have arisen in the last few years, such as microarrays and next-generation
sequencing and more.
Sothere are tremendous volumes of data available; and many places to try to find it. The UCSC
Genome Browser is a great resource because it organizes this material in one place. It uses the
sequence of the genomethe official reference sequence of the Human Genome Project, or the
official reference genome of other speciesand combines this data with all kinds of other useful and
important biological information, such as chromosome banding patterns, known genes, gene
predictions, phenotype and disease information, enhancer and promoter data, expression data,
comparative genomics and evolutionary conservation, SNPs and other variations, and so on.
As I illustrate in this conceptual diagram, the data is organized along the official genomic sequence
reference coordinates. The other data types are referred to as Annotation Tracks and are aligned on
the genomic sequence framework. These tracks provide additional information about any given
genomic region of interest.
All of this data is aligned in one place so you can quickly find new information, and context, about
regions important for your work. In addition, all the data links out to other databases, web sites, and
literature so you can go as deep as you want into any specific topic in which you may be interested.

On the previous slide we had a conceptual diagram of the UCSC Genome Browsers representation of
the genome and the annotation databriefly I wanted to show you a sample of the kind of data we will
examine as it actually looks in the Genome Viewer. Here you see a portion of the genome viewer, with
the base positionsthe official genome reference sequence--the top, and the many layers of data
annotation tracks--organized in that region. From any of this data if you click on the features, you will
be presented with even more detail about the items you see on new pages. The detail pages
themselves link out to more resources, too. Shown here are some examples of Gene Details, multiple
species alignment data, and SNPs.
So much data, so well organized, is right at your fingertips now, thanks to the UCSC Genome
Bioinformatics Group team. Youll learn a great deal about any genomic region using the graphical
representations and the clickable features.

Shown here is the homepage for the UCSC Genome Bioinformatics sitetaken on a day that happened
to have a tribute to Charles Darwin. When you first arrive, you will see a page that is organized like this.
At the top there is a section that contains general information about the site. Next, there is a specific
section for News--new species, new features, software or data changes, the current state of the data
that is available. This information is worth a quick check when you visit the site, in case there have been
changes since the last time you visited.
But the real substance of the sitethe data and toolsare accessible in a couple of ways from this
page. There are navigation bars at the top and left side which will permit you to access all of the
available features. You will begin your experience at the UCSC Genome Browser by navigating from
these blue areas. Some features are available from both the top and the side. Some are only along the
left. We wont be able to cover all of the great tools and details in this introduction. There are separate
tutorials available on our site that explore some of these, including the Table Browser & Custom Tracks,
the Gene Sorter, VisiGene, and more.
To actually get in and start performing basic searches in the database, there are several optionsyou
can search by textgene name, gene symbol, keywords, ID, and so on. To do this we will use the
Genomes or the Genome Browser link. Either of these will give us access to the Gateway page where we
will begin to search.

Shown here is a portion of the Genome Browser Gateway page. By default the search is set to Human
and the current assembly when you first arrive, but we will see that you can change the species and
assembly later.
We will begin to talk about searching using the text search feature from this Genome Browser Gateway
page.
You can do a text search for information such as gene names, chromosome number, chromosome
region, your favorite gene or marker identification number (ID), GenBank submitter name, and more.
You can use a keyword to find records. Examples of the kinds of searches you could do are shown on
the lower part of this pagesee the request items, and the expected responses from the genome
browser. Remember that you can just check out this section for helpful reminders of the correct query
format when doing your own searches later on.
We are going to go a little deeper into your search options from this gatewaywell take each option
and explore what you can expect from a given search.

Here we are going to focus on the options that you have to search a genome using the Gateway page.
This screen shot isolates that part of the page for us so we can focus on the specific items that are
available to you.
The first option is clade, and then the second is the genome, or species, choice. At one time all of the
species were in a single list, but there are so many species now that they have been re-organized into
these menus. You will search one species at a time in the Genome Browser. Use the pulldown menus to
select and highlight the species name that you want to use in your search.
Next, you have to choose an assembly. Assembly refers to the official reference genomic sequence
that is used to create the framework on which to hang all the other data. The reference sequence
comes from the official groups who release genome sequence data. In the case of human that is now
the GRC or Genome Reference Consortium. The groups deposit sequence in GenBank, and then UCSC
obtains the official assembly, and generates the annotation tracks for that genome. The source of that
assembly and any version number from the sequencing group is indicated on that species gateway
information section. You will also see other nicknames for it as well, which you may see used in various
places. It would be great if everyone used a standard assembly designation in the literature. It can be
confusing.
The official release date is what we see in the Assembly menu. Often you will want the most current
assembly, but sometimes you may want to look back at older data and you can see that is still available
for a while.
Even older data is still available in the UCSC archives if you need it. Archives are accessible from a link
on the homepage left navigation menu.
Position or search term is the next option. This is where you put the symbol, keyword, or ID
information about where you want to examine in the genome. You can put a symbol in the position
box, or use the handy gene search box to quickly find the right canonical gene. The gene box assists you
with suggested text that appears as you type.
The options described so far will get you to a genomic location. But if you are wondering where to find
the specific data types or annotation tracks, the track search button will enable that. Clicking that will
take you to a new search where you can explore the annotation track descriptions.
The last thing Ill point out here is the button for configuring tracks and displays. You can make changes
here to the displaysuch as the font sizes and feature appearance, but later Ill show you a couple of
other places you can access this as well. If you are finding that the text on the viewer is too small, or the
arrowhead features are difficult to view, configure their size and alter other aspects of the viewer here.

Now that we have examined the search options, lets perform a sample search of this database.
The search that Ill be demonstrating uses the HUMAN genome, the February 2009 assembly. If you are
seeing these slides at a time when there is a later assembly, things might look slightly different.
For this example, Im going to use the human TP53 genethis is an important and medically relevant
gene that has been implicated in some cancers. It is a well characterized gene for our example. I could
choose the gene suggestion box item for this search to get the canonical gene, but in this case Id like
to explore the full results options so I will illustrate the plain search box method.
Once you have made the appropriate selections among the options, added your position or search text,
you click the submit button and wait for your results..some of which we see below.
Here I show a part of the results page for the text search for TP53. That text appears in a number of
different records within different annotation track sets, so you have to select the one you want from this
results page. It depends on your needs. For my example Ill focus on UCSC genes, which is a collection of
several different gene resources that have been gathered by the UCSC team.
Sometimes you can go directly to the browserif you use a specific accession number that might
happen. However, with text searches often you will have to select from the records. Usually I choose a
record that appears to be the correct gene symbol or name. I look at the description text. And if there
appear to be multiple entries that are likely to be splice variants, I may select the longest of them (as
indicated by the nucleotide range at the end of the link). We simply have to choose one to move to that
genomic regionas you will see, the other versions of that gene will be visible on the viewer when we
get there.
For my example here, I will choose the link that says uc002gij.3, tumor protein p53, variant 1. Click that
link to go to the TP53 position in the genome with those nucleotide coordinateswe will go to
chromosome 17 in that nucleotide range in the browser viewer. Well pick up with the viewer in the
next section.

You might also like