Professional Documents
Culture Documents
2004 - 2005
Basic principle:
3. Protein-DNA interactions
• Transcription factors
• Recombination enzymes
• RNA replication factors
Spotted microarray technology
1.28 cm
12.7cm
Wafer
≈ 60 million probes A
T
C
G
Reverse
Transcriptase to
make cDNA
Heat fragmentation
Hybridization: a
specific target
molecule will bind to
PM but not to MM
Using statistics, a
signal per gene for all
probe pairs (PM-MM)
is computed after the
scan
Data file types
• DAT
- Raw bitmap image of the Affymetrix GeneChip® scanned probe array
- Encodes essential information about the experiment and sample that the
image belongs to
• CEL
- Cell intensities and quality control information generated by aligning grid
to DAT file and using the Affymetrix Cell Analysis Algorithm to compute
intensity values for each X and Y coordinate
- Contains information about the image, sample, and experiment CEL data
derived from
• CDF
- Affymetrix GeneChip® array library definition information containing the X
and Y coordinate map on the probe array to the following information:
probe set ID, feature number, and perfect match (PM) or mismatch (MM)
Data file types
• CHP
- GCOS native file containing the probe (CEL) analysis output using the
GCOS probe analysis algorithm (which computes one expression signal
value and detection call from all the probe intensities in a probe set)
• EXP
- GCOS native file containing general experiment information such as name,
sample, probe array type, hybridization information, etc.
• RPT
- Summary analysis information gathered during GCOS probe analysis (CHP
file generation)
• TXT
- Tab-delimited text output of the CHP file containing probe set IDs, probe
set signals, and detection calls generated by the GCOS probe analysis
algorithm as well as important information about the experiment, sample,
and analysis algorithm parameters
Data file types
MIMAS repository
File types
3‘ UTR
AAAAA
DAT:
image
at the
pixel
level
File types
DAT
CEL
DAT
CEL
P calls:
Yeast: 75-80%
Mammals: 35-45%
TXT: export chp transcript level data
Why do we need MIMAS?
• OPEN-SOURCE/FREE SOLUTIONS
- Do not have fundamental concepts right (make customized development to our
needs difficult)
- No scalability
•MIMAS API
•Web Front-end
• REPOSITORY
- Feature-level intensities
- Meta-data
- Data warehouse schema design
• ONTOLOGY/ARRAY LIBRARY
- Controlled vocabulary used to describe microarray data
- Library can be extended by MIMAS users
- MIAME compliant and extensible
- Affymetrix GeneChip® information
• UPLOAD/STAGING DATABASE
- Persistent area to store uploads before they are ready to go into the repository
• MIMAS-Signet Loader
- Master script which takes uploaded/staged experiments and then loads it into the
MIMAS Repository
- Runs daily (or more frequently depending on hardware)
- Integrity and redundancy checking of experiment, CEL, TXT data
- Integrity checking of CEL files
- CEL fingerprinting (avoid redundancy)
- Transformation and loading of meta-data (MIAME)
- Integrity checking of TXT files
- Emails user of success or failure
• External Job Execution
- Periodically scans the EXTERNAL_JOBS MIMAS table for requests and executes them
depending on available resources
- Recreation and archiving of sample CEL files for download
Data computation
• Detection p-value
• Detection call
• Signal algorithm
Comparison analysis
• Normalization
• Change p-value
• Change call
• Signal log ratio algorithm
The CEL analysis algorithm
Histogram
1243 1283 1346 1271
8 100.%
95.% 1158 1272 1254 1247
90.%
7
85.%
80.% 1247 1255 1192 1182
6 75.%
70.%
65.%
1254 1309 1241 1122
5
60.%
Frequency
55.%
4 50.%
45.%
40.%
3
35.%
30.%
2 25.%
20.%
15.%
1
10.%
5.% 1271.3
0 .%
1140 1180 1220 1260 1300 1340 1380
Bin
Detection algorithm
Detection algorithm
Discrimination Score
Detection algorithm
Detection algorithm
Null hypothesis:
Add all the ranks associated with positive differences, giving the
T+ statistic. Finally, the P-value associated with this statistic is
identified in an appropriate table.
Detection algorithm
Background
Signal calculation:
Σ w(u)xi
Tbi =
Σ w(u)
Practical work
GCOS:
•Compute CEL, CHP files
•Determine presence/absence calls
•Data quality control
MIMAS
•Annotate and upload files
Textbooks, Literature & web portals
Microarray Gene
Expression Data Analysis:
A Beginner's Guide
http://www.amazon.co.uk/exec/obidos/ASIN/
1405106824/qid%3D1047375686/026-
1898565-5814030
Textbooks, Literature & web portals
Bioinformatics
Sequence and Genome
Analysis
David Mount
2004 CSH Press
http://www.bioinformaticsonl
ine.org/
Textbooks, Literature & web portals
Liu et al.
Analysis of high density expression microarrays with signed-
rank call algorithms. Bioinformatics 2002
Hubbel et al.
Robust estimators for expression analysis. Bioinformatics 2002
Irizarry et al.
Summaries of Affymetrix GeneChip probe level data. Nucleic
Acids Res 2003.
Bolstadt et al.
A comparison of normalization methods for high density
oligonucleotide array data based on variance and bias.
Bioinformatics 2003
Gautier et al.
affy--analysis of Affymetrix GeneChip data at the probe level.
Bioinformatics 2004
Textbooks, Literature & web portals
NETAFFX:
http://www.affymetrix.com/analysis/
http://www.nslij-genetics.org/microarray/