RG CubeAnalyst

Cube Analyst Reference Guide
Cube Analyst
CUBE ANALYST
VERSION 6.1.0
Copyright 20072013 Citilabs, Inc. All rights reserved.

Citilabs is a registered trademark of Citilabs, Inc. All other brand names and product names used in this book are
trademarks, registered trademarks, or trade names of their respective holders.
The information contained in this document is the exclusive property of Citilabs. This work is protected under United
States copyright law and the copyright laws of the given countries of origin and applicable international laws, treaties,
and/or conventions. No part of this work may be reproduced or transmitted in any form or by any means, electronic or
mechanical, including photocopying or recording, or by any information storage or retrieval system, except as expressly
permitted in writing by Citilabs.
Citilabs has carefully reviewed the accuracy of this document, but shall not be held responsible for any omissions or
errors that may appear. Information in this document is subject to change without notice
60-010-1
April 24, 2013
Contents
About This Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Chapter 1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
What is Cube Analyst? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Scope of this document. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Whats new? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Common elements and variations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Reading this document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Conventions used in this document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Computing resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Cost information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Chapter 2
Estimation System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Framework for handling different data consistently . . . . . . . . . . . . . . . . . . . . 12
Objectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Handling data variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Options for users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Considerations for users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Deciding what information to input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Inputting data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Estimating the matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Analyzing the estimated matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Improving the estimated matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Estimating highway and public transport matrices. . . . . . . . . . . . . . . . . . . . . 20
Overview of Cube Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
iii
Contents
Chapter 3
Possible Data Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Types of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Link counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Turning counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Prior trip matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Trip cost matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Partial O-D matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Trip ends. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Routing information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Cost distribution function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Part-trip data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Sets of data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter 4
Mathematical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Mathematical notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Explaining the letters and symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Notation used in the estimation equation . . . . . . . . . . . . . . . . . . . . . . . . . 34
Introduction to the mathematics in Cube Analyst . . . . . . . . . . . . . . . . . . . . . . 35
Main mathematical features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Estimation equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Model parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Maximum likelihood objective function . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Describing the variation in data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Optimizer: Finding the minimum value . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Mathematical summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Maximum likelihood method: Background theory . . . . . . . . . . . . . . . . . 48
Application of maximum likelihood to Cube Analyst. . . . . . . . . . . . . . . 49
Cube Analyst objective function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Cube Analyst trip estimation model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Estimating model parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Optimization procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Parameter errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Cell reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Extensions to the calculations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Chapter 5
Data Preparation and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Trip ends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Networks and traffic and passenger counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Screenlines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
iv
Contents
Routings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Highways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Public transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Setting confidence levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Characteristics of the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Deciding on confidence values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Tuning estimation performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Control of routing information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Analyzing the results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Chapter 6
Estimation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Study area. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Estimating the matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Evaluation: Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Including part-trip data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Chapter 7
Hierarchic Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Introduction to hierarchic estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Approaches to estimating very large matrices . . . . . . . . . . . . . . . . . . . . . 94
Different levels of detail: Districts and zones. . . . . . . . . . . . . . . . . . . . . . . 94
Different approaches to hierarchic estimation . . . . . . . . . . . . . . . . . . . . . 95
Alternative approaches to hierarchic estimation . . . . . . . . . . . . . . . . . . . . . . . 96
Estimation with mixed district and zonal detail . . . . . . . . . . . . . . . . . . . . 96
Local matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Summary of the hierarchic estimation process . . . . . . . . . . . . . . . . . . . . 99
Defining districts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Running Cube Analyst for hierarchic estimation . . . . . . . . . . . . . . . . . . . . . . 106
Parameter ZCONF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Chapter 8
Using Cube Analyst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Input data: overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Outputs: overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Estimating large matrices (hierarchic estimation) . . . . . . . . . . . . . . . . . . . . . 112
Estimation process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Chapter 9
Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Summary of Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Sample reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Average confidence level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Final five iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Cube Analyst Reference Guide v
Contents
Matrix totals and zone generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

Zone attractions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Average confidence level (part trip data) . . . . . . . . . . . . . . . . . . . . . . . . . 121
Part trip totals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
District matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Local matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Chapter 10
Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Chapter 11
Control Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

&PARAM keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Standard user control parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Secondary user control parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Tuning control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
&OPTION keywords. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Chapter 12
Program Specific Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

Screenline file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Link count format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Turning count format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Trip end file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Coordinate file. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Model parameter file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Local matrix control file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
District definition file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Intercept file. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Gradient search file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Chapter 13
Notes on Program Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

Approaches to running Cube Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Initial estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Constrained model parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Controlling the optimization process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Selection of model form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Information in the optimization log file. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Computation times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Running Cube Analyst from Cube Voyager . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
Running Cube Analyst from a VOYAGER script . . . . . . . . . . . . . . . . . . . . 160
Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
Chapter 14
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Estimation with prior trip and count data only . . . . . . . . . . . . . . . . . . . . . . . . 162
vi
Contents
Estimation with prior trip, count, and trip end data . . . . . . . . . . . . . . . . . . . 163
Estimation with warm start and cost data . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Estimation with highways part trip data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Estimation with public transport part-trip data . . . . . . . . . . . . . . . . . . . . . . . 166
Hierarchic estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Example of screenline volumes report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Cube Analyst Reference Guide vii
Contents
viii Cube Analyst Reference Guide
About This Document
Welcome to Cube Analyst!

This document provides detailed reference information about
Cube Analyst.
This document contains the following chapters:
Chapter 1, Introduction
Chapter 2, Estimation System
Chapter 3, Possible Data Inputs
Chapter 4, Mathematical Background
Chapter 5, Data Preparation and Analysis
Chapter 6, Estimation Process
Chapter 7, Hierarchic Estimation
Chapter 8, Using Cube Analyst
Chapter 9, Reports
Chapter 10, Files
Chapter 11, Control Data
Chapter 12, Program Specific Data
Chapter 13, Notes on Program Use
Chapter 14, Examples
ix
About This Document
Introduction
This chapter introduces you to Cube Analyst. Topics include:
What is Cube Analyst?
Scope of this document
Whats new?
Background
Common elements and variations
Reading this document
Conventions used in this document
Computing resources
Cost information
Introduction

Cube Analyst is a program which estimates an origin-destination
(O-D) trip matrix. It is an optional, standalone and separately
licensed module in the Cube suite.
Cube Analyst estimates one matrix at a time, and the data should
form a set related to this particular matrix; that is, the data should
correspond to the same time period (hour(s) of day, day of week,
time of year) as the matrix. It should also correspond to the same
units of flow as the matrix (vehicles, pcus, passengers, etc.).
The characteristic common to all estimation options offered by
Cube Analyst is that they make the best use, in a flexible way, of
commonly available data sources to contribute to the estimation
process.
Data is given levels of confidence or reliability by the user which
conditions the influence of varying sources of data in the
estimation. The estimation process is based on the maximum
likelihood technique, coupled with an optimization procedure.
2 Cube Analyst Reference Guide
Introduction

This document applies to all levels of functionality offered and
modes of operation of Cube Analyst. Features specific to a variant
are noted.
This document concentrates on Cube Analyst; wider matters on
matrix estimation, and the context within which Cube Analyst may
be used, are described in the Introduction to the Matrix Estimation
Programs. This also explains the terms which have a specific
meaning for Cube Analyst which are also used in this document.
Introduction
Whats new?
Whats new?
Cube Analyst can now estimate Cube Voyager Public Transport
matrices by using an intercept file output by the Cube Voyager PT
program.
Introduction
Background
Background
Cube Analyst enables transport planners to estimate origindestination (O-D) trip matrices and to maintain the currency of
existing O-D matrices, while minimizing survey costs.
As is described in Introduction to the mathematics in Cube
Analyst on page 35, Cube Analyst is suitable for estimating present
day matrices, but not for forecasting future year trip matrices.
The software contains a number of novel and distinctive features. It
was first developed as a collaborative venture with the Dutch
Ministry of Transport, the Rijkswaterstaat. Subsequently, studies
and developments undertaken for Centro (the Passenger Transport
Executive for the West Midlands area of England) led to a
broadening of the softwares capabilities to consider public
transport passenger matrices, as well as highway (vehicle) matrices,
and to estimate detailed matrices for very large study areas.
Introduction

The characteristic common to all variants of Cube Analyst is that
they make the best use, in a flexible way, of most available data
sources in the estimation process. This includes not only vehicle
traffic or passenger flow counts and prior (old) matrices, but also
partially observed matrices, zonal trip end (generation and
attraction) data, vehicle routing, travel cost matrices, and even
previously calibrated trip cost distribution functions. An extension
is the use of a further form of data called part trip data, described
in Part-trip data on page 29.
Data is ascribed confidence, or reliability levels by the user. This
conditions the influence of data when different data items
(inevitably) imply different trip matrix cell values. The estimation
process is based on a statistically rigorous procedure which takes
direct account of inherent traffic data variability. It uses the
maximum likelihood technique, coupled with a powerful
optimization procedure, to derive simultaneously an unusually
large set of model parameters. These then determine the estimated
trip cell values with correspondingly enhanced precision.
Nevertheless, the estimation process remains mathematically
underspecified and a feature of Cube Analyst is the information
available to assess the quality of the estimated matrix. This includes
comparative and sensitivity analyses, and reports which draw on a
range of graphical and tabular presentations. Statistical reports are
available which provide information on the standard errors of
model parameter values, and indicators of the stability of estimated
trip matrix cells (via a sensitivity matrix).
Cube Analyst provides a hierarchic approach to estimation, suited
for use with very large matrices, typically, between 2,500 and 5,000
zones in size. Its basic approach is to estimate a general matrix, in
which zones are automatically grouped into districts. This areawide estimation is then used to control a set of detailed
estimations, which build up to provide a fully detailed estimate for
the entire study area.
Introduction

The introductory chapters provide:
An overview of Cube Analyst
A set of Standardized Procedures, suitable for different types of

estimations
The document considers estimation of highway and public

transport matrices and all of the Cube Analyst features.
Highway and public transport estimation are very similar, apart
from obvious differences such as the use of line (service) data for
public transport. There are also differences in emphasis, for
example, count data is often more plentiful and reliable for
highways than for public transport. Where such differences arise,
they are noted.
When reading this document note that:
The next four chapters provide an essential overview of Cube

Analyst
Chapter 6, Estimation Process documents an example of

applying Cube Analyst
Chapter 7, Hierarchic Estimation is concerned with the

specialist topic of hierarchic estimation
Introduction

The following conventions are used in this document:
Parameters, options, and selections appear in upper case.

For example: COSTM
Technical term introduced for the first time, in upper and lower
case italics.
For example: Hessian
Terms and phrases with particular meaning in the context of

Cube Analyst in quotes. These phrases may also appear in
italics.
For example: Sensitivity Matrix
Introduction
Computing resources
Computing resources
Cube Analyst is a major system. The programs ensure that the
mechanics of operation for the user are straightforward, but it
requires familiarity with a number of programs, especially for data
preparation and analysis of results, and this should be taken into
account when planning to use it for the first time.
Cube Analyst is designed about a number of rigorous principles,
including the calibration of the mathematical estimation model
which the program undertakes. One consequence is that it is
computationally intensive; the differing sets of data are considered
simultaneously and this requires the availability of relatively large
amounts of random access memory (RAM), memory, and of disk
space.
Introduction
Cost information
Cost information
For highways, cost data is produced by Citilabs products.
For public transport in TRIPS, cost data is produced by MVPUBM.
10
Estimation System
This chapter discusses the nature of the estimation system. Topics

include:
Framework for handling different data consistently
Objectives
Handling data variability
Options for users
Considerations for users
Estimating highway and public transport matrices
Overview of Cube Analyst
Cube Analyst Reference Guide 11
Estimation System

Cube Analyst provides a framework that is used to input a variety of
information to estimate an O-D matrix. The characteristics of the
system are that:
12
Some or all of the types of information introduced in Common

elements and variations on page 6 may be used.
The system can work with little data, but the accuracy of the
estimated matrix is improved as more data is input.
Different information is handled on a consistent basis.
The variability of data is explicitly accounted for.
Estimation System
Objectives
Objectives
The aim of Cube Analyst is to maximize the value of existing data
and to limit the need for costly surveys. As such, it is mainly
concerned with processing information in the best (statistical)
manner; though the accuracy of the estimated matrix remains
strongly affected by the amount and the quality of the information
input by the user.
Beside the role of estimating matrices for individual studies, Cube
Analyst is suited for use with regular surveys designed to keep
matrix information up-to-date.
Estimation System

Cube Analyst explicitly considers the variability of data. Inevitably,
there are inconsistencies in what the different data suggest that the
estimated matrix should be. The inherent variability means that
collected data items are merely a sample, and hence the values,
(even of simple traffic counts) may only be considered to fall within
a range (a distribution). The width of this range is a reflection of the
confidence that may be placed in particular items.
Cube Analyst therefore requires the user to input information
about how confident they are that each data item is representative
of the situation for which the matrix is to be estimated. The
information is input as a nominal percentage sample value. In
restricted circumstances, this may be an actual sample obtained in
a survey. This information about the variability is used to determine
what relative influence each item of data has in the estimation
processit acts algebraically as a weighting value, and is referred
to as a confidence level.
14
Estimation System
Options for users
Options for users

The user does not have to use Cube Analyst in one manner, but
rather according to the information that is available and the
context within which the matrix is required. Typically, the user will
start with what information is to hand or may easily be collected.
This provides a fast means of obtaining an initial matrix that can
enable a study to proceed, at least for general investigations.
Analysis of the resulting matrix and estimation statistics will show
where there is greatest requirement for further quality data. Cube
Analyst is then used to integrate this new (and possibly different
type of ) data to produce an improved estimated matrix.
Estimation System

Cube Analyst involves the user in a number of stages:
16
Deciding what information to input
Inputting data
Estimating the matrix
Analyzing the estimated matrix
Improving the estimated matrix
Estimation System
Deciding what information to input

This will usually be all information already available, but new data
will normally be appropriate for those parts of the study area where
most change has taken place since previous surveys, or where
traffic schemes or policy proposals require detailed analysis.
Identify notable features and data sources
Feature
Example
Data
Car ownership
Traffic growth
Counts
Land use
New industry, shops

New car parking
Trip ends (generations and

attractions)
New bypass
Travel times, routing
Changes in:
Road/public
transport network
Traffic management
New bus/rail services
Travel habits
Out-of-town shopping
Observed O-D patterns;

PT operators boarding &
alighting surveys;
vehicle licence plate surveys
Appreciating key land uses
Estimation System
Inputting data
Information may be input in the form of matrices, as trip ends, or as
network-related information. This data is prepared by the user
within Cube, which offers a variety of modes of data entry. Extra
information is required on data variability. This is input in the same
form as the information to which it corresponds. Each data item, for
example each count, trip end, etc., may have an individual
confidence level attached to it, but in many cases global values will
be used.

The matrix estimation stage simply requires the user to input the
prepared files into Cube Analyst. As is described in Overview of
Cube Analyst on page 21, and with more detail in Chapter 4,
Mathematical Background Cube Analyst performs a set of
iterative calculations which will automatically determine the
statistically most likely matrix for the set of input data values
provided.
The first time Cube Analyst is run, it creates a set of files which can
be used to reduce the run times of subsequent runs of Cube
Analyst. This is either because the need to restructure data is
avoided (the intercept file) or because an estimation can take
advantage of previously calculated results (the gradient search file
and the model parameter file).
This ability to benefit from a previous run of Cube Analyst (for the
same basic study) is usually used to assist in analyzing the
consequences of changes in data values, but, for lengthy runs for
large matrices it can provide a means of breaking an estimation
into more than one run, for convenience.
With an improved optimizer in Cube Analyst and more powerful
computers such staging of estimations is now rarer, but it remains a
typical feature for hierarchic estimations of extremely large
18
Estimation System
matrices. This is assisted by the local matrix control file, which is

open to editing so that estimations are staged in a manner
convenient to the user.
Analyzing the estimated matrix

It is natural and desirable to want to check the quality of the
estimated matrix. A typical approach to checking quality might be
to compare the estimated matrix with some observed data which
has not been used in the estimation process. However, this
approach is not usually appropriate for Cube Analyst, which is
designed to take advantage of all reasonably observed data. For
example, if the estimated matrix implies that the link flows across a
screenline are different from that observed (this is easily checked
by assigning the estimated matrix to the network), then the
solution is to re-run the estimation but now incorporating the extra
observed data.
The approach to analyzing the quality of the estimated matrix is,
therefore, based on:
Comparing the estimated results with input data values
Checking the sensitivity of the results if data values are altered
Analyzing the estimation calculations
Besides information output by Cube Analyst itself, extensive use is

made of other Citilabs programs for creating tabulations and
graphic displays which highlight different characteristics of the
estimated matrix.
Improving the estimated matrix

Deficiencies in the quality of the estimated matrix, when they are
signalled by the results of the analysis phase, are remedied by
improving the quality or quantity, or both, of the input data. The
analysis phase can provide strong pointers as to which data is
contributing to quality problems and hence where the user can
focus attention.
Estimation System

For much of the time, it is not necessary to distinguish between the
cases of estimating matrices for use with highways and public
transport analysis; the same principles apply to each. However,
there are a number of points to note. The first one is that the units
of the matrices are usually in terms of vehicles for highways, and in
terms of passengers for public transport.
Much of the data and methods of processing are identical for both
highways and public transport, but the routing information is
derived in quite different ways. There is also the concept of line
groups, which only applies to public transport and not to highways.
Assumptions about the quality and quantity of data vary between
the modes. Link count data is more readily, and accurately,
available for highways than for public transport. Public transport is
often more reliant on part-trip data, as obtained from boarding and
alighting surveys. This form of data may be obtained from licence
plate matching surveys for highways.
20
Estimation System

Cube Analysts operations can be considered as a series of activities:
1. Data input and restructuring
For the most part, Cube Analyst simply reads the set of users
input data at this initial stage. However Cube Analyst also
analyzes and restructures routing information (from the TRIPS
route choice probability (RCP) file or Cube Voyager path file),
and count data, from the screenline file, into a more concise
and efficient file, called the intercept file. This restructuring can
be relatively lengthy so, as noted in Considerations for users
on page 16, it is possible to re-use an Intercept file once it has
been created. For Cube Voyager users, the creation of the
Intercept file is handled by the HIGHWAY program.
2. Calculation initiation
The main Cube Analyst calculations may be viewed as a search

for the statistically most likely matrix, given the set of input
data values. As this search relates, typically, to many thousands
of matrix cell values, the manner of searching is a critical aspect
of Cube Analyst.
A calculation called the method of scoring directs the start of
the searching process. This calculation is always done as the
first stage of the estimation calculation, and it may be repeated
later, according to the settings of Cube Analysts ITERH
parameter. (This determines the number of iterations between
gradient search matrix calculations.)
There is a strategy consideration here. The default method for
running Cube Analyst spends time with the method of
scoring calculation in order to limit subsequent calculations.
Cube Analyst also calculates a suitable value for ITERH.
However, it is open to the user to over-ride this strategy by:
Changing the setting of the IHTYPE parameter (used to

determine the optimization process) of Cube Analyst from
its default in order to avoid the method of scoring. This
Estimation System
reduces the associated calculation time, but means that the

searching process is initially less well directed and so the
net calculation time may still be longer.
Setting ITERH to a lower value than the default, which

means that the searching process is re-appraised by further
application of the method of scoring. This may be suitable
when there are signs that the optimizer is not able to
determine a convergent solution in a reasonable number of
iterations.
The user should note that these options for tuning the
performance of Cube Analyst exist, but should not necessarily
be concerned to apply them, as the default operation is usually
entirely satisfactory. It requires some experience with a
particular estimation problem to determine its best strategy.
3. Function evaluation
Function evaluation is the term used to describe the

calculation of a series of estimation results. These are calculated
by way of an estimation equation (function). The estimation
equation calculates the values of the estimated cells according
to the current values of a series of model parameters. There are
a large number of model parameters, in fact the number is
usually two times the number of zones, plus the number of
screenlines.
These model parameters have an initial value of 1.0, which has
the consequence that the initial function evaluation (usually)
results in an estimated matrix which is identical to the old
(Priorsee Prior trip matrix on page 27).
4. Optimization
The optimizer is a central feature of Cube Analyst; there are two

critical elements to it:
a.
22
Objective function This provides a criterion by which the

optimizer can determine whether one value of a particular
cell is better than another value. Maximum likelihood
objective function on page 40 explains how this criterion
Estimation System
is derived from the statistical maximum likelihood theory

and rigorous mathematical calculation. Hence, Cube
Analyst defines better as statistically more likely.
b. Set of search directions and a step length The optimizer
alters the model parameter values, from their starting point

of 1.0, to seek an estimated matrix that is an improvement
on its current estimates. The search direction determines,
for any cell in the matrix, whether model parameters
should be increased or decreased, and the step length
defines by how much.
The final values of the model parameters are available to view
as the model parameter file, so it is possible to see how they
have been changed from 1.0.
5. Iterations and convergence
After the optimizer has calculated new model parameter

values, the function evaluation process is repeated to obtain
the latest estimated matrix (and its derivative values). This
overall process is repeated in a series of iterations; at each
iteration the optimizer will ensure that the new estimated
matrix is an improvement (more likely) than the previous one.
Because there are so many cells to estimate, which Cube
Analyst does not confine to have integer values, it is normally
always possible to make some improvement, however small.
Therefore, it is necessary to define a criterion to determine
when the iterations have reached an acceptable solution. In
Cube Analyst, this criterion is set by the UTOL (user tolerance)
control parameter. UTOL sets a minimum value on the step
length which the optimizer is allowed to use, as very small step
lengths indicate that the optimizer is making correspondingly
small changes to the estimated matrix. It is usual to leave UTOL
at its default value, and allow Cube Analyst to run until it
terminates with a converged message.
Estimation System
24
Possible Data Inputs
This chapter describes data inputs. Topics include:
Types of data
Sets of data

Types of data
Types of data
Cube Analyst can operate using some or all of the following types
of data:
Link counts
Turning counts
Prior trip matrix
Trip cost matrix
Partial O-D matrix
Trip ends
Routing information
Cost distribution function
Part-trip data
NOTE: Cube Analyst requires confidence level information for all

data types except routing information and cost distribution
function.
Link counts
For highways, this information may be surveyed with considerable
accuracy and exploit automatic counters, but it may not show the
current demand for travel (which the O-D matrix should represent)
if congestion has constricted flows.
For public transport, this data is often obtained from estimates of
passenger numbers in buses and rail carriages, and is of inherently
limited accuracy (but may still be usefully exploited by Cube
Analyst).
For both modes, it should be observed that matrices normally
apply to average situations for which individual counts will match
to only some extent.
26

Types of data
Link counts which are spread randomly across the network

contribute relatively little information to the estimation of matrix
cells. This may be less of a problem for public transport networks
offering limited alternative routes, than for highway networks with
inherently greater route choice options.
Turning counts
The same comments as for link counts apply. Note that turning
counts may only be applied when inputting a Cube Voyager path
file. They are not supported for an estimation using a TRIPS RCP file.
Prior trip matrix

This matrix might be an out-of-date matrix for the study area, or
possibly a previous study forecast for the present day. It is not
essential to input a prior trip matrix, but in practice a matrix is very
desirable for information about the pattern of trip movements.
Trip cost matrix

This matrix summarizes the cost of travel between zones, where
cost is normally defined as a user-specified combination of time
and distance, and any tolls or fares, etc. The trip cost matrix may be
used as a substitute when some or all of a prior matrix is not
available. The costs may be based on either modelled or surveyed
speed data.
Partial O-D matrix

This is simply another approach to providing the prior matrix that
makes it possible to use information that specifies some cells of the
matrix but not all. The user merely identifies a (relatively) high
confidence in those cells which have been observed and allows
other information to determine values in the remaining cells. This
may be data from the cost matrix, in which case the corresponding
prior matrix cells must be zero. Alternatively, non-observed cells are
given non-zero values with zero or low confidence levels. Zero

Types of data
values in input matrices are taken to indicate that trips in

corresponding cells are impossible. Cost data are not used to
estimate trips for cells which have non-zero prior-trip values.
This approach makes Cube Analyst useful when surveys have been
conducted around critical parts of a study area (for example, town
centers, travel corridors, etc.), but there remains a need to estimate
the matrix for the rest of the area.
Trip ends
The total number of trips generated from and attracted to zones
(G&A) may be obtained either from surveys or from mathematical
land-use type models. Surveys are appropriate when zone
boundaries are such that traffic may be counted entering and
leaving zones on distinct trips, rather than merely passing through
the zone. This tends to occur only for some zones, for example a car
park or an industrial estate, but these are often important zones for
a study.
It is possible to use data derived from both methods, for example, a
few zones surveyed and the remainder derived from a model, with
the resulting trip ends distinguished through differing confidence
levels.
Routing information
It is possible to survey routing data, though this is rarely done. The
modelling of routing is often not a very good replication of actual
(erratic) driver or passenger routing, and it is often not possible to
place much reliability on this otherwise important data. Cube
Analyst is therefore designed to use routing information, as far as
possible, only where the precise routing does not matter. Thus, for
skim cost information small variations in routes may be ignored,
while count information is used in bottleneck situations where
the number of routes is limited to a few alternative links (ideally
one).
28

Types of data
Cost distribution function

Many areas which have been the subject of previous studies will
have a previously calibrated mathematical trip-cost distribution
function, as used in the gravity model. Because Cube Analyst
contains its own calibration procedures, the information implied by
the distribution function is not normally used directly, although the
a and b parameters, discussed later, may be fixed with reference to
a previously calibrated gravity model.
Part-trip data
This data is surveyed in the form of matrices where the recorded
origin and destination are not necessarily the ultimate origin and
destination of the trip. This is illustrated in the figure Definition of
part-trip data that shows the recorded part of trip (S - E) relative to
the total trip (O - D). It is possible for one or both of points S and E
to coincide with the corresponding points O and D. For highways,
this data is typically obtained from licence plate matching surveys,
and from on-board surveys recording passenger boarding and
alighting points for public transport.
Definition of part-trip data

Sets of data
Sets of data
Cube Analyst estimates one matrix at a time, and the data should
form a set related to this particular matrix, that is, the data should
correspond to the same time period (hour(s) of day, day of week,
time of year) as the matrix. It should also correspond to the same
units of flow (vehicles, pcus, passengers, etc.). Sometimes the user
will have to transform data (for example, by factoring) to achieve
this, and this will usually imply a reduction (small or large) in
confidence levels for the transformed data.
Also, only one set of information may be input into Cube Analyst for
an estimation. Hence, if multiple sets exist, say, several traffic counts
for the same link, then the user must derive a single set. This may
simply be to choose the most recently surveyed set, or it might be a
weighted average of all available sets. Multiple sets of data usually
allow confidence levels to be increased relative to single sets of
data.
30
Mathematical Background
This chapter describes the mathematics that Cube Analyst uses.

Topics include:
Mathematical notation
Introduction to the mathematics in Cube Analyst
Mathematical summary
Extensions to the calculations
This section discusses mathematical notation. Topics include:
Explaining the letters and symbols
Notation used in the estimation equation
Explaining the letters and symbols

This section uses mathematical notation, which can look daunting
for those who are not accustomed to it. So, first, a word of
background explanation. The notation can be made to appear
worse because of the use of greek letters and some specialist
mathematical symbols. The problem is that the normal 26-letter
Roman alphabet is not sufficient, even considering upper and
lower case letters, and remembering that some letters have
traditional mathematical meanings and associations. The
mathematics which is presented here is only an extract of the full
Cube Analyst mathematics, which uses an even wider range of
letters. Also, some of the traditional mathematical notations are
cumbersome when used with vectors and matrices and their
elements, as Cube Analyst requires, hence it is better to use
alternative forms.
This is mainly a pronunciation guide, but some of the symbols and
letters are explained further:
32
Symbol
Description
alpha
beta
eta
theta
lambda
xi (upper case)
xi (lower case)
pi (upper case); symbol for multiplication (product)
sigma (upper case); symbol for summation
Symbol
Description
phi (upper case)
psi (upper case)
partial differential operator
nabla; symbol for (partial) differentiation of matrix elements
exponent
factorial operator (for example, 4! = 4x3x2x1)
The notation P(x|X) implies the probability of x, given the value X.

Similarly, L(x|X) is the likelihood of x, given X; M(x|X) refers to the
log-likelihood of x, given X. Note the use of bold in the last example
implies that x and X are multi-valued vectors (or matrices).
Notation used in the estimation equation

Notation
=
=
=
=
Description
Origin zone
Destination zone
Link count
Screenline count (from count sites
.....)
=
Model parameters
Mean travel cost
Any one of the model parameters

=
Observed data item

Estimated data item
H observed
NOTE: These may take values as shown below:
h estimated
Description
Number of trips from i to j
Number of trips from origin i
Number of trips to destination j
Number of trips through link k
This notation is used in Introduction to the mathematics in Cube

Analyst on page 35 and Mathematical summary on page 48.
34

The design of Cube Analyst means that a user can estimate
matrices simply by supplying the program with the appropriate
input data and accepting the resulting matrix. However, it is
valuable to have some understanding of how Cube Analyst
calculates the value of the estimated matrix cells; this insight both
helps in providing confidence in the results and in guiding the
approach to input data, such as setting confidence levels and
considering the potential effects of extra data or improved data
quality.
This section provides additional information about how Cube
Analyst computes matrix cells. Topics include:
Main mathematical features
Estimation equation
Model parameters
Maximum likelihood objective function
Describing the variation in data
Optimizer: Finding the minimum value
Main mathematical features

This section is intended to cater to those Cube Analyst users who
are interested in the detailed mathematical and statistical
underpinnings of the estimation process. Users who are more
interested in other aspects of the model should proceed to
Chapter 5, Data Preparation and Analysis.
The basis of Cube Analysts calculations is an application of the
standard statistical approach known as the maximum likelihood
method. This method allows estimates of a set of inputs to guide
the estimates of a corresponding set of outputs; the estimates of
the set of inputs are obtained from likelihood functions, which are
expressions of probability distribution functions (pdfs) associated
with the users input data. The outputs are calculated from an
estimation equation, which must be provided. These points are
further explained below.
Given the range of possible input data, the full mathematical
expression of Cube Analyst is complex, but it involves some
principal components which we use to describe the essential
features of Cube Analyst. Mathematical summary on page 48
explains the standard Cube Analyst calculations by summarizing
the main mathematical steps. Extensions to the calculations on
page 57 shows how additional features are accommodated in the
calculations. This section continues with explaining Cube Analysts
mathematics in largely descriptive terms, while introducing the
main equations. Throughout this section, the mathematical
notation is defined Mathematical notation on page 32, where it is
not otherwise clear from the text.
Estimation equation
The heart of the estimation is an equation (estimation model)
whose output, , corresponds to the values of the cells of Cube
Analysts output matrix for trips between zones and . The form
of this mathematical estimation model in Cube Analyst is:
.....(1)
This equation contains the following elements:
its output,
some data items:
Prior observation of trips between and

Probability of trips between zones and using screenline site
(it
is possible for a screenline to correspond to a single count site, in
suitable circumstances)
36
some Model Parameters

ai, bj, XK.
Implies the product of
over all the screenline count sites
If there is no prior observation for movements between some or

all possible origin-destination zone pairs, , then may be
calculated by Cube Analyst from:
.....(2)
Equation (2) introduces further elements:
One data item:

- the generalized cost of travel between zones
and
Two model parameters:

,
It may be noted that screenlines are usually organized so that

or . Also, because provides an estimator of the output, as
well as possibly being an input data item, it may also be considered
as a model parameter. Hence, the data item is also referred to as
. (That is, and
are numerically identical, but are logically
distinct.)
The form of equation (1) has been chosen primarily for reasons of
convenience, and for the appropriateness of its form according to
the data used in the estimation (as we discuss below). It is designed
to be efficient is assisting information to be processed, but is not
behavioral in nature. This implies that Cube Analyst is suitable for
estimating present day matrices, but not for forecasting which
would require some behavioral assumptions.
Equation (2) is borrowed from the well-known gravity model that
makes the behavioral assumption that people prefer lower cost
journeys to higher cost ones, but are influenced by the level of trips
generated by and attracted to different zones. This is a broad
assumption; it means that cost data may be used where no other

source of prior matrix data is available, but it is not a precise
approach to estimating individual matrix cells.
Model parameters
For Cube Analyst, therefore, the estimated matrix is entirely
dependent on the values given to the model parameters. Cube
Analyst is thus, in effect, solely concerned to establish the most
appropriate values for these model parameters. (Cube Analysts
calculations are in parameter space, which accounts for some of
the behavior that may be observed in Cube Analysts output to the
screen and log file while it is computing, where the values of the
matrix may change in an apparently erratic manner.) Cube Analysts
calculations are mainly in the nature of a search for the best
model parameter values. Apart from the estimation equation itself,
the main features of the Cube Analyst calculations are:
Directing the search for Model Parameters values

optimization
Deciding whether the new Model Parameter values are the

best function evaluation
We now describe the general issues for Cube Analyst when setting
model parameter values.
Unless the user supplies an input model parameter file (created
either by an earlier run of Cube Analyst), the model parameters are
automatically initialized to 1.0. From equation (1), it may be seen
that the initial estimate is identical to the prior matrix (or based on
the cost matrix, equation (2), if no prior matrix value exists).
It is possible to compare the estimated matrix with all of the items
of the users input data. For example, the sum of rows and columns
of the estimated matrix may be compared with input trip ends
(Mathematical summary on page 48 shows this in mathematical
terms for all data items). If the result of this comparison indicates
38
that the current estimate is too low, then an improved estimated

matrix may be achieved by increasing the value of, at least, some
model parameters.
The problem for Cube Analyst is that there are many items of user
data, implying many comparisons of the type just described; some
of these comparisons may require the current estimate to be
improved in one way (increased, say), while other comparisons
need the estimate to be altered another way (decreased, say). The
large number of model parameters provides the basis for
reconciling these apparent conflicts;by definition there are (2 x the
number of zones) model parameters provided by the s and the
s alone. It may be demonstrated that these are sufficient for
equation (1) to define any possible combination of positive, nonzero matrix cell values. Hence, if, by some means, suitable values of
the model parameters may be found, equation (1) can produce a
matrix which is consistent with all of the users input data. That is, at
least, if the input data is self-consistent in the first place.
Of course, this consistency is never the case in real applications of
Cube Analyst, and the best that may be hoped for is to estimate the
matrix which is most likely, given the users input data. Achieving
this most likely result is the next main topic to discuss, but we will
stay with model parameters to make a few more points.
In principle, there is nothing particular to distinguish the set of
model parameters
; mathematically, they are equal and
each may be affected by any item of data. However, the form of the
estimation equation allows parameters to be associated naturally
with different types of data such as:
Trip ends, for trips generated at zone i and attracted to zone j
Counts on screenline site K
Trip c information
(
Prior trip matrix
This association is useful to the optimizer in reflecting the different

(quality) characteristics of the data sets. The nominally redundant
parameters provide extra degrees of freedom to handle data
inconsistencies. This is useful, as the matrix cells affected by a set of
screenline data are precisely defined by the
routing
information.
Maximum likelihood objective function

When Cube Analyst establishes values for the model parameters, it
requires a criterion to determine if the corresponding Tij estimates
either are correct or are better than another set of model
parameter values. This criterion is provided by a mathematical
equation called an objective function. The objective function, ,
for Cube Analyst has the following form:
.....(3)
where:
- is an estimated data item
- is an observed data item
- is the confidence level associated with
Notation used in the estimation equation on page 34 shows

which items and can represent but, in general terms, is the
input data which the user supplies and is the corresponding
value implied by the estimated matrix.
We have already discussed how the form of the estimation
equation (1) has been determined for reasons of effectiveness, but
which remain essentially arbitrary; also, how equation (2) derives a
weak behavioral basis from the gravity model. It is therefore
important to appreciate that in contrast, the objective function,
equation (3), is the result of a statistically rigorous procedure,
namely the maximum likelihood method.
40
The consequence of this is a guarantee, subject to some

qualifications which we consider below, that the estimated matrix
is the statistically most likely, given the data supplied by the user.
The correctness of the estimate remains, of course, dependent on
the quality of the input data. Maximum likelihood theory shows
that the most likely values are indicated when M in equation (3),
which is negative, reaches its minimum possible value. (For reasons
of computational convenience, Cube Analyst minimizes the
negative of the log-likelihood objective function, rather than
maximizing the positive version, as the name maximum
likelihood might suggest.)
The qualifications mentioned before respectively concern the input
data sets representing independent observations, which is not
normally a problem for Cube Analyst users, and of the input data
being described by a probability distribution function, which we
now discuss. The derivation of equation (3) for the objective
function is outlined in Mathematical summary on page 48.
Describing the variation in data

The maximum likelihood method assumes that each item of input
data represents an observation from a random distribution of
possible values, but where the variation of values may be described
by a probability distribution function. That is when the user
supplies Cube Analyst with, say, a screenline traffic count value of
1684 vph; this is not considered to be the count for that screenline
but, rather, a sample from a distribution. It is common experience
that counting the same screenline on another, but equivalent
occasion (for example, the same time the following week) will
provide another count value, say 1739 vph, simply on account of
the random variation which is inherent in all traffic (and passenger)
data.
The assumption is made, therefore, that all input data for Cube
Analyst is subject to variation which may be described by the
Poisson probability distribution function (pdf ).
llustration of a Poisson probability distribution function
The Poisson is a well-known pdf, often associated with data which

can involve many events (for example, 1684 vehicles passing an
observer in an hour). It has the statistical property that its mean
equals its variance. This is valuable for data such as count
information where the variation of 100 vph is significant when the
mean figure is 200 vph, but not when it is 1000 vph; alternatively, a
10% variation implies many vehicles on a mean of 5000 vph, but
not on 50 vph. The Poisson distribution reflects these changes in
significance in an appropriate way.
During the original development of Cube Analyst, alternative
assumptions about the pdf used to describe data variation were
reviewed; the Log-Normal distribution for example, but these were
42
considered only to add complexity, rather than accuracy. It is

usually that case that the Poisson is a good way of describing traffic
and passenger data. The Poisson distribution also has the
considerable merit that it leads to some mathematical relationships
where the role of confidence levels is clearly apparent. In particular,
Mathematical summary on page 48 shows an element of the
calculation concerned with calculating the optimum value of the
objective function which has the following general form (see
equation (18) later for details):
The 1, , and represent, respectively, the confidence levels (),

observed (H), and estimated (h) values for the first data item,
similarly for the second, third, etc., data items. The form of this
equation is directly attributable to the use of the Poisson pdf;
another pdf, the Normal pdf for example, would give a different
and more complex form.
The significance of equation (18) is two-fold: first, each and every
data item is represented in this equationthat is, each cell of the
prior matrix, each trip end, each screenline count, and so on. Thus,
all items of data are considered together, not in separate
categories. (It is not only equation (18) which shows this, most
significantly, so does equation (3), the objective function, amongst
others.) The second point is that the data contributes as:
1. A ratio of observed to estimated values
2. A linear combination (that is, simple addition (+)) of data items,
each multiplied (weighted) by its own confidence level

This enables the Cube Analyst user to view confidence levels as
simple weighting factors, even though the derivation of is
originally from considerations of data sampling, as discussed in the
following section. This would not be the case if a non-Poisson pdf
had been used.
Optimizer: Finding the minimum value

We have already discussed how Cube Analyst is designed to adjust
the model parameters, from their initial value of 1.0, so that
equation (1) leads to a new value of , which provides a new set of
estimated data values, .
Equation (3) can then be used to determine if the new estimates are
more likely (more consistent with) the input data, . Cube
Analyst therefore incorporates a powerful optimizer to amend the
model parameters so that the value of
is minimized as much as
possible. This minimum is defined mathematically by locating the
point at which the gradient of the objective function, with respect
to the set of model parameters, , is zero, that is
. This
well-known approach to determining minimum or maximum
44
points is shown in the following figure, which shows in a schematic

fashion how the value of the objective function, , varies
according to the value of a parameter, .
Two dimensional schematic view of variations in objective function according to

model parameter values
It is at this stage, in particular, that Cube Analyst is operating in

parameter space. The principle is, simply, to adjust each
parameter by an amount (the step length) and by a search
direction (up or down). The optimizer ensures that Cube Analyst
only makes adjustments which improve the situation (that is, to
further minimize the objective function, ). Once a set of
(improving) adjustments has been made, the Cube Analyst
optimizer performs another iteration of adjustments to determine
whether more improvements are possible, and so on, until no
further decrease in the (negative) value of the objective function is
possible.
This approach places several requirements on the optimizer:
Efficiency in determining optimum step lengths and directions
Avoidance of local minima and location of the global

minimum (this means being sure that no values of step length
and direction could lead to a better result)
Identification of the minimum point when in the neighborhood

of one (this means achieving a stable convergence point)
There are several possible approaches to calculating optimum step

lengths and directions. These may be considered to represent a
spectrum characterized, at one end, by methods which use a
simple strategy to define a step length and direction, but spend
more time adjusting these elements through more iterations; at the
other end, the methods spend more effort calculating the optimum
step length and direction, but require fewer iterations.
The direction information is held by Cube Analyst in the gradient
search matrix file; this is also known as the Hessian matrix, as the
gradient search matrix is an approximation for the Hessian. The
degree of approximation depends on the method and certain
aspects of the calculation, notably the proximity to convergence
and the number of iterations since the gradient search was last recomputed (controlled in part by Cube Analyst control parameter
ITERH).
The significance of the Hessian matrix for Cube Analyst is that it
provides a mathematical description of the relationships between
model parameters; indeed the Hessian itself approximates to the
variance-covariance matrix. This can be exploited by the optimizer
to update the direction information in an optimum manner.
Through the Cube Analyst control parameter IHTYPE, the user can
select alternative methods. These are listed below in order of
increasing calculation effort given to the step length and direction:
1. Method of steepest descent
2. Newtons method
46
3. Quasi-Newton method
4. Method of scoring.
The default procedure in Cube Analyst uses a combination of

methods (iii) and (iv). It starts by using the method of scoring to
calculate an approximation to the Hessian, which requires
considerable computational effort. Further improvements to the
solution are obtained by the quasi-Newton method, which needs
less computation. This method works well and requires very few
iterations if the solution is in the region of the optimum value.
Otherwise the gradient search matrix is recalculated using a
method to determine the exact Hessian matrix, a new step length is
adopted, and the process repeats itself. (If the exact Hessian cannot
be computed, maybe because the results are still far from a
converged solution, the method of scoring is automatically reapplied.)
As the solution approaches the optimum, the step length is
reduced, allowing the optimum to be located more precisely. A
very small step length indicates a close proximity to the optimum
value and so the search is terminated when the step length is
beneath the threshold defined by Cube Analyst control parameter
UTOL. This is a more practical method of determining when the
calculation should finish than monitoring the gradients
approaching zero.
This section presents a further explanation of Cube Analysts
calculations, as given in Introduction to the mathematics in Cube
Analyst on page 35.
Topics include:
Maximum likelihood method: Background theory
Application of maximum likelihood to Cube Analyst
Cube Analyst objective function
Cube Analyst trip estimation model
Estimating model parameters
Optimization procedure
Parameter errors
Cell reliability
Maximum likelihood method: Background theory

Maximum likelihood is a standard method of estimating
parameters of mathematical modeling equations, based on sets of
relevant data observations. Given values of the model parameters,
the pdf defines the probability associated with the observed data.
When viewed as a function of the model parameters, the pdf is
called a Likelihood function. The values of the parameters which
maximize this function are called maximum likelihood estimates.
They correspond to a model in which the probability of the
observed data is maximized. The estimation process has two
elements of establishing the likelihood function and of
determining the optimum parameter values to maximize it.
Mathematically, the theory may be expressed as:
.....(4)
where:
48
= random variable
= observation
= parameter (or function of a parameter)
The likelihood function is then defined to be:
.....(5)
where:
that is,
is a set of
observations
The optimization process is to find the value of

.
that maximizes
Application of maximum likelihood to Cube Analyst

In accordance with the above theory, but with a slightly altered
notation, the following are defined:
= a data item ( =above)
= an estimated item (
=above)
It is assumed that the appropriate pdf is

.....(6)
where is called the weighting factor. It can be seen that
is
a Poisson random variable with mean . Thus can be
considered a scaling parameter which defines the time units in the
underlying Poisson process.
A likelihood function may thus be defined as:
.....(7)
Taking logarithms of equation (7) leads to:
.....(8)
It may be noted that
= constant
Referring to equation (5), and considering all data items, H, a

likelihood function may be defined as:
.....(9)
For computational ease, the task of maximizing L may be converted
to the minimization of:
.....(10)
where
.....(11)
Equation (10) therefore represents the general form of the
objective function which is minimized by Cube Analyst.
50
Cube Analyst objective function

Cube Analyst allows varied data items to be used in the estimation,
that is, H and h may represent different data items, as shown in the
following table:
Observed data, H, and estimated equivalents, h
Observed data value,
Estimated data value,
Description
Nij
Number of trips with

origin at zone i and
destination at zone j
Oi

origin at zone i
Dj

destination at zone j
QK
Number of trips
through screenline K
where: RijK is the proportion of trips in matrix cell (i, j) using screenline K
Substituting these observed and estimated data items into

Equation (10) gives an objective function shown below, with the
source of the data indicated.
For reasons to do with function evaluation, the estimated tij is
treated as a least squares minimization in the objective function.
The objective function then becomes:
Objective function, M =
Comment
Screenline counts
Trip origins
Objective function, M =
Comment
Trip destinations
Prior matrix
Cost matrix derived
.... (12)
where
indicates summation over cells which are zero in the
prior matrix, but not the cost matrix.
Cube Analyst trip estimation model

The objective function, equation (12) above, is used to calibrate the
trip estimation model of the form:
.....(13)
where tij = Nij
or
Estimating model parameters

It follows, by differentiation of equation (11):
.....(14)
.....(15)
(Note: undefined for h=0)
52
The minimum value of the objective function, M, for a parameter ,

is found when
The remaining steps are to:

1. Calculate
using equation (13) and current values of Model
Parameters.
2. Use the table Observed data, H, and estimated equivalents, h
on page 51 to calculate
data.
3. Calculate
for each set of input and estimated
as we show below, for each set of estimated data.
Substitutions for equation (15)
leads to
.....(16)
where
.....(17)
and
.....(18)
Note:
are constants
is undefined if
or
In equation (16) we need to substitute each set of model

parameters for . We start by determining
reproducing Model Equation (13),
for each parameter
.....(13)
where
= constant
or
let
Then differentiating (13) gives:
(for each )
.....(19)
(for each )
.....(20)
(for
( )
) .....(21)
.....(22)
.....(23)
Finally, we substitute (19) to (23) into (16) for each value of , and
use an optimization procedure to choose parameter values that
give values of that minimize the objective function (9).
54
Optimization procedure
Given an initial guess
Cube Analyst computes the maximum
likelihood estimates by generating a sequence of estimates
from
where
is a suitable step length, and
vector given by
denotes a search
For the method of scoring used by Cube Analyst,

is equal to the
expected value of the Hessian matrix. It may be shown that this can
be represented as
where indicates the expected value, and

, which
denotes the gradient vector of the objective function, , with
respect to the model parameters, .
The
entry of the matrix
is given by
.....(24)
From equation (16) we can write
.....(25)
This leads to
.....(26)
The formulae for
and
are given in equations (19) to (23).
When
is calculated by the quasi-Newton method (as previously
described in Introduction to the mathematics in Cube Analyst on
page 35), the Hessian matrix updates the expected value,
,
using the BFGS update formula.
Parameter errors
The optimization produces an estimate
the variance of
parameter , and an estimate of the parameter value itself,
Therefore,
Standard Error =
.....(27)
and the range within one Standard Error is
Cell reliability
The sensitivity of the estimate of
, is defined to be
.....(28)
where is the objective function,
second differentials.
56
and represents a matrix of

Hierarchic estimation is described in Chapter 7, Hierarchic
Estimation.
Hierarchic estimation calculates two forms of matrix, the district
matrix and a set of local matrices. Apart from the aggregation of
information which is implied by converting a zonal matrix to a
district matrix, the estimation of a district matrix is entirely similar
to a standard estimation. The estimation of the local matrices is,
equally, similar, but it introduces a new set of data, derived from the
district matrix, which are referred to as side constraints.
To understand this side constraint information, we show a local
matrix in a schematic form in the following figure.
Relationship of side constraints with local matrices
The set of side constraint variables, in terms of prior observed (H)

and estimated (h) data, and associated confidence levels, , are:
H
PZTZ
PZTR
PRTZ
h
FZTZ =
Tij
PZTZ
FZTR =
Ti1
PZTR
FRTZ =
T1j
PRTZ
NOTE: The specifications of PZTZ (observed), FZTZ (estimated), etc.,
are indicated in the figure, Relationship of side constraints with

local matrices.
Note that the corresponding confidence levels, PZTZ, PZTR and
PRTZ are all set by the user with Cube Analysts ZCONF control
parameter.
(The confidence levels for the trip ends applied to the district
matrix are set according to the minimum values of the generation
and attraction trip ends confidence levels found in the trip end file.)
These values of H and h are the substituted in the same manner
which applies to other sets of data represented by H and h.
58
Data Preparation and

Analysis
This chapter focuses on the tasks which the user undertakes as part
of the estimation process. Topics include:
Overview
Matrices
Trip ends
Networks and traffic and passenger counts
Screenlines
Routings
Setting confidence levels
Tuning estimation performance
Control of routing information
Analyzing the results
Data Preparation and Analysis

Overview
Overview
There are a series of data preparation tasks which are discussed in
the following sections. Most of the tasks only require data files to
be created in a relatively mechanistic manner, but two of the tasks
require the user to make considered choices. These are discussed in
Screenlines on page 65 and in Setting confidence levels on
page 70.
The final sections in this chapter explain the estimation stage in
terms of tasks facing the user. As Cube Analyst usually requires
minimal input from the user, apart from the supply of prepared
data files, the estimation stage is very straightforward. However,
advice is given on possible ways of improving the speed of
estimation. This may be achieved through:
Influencing the strategy used to calculate the Hessian matrix,

which is used in the optimization stages of Cube Analystsee
Tuning estimation performance on page 73
Avoiding unnecessary detail in the routing files, which can be

burdensome for the data processing elements of Cube
Analystsee Control of routing information on page 74
The final set of activities for the user are to analyze the results to
assess the quality of the estimation, partly to determine if and how
they might need to be improved. This topic is discussed in
Analyzing the results on page 75.
The ideas introduced in this chapter are subsequently illustrated in
later chapters with an example application of Cube Analyst, based
on an actual study. Further details on points covered in this section
are provided in the standardized estimation procedures.
60

Matrices
Matrices
Cube is used to set or modify individual cells or ranges of cells. This
also permits confidence levels to be easily set to global or
individual values. For example, you can use a prior matrix (Table
101) to give information about basic trip patterns.
Prior matrix (Table 101)
|
| 20 |
| TABLE = 101
(Prior
)
| 20 |
|
1
2
3
4
5
6
7
8
9
10 | 20 |
|
------------------------------------------------+ 20 |
| 1:
1
1
0
5
45 126
50
21
30
55 | 20 |
| 2:
1
5
0
70 125
36
38
50
58
14 | 20 |
| 3:
1
1
0
2 108 119
90
69 148
44 | 20 |
| 4:
69
3
0
1
6
7
6
3
25
3 +----+
| 5:
100
1
0 192
71
20
12
11
14
7 |
| 6:
36
2
0
88
52
6
3
7
16
13 |
| 7:
62
3
0
32
36
58
9
63
9
61 |
| 8:
0
1
0
64
65
30 119
19 121
64 |
| 9:
0
7
0
57 123
70 178 279
7
38 |
| 10:
0
10
0
7
31
3
1
10
21
3 |
| 11:
0
13
0
19
35
4
96 170
28
29 |
| 12:
0
5
0
41 286
52 103 117
29
56 |
| 13:
0
9
0
24
99
50
90
91
23
12 |
| 14:
4
3
14
20
56
19
67
58
21
7 |
| 15:
28
2
36
1 185
1
1
2
15
1 |
+----------------------------------------------------------+

Matrices
You can use an associated confidence matrix (Table 102) to

discriminate between data reliability for different groups of
movements.
Confidence levels (Table 102)
|
|
| TABLE = 102 (Confidences
)
|
|
1
2
3
4
5
6
7
8
9
10 |
+-------------------------------------------------------------+
|
1:
20
20
20
20
40
40
20
20
20
20 |
|
2:
20
20
20
20
40
40
20
20
20
20 |
|
3:
20
20
20
20
40
40
20
20
20
20 |
|
4:
20
20
20
20
40
40
20
20
20
20 |
|
5:
40
40
40
40
40
40
40
40
40
40 |
+-+--------------------------------------------------------+ 40
Intrazonals can be included in the matrix. Note that because

routings only cover inter-zonal trips, the intrazonals will not be
affected by the screenline counts. They will just impact on the trip
ends. So as their role is limited, there is a case for omitting
intrazonals from the estimation. Note that if intrazonals are
included in the trip ends, then they should also be included in the
matrix. If the trip ends do not include intrazonals, the intrazonal
cells of the input matrices should be zero.
62

Trip ends
Trip ends
Trip ends may be determined either by reference to an existing
matrix, surveys (for example, of parking), or they may be calculated
from equations.


Cube is used for preparing networks. Traffic and passenger counts,
together with confidence level information, is input into the
volume field storage areas associated with each link.
64

Screenlines
Screenlines
Screenlines are used to minimize the effects of assignment errors.
Screenlines are defined as the set of count sites which intercept
traffic/passenger flows between sets of zones which share the
same general corridors of movement (across which the screenlines
are suitably located).
The extent of a screenline is determined by the number of
alternative (reasonable) paths which are available. In many public
transport networks where services are sparse, or in rural highway
networks, there may only be a single reasonable route between
one general area and another. In this case, screenlines may
correspond to single links (although they are still treated as
screenlines in this context of Cube Analyst). In general, however, a
screenline will represent a set of links.
In the case of highways, a useful type of screenline is provided by a
river or a railway line, that has only a few crossing points. In this
case all traffic must be routed through known points, and so
assignment error associated with the screenline will be minimized.
For Cube Analyst, there is no difference between a group of traffic
counts on separate links (that form a logical screenline) and a single
link count amalgamating the flows on separate traffic lanes.
There will normally be few, if any, screenlines that entirely bisect a
study area and so intercept all trips either side of it. Cube Analyst
therefore employs the concept of partial screen lines. They are
partial in the sense that they do not extend between the
boundaries of a study area, but they intercept all trips between, at
least, certain defined pairs of zones.
The method for defining such partial screenlines is manual, and
based partly on judgement and the availability of count data sites.
The routing information, together with user-defined screenlines, is
used to define the set of O-D pairs whose routes they intercept. The
aim is to group count sites into screenlines that balance the
objectives to:

Screenlines
1. Maximize the number of O-D pairs that have all routes passing
through a screenline.
2. Minimize the number of O-D pairs per screenline, as this
maximizes the information value of the counts for the

corresponding matrix cells.
The following figure shows an example of screenlines for an
example urban area.
Typical screenline configuration for an urban area
66

Screenlines
Features that these screenline locations demonstrate are shown in

the following table.
Screenline location
Function
Northern
Screenline over a single link (for example, a bridge)

intercepts all traffic to and from the North.
Western
Parallel, alternative routes from the West require a single

screenline intercepting both routes for this corridor.
Southern Ring Road
Non-radial traffic is intercepted by (two) screenlines on

orbital road.
Eastern
Similar parallel routes for long distance traffic to Western

side, but parallel routes for local traffic require additional,
shorter screenline. Note use of count location in more
than one screenline.
Central Area
Detailed movements in centre intercepted with several

short screenlines.

Routings
Routings
Matrix estimation requires information about which routes are
used to connect each pair of origin and destination zones, and the
probability that each route is used. Ideally this would come from
survey information, but this is onerous and not very practical, so
the method uses modeling instead. This routing information is one
of the outputs from the assignment process. For TRIPS users it is
stored in the route choice probability (RCP) file. For Cube Voyager
Highway users, it is stored in the Cube Voyager path file. For Cube
Voyager Public Transport users, it is stored in route files.
This section discusses two types of routings:
Highways
Public transport
Highways
The main requirement for Cube Analyst is for the routings to reflect
all reasonable alternative paths whilst avoiding spreading out too
much so that they become unrealistic.
For Cube Voyager users, the paths reflected in the Intercept file
derive from combining the all-or-nothing paths from each
assignment iteration into one set. This can be done directly in the
HIGHWAY program. Alternatively, HIGHWAY can be used to
generate a path file, and the appropriate path sets and volumes
selected from it for use in Cube Analyst.
TRIPS users could use a similar approach, or apply one of the
stochastic methods. When considering networks where congestion
is a factor, the assignment itself relies on the trip matrix that the
estimation is trying to provide. Hence it may be preferable to apply
routes derived using methods that can calculate multiple routes
between zones based on stochastic (statistical) methods, rather
than to rely on the paths from a capacity-restrained assignment.
TRIPS supports two such methods, known after their originators as
Burrell and Dial. Both methods can be used successfully with Cube
68

Routings
Analyst, but Burrell can have limitations in large networks when

routes traverse large numbers of links. In this case, the central limit
theorem of statistics means that the chances of routes having the
same cost for a different set of randomized link costs (which is the
approach used in Burrell) become higher the more links occurring
on an average route. The consequence of this is that it is more
difficult to generate varied routes. (It can be noted, in passing, that
the length of routes in terms of distance is not a problem for the
implementation of Burrell used in TRIPS.) The Dial method is not
subject to this effect concerning routes with many links so it is the
approach that is advised. Note that in cases where estimation is
being used to update a matrix that is not anticipated to have
changed by very much, for instance, it was obtained from a
relatively recent survey, then the RCP file from an existing
converged capacity restraint assignment may be used in
preference to Dial. The choice here is a matter of judgement on
relative accuracies of the RCP information.
Public transport
Cube Voyager PT outputs to route files by user class. Many controls
affect the routing, but a factors file provides a means to determine
the extent of multirouting. TRIPS automatically produces
multiroute paths and can also store them in a RCP File. The
determination of which links are used to connect pairs of origindestination zones is a function of a path building algorithm which
generates a set of reasonable paths. These are based on
considerations of generalized cost, which reflect users data about
transit times, fares, boarding and transfer penalties, and so on. A
submode split model can be used to reflect passenger biases when
deciding if different modes (bus, metro, rail, etc.) are candidates for
inclusion into the set of reasonable paths.


Mathematically, confidence levels have the dual facets of being
sampling rates and weighting factors. Confidence levels are
entered as percentages but, from both points of view, values of
greater than 100 are legitimate.
This section discusses:
Characteristics of the data
Deciding on confidence values
Characteristics of the data

The ability of a confidence level to help match an estimated data
item (trip end, screenline flow, matrix cell) to its corresponding
observed value is influenced by:
1. Data consistency
If data is consistent and free of errors, then the confidence

levels will have no influence as they, essentially, help to
mediate between different estimates implied by different data
items. Conversely, more discrepancies within the data increase
the importance of confidence levels.
2. Data quantity
As all data is present in the objective function (see Maximum

likelihood objective function on page 40), the quantity of data
is influential, besides the confidence levels. This means that, for
example, relatively large confidence levels applied to the prior
matrix, which has many data elements, will tend to restrict the
scope of a few count sites to influence the estimated matrix to a
significant degree. Of course, this may be the desired effect in
some circumstances.
70

An improved match with any data item can always be achieved

with an arbitrarily large confidence level, but it will normally be
necessary for users to check the appropriateness of confidence
levels that are input.
Deciding on confidence values

A practical approach to setting confidence levels is often to
establish a dataset as a reference benchmark, and then set the
confidence levels of other data relative to this. For example, if a
program of automatic counting means that traffic counts are well
and recently observed, then these may be given a high confidence
level, say 100, and confidences for other data set relative to that
value.
Note that an implied range of 1 - 100 (or of that order of
magnitude) has been found to be suitable for many studies. Large
applications (say, of 500 zones or more) will tend to encounter a
greater range of absolute data values, which can imply the need for
a wider range of confidence levels (see the discussion above). The
need for this is suitably assessed by means of sensitivity analysis on
the confidence levels.
Some general observations applying to confidence levels for
different categories of data are given below, in descending order of
magnitude of confidence levels for most applications:
At least some count sites should have observations made over

several days (weeks, etc.) to determine basic levels of variability
associated with single observations.
Count confidences should be set with respect to the time

period applying to the estimated matrix (for example, a series
of counts made on Tuesdays is only a partial observation if the
matrix is to correspond to an average working day).
In the case of highways, trip end confidences are unlikely to

exceed count confidences, and will usually be less due to
observational difficulties; in the case of public transport, the
two sets of confidences are more likely to be similar.

72
Even when trip ends have been determined simply from the
row and column totals of the prior matrix, the aggregation of
the data means that the trip end confidences will be higher
than the corresponding individual cell confidences. For this
reason, trip end data should always be used when a prior
matrix is input.
Prior matrix cells are, individually, unlikely to have high

confidences even when collected by recent, good surveys
because there are so many elements of the matrix. This
becomes truer as the number of study area zones increases
(due to the difficulty of observing all possible movements
adequately).
Cost matrix data may be obtained reasonably reliably, but the

relevant confidence concerns the use of this data for trip
estimation and this normally only offers an approximation.


In general Cube Analyst should be run with default parameter
settings. In the majority of cases this will lead to a converged
solution, within a reasonable number of iterations.
In some cases an excessive number of iterations may be required or
Cube Analyst may be unable to find a converged solution. In the
latter case Cube Analyst will report that it has halted optimization
for a reason such as No further progress possiblelinear search
failed, rather than the successful message Convergence detected.
Such a message is usually caused by excessively inconsistent data
being input to Cube Analyst which pulls the optimizer in opposite
directions to the extent that no solution can be found.
To correct this, the user is normally required to check the input
data. However, Cube Analyst does provide an extra control in the
form of the parameter ITERH. This determines the frequency by
iteration for the calculation of the Hessian matrix (see Optimizer:
Finding the minimum value on page 44) which directs the
optimizer towards the solution. Although this calculation is a time
consuming process, it will result in the optimizer converging in
significantly fewer iterations. For the case of unconverged
problems, recalculation of the Hessian may provide the direction
which the optimizer needs to find a solution. For example, if a
problem was halted after 58 iterations, try setting ITERH=50 to see
if a new Hessian will allow the optimizer to converge.
In most cases, recalculation of the Hessian matrix will result in
longer run times. In particular, time will be wasted if ITERH is set to
low values such as 40 or less. Cube Analyst will determine a suitable
value for ITERH. It is only recommended for the user to set ITERH in
order to attempt to solve convergence problems (which are
encountered only exceptionally).


For many estimation runs, the production of the O-D intercepts for
screenlines and/or part-trip data takes as much or even more time
than the actual estimation itself. Cube Analyst just needs the
reasonable paths so controlling the routing to avoid the production
of routes used only by a small proportion of trips is an important
aspect of achieving practical run times for the estimation. This is
particularly the case for public transport which can often supply a
huge variety of routes. For large models this could result in the
production of the intercepts requiring an excessive time to
complete; this can be an order of magnitude greater than if
parameters are given appropriate settings. Too many routes can
also result in file sizes becoming too large for practical use.
Routing information can be supplied to Cube Analyst in the form of
a TRIPS RCP file, or Cube Voyager path file. Cube Voyager can also
supply an intercept file via the Highway and PT programs. If an
intercept file is not input, then before starting the estimation
proper, Cube Analyst analyses the routes through screenlines
and/or part trip links to produce the intercepts which it saves in an
ICP file. It is important to note that this intercept file can be input
back into subsequent estimations as long as the links of the
screenlines and/or part trip data are not modified. This is achieved
by setting option INTCPT=T or WARMST=T as appropriate and will
result in a considerable time saving.
74


Cube Analyst produces its results as a set of tabulation for printing
or viewing, and as a set of files which may be subject to further
analysisone of these files is the estimated matrix itself.
The tabulations in Cube Analysts printout are ordered as follows,
after the standard header information:
1. Summary of input data characteristics, showing:
Data types were used in the estimation
Average confidence levels, and their ranges
the number of data elements for each type of data.
This information indicates the relative weighting of data in

the estimation process, which is important to know when
assessing the results.
2. A summary of the values of key indicators from the last five
iterations before the optimization halted. The indicators, and

their values, are the same as Cube Analyst outputs to the screen
during the course of its calculations. They are:
Iteration number
Step size
Value of the objective function
Estimated matrix total number of trips
The reason for halting is also shown, which will normally be

convergence detected.
This information is mainly provided for confirmation that the
estimation calculations operated in an appropriate manner (for
example, that the objective function value never increased).
These two elements of Cube Analysts printout are shown in
Results of estimationincluding part trip data on page 89
(and in an abbreviated form in Confidence and convergence
summary on page 83);

3. The remainder of Cube Analysts tabulations are concerned
with comparisons between the users input data and the

corresponding values derived from the estimated matrix.
Comparative information is output, when applicable, for:
Trip matrix totals
Part-trip data
Total trip generations from zones
Total trip attractions to zones
Screenline flow counts
The general pattern of this comparative information from Cube

Analyst is shown in Results of estimationincluding part trip
data on page 90 (Trip end comparison of prior (observed) and
estimated values on page 84 and Screenline comparison of
prior (observed) and estimated values on page 85 contain this
information in a slightly altered format).
Results of estimationincluding part trip data on page 89
and Results of estimationincluding part trip data on
page 90 illustrate the case for Cube Analyst including part-trip
data. Hierarchic estimation output conforms to this same basic
pattern, but extra information is provided, as explained in
Chapter 7, Hierarchic Estimation and illustrated in Figures
8.12a - 8.12d. xxx
As a rule, the user will be looking for good correspondences
between input data and estimated results. However, it is
important to note that a poor comparison between input and
estimated information is not, by itself, a sign of a poor quality
estimation. The reason is (or should be) that a data item with a
higher confidence level is dominating the estimation with
respect to data which is also relevant, but which has a lower
confidence level.
The approach to analyzing Cube Analysts comparative results
is, therefore, to identify data which has not been matched well
in the estimation and to determine what the other data might
be causing the discrepancy. Often this is straightforward, for
76

example, a screenline flow count with a markedly different

value from trip end values for adjacent zones. If the discrepancy
seems unwarranted then this may be a cause to review either
the data values themselves, or their confidence levels. (One
cause of discrepancies which may not be immediately
apparent, is poor routing information, for example, on account
of inappropriate generalized cost parameters.)

78
Estimation Process
This chapter discusses the estimation process in the Cube Analyst

application. Topics include:
Study area
Data
Evaluation: Sensitivity analysis
Including part-trip data
Estimation Process
Study area
Study area
This section discusses a highways based application of Cube
Analyst for an 82-zone study area for the town of Guildford in
Surrey, UK, (pop. 100,000). The network shown has a major bypass
for the town, which is shown as a thicker line. Zone centroid
connectors are shown as pale blue lines. Eleven zones were
designated cordon-crossing zones at the study area boundary.
Guildford highway network
80
Estimation Process
Data
Data
The network was well provided by current traffic counts and these
were all given a confidence level of 80, which served as a
benchmark for other data confidences. Most of the trip end data
was synthesized, by disaggregation of UK Department of Transport
data with reference to zonal population and employment figures,
and was given confidence levels of 40. Higher confidence values of
80 were set for external trip ends, determined from a cordon
crossing survey, and to a set of five zones in the town center area
that were the subject of a car park survey. An out-of-date trip
matrix existed, which served as the prior matrix, and which was
given a uniformly low confidence for each cell of 5. Sixteen
screenlines were defined, which are shown in the following figure.
Screenlines for Guildford
MVHWAY in TRIPS was used to calculate three sets of Burrell paths.

The degree of randomization was controlled by setting the SPREAD
parameter to 25, a relatively low value selected after viewing paths
Estimation Process
Data
for different values using MVGRAF and using local knowledge of

the network. MVHWAY was also used to prepare a cost matrix based
on minimum cost routes.
82
Estimation Process

Cube Analyst offers a number of controls on the calculation process
and convergence criteria, but these were left to take default values
and the process of running Cube Analyst itself was entirely
straightforward. However, a series of estimation runs were
undertaken, as described below.
The results provided by Cube Analyst of the first estimation are
shown. These show extracts of the Cube Analyst printed reports,
from which a number of observations can be made.
Because each data item enters the objective function, the

number of elements associated with each different type of data
is significant, as well as their confidence levels.
Confidence and convergence summary
AVERAGE CONFIDENCE LEVELS (EXCLUDING ZERO VALUES)

Average Maximum Minimum Number of
Elements
Trip matrix confidence levels
5.0
5.0
5.0
6724
Screen line confidence levels
80.0
80.0
80.0
16
Trip end (dest) confidence levels 47.8
80.0
40.0
82
Trip end (orig) confidence levels 47.8
80.0
40.0
82
Optimisation halted because:
Convergence detected
The optimizer adjusts model parameter values and evaluates

the resulting cell estimations in a series of iterations. The
mathematics of the optimizer implies that it will converge to a
solution in a number of iterations which is less than the number
of model parameters.
Estimation Process
Trip end comparison of prior (observed) and estimated values

MVESTM with Counts, Input Prior Matrix and Trip Ends Only
REPORTING OBSERVED/ESTIMATED GENERATIONS AND ATTRACTIONS
ZONE
1
2
3
4
5
6
7
8
9
10
Some
30
31
32
33
34
35
36
37
38
39
40
Some
70
71
72
73
74
75
76
77
78
79
80
81
82
84
GENERATIONS
ATTRACTIONS
NO
OBS.
EST. OBS-EST
%
OBS.
EST. OBS-EST
%
4869.0
4324.3
544.7
11.2%
3657.0
3591.6
65.4
1.8
3825.0
3745.0
80.0
2.1%
2984.0
3571.1 -587.1 -19.7
1798.0
2559.5 -761.5 -42.4%
5715.0
5710.1
4.9
0.1
419.0
383.2
35.8
8.5%
558.0
528.3
29.7
5.3
1256.0
1572.5 -316.5 -25.2%
2018.0
2156.1 -138.1 -6.8
2045.0
1731.1
313.9
15.4%
2084.0
1998.6
85.4
4.1
1935.0
1815.4
119.6
6.2%
2112.0
2194.3
-82.3 -3.9
1794.0
1894.8 -100.8
-5.6%
2673.0
2815.2 -142.2 -5.3
3662.0
3364.9
297.1
8.1%
4763.0
4247.7
515.3 10.8
430.0
388.9
41.1
9.5%
273.0
307.1
-34.1 -12.5
missing....
3870.0
3176.5
693.5
17.9%
2370.0
2375.0
-5.0 -0.2
2778.0
2618.2
159.8
5.8%
1304.0
1616.1 -312.1 -23.9
5450.0
4633.8
816.2
15.0%
3257.0
3175.7
81.3
2.5
2943.0
2741.1
201.9
6.9%
3006.0
2807.4
198.6
6.6
736.0
806.5
-70.5
-9.6%
1151.0
1107.2
43.8
3.8
368.0
785.9 -417.9 -113.5%
930.0
909.7
20.3
2.2
4042.0
4062.2
-20.2
-0.5%
1523.0
1570.5
-47.5 -3.1
1821.0
1964.4 -143.4
-7.9%
2026.0
2083.9
-57.9 -2.9
4719.0
4763.3
-44.3
-0.9%
2683.0
2326.7
356.3 13.3
3116.0
3440.8 -324.8 -10.4%
6410.0
6234.9
175.1
2.7
3030.0
3369.2 -339.2 -11.2%
5227.0
6016.3 -789.3 -15.1
missing....
1829.0
1639.9
189.1
10.3%
1251.0
1214.6
36.4
2.9
1089.0
1160.4
-71.4
-6.6%
1298.0
1364.2
-66.2 -5.1
4396.0
4122.0
274.0
6.2%
4226.0
3952.8
273.2
6.5
10600.0 11231.3 -631.3
-6.0% 11100.0 11146.4
-46.4 -0.4
6950.0
6931.0
19.0
0.3%
5806.0
5720.2
85.8
1.5
9200.0
9605.6 -405.6
-4.4%
9200.0
9384.8 -184.8 -2.0
14423.0 15045.9 -622.9
-4.3% 14313.0 14109.6
203.4
1.4
1008.0
824.4
183.6
18.2%
722.0
655.4
66.6
9.2
2270.0
2217.8
52.2
2.3%
2270.0
2236.2
33.8
1.5
5665.0
5396.6
268.4
4.7%
5665.0
5465.7
199.3
3.5
26660.0 26727.9
-67.9
-0.3% 28912.0 27872.0 1040.0
3.6
5310.0
5258.3
51.7
1.0%
5990.0
5940.6
49.4
0.8
6033.0
6390.8 -357.8
-5.9%
6085.0
6601.1 -516.1 -8.5
Estimation Process
Cube Analyst prints basic comparisons of input and estimated

data for:
Trip ends (Trip end comparison of prior (observed) and

estimated values)
Screenline inputs (Screenline comparison of prior

(observed) and estimated values)
This information must be interpreted with care, as a difference

may be a good feature, indicating that some other, more
reliable information has determined the estimated result.
Screenline comparison of prior (observed) and estimated values
MVESTM with Counts, Input Prior Matrix and Trip Ends Only
REPORTING OBSERVED/ESTIMATED SCREEN LINE COUNTS
SCRLINE NO
OBSERVED
ESTIMATED
OBS-ESTM
%
1
11677.0
11301.1
375.9
3.2%
2
11677.0
11925.8
-248.8
-2.1%
3
27947.0
26234.4
1712.6
6.1%
4
25504.0
25213.3
290.7
1.1%
5
28539.0
31075.9
-2536.9
-8.9%
6
28431.0
30261.4
-1830.4
-6.4%
7
18981.0
15441.2
3539.8
18.6%
8
18809.0
18445.5
363.5
1.9%
9
24000.0
23770.1
229.9
1.0%
10
24435.0
23585.0
850.0
3.5%
11
7225.0
7635.8
-410.8
-5.7%
12
7225.0
8479.7
-1254.7
-17.4%
13
16285.0
16367.7
-82.7
-0.5%
14
22670.0
23883.7
-1213.7
-5.4%
15
6261.0
6511.4
-250.4
-4.0%
16
6022.0
6886.0
-864.0
-14.3%
Estimation Process

The estimated matrix was also evaluated by examining how
sensitive the results were to changes in the input data:
Alterations in confidence levels The effect of assumptions in

setting confidence levels was tested by increasing the
confidence levels from 80 to 200 on two screenlines for the
major traffic carrying road (the town bypass).
Using the previously calculated model parameter and gradient
search (Hessian) matrix, the re-estimation, in this case, required
only six iterations. The differences between observed and
estimated screenline counts were correspondingly improved.
Flow differences
Flow % differences
Screenline
(i)
(ii)
(i)
(ii)
Before (80)
1713
291
6.1
1.1
After (200)
1094
219
3.9
0.9
Elsewhere, other screenlines were marginally affected, both

better and worse, apart from one screenline where the
improvement was much more noticeable.
In general, the results suggested that small changes in
confidence levels were not significant, but that improvements
were obtainable where it was possible to refine values of
confidence levels rationally.
Matches of estimated and input data can always be improved
for individual data items by increasing the corresponding
confidence, but this will only have a net improvement on the
estimated matrix when it does not exacerbate data
inconsistencies.
86
Estimation Process

The original estimation of the Guildford matrix was later updated
using a set of data which corresponded to a license plate match
survey taken around the center of the town. The data was
preprocessed and converted into a set of link flows, as illustrated in
terms of bandwidths; this also serves to indicate the extent of the
survey.
Part-trip data shown as link flows, using bandwidths
The estimation was re-run, now incorporating the following sets of

information:
Prior matrix
Trip ends
Link counts
Estimation Process
Part-trip data
The figure Part-trip data and link counts shows the two sets of link
flow information which were used. Link counts, shown as open
bandwidths, and part trip, as shown previously in the figure Parttrip data shown as link flows, using bandwidths on page 87. It may
be noted that some links had both link count data and part trip
data. In this application, the confidence levels for link counts were
set higher, at 80 or more, than those for part-trip data, which were
set at 60 in recognition of the sampling process inherent in license
plate surveys.
Part-trip data and link counts

Results of new estimation
Extracts of the Cube Analyst results of new estimation are shown in

Results of estimationincluding part trip data on page 89 and
Results of estimationincluding part trip data on page 90. These
are similar to those presented in Estimating the matrix on
page 83, but with additional information concerning part-trip data,
and with some differences of presentation format. From Results of
88
Estimation Process
estimationincluding part trip data on page 90, it may be noted

that the estimated part-trip flows match the overall number of
observed part trips, in this case, to within 1.9%. This, of course,
partly reflects their relatively high confidence levels and number
of elements, which are reported near the top in Results of
estimationincluding part trip data. Number of elements for
part-trip data is the number of (one-way) links with part-trip data.
The figures, Part-trip data shown as link flows, using bandwidths
on page 87 and Part-trip data and link counts on page 88, in fact,
show respectively estimated and observed part-trip data, but the
difference is too small to make clear graphically in this particular
application. It is therefore useful to view the correspondence as a
tabulation. This report is shown in Report on observed and
estimated part trip data on page 92, which is headed by a key
explaining the storage of information in volume fields.
Results of estimationincluding part trip data
------------------------------------------------Average
Trip end (dest) confidence levels
Trip end (orig) confidence levels
Part Trip confidence levels
Maximum
5.0
95.0
47.8
47.8
60.0
Minimum
5.0
200.0
80.0
80.0
60.0
Number of
5.0
80.0
40.0
40.0
60.0
6642
16
82
82
226
SUMMARY OF FINAL FIVE ITERATIONS

-------------------------------Iteration
34
35
36
37
38
Stepsize
(Tolerance=0.00010)
0.0003559
0.0001208
0.0001890
0.0001123
0.0000580
Optimisation halted after

Objective
Value
-8859208.83
-8859208.83
-8859208.83
-8859208.83
-8859208.83
Matrix
Total
229655.8
229656.0
229655.9
229655.9
229655.9
38 iterations because:
Estimation Process
Results of estimationincluding part trip data

REPORTING PRIOR/ESTIMATED MATRIX TOTALS
CONFIDENCE
PRIOR
ESTIMATED ESTM-PRIOR (ESTM-PRIOR)/PRIOR(%)
5.0 238498.0
229655.9
-8842.1
-3.7%
REPORTING OBSERVED/ESTIMATED PART TRIP FLOW TOTALS
CONFIDENCE
OBSERVED
ESTIMATED
ESTM-OBSV
(ESTM-OBSV)/OBSV(%)
60.0
972944.0
991158.2
18214.2
1.9%
GENERATIONS
ZONE NO CONFIDENCE OBSERVED ESTIMATED ESTM-OBSV (ESTM-OBSV)/OBSV(%)
1
40.0
4869.0
4714.7
-154.3
-3.2%
2
40.0
3825.0
3756.0
-69.0
-1.8%
3
40.0
1798.0
2015.2
217.2
12.1%
4
40.0
419.0
398.8
-20.2
-4.8%
5
40.0
1256.0
1381.2
125.2
10.0%
6
40.0
2045.0
1879.7
-165.3
-8.1%
7
40.0
1935.0
1866.6
-68.4
-3.5%
8
40.0
1794.0
1786.8
-7.2
-0.4%
9
40.0
3662.0
3490.7
-171.3
-4.7%
10
40.0
430.0
411.9
-18.1
-4.2%
Some missing....
ATTRACTIONS
1
40.0
3657.0
3661.8
4.8
0.1%
2
40.0
2984.0
3142.7
158.7
5.3%
3
40.0
5715.0
5668.6
-46.4
-0.8%
4
40.0
558.0
535.7
-22.3
-4.0%
5
40.0
2018.0
2067.6
49.6
2.5%
6
40.0
2084.0
2000.6
-83.4
-4.0%
7
40.0
2112.0
2092.6
-19.4
-0.9%
8
40.0
2673.0
2629.0
-44.0
-1.6%
9
40.0
4763.0
4437.3
-325.7
-6.8%
10
40.0
273.0
279.5
6.5
2.4%
Some missing....
90
Estimation Process

SCREENLINE CONFIDENCE OBSERVED ESTIMATED ESTM-OBSV
OBSV(%) NO OF
ODs
NO & NAME
1 A'shot Rd W-E
80.0
11677.0
11370.7
-306.3
-2.6%
219
2 A'shot Rd E-W
80.0
11677.0
11651.1
-25.9
-0.2%
221
3 A3-Hogs Back S-N
200.0
27947.0
26670.6
-1276.4
-4.6%
153
4 A3-Hogs Back N-S
200.0
25504.0
24896.4
-607.6
-2.4%
154
5 A3-Parkway W-E
80.0
28539.0
29956.5
1417.5
5.0%
538
Some missing....
Estimation Process
Report on observed and estimated part trip data

NETWORK IDENTIFIER <Network with Estimated Part Trip Flows>
VOLUME FIELD 1 NAME <Obsv> - Observed Link Counts
VOLUME FIELD 2 NAME <Conf> - Confidences Levels for Link Counts
VOLUME FIELD 3 NAME <PrtT> - Observed Part Trip Data
VOLUME FIELD 4 NAME <PrtC> - Confidence Levels for Part Trip Data
VOLUME FIELD 5 NAME <EPtr> - Estimated Part Trip Data
Print Comparisons of Part Trip Data and Estimates
REPORT 4:
LINK VOLUME FIELDS
ANODE
2119
BNODE
2105
2112
2644
-------------- ---------- ---------1 <Obsv>
8552.
18809.
0.
2 <Conf>
80.
80.
0.
3 <PrtT>
6871.
17775.
9073.
4 <PrtC>
60.
60.
60.
5 <EPtr>
7139.
18420.
10808.
REPORT 4:
LINK VOLUME FIELDS
ANODE
2120
BNODE
2127
2207
2212
-------------- ---------- ---------1 <Obsv>
0.
5387.
0.
2 <Conf>
0.
80.
0.
3 <PrtT>
4906.
3497.
2821.
4 <PrtC>
60.
60.
60.
5 <EPtr>
4565.
3356.
2822.
ANODE
2843
BNODE
2113
2194
2841
-------------- ---------- ---------1 <Obsv>
1226.
0.
0.
2 <Conf>
80.
0.
0.
3 <PrtT>
3635.
213.
0.
4 <PrtC>
60.
60.
60.
5 <EPtr>
3325.
231.
0.
92
Hierarchic Estimation
This chapter discusses hierarchic estimation. Topics include:
Introduction to hierarchic estimation
Alternative approaches to hierarchic estimation
Defining districts
Running Cube Analyst for hierarchic estimation

This section provides an overview of hierarchic estimation. Topics
include:
Approaches to estimating very large matrices
Different levels of detail: Districts and zones
Different approaches to hierarchic estimation
Approaches to estimating very large matrices

There are formidable data processing and computational issues to
be faced when estimating very large matrices, whose size may lie in
the range of 2,500 to 10,000 zones for major transport studies.
Theoretically, the matrices can have between 25002 and 100002
(6,250,000 to 100,000,000) cells to estimate, although the practical
number of cells with non-zero trips will only be a fraction of this.
Nevertheless, the number of cells to be estimated in typical
applications will be of the order of 250,000 to 750,000 cells.
The natural approach, which is used in hierarchic matrix estimation,
is to reduce the estimation problem to a more manageable size by
grouping information. However, it is necessary to recognize that
the pattern of trips across many large study areas, such as
conurbations, is not readily partitioned. For example, a data item
such as a flow count or a trip end may relate to trips with dispersed
origins and destinations which may not easily be grouped.
It is therefore a feature of Cube Analyst hierarchic estimation that
each of the different approaches to estimation offered, and which
are described below, always considers all of the trips in the entire
study area.
Different levels of detail: Districts and zones

The approaches offered by Cube Analyst hierarchic estimation
considers the OD matrix at two levels of detail:
94
Fine level, which is the original zoning system and results in a

zonal matrix
Coarser level, which aggregates (groups) sets of zones into a

limited number of districts, from which a corresponding district
matrix may be produced
The total number of trips in the zonal and district matrices is the
same.
Different approaches to hierarchic estimation

The main method is called hierarchic estimation as the estimated
district matrix is used to control a series of estimations primarily
conducted at the zonal level. This process leads to a fully updated
zonal matrix.
Hierarchic estimation also allows a variant method in which the
district matrix is defined as a mixture of district and zonal detail.
The resulting district matrix which is estimated includes some
cells estimated at the zonal level. The output estimated matrix has
fewer rows and columns than the input matrix, but there will be a
direct correspondence between certain of the cells as selected by
the user. This variant is valuable when it is only necessary for the
application to update cells relating to only parts of the large study
area, for example, to update cells for an administrative borough
within a large city region. The method only requires a single
estimation, rather than the series of estimations used in the main
hierarchic estimation process. This hierarchic estimation variant is
referred to as combined district and zonal estimation.
The underlying estimation process is common to all Cube Analyst
runs but there are differences in how information is grouped in
hierarchic estimation. Apart from differences in information
grouping, the combined district and zonal estimation is very similar
to a standard estimation. The hierarchic estimation method
introduces a new concept, which is called a local matrix. This is
explained in Local matrices on page 98.

This section describes alternative approaches to hierarchic
estimation. Topics include:
Estimation with mixed district and zonal detail
Local matrices
Summary of the hierarchic estimation process
Estimation with mixed district and zonal detail

The majority of this section is concerned with hierarchic estimation,
but it begins with a view of the approach for combined district and
zonal estimation, shown in the figure Combined estimation of
selected zones and districts on page 97. This shows the estimated
matrix where the sides of the cells have been scaled according to
the geographical size of the areas to which they relate. That is, the
large sides correspond to districts and the small sides to zones. This
has resulted in three types of cells:
96
Large squares All information is estimated at district level
Small squares All information is estimated at zonal level
Rectangles Information is estimated at a mixture of district

and zonal detail
Combined estimation of selected zones and districts
The user may choose whether to retain information at mixed levels

of details, as shown, or (manually) to extract the cells fully
estimated at zonal detail (the small squares the figure) to update a
portion of the zonal prior matrix.
As shown in the figure, the detailed estimation has been for trips
traveling from one part of the study area to another; if the small
squares were located on the diagonal of the main square shown,
then the detailed estimation would be for all trips within, and
traveling to and from, a particular part of the study area, such as a
town center area.
Some points to note about this approach are:
Although the terms zonal and district have been used to

indicate different levels of detail, Cube Analyst considers this
form of estimation as a special form of district estimation,
without recognizing that a selected number of districts are
simply individual zones.
There must be the same number of origin and destination

districts, which is not the case for hierarchic estimation.
this approach requires a single estimation.
Local matrices
When using hierarchic estimation, Cube Analyst first estimates a
district matrix, which is used to influence the calculation of a set of
local matrices. These local matrices contain a mixture of zonal detail
and district-based information. The estimated zonal detail is
captured automatically by Cube Analyst and, as each local matrix is
estimated, is used to develop progressively an update of the entire
matrix at the zonal level of detail. The district matrix simply
represents the zonal matrix aggregated into a district matrix,
although the district matrix may be non-square, that is, there may
be a different number of origin and destination districts. Further
information about districts is given later in this section.
Consider a local matrix that is an extension of the combined district
and zonal matrix shown and discussed in Estimation with mixed
district and zonal detail on page 96.
Zonal estimation controlled by district matrix
98
In this diagram all of the large squares, where information is only

estimated at district level, have been shaded. This is because this
portion of the matrix is treated in a local matrix as a single unit,
termed Rest-of (the)-World RoW.
A local matrix, therefore, has the following elements:
Detailed zonal level set of cells (the small squares)
Trips in the Rest-of-World (shaded area)
Trips from RoW to zonal level area (rectangular cells)
Trips to RoW from zonal level area (rectangular cells)
A local matrix is defined for each origin and destination district pair
(the unshaded part in the figure represents one such pair), and the
fully estimated (zonal) matrix is produced when all local matrices
have been estimated.
Information involving trips from the RoW is obtained from the
district matrix. This element, and the fact that the total number of
trips is the same (in principle) for each local matrix, ensures that
consistency is maintained across the entire study area, even though
detail is calculated separately in estimations for different parts.
Summary of the hierarchic estimation process

The hierarchic estimation process may be summarized in four
stages:
Creation of districts from zones
Estimate district matrix
Estimate local matrices
Build-up full estimated matrix
Creation of districts from zones
The following figure shows a study area divided into many (small)
zones (denoted by ij). These are grouped into a number of fewer
(and larger) districts (denoted by IJ). Subsequent topics in this
chapter give more information about creating districts.
Districts (I,J) and zones (i,j)

This is the first operation by Cube Analyst, which estimates a small

matrix for the 5 to 15 origin and destination districts which are
typically defined.
One of the cells, corresponding to a pair of origin and destination
districts, which contribute to a local matrix, is referenced as Mij. The
figure Estimate district matrix on page 101 indicates the
information in the district matrix estimation: the prior matrix and
trip ends are automatically aggregated from the users input zonallevel information. Internally, Cube Analyst creates a condensed
100
network but does not aggregate the screenline count data. This
treatment of data is reflected in Cube Analysts reports on the
district matrix (see Figure 7.12b).

Estimate local matrices
Cube Analyst can estimate all Local matrices in one run, but the
user may exercise considerable control over this process.
This example relates to a single Local matrix, but this stage is
repeated for all Local matrices. The example considers the same
structural elements introduced in the discussion on Zonal
estimation controlled by district matrix on page 98. The
information used to estimate Zonal cells, referenced as Mij,
includes:
Prior matrix and trip ends are used at zonal level in the
estimation
Count data is used as input where relevant to the local matrix.
Other items are obtained from the corresponding district

matrix estimation.
This use of information is reflected in Cube Analysts reports on

local matrices (see Figures 7.12c and 7.12d).
Estimate local matrix

Build-up full estimated matrix
This example indicates the construction of the fully estimated

matrix from detailed information (Mij) calculated from a set of local
matrices. When the matrix is in the form shown in the figure (with
only some of the cells estimated), it is referred to as the partially
estimated matrix. Those cells in the partially estimated matrix
which have not yet been estimated contain copies of the
corresponding prior matrix cells.
102
(This can provide another means of estimating just part of a study

area, namely, by restricting the estimation to selected
districts/zones of interest.)
When all cells of the partially estimated matrix have been
estimated, it, of course, becomes the final fully estimated zonal
matrix.
Combine local matrices in partially estimated matrix
Defining districts
Defining districts
Hierarchic estimation is a heuristic method which approximates the
formal mathematical methodology provided by a standard run of
Cube Analyst. It is most appropriate when the study area is large
enough to encompass sub-areas which can become districts where
the travel patterns are reasonably independent of one another.
The purpose of the estimated district matrix is, largely, to consider
the inter-district movements, while the focus of local matrices is the
intra-district movements. Because precision (greater detail) is
associated with the latter, it is desirable to minimize the amount of
inter-district movements.
The number of local matrices is approximately the square of the
number of districts. It therefore can make a considerable difference
to computational times whether, say, 10 districts are chosen (about
100 local matrix estimations) or 8 districts (about 64 local matrix
estimations).
Not all study area zones may be allocated to districts in this way,
either because some or all trips from or to a zone do not pass
through a screenline, or because allocation of the zone to a district
would violate the maximum number of zones per district. Zones
are then allocated to the adjacent district, based on the coordinates
associated with zone centroids. The effect of allocating zones to
district which is not based on routing behavior is potentially to
worsen the effects of the approximation implicit in hierarchic
estimation. In many cases, this worsening may be negligible in
practice, but will be more significant if those zones involve
relatively large numbers of trips, or if a significant proportion of
zones are involved. It is this latter consideration which makes it
inadvisable to use hierarchic estimation on study areas with less
than 500 zones.
The considerations involved in defining districts may be
summarized as:
104
The fewer districts the better
Defining districts
The maximum local matrix size is determined by the maximum

size of standard estimation that may be conveniently run on
the available computer (say 1000 - 2500 zones)
The more allocation of zones to districts on the basis of

routings through screenlines the better
Note that it is a feature of hierarchic estimation districts that there

may be a different number of origin and destination districts (that
is, the district matrix may be non-square), and the allocation of
origin zones to origin districts is independent of the allocation of
that same zone to a destination district. This enables the
asymmetries of trip patterns to be reflected, as, for example, in a
morning peak matrix when trips originate from many zones in the
suburbs and head for only a few destination zones in the city
center. This is of value to the estimation process, but means that the
district matrix and the local matrices cannot be reported directly.

Cube Analyst is run in a similar manner to non-hierarchic
estimation except that:
Option DSTRCT=T, to indicate calculation/use of a district

matrix
LMC and DDF files are input additionally
Parameter ZCONF is set
If Cube Analyst is run with an incomplete LMC file, then the

estimated matrix is a partially estimated matrix. This matrix
provides an additional input file when further local matrices are to
be estimated.
The model parameter file only ever contains information relating to
the district matrix (and not any local matrices), and the execution
log file contains brief summary information for both district and
local matrix estimations.
The printout file for hierarchic estimation contains the same type of
information as for non-hierarchic estimation, as illustrated in
Estimating the matrix on page 83. However, there may be many
sets of this information: the first set of information always refers to
the district matrix estimation. This is followed by a set of
information for each local matrix being estimated, noting that this
may be none in the case of a combined district and zone
estimation. (Because estimations involving many local matrices can
generate very large print files, it can be convenient to edit the local
matrix control file to create a series of runs of Cube Analyst in which
the size of individual print files is reduced.)
An additional item of information is provided for hierarchic
estimation concerning the influence of the district matrix on each
local matrix estimation. The table with this information, shown in
Figure 7.12c, is labeled Side constraints on matrix totals. This term
refers to the constraints of the district matrix on various sides (and
106
elements) of the local matrix, as illustrated previously in Estimate

local matrix on page 102. Reporting Hierarchic Estimation Results,
discusses the printout for hierarchic estimation.
Parameter ZCONF
The extent of the constraining effect of the district matrix on the
local matrices is determined by Cube Analyst parameter ZCONF,
which acts as a confidence level, treating the district matrix as
observed data and the local matrix as estimated. For the local
matrix estimation, therefore, the district matrix is just another item
of observed data and ZCONF should be set in relation to
confidence levels for other items of observed data.
From the users point of view, the setting of ZCONF should be a
reflection of the degree and importance of the interaction between
districts, in terms of trips which cross more than one origin or
destination district boundary. (An effect of the automatic
generation of districts is to minimize such boundary crossings.) The
district matrix contains information about these interactions; if they
are important then the district matrix should be made
correspondingly significant with a relatively high setting of ZCONF.
A low value of ZCONF allows local matrices to reflect local data
more precisely, at the expense of the larger picture across the
entire study area. A possible symptom of an inappropriate setting
of ZCONF might be an unwarranted distortion of the distribution of
trip costs/lengths in the estimated matrix.
108
Using Cube Analyst
This chapter discusses the process for using Cube Analyst. Topics
include:
Input data: overview
Outputs: overview
Estimating large matrices (hierarchic estimation)
Estimation process
Using Cube Analyst


The data that can be used in estimating the new O-D matrix may
include some or all of the following types of data:
110
A prior (existing) trip matrix
Traffic generations and attractions of zones
Traffic counts on links and/or turns
Modeled (multiple) paths between zones
Cost of travel between zones
Parameters of a calibrated trip distribution function
Part-trip data, where trips are observed traveling between

points which are not necessarily their ultimate origins and
destinations
Using Cube Analyst

Outputs: overview
Outputs: overview
The outputs from Cube Analyst are:
The estimated O-D matrix
Summary Reports, in the form of a print (*.prn) file, describing

the differences between input data and corresponding values
implied by the estimated matrix. The print file also provides a
return code indicating problems during execution, or a
successful completion.
For more information, see Reports on page 115.
A set of files with information on:
Model parameter values
A log of the optimization steps
Internal gradient search and intercept data
Using Cube Analyst


Cube Analyst provides a hierarchic approach to estimation for use
with very large matrices; typically more than 2,500 zones. This is
required to make the process more manageable and less time
consuming.
The basic approach is to estimate a general matrix, in which zones
are automatically grouped into districts. This area-wide estimation
is then used to control a set of detailed estimations, these build up
to provide a fully-detailed estimate for the entire study area. This is
discussed in detail in Chapter 7, Hierarchic Estimation.
112
Using Cube Analyst

Estimation process
Estimation process
The only program directly involved in the estimation process itself
is Cube Analyst, although other Cube programs play an important
part in the pre- and post processing of the data.
The data used may be some or all of the data described earlier in
Input data: overview on page 110.
Cube Analyst may also use model parameters, gradient search, and
intercept files from a previous run of Cube Analyst for the current
estimation to warm start the calculations.
Internally Cube Analyst can be considered to be made up of two
main parts each of which is executed alternately, namely:
Estimation model
The function of this is, given some particular values of the

model parameters, to calculate the estimated matrix, trip ends,
screenline volumes, etc., and also to perform the likelihood
calculation.
Optimization step
This procedure attempts to change the values of the model

parameters to improve the likelihood value (the objective
function).
These two stages are carried out alternately in a series of iterations
until no further improvement can be made.
Using Cube Analyst

Estimation process
114
Reports
This chapter discusses reports you can prepare with Cube Analyst.
Topics include:
Summary of Reports
Sample reports
Reports
Summary of Reports
Summary of Reports
The Analyst reports described in this chapter are saved in a print
(*.prn) file during program execution. The reports include:
A listing of input parameters and options, and input binary

header information.
Mean, minimum, and maximum confidence levels set by the

user for each type of input data are given.
Memory requirements.
A report of each iteration of the optimization process, during

execution in interactive mode. This shows the current value of
the objective function, the gradient tolerance, and the sum of
all the estimated matrix elements. These values for the last five
iterations are always reported.
On completion, Cube Analyst provides summary reports on the

comparison between sets of input data and the corresponding
estimated values, with the confidence levels that apply. Where
relevant data is input to Cube Analyst, these reports are
produced giving comparisons for prior and estimated:
116
Matrices Matrix totals
Trip ends Zone generations and attractions, with input

zone generations and attractions
Link flows Screenline volumes and input screenline

volumes
Part trips Part trip matrix totals, distinguished by line

groups, where appropriate.
Finally, Analyst provides a return code indicating problems

during execution, or a successful completion. The codes are:
0 = Normal Termination
4 = Warning. Review the print file and find the (W) tag for
information on the warning(s).
Reports
Summary of Reports
8 = Fatal Error, Non Immediate Termination. Review the

print file for information on the error(s).
16 = Fatal Error, Immediate Termination. Review the print

file for information on the error(s).
Further information may be obtained by using Cube programs to

report on the estimated matrix file.
For an estimation using part-trip data, the output network file
contains detailed information on estimated part-trip link flows
(equivalent to an assignment of the estimated Part Trip matrix).
Cube Analyst reporting for hierarchic estimations
Cube Analyst reporting for a hierarchic estimation varies according

to whether the estimation is for a district or a local matrix. The
reports for district estimation are the same as for other levels,
except, of course, the results apply to districts rather than zones.
For local matrices, Cube Analyst additionally provides summaries of
the row and column side constraints from the district matrix, and
equivalent values from the prior matrix. The first reported zone
corresponds to the Rest-of-the-World (RoW), while the other
reported zones are the set of zones relevant to that local matrix. No
screenline reports are produced for local matrices.
The execution log file is output by the optimization step of Cube
Analyst, and three levels of report may be produced. These are
controlled by the IREP parameter. The contents of the log file will
not normally be of interest to general users, but are of assistance in
summarizing the progress of the calculation should investigation
be required.
Reports
Sample reports
Sample reports
This section contains examples of reports:
Average confidence level
Final five iterations
Matrix totals and zone generation
Zone attractions
Average confidence level (part trip data)
Part trip totals
District matrix
Local matrix
Average confidence level

-------------------------------------------------

118
Average Maximum Minimum Number of

Elements
20.0
20.0
20.0
6724
95.0
200.0
80.0
16
47.8
80.0
40.0
82
47.8
80.0
40.0
82
Reports
Sample reports
Final five iterations

SUMMARY OF FINAL FIVE ITERATIONS
-------------------------------Iteration
149
150
151
152
153
Stepsize
(Tolerance= 0.0001)
0.0004152
0.0005055
0.0003342
0.0002368
0.0000781
Objective
Value
-4735264.48
-4735264.48
-4735264.49
-4735264.49
-4735264.49
Matrix
Total
239547.4
239548.6
239550.3
239551.0
239551.0
Optimization halted after 153 iterations because:

Final Value of Maximum Search Step, UMAX = 0.01
Matrix totals and zone generation

CONFIDENCE
PRIOR
ESTIMATED ESTM-PRIOR (ESTM-PRIOR)/PRIOR(%)
20.0 238498.0
239551.2
1053.2
0.4%
GENERATIONS
1
2
3
4
5
6
7
8
9
10
11
<continued>
40.0
40.0
40.0
40.0
40.0
40.0
40.0
40.0
40.0
40.0
80.0
4869.0
3825.0
1798.0
419.0
1256.0
2045.0
1935.0
1794.0
3662.0
430.0
9200.0
4387.9
3763.3
2562.3
386.7
1574.9
1743.0
1827.1
1904.4
3288.3
381.3
9347.4
-481.1
-61.7
764.3
-32.3
318.9
-302.0
-107.9
110.4
-373.7
-48.7
147.4
-9.9%
-1.6%
42.5%
-7.7%
25.4%
-14.8%
-5.6%
6.2%
-10.2%
-11.3%
1.6%
Reports
Sample reports
Zone attractions
ZONE NO CONFIDENCE
1
40.0
2
40.0
3
40.0
4
40.0
5
40.0
6
40.0
7
40.0
8
80.0
9
40.0
10
40.0
11
80.0
<continued>
ATTRACTIONS
OBSERVED ESTIMATED
3657.0
3586.4
2984.0
3500.3
5715.0
5556.7
558.0
518.4
2018.0
2162.9
2084.0
1948.0
2112.0
2129.7
976.0
1030.0
2673.0
2804.5
0.0
0.0
5665.0
5549.3
ESTM-OBSV
-70.6
516.3
-158.3
-39.6
144.9
-136.0
17.7
54.0
131.5
0.0
-115.7
(ESTM-OBSV)/OBSV(%)
-1.9%
17.3%
-2.8%
-7.1%
7.2%
-6.5%
0.8%
5.5%
4.9%
n/a%
-2.0%
The trip end summaries can also be produced with the zone labels. Short
zone labels are printed if NODLAB=T, LNGLAB=F:
ATTRACTIONS
ZONE NO,NAME CONFIDENCE OBSERVED ESTIMATED ESTM-OBSV (ESTM-OBSV)/OBSV(%)
1 <Beaumont>
40.0
3777.0
3382.2
-394.8
-10.5%
2 <Cross_Ro>
40.0
3482.0
3441.1
-400.9
-10.4%
3 <Binley_S>
40.0
5815.0
5220.2
-594.8
-10.2%
<continued>
Long zone labels are printed if NODLAB=T and LNGLAB=T. The example below
shows hierarchic zone numbers and long zone labels in the report:
ZONE CONFIDENCE OBSERVED ESTIMATED ESTM-OBSV (ESTM-OBSV)/OBSV(%)
NUMBER & NAME
28480 <Beaumont Avenue>
40.0
5069.0
5544.5
475.5
9.4%
28172 <Cross Roads, town centre>
40.0
4025.0
4392.2
367.2
9.1%
27848 <Binley Street>
40.0
1898.0
2076.7
178.7
9.4%
<continued>
120
Reports
Sample reports
Average confidence level (part trip data)

------------------------------------------------Average
Maximum
Minimum
Number
of
Elements
Part Trip confidence levels
10.0
80.0
46.7
46.7
7.0
10.0
80.0
80.0
80.0
7.0
10.0
80.0
40.0
40.0
7.0
1083
2
95
95
594
Part trip totals

This report is produced if option PRTTRP=T.
For Public Transport data, the report is as follows:
GROUP
CONFIDENCE OBSERVED ESTIMATED
ESTM-OBSV
(ESTM-OBSV)/OBSV
(%)
ALL
7.0 1386723.0 1232440.0
-154283.0
-11.1%
1 Local
7.0
624295.0
597702.1
-26592.9
-4.3%
2 Express
7.0
521925.0
532005.7
10080.7
1.9%
For Highways data, the report is as follows:
CONFIDENCE
OBSERVED
ESTIMATED
ESTM-OBSV (ESTM-OBSV)/OBSV(%)
20.0
1590478.0
1606103.8
15625.8
1.0%
Reports
Sample reports
District matrix
CONFIDENCE
PRIOR ESTIMATED ESTM-PRIOR (ESTM-PRIOR)/PRIOR(%)
20.0 238498.0
240291.2
1793.2
0.8%
GENERATIONS
DISTRICT CONFIDENCE OBSERVED ESTIMATED ESTM-OBSV (ESTM-OBSV)/OBSV(%)
1
40.0 14616.0
13368.5
-1247.5
-8.5%
2
40.0 48050.0
47995.1
-54.9
-0.1%
3
40.0
7855.0
7711.1
-143.9
-1.8%
4
40.0 40478.0
42008.8
1530.8
3.8%
5
40.0 62530.0
59877.4
-2652.6
-4.2%
6
40.0 15462.0
16832.4
1370.4
8.9%
7
40.0 18734.0
19158.2
424.2
2.3%
8
40.0
6744.0
6707.7
-36.3
-0.5%
9
40.0 26890.0
26631.9
-258.1
-1.0%
ATTRACTIONS
DISTRICT CONFIDENCE OBSERVED ESTIMATED ESTM-OBSV (ESTM-OBSV)/OBSV(%)
1
40.0 21562.0
22434.0
872.0
4.0%
2
40.0 43850.0
43476.7
-373.3
-0.9%
3
40.0 43963.0
44217.7
254.7
0.6%
4
40.0 21627.0
20809.8
-817.2
-3.8%
5
40.0 30926.0
30638.1
-242.9
-0.8%
6
40.0 37198.0
39445.7
2247.7
6.0%
7
40.0
8070.0
7973.4
-96.6
-1.2%
8
40.0 15332.0
16906.5
1574.5
10.3%
9
40.0 14423.0
14344.4
-78.6
-0.5%
SCREENLINE CONFIDENCE OBSERVED ESTIMATED ESTM-OBSV OBSV(%)
NO & NAME
1 A'shot Rd W-E
80.0
11677.0
11303.8
-373.2
-3.2%
2 A'shot Rd E-W
80.0
11677.0
11947.6
270.6
2.3%
<continued>
122
NO OF ODs
244
244
Reports
Sample reports
Local matrix
REPORTING SIDE CONSTRAINTS ON MATRIX TOTALS
DISTRICT
IN-PRIOR ESTIMATED ESTM-DISTRICT (ESTMDISTRICT)
CONSTRAINT
/ZONAL(%)
WITHIN DISTRICT
1506.2
958.0
1456.7
-49.5
-3.3%
FROM DISTRICT
6204.9
6276.0
5966.2
-238.7
-3.8%
TO DISTRICT
19303.5
16610.0
19181.9
-121.6
-0.6%
NOT IN DISTRICT
213276.5 214654.0 214338.2
1061.7
0.5%
MATRIX TOTAL
240291.2 238498.0 240943.0
GENERATIONS
R-o-W
40.0 233504.0
233520.1
16.1
-0.0%
25
40.0
1557.0
1625.1
68.1
-4.4%
26
40.0
1753.0
1654.5
-98.5
-5.6%
27
40.0
378.0
338.8
-44.2
-11.7%
28
40.0
1535.0
1339.8
-195.2
-12.7%
55
40.0
211.0
232.5
21.5
10.2%
56
40.0
875.0
878.7
3.7
0.4%
60
40.0
268.0
296.6
28.6
10.7%
61
40.0
1278.0
1061.8
-216.2
-16.9%
ZONE NO
R-o-W
30
44
48
53
72
77
ATTRACTIONS
CONFIDENCE OBSERVED ESTIMATED
40.0 215324.0
220304.4
40.0
2370.0
2431.7
40.0
12392.0
11794.8
80.0
1708.0
1757.0
40.0
209.0
273.1
80.0
4226.0
3691.0
80.0
722.0
661.0
ESTM-OBSV (ESTM-OBSV)/OBSV(%)
4980.4
2.3%
91.7
3.9%
-597.2
-4.8%
49.0
2.9%
64.1
30.7%
-535.0
-12.7%
-61.0
-8.5%

NO & NAME
1 A'shot Rd W-E
80.0 11677.0
11459.1
-217.9
2 A'shot Rd E-W
80.0 11677.0
12580.3
903.3
OBSV(%) NO OF ODs
-1.9%
244
7.7%
244
Reports
Sample reports
Note that as for standard estimations, short and long zone labels
can be shown in the trip end reports. The label for the R-O-W
(Rest-of-the World) will be left blank.
124
10
Files
This chapter lists permanent files found in Cube Analyst.
Required (R)/
optional (O)3
Symbolic1
parameter
File2
ext
I/O
File description
.CTL
Control Data File
(R)
IMAT1
.MAT
Prior Trip/
(R)
Cost Matrix File

IDAT1
.DAT
Trip End Records
(R) If TRPEND=T
IDAT2
.DAT
Model
Parameters File
(R) If MODPAR=T or
if WARMST=T
IDAT3
.GDS
Gradient Search
File
(R) If WARMST=T
IDAT4
.DAT
Screenline File
(R) If SCRFIL=T
IDAT5
.ICP
Intercept File
(R) If WARMST=T
or If INTCPT=T
INET1
.NET
Network File
(R) If PRTTRP=T
PATH1
.PTH
VOYAGER Path
File
(R) If WARMST=F
and INTCPT=F and
no IDAF1 input
IDAF1
.RCP
Route Choice
Probability File
(R) If WARMST=F
and INTCPT=F and
no PATH input
10
Files
Required (R)/
optional (O)3
Symbolic1
parameter
File2
ext
I/O
File description
IDAF2
.PTL
Lines File
(R) If PRTTRP=T and if network

is Public Transport
IDAF3
.DDF
District
Definition File
(R) If DSTRCT=T
IDAT6
.DAT
Local Matrix
Control File
(R) If DSTRCT=T
IMAT2
.MAT
Partial Estimated
Matrix
(R) If DSTRCT=T and if
Coordinate File
(R) If NODLAB=T or using
IDAT7
.DAT
WARMST=T
hierarchic numbering
OMAT1
.MAT
Estimated Trip
Matrix
(R)
ODAT1
.DAT
Model
Parameter File
(R)
ODAT2
.GDS
Gradient Search
File
(R)
ODAT3
.DAT
Optimization
Log File
(R)
ODAT4
.ICP
Intercept File
(R) If WARMST=F and

INTCPT=T and either SCRFIL=T
or PRTTRP=T
ODAT5
.DAT
Text Intercept
File
(O)
ONET1
.NET
Network File
(R) If PRTTRP = T and DSTRCT

=F
OPRN
.NET
Print File
(R)
1. The SYMBOLIC PARAMETERS are those that would appear in an &FILES

record to control file opening.
2. The file extension shown is that used conventionally when running in
Application Manager.
3. File requirements can vary according to the combination of program
PARAMETERS and OPTIONS chosen.
126
11
Control Data
This chapter discusses control data. Topics include:
&PARAM keywords
&OPTION keywords
11
Control Data
&PARAM keywords
&PARAM keywords
It is usual to leave Cube Analyst control parameters to their default
values, with the user only setting the parameters associated with
data input and output file definition. These are described in
Standard user control parameters.
Of the remaining parameters, there is a set which is sometimes
changed (Secondary user control parameters on page 133) and
another which is rarely changed (Tuning control on page 135).
Most of the parameters in this last set are connected to the
operation of Cube Analysts optimization process, and hence are
only of interest when there is evidence of poor performance in
achieving convergence.
Topics in this section include:
Standard user control parameters
Secondary user control parameters
Tuning control
Standard user control parameters

TABLES
Type = Integer(4)
Default = 101, 102, 0, 0
Example: TABLES=101,102,103,104
The input matrix numbers to be used. They are respectively the
prior trip matrix and confidence levels, and the cost matrix and
confidence levels.
MATID
Type = Character(60)
Default = Blank
Example: Estimated Matrix for Study Area
128
Control Data
&PARAM keywords
Matrix identifier. Up to 60 alphanumeric characters can be used to

describe the contents of the output matrix. The identifier should be
enclosed in single quotes (').
WIDEND
Type = Integer
Default = 2 if hierarchic numbering, 0 otherwise
Range = 0-3
Example: WIDEND = 0
Indicates the format of the Screenline File:
0 = Cube Analyst establishes the format automatically. For this to
happen, all numeric entries in the file need to be right justified for
Cube Analyst to determine the file format unambiguously.
1 = Version six format. This supports the record format where both
link flow and link toll data are stored in the same record type.
2 = Version seven format. Two types of screenline record format can
be defined at version seven; link flow records (S in column one) or
link toll records (T in column one). The former type must be used
for Cube Analyst runs.
3 = Version seven format with a CNode column inserted to support
the input of turning counts in addition to link counts.
If hierarchic numbering is in use (HIERND = T), WIDEND must be set
to 2 to use the version seven format
MFORM
Type = Integer
Default = 0
Range = 0-4
Example: MFORM = 0
Indicates the format of the output matrix:
11
11
Control Data
&PARAM keywords
0 = Save the matrix in the same format as the input matrix. This is
the default action.
1 = TRIPS
2 = TP+/VOYAGER
3 = TRANPLAN
4 = MINUTP
DEC
Type = Character (1)

Default = Blank
Range = '0' to '9', 'S', 'D', or blank
Example: DEC='4'
Defines the precision with which to store values in the output
matrix:
Blank = Uses same precision as in the input trip matrix. If just a cost
matrix is input, a value of 2 is used.
'0' to '9' = Store numbers in the matrix as integers representing
values to the specified number of decimal places.
'S' or 'D' = Store numbers as floating point numbers in either single
or double precision. Double precision gives more accuracy to a
greater number of decimal places than single. These values give the
best representation in the output matrix, but will generally produce
a bigger output file. This option is only available if the output
matrix format is TP+/VOYAGER.
PSETS
Type = Integer(50)
Default = 1
Range = 1 to Number of paths sets in the input VOYAGER path file
Example = 1,3
Applies only when a VOYAGER path file is input. It defines the path
sets to apply when building the intercepts for the screenlines. At
least one set must be specified, and sets are referenced by their
number rather than by their name.
130
Control Data
&PARAM keywords
PVOLS
Type = Integer(50)
Default = 1
Range = 0 to number of volumes in the input VOYAGER path file
Example = 1,2
Applies only when a VOYAGER path file is input. It defines the
volumes to apply when building the intercepts for the screenline. If
a value of 0 is specified, the volumes will be ignored, and the
weighting of alternative routes will be solely defined by the
iteration factors. Otherwise, PVOLS is a list of numbers that are 1 or
more, representing the selected volume.
NETID
Type = Character(40)
Default = Blank
Example: NETID='Network with estimated link flows'
Network identifier. Up to 40 alphanumeric characters can be used
to describe the contents of the output network. The identifier
should be enclosed in single quotes ('). NETID is only used if reading
part trip data (PRTTRP=T).
EFLOW
Type = Integer
Default = 2
Range = 1-20
Example: EFLOW = 4
The number of the volume field in the output network into which
the total link flows estimated by Cube Analyst will be written.
EFLOW is only used if PRTTRP=T.
11
11
Control Data
&PARAM keywords
ELINEn
Type = Integer
Default = ELINE(n)=2+n*2
PT Only
Range = 1-20
Example: ELINE1 = 6
ELINE(n) is the number of the volume field in the output network
into which the link flows estimated by Cube Analyst will be written
for line group n. ELINEn is only used if PRTTRP=T and doing a Public
Transport matrix estimation.
NFLOW
Type = Character(4)
Default = 'EFLW'
Example: NFLOW = 'EFLW'
Volume field identifier. Up to four alphanumeric characters can be
used to indicate the contents of the volume field specified by
EFLOW. The identifier should be enclosed in single quotes (').
EFLOW is only used if reading part trip data (PRTTRP=T).
NLINE
Type = Character(4)*8
Default = 'NLINEn='ELGn'
PT Only
Example: NLINE1 = 'ELG1'
Volume field identifier. Up to four alphanumeric characters can be
used to indicate the contents of the volume field specified by
ELINEn. The identifier should be enclosed in single quotes (').
NLINEn is only used if PRTTRP=T and doing a Public Transport
matrix estimation.
132
Control Data
&PARAM keywords
ZCONF
Type = Integer
Default = 100
Range = 1-10000
Example: ZCONF =200
Confidence level for side constraints applied to local matrices and
derived from estimated district matrix.
Secondary user control parameters

The parameters described in this section would only be used to try
to reduce the processing times required to achieve convergence.
Refer to Computation times on page 159.
MXITER
Type = Integer
Default = 3000
Range = 1-999999
Example: MXITER = 1500
The maximum number of iterations. Cube Analyst will stop if this
number of iterations has been reached and no convergence has
been achieved. The model parameter and gradient search files are
written out and can be used to restart Cube Analyst (from the
position it was in when it stopped) and the optimization continued.
The currently estimated matrix is also output.
ITERH
Type = Integer
Default = Generated by Cube Analyst
Range = 1-9999
Example: ITERH = 4000
The number of iterations between recalculations of the estimated
Hessian matrix.
11
11
Control Data
&PARAM keywords
UTOL
Type = Real
Default = 0.0001
Range = 0.0-99.0
Example: UTOL = 0.05
The accuracy tolerance in detecting convergence or failure. When
the maximum absolute size of the search vector is less than this
value then the procedure will be deemed to have converged.
IREP
Type = Integer
Default = 3
Range = 1-3
Example: IREP = 2
Reporting level for the optimization log file. See Information in the
optimization log file on page 157.
IHTYPE
Type = Integer
Default = 4
Range = 0-4
Example: IHTYPE = 2
This controls the type of optimization process used by Cube
Analyst, as shown in the following table. The difference for values 1
- 4 correspond to differences in the way the initial Hessian matrix,
H0, is calculated.
Methods of optimization
134
Optimization process
Value of IHTYPE
Comments
Steepest Descent
Simple searching
Quasi-Newton
1,2,3
1 = H0 set to unit matrix

2 = H0 read from file (warm start)
3 = H0 computed every iteration
Newton/Hybrid Newton
Hessian calculated regularly,

according to setting of ITERH
Control Data
&PARAM keywords
Tuning control
The parameters documented in this section would normally be
changed only in response to an error message generated by Cube
Analyst. In the event of this occurring, please contact
support@citilabs.com for advice.
MXCALL
Type = Integer
Default = 5000
Range = 1-999999
Example: MXCALL = 10000
The maximum number of function evaluations. (This should be
greater than MXITER. At least one function evaluation is required at
each iteration, possibly more.)
MXFREE
Type= Integer
Default= 4
Range= 1-10
Example: MXFREE = 7
The number of times a parameter may be freed from its bounds.
UMAX
Type = Real
Default = 1.0
Range = 0.0-1000.0
Example: UMAX = 0.5
The maximum allowed search step. If the maximum absolute value
of the search vector (called UNORMX in the log report) is greater
than this then the entire search vector is multiplied by a term
UMAX/UNORMX so that the new maximum entry is equal to UMAX.
11
11
Control Data
&OPTION keywords
&OPTION keywords
Note: Options TRIPM and COSTM work in conjunction with one
another.
TRIPM
Type = Logical
Default = True
If TRIPM = T then the input matrix file will contain at least two
tables. The first will be the prior trip matrix; the second will be the
associated confidence levels.
IF COSTM = F then these will be the only two matrices present in
the file.
TRIPM = F is only allowed if COSTM = T.
COSTM
Type = Logical
Default = False
If COSTM = T and TRIPM = T, then the input matrix file will contain
four two tables. The first two are as described above; the third will
be the cost matrix and the fourth will be the associated confidence
levels.
If COSTM = T and TRIPM = F, then the cost and confidence level
matrices will be the first and second supplied in the file.
SCRFIL
Type = Logical
Default = True
If SCRFIL = T then an input screenline file is supplied. See
Screenline file on page 140.
136
Control Data
&OPTION keywords
TRPEND
Type = Logical
Default = True
If TRPEND = T then an input trip end data file is supplied. See Trip
end file on page 142.
INTCPT
Type= Logical
Default= False
If INTCPT=T then an input Intercept file is supplied. See Intercept
file on page 149.
MODPAR
Type= Logical
Default= False
If MODPAR = T then an input model parameter file is supplied. See
Model parameter file on page 144.
WARMST
Type= Logical
Default= False
If WARMST = T then gradient search, model parameter, and
intercept files are supplied to warm start the estimation calculation.
The input of these files from a previous run of the same model
should assist the speed of optimization, but see Approaches to
running Cube Analyst on page 152.
HIERND
Type= Logical
Default= Set automatically from RCP input file; False if no RCP file
input.
11
11
Control Data
&OPTION keywords
HIERND=T indicates that a hierarchic node numbering system is in

use. This option only needs to be set if no RCP file is input. If there is
an RCP input file, the setting of HIERND in the control file will be
ignored.
NODLAB
Type= Logical
Default= False
If NODLAB=T, zone labels will be included in the Cube Analyst
reports. A coordinate file must be supplied containing node labels.
Coordinate file on page 143.
LNGLAB
Type= Logical
Default= False
LNGLAB=T to include long zone labels in the Cube Analyst reports.
Note that NODLAB = T must also be set to use long labels. If
NODLAB=T and LNGLAB=F, the short zone labels will be used. A
coordinate file must be supplied containing node labels. See
Coordinate file on page 143.
PRTTRP
Type= Logical
Default= False
If PRTTRP = T then an input network file is supplied which contains
part-trip link flows.
DSTRCT
Type= Logical
Default= False
If DSTRCT = T then a local matrix control file and district definition
file must be supplied. See Local matrix control file on page 147
and District definition file on page 148.
138
12
Program Specific Data
This chapter discusses files containing Cube Analyst data. Topics

include:
Screenline file
Trip end file
Coordinate file
Model parameter file
Local matrix control file
District definition file
Intercept file
Gradient search file
12

Screenline file
Screenline file
This file is required if SCRFIL = T.
The screenline file is used to supply link/turn count and confidence
level data to Cube Analyst.
There are two formats of the file supported. The original format
(indicate by parameter WIDEND=2) just supports link counts. An
alternative format (WIDEND=3) has an extra column to allow
turning counts to be specified.
This section describes both formats:
Link count format
Turning count format
Link count format

The format of the file containing just link counts is as follows:
140
Columns
Type
Contents
Character
S screenline record identifier
2-5
Integer
Screenline number
6 - 14
Integer
Anode of link
15 - 23
Integer
Bnode of link
24 - 33
Real
Link traffic volume count
34 - 40
Integer
Confidence level. A number between 1 and 10000,

but usually in the range 1-100, that expresses the
users confidence in the link traffic volume count. This
is used only by Cube Analyst.
41 - 58
Character
Screenline name, up to 18 characters (optional)
60
Integer
Direction code. For purposes of matrix estimation this

must be set to 1.
61 - 70
Integer
X-coordinate (optional) at which to display screenline

name on the screen.
71 - 80
Integer
Y-coordinate at which to display screenline name on

the screen.

Screenline file
Turning count format

The format of the file that supports turning counts is as follows:
Columns
Type
Contents
Character
S screenline record identifier
2-5
Integer
Screenline number
6 - 14
Integer
Anode of link/turn
15 - 23
Integer
Bnode of link/turn
24 - 32
Integer
Cnode of turn (leave blank for link counts)
33 - 42
Real
Link/Turn traffic volume count
43 - 49
Integer
Confidence level. A number between 1 and 10000,

but usually in the range 1-100, that expresses the
users confidence in the traffic volume count. This
is used only by Cube Analyst.
50 - 67
Character
Screenline name, up to 18 characters (optional)
70
Integer
Direction code. For purposes of matrix estimation

this must be set to 1.
Notes:
If a screenline contains more than one link/turn, then Cube

Analyst calculates the screenline count as the sum of the
counts for each link/turn in the screenline. Also, the screenline
confidence level is set as the weighted average of the
individual link/turn count confidence levels.
The file can contain a mixture of link and turning counts. For
link counts, the Cnode should be left blank.
Comment records, which have an asterisk (*) in column one,

may appear anywhere in the file.
12
12

Trip end file
Trip end file

This file is required if TRPEND = T.
The trip end file format for Cube Analyst is therefore:
Columns
Type
Content
1 - 10
Integer
Zone Number
11 - 20
Real
Generations
21 - 40
unused
41 - 50
Real
Attractions
51 - 60
Integer
Confidence Level for Generations
61 - 70
Integer
Confidence Level for Attractions
Comment records, which have an asterisk (*) in column one, may

appear anywhere within the file.
142

Coordinate file
Coordinate file
The input coordinate file must be supplied if option NODLAB has
been set to TRUE. The file supplies the correspondence between
node numbers and their hierarchic equivalents.
The format of the file is summarized below:
Columns
Type
Content
*1 - 10
Integer
Node number (sequential)
11 - 20
Integer
X coordinate
21 - 30
Integer
Y coordinate
*31 - 40
Integer
Hierarchic node number
41 - 48
Character
Text for node; short label (optional)
49 - 80
Character
Text for node; long label (optional)
Notes:
Items marked * must be coded for hierarchic processing.
Sequential node numbers must be unique.
If hierarchic node numbers are being used then:
Hierarchic node numbers must be unique.
Columns 31-40 must be coded on each record.
Text labels should normally be left justified in their respective

fields.
Blank records will be ignored.
Records with an asterisk (*) in column 1 will be treated as

comment records.
Node coordinates are used by the graphics programs and are

therefore optional.
12
12


This file is required if MODPER = T.
This file contains data describing the model parameters and their
attributes. It would not normally be constructed by the user, as on
initiating a run of Cube Analyst the model parameters take a
default value, as shown in the table below.
However, at the end of the Cube Analyst run a file is generated
containing the new model parameters calculated. The amended
file, or indeed the unedited file, can be re-input to Cube Analyst to
invoke a warm start; that is to continue the estimation process
from where the last run finished.
The general format of the file is as follows:
Record
Description
Record one
Header record defining the number/type of

parameters in the file.
The next ZONES records
Values for the A(i) parameters
The next ZONES records
Values for the B(j) parameters
The next SCREENLINE records
Values for the X(k) parameters
The next two records
The a and b parameters of the Distribution

Model
(if COSTM = T)
Where:
ZONES is defined as the number of zones in the matrix.
SCREENLINE is defined as the number of screenlines specified.
Note that comment records, which have an asterisk (*) in column

one, can appear anywhere in the file.
144

The format of the individual record types is as follows:

First record:
[Must not be edited]
Columns
Type
Content
1 - 23
Character
24 - 31
Integer
Number of model parameters
32 - 39
Integer
Number of origin zones
40 - 47
Integer
Number of destination zones
48 - 55
Integer
Number of screenlines
56 - 63
Integer
1 if using cost data, otherwise 0.
64 - 77
Real
Value of objective function
78 - 91
Real
Step size
92 - 99
Integer
Number of iterations completed
Remaining records
Default if parameter
not defined
Columns
Type
Content
1-8
Integer
Parameter number
10 - 22
Real
Parameter value
1.0
24 - 36
Real
Lower bound for parameter
0.1 E-6
38 - 50
Real
Upper bound for parameter
1.0 E10
52 - 64
Real
Scale factor for parameter
1.0
65 - 89
Reserved for Cube Analyst
100 - 107
If the file is not supplied by the user then it is created by Cube

Analyst and the default values shown above are used for each of
the model parameters.
If the second, third and fourth fields all have the same value then
the parameter is deemed to be fixed at this value. It is a
requirement of Cube Analyst that:
At least one parameter must be free otherwise a fatal error is

reported and the program will stop.
12
12

At least one parameter must be fixed. If not done by the user

than Cube Analyst will fix A(1).
An identical format file is created at the end of an Cube Analyst run,

but it will contain the revised parameter values in it. This is so that
Cube Analyst can be re-started from where the last run was finished
if required, or used as a basis for fixing parameter values. Note that
Cube Analyst adds up to three extra columns on the end of each
record which are for its own internal use. The information put there
should not be edited by the user.
146


This file is required if DSTRCT = T. The format of the file is as follows:
Columns
Type
Content
1 - 10
Integer
Origin District
11 - 20
Integer
Destination District
Comments records, which have asterisk (*) in column one, may

appear anywhere in the file.
12
12


This file is required if DSTRCT = T or WARMST = T.
The user may affect the operation of the estimation according to
the grouping of zones into origin and destination districts. The
district definition file which is input to Cube Analyst is a direct
access file, and so it is not amenable to direct alteration by the user.
148

Intercept file
Intercept file
This file is required if INTCPT = T or WARMST = T.
Output by Cube Analyst and Cube Voyager HIGHWAY and PT, this
binary file stores information on routings and screenlines in a
concise format. Once established, it may be re-input to Cube
Analyst to save (substantial) processing times when Cube Analyst is
estimating or re-estimating for data where neither the routings or
screenline locations definitions have been altered. This file cannot
be edited by the user.
Note that there is also a text file version of the intercept file that can
be output. Its purpose is for information only; it is not intended for
subsequent input to Cube Analyst or any other program. The file is
written to if the file is named, and is generated from either the
input or output binary intercept file, depending upon which is
used. For each screenline it shows:
The number of intercepting I-J pairs.
A sub-header under the screenline for each origin I that has

routes that intercept the screenline.
Under each origin, a list of pairs of numbers. The first number of

the pair represents the destination zone J. The second number
of the pair represents the percentage of traffic travelling from
the origin to the destination that routes through the screenline.
12
12


This file is required if WARMST = T.
This is a binary file output by Cube Analyst which is re-read by Cube
Analyst when warm starting a run (WARMST = T). It contains
information used by Cube Analysts optimizer and cannot be edited
by the user.
150
13
Notes on Program Use
This chapter contains information you might find helpful when

using Cube Analyst. Topics include:
Approaches to running Cube Analyst
Selection of model form
Information in the optimization log file
Computation times
Running Cube Analyst from Cube Voyager
13


There are several approaches that the user may adopt towards
running Cube Analyst, which vary according to the information
available to the user about the estimation of any particular matrix.
The approaches may be categorized as:
Initial estimation
Constrained model parameters
Controlling the optimization process
Initial estimation
Only basic input data is required, as contained in the routes (RCP for
TRIPS, PATH or ICP for Cube Voyager users), matrix, trip end, and
screenline files (as well as optionally a network file with part-trip
data). Program control parameters are allowed to take default
values. Occasionally, an input model parameter file may be
required to influence the model form by fixing some parameter
values.
Re-Estimation with altered data: Warm starting
In this case the model parameter, gradient search, and intercept

files from the last (or initial) estimation run are input, additionally to
the user input data.
Warm starting is only valid when the structure of the estimation is
unaltered, this means that the number of data items, screenline
locations, and routings should not be altered. However, data values
and confidence levels may be changed.
Warm starting is useful either to split an estimation into more than
one run of Cube Analyst, for the sake of convenience, or to
undertake sensitivity analysis on the effects of altered data or
confidence level values. When a run of Cube Analyst is split for
convenience, and no input data is changed, then it is efficient to set
the parameter UMAX to the value reported by Cube Analyst at the
final iteration of the previous run.
152

Constrained model parameters

Model parameters may be constrained to:
Reflect a user-specified value (for example, of and

parameters)
Partition an estimation into sub-problems to be

accommodated within computing resources
Alter the nature of the trip estimation equation
If model parameters are fixed or freed from run to run then the
gradient search file from one run should not be used in the next.
Note that Cube Analyst may itself constrain model parameters. This
occurs when the estimated Hessian matrix is found to be,
mathematically speaking, non-positive definite, which arises
when one or more model parameters are degenerate. That is, a
model parameter is not contributing independently of another
model parameter.
In these circumstances, Cube Analyst gives a message reporting
how many such model parameters it is constraining, which is of the
form:
ME (I): XXX MODEL PARAMETERS ARE NOT CONTRIBUTING TO THE ESTIMATION
The constrained model parameters are listed in Cube Analysts log

file.
It is not necessarily a cause for concern when Cube Analyst
constrains model parameters in this way, although it is a signal that
not all data is of value to the estimation because it is strongly
correlated with other data. For instance, link flow counts on
adjacent links of a main road may refer to substantially the same
trips and hence one count is (mathematically) redundant. It is thus
most frequently the case that Xx model parameters are constrained
by Cube Analyst, although other model parameters may be
constrained too.
13
13

Controlling the optimization process

Normally program control parameters should be allowed to take
their default values. However, computation times may be improved
by judicious setting of the control parameters. This is discussed
further in Computation times on page 159 and in Tuning
estimation performance on page 73.
154


Cube Analyst provides capability for the user to control the
structure of the solution and how it is achieved. This section
describes these possibilities. However, it may be observed that
Cube Analyst is usually run with the full, default model form. It may
be shown that in this form, the XK parameter is, strictly redundant,
although it is of value in providing extra degrees of freedom by
which Cube Analyst can handle the effects of errors and
inconsistencies in the input data.
The number of possible parameters in a model is:
Two for each zone (that is the a(i) and b(j))
One for each screenline (the X(k))
Two more if any cost data is to be used ( and )
The model parameter file contains a value for the parameter, the
upper and lower bounds for the parameter, and a scaling factor.
The scaling factor is used only to assist the optimization process in
ensuring that maximum accuracy is obtained. It should be set
equal to the expected value of the parameter in the final
solutionit is only necessary to make this scaling factor of the
same order of magnitude if there are difficulties in ensuring
convergence. If no such difficulties are apparent then the scaling
factor can just be set to 1.0.
The lower and upper bounds for the parameters allow the user to
specify the degrees of freedom which are permitted. In particular if
a model parameter is set to 1.0 and its lower and upper bounds are
also set to this value then a number of standard forms for the
matrix estimation process can be achieved. For example:
Setting all a(i), b(j) and X(k) equal to a fixed value (for example,
1.0) together with their bounds then the problem is reduced to
a Gravity model driven only by the cost data (that is, only and
are allowed to vary). (Note that varying values of a(i) and b(j)
can be used to scale the numbers of trips per zone).
13
13

Setting and and their bounds to a particular value allows

the estimation process to use cost data which has been
previously calibrated.
Setting a(i), b(j) and and to a fixed value and allowing x(k) to
be free provides a link constraint model.
Setting x(k), and to fixed value gives a growth factor model.
Setting and to a fixed value allows a growth factor link

model (note that and are only defined if cost data is
supplied) .
Setting a(i) and b(j) to a fixed value defines a link gravity model.
In some of these cases input data although defined and requested

by the program is not used by the estimation process. The data that
is used in these special cases is summarized in the following table.
Data used in different reduced model forms
Data/model
type
Growth
factor
Trip matrix &

confidences
Trip end &

confidences
Gravity
Growth
factor link
Link
gravity
Routing
information
(RCP)
Trip costs
Link
constraint
It should be observed that there are no specific model parameters

associated with part-trip data, so this form of data is not relevant to
discussions of model forms.
156


The levels of report, determined by the setting of the IREP
parameter are as follows:
IREP=1 A report is only produced at the end of the run. This

shows:
The reason that the optimization has been halted
The value of the objective function (the maximum

likelihood)
The current step size
The minimum tolerance step size
In addition a number of important variables defining the size of

the problem and the parameters input to Cube Analyst are also
displayed.
IREP=2 A report is produced as for IREP 1 and also a report at

each iteration. This shows:
The iteration number and whether or not progress was

made at this iteration.
The number of evaluations made so far in this run. This is

the number of times that a matrix and associated trip and
screenline data have been calculated together with the
likelihood function.
The current step size.
The step size tolerance.
IREP=3 A report is produced as for IREP 2 and also a report at

each time the model is evaluated to calculate the effects of a
particular choice of model parameter. This shows:
The evaluation number
The step size multiplier (used if no progress is made in the

first evaluation to reduce the step size, ALPHA)
The step size (STEP)
13
13

The objective function at the last iteration and at the point

at which the function evaluation is made. (FTRIAL and
FBEST)
A measure of the gradient at the last iteration at the point

at which the function evaluation is made
Other internal variables used only for intermediate

calculations (FGOLD, FJJS)
In addition, reports may be output to the execution log file if the

gradient search matrix is found to be unstable and has to be reinitialized.
158

Computation times
Computation times
Program control parameters should usually take their default
values. However, computation times may be improved by setting
the set of parameters shown in the following table.
Parameters for influencing computation times
Control parameter
Comment
MXITER
MXITER should only be used to terminate an estimation

prematurely when there is some evidence that it is safe
to do so, that is, after Cube Analyst has been initially
allowed to reach convergence.
ITERH
There is a trade-off between the reduction in the

number of iterations and the number of times the
Hessian matrix must be re-calculated. The default value
of ITERH represents an average best value, but it is
worth experimenting with the value of ITERH for
different types of estimation problem.
In some cases, lowering the value of ITERH may guide
the optimiser to a solution which it otherwise could not
find. In other cases, estimating the Hessian too
frequently will add to the run time, sometimes
significantly.
UTOL
Examination of the log file, and the screen display, will

show how the convergence indicator, UNORM, is
approaching the target value set by UTOL. Larger values
of UTOL increase the risk of Cube Analyst terminating
significantly away from the most likely value of the
estimated matrix for the set of input data, while lower
values of UTOL imply lower standard errors for the
model parameters.
13
13


You can run Cube Analyst from Cube Voyager. This section
discusses:
Running Cube Analyst from a VOYAGER script
Files
Running Cube Analyst from a VOYAGER script

Cube Analyst can be executed via a RUN PGM statement, where the
program name needs to be specified as MVESTM71.
Cube Analyst needs to be told the name of the control file to use,
and how much memory it can take. The former is achieved via the
CTLFILE keyword. The memory setting is achieved by specifying a
PARAMETERS entry on the RUN PGM command of the form
PARAMETERS="/m=14" where the number 14 indicates the
amount of memory to use in MB. If no PARAMETERS entry is
specified, a default amount will be applied which will insufficient
for larger problems.
For example, the following command:
RUN PGM="MVESTM71", CTLFILE="C:\Test\Me_Test.ctl", PARAMETERS="/m=100"
ENDRUN
will run Cube Analyst, providing 100 MB of memory.
Files
For Cube Voyager users, the intercept file can be generated by
HIGHWAY and PT. In this case the intercept file name should always
be defined, along with the option INTCPT=T. Alternatively,
HIGHWAY can be used to generate a Cube Voyager Path file, which
is input to Cube Analyst, which will create the screenline Intercepts
from the paths.
160
14
Examples
This chapter contains a set of examples. Topics include:
Estimation with prior trip and count data only
Estimation with prior trip, count, and trip end data
Estimation with warm start and cost data
Estimation with highways part trip data
Estimation with public transport part-trip data
Hierarchic estimation
Example of screenline volumes report
14
Examples

The following control data would be appropriate for an initial
estimation when the only data available for updating a matrix is
from count sites.
Column
1...5...10...15...20...25...30...35..40..45..50..55..60
Estimate with Old Matrix and Count Data
&FILES IMAT1='PRIOR.MAT',
IDAT4='SCRL.DAT',
IDAF1='ROUTES.RCP',
OPRN='MVESTM.PRN',
OMAT1='ESTM.MAT',
ODAT1='PARM.DAT',
ODAT2='GRAD.GDS',
ODAT3='LOG.DAT',
ODAT4='INTCPT.ICP' &END
&PARAM MATID='Estimated Matrix' &END
&OPTION TRPEND=F &END
162
Examples

This is a similar run to 7.1, but additionally trip end data is available.
Also, short zone labels are included in the reports.
Column
1...5...10...15...20...25...30...35..40..45..50..55..60
Estimate with an Old Matrix, Counts, and Trip End Data
IDAT1='TEND.DAT',
IDAT4='SCRL.DAT',
IDAT7='COORD.DAT'
IDAF1='ROUTES.RCP',
OPRN='MVESTM.PRN',
OMAT1='ESTM.MAT',
ODAT1='PARM.DAT',
ODAT2='GRAD.GDS',
ODAT3='LOG.DAT',
ODAT4='INTCPT.ICP' &END
&PARAM MATID='Estimated Matrix - including Trip End Data', &END
&OPTION NODLAB=T, LNGLAB=F &END
14
14
Examples

The following control data would be suitable if, say, some
confidence levels had been altered in data input files, and where
the data included both trip and cost matrices, as well as trip end
and count data. Long zone labels are to be included in the reports.
Column
1...5...10...15...20...25...30...35..40..45..50..55..60
Re-Estimation with altered Confidence Levels
IDAT1='TEND.DAT',
IDAT2='PARM.DAT',
IDAT3='GRAD.DAT',
IDAT4='SCRL.DAT',
IDAT7='COORD.DAT'
IDAT5='INTCPT.ICP',
IDAF1='ROUTES.RCP',
OPRN='MVESTM.PRN',
OMAT1='ESTM2.MAT',
ODAT1='PARM2.DAT',
ODAT2='GRAD2.GDS',
ODAT3='LOG.DAT' &END
&PARAM TABLES=101, 102, 103, 104 MATID='Re-Estimated Matrix' &END
&OPTION TRIPM=T,
COSTM=T,
NODLAB= T,
LNGLAB=T
WARMST=T &END
164
Examples

The following control data would be suitable where the data
included part-trip data as well as a trip matrix, trip ends, and count
data.
Column
1...5...10...15...20...25...30...35..40..45..50..55..60
Estimation including Part Trip data
IDAT1='TEND.DAT',
IDAT4='SCRL.DAT',
IDAF1='ROUTES.RCP',
INET1='PTRIPS.NET',
OPRN='MVESTM.PRN',
OMAT1='ESTM.MAT',
ODAT1='PARM2.DAT',
ODAT2='GRAD2.GDS',
ODAT3='LOG.DAT',
ODAT4='INTCPT.ICP',
ONET1='ESTMPTRP.NET' &END
&PARAM MATID='Estimated Matrix using Part Trip data',
NETID='Estimated Flows (with Part Trip Flows)',
EFLOW=7,
NFLOW='ESTM' &END
&OPTION PRTTRP=T &END
14
14
Examples

The following control data would be suitable for estimating a
public transport matrix where the data included part-trip data,
organized into three line groups, as well as a trip matrix, trip ends,
and count data.
Column
1. 5. 10. 15 .20..25. 30..35 40. 45 .50 .55 .60
Estimation including Part Trip data by line group
IDAT2='TEND.DAT',
IDAT4='SCRL.DAT',
IDAF1='ROUTES.RCP',
IDAF2='LINES.PTL'
INET1='PTRIPS.NET',
OPRN='MVESTM.PRN'
OMAT1='ESTM.MAT',
ODAT1='PARM.DAT',
ODAT2='GRAD.GDS',
ODAT3='LOG.DAT',
ODAT4='INTCPT.ICP'
NETID='Estimated Flows (Prt Trp Flows by Line Group)'
ELINE1=4,
ELINE4=6,
ELINE5=8,
NLINE1='EXPR',
NLINE4='LOCL',
NLINE5='AIRP' &END
&OPTION PRTTRP=T &END
166
Examples
The following control data would be suitable for estimating a very
large public transport matrix where the data included part-trip data
as well as a trip matrix, trip ends, and count data.
Column
1...5...10...15...20...25...30...35..40..45..50..55..60T
itle:
Large PT Estimation using Part Trip Total Link Flow
IDAT1='TEND.DAT',
IDAT4='SCRL.DAT',
IDAF1='ROUTES.RCP',
IDAF2='LINES.PTL',
IDAF3='DISTRICT.DDF',
IDAT6='LMCTL.DAT',
OPRN='MVESTM.PRN',
OMAT1=ESTM.MAT',
ODAT1='PARM.DAT',
ODAT2='GRAD.GDS',
ODAT3='LOG.DAT',
ODAT4='INTCPT.ICP'
NETID='Estimated Flows (total Part Trip flow)',
EFLOW=2,
NFLOW='ETOT' &END
&OPTION PRTTRP=T,
DSTRCT=T &END
14
14
Examples

NO & NAME
1 A'shot Rd W-E
80.0
11677.0
11335.4
-341.6
2 A'shot Rd E-W
80.0
11677.0
11734.9
57.9
3 A3-Hogs Back S-N
200.0
27947.0
26672.2
-1274.8
4 A3-Hogs Back N-S
200.0
25504.0
25160.5
-343.5
5 Onslow St S-N
80.0
18981.0
17479.2
-1501.8
6 Onslow St N-S
80.0
18809.0
18285.2
-523.8
7 Town Centre E-W
80.0
16285.0
16556.5
271.5
8 Town Centre W-E
80.0
<continued>
168
22670.0
22494.5
-175.5
OBSV(%) NO OF ODs
-2.9%
244
0.5%
244
-4.6%
160
-1.3%
160
-7.9%
383
-2.8%
687
1.7%
904
-0.8%
870
Index
A
analysis
Cube Analyst results 75
C
calculations
hierarchic estimation 57
common elements 6
computation times 159
computing resources 9
confidence levels
setting 70
controlling routing information 74
conventions used 8
coordinate file 143
cost data
example using 164
sources of 10
cost distribution function 29
COSTM 136, 136
count data, example using 162, 163
Cube Analyst
overview 21
running for hierarchic estimation 106
running, approaches to 152
D
data
Cube Analyst estimation process 81
sets, Cube Analyst 30
types, Cube Analyst 26
DDF 106
defining
districts 104
district definition file 148

district matrix, calculation of 57
districts, defining in hierarchic estimation 104
DSTRCT
&OPTION keyword 138
district definition file requirement 148
E
EFLOW, &PARAM control parameter 131
ELINEn, &PARAM control parameter 132
estimation
example 163
highway matrices 20
large matrices 112
matrix calculation process 83
performance, tuning 73
process, overview 113
public transport matrices 20
evaluating matrix sensitivity 86
examples
estimating part-trip highway data 165
estimating public transport matrix 166
screenline volumes report 168
F
framework for inputting data, Cube Analyst 12
G
gradient search file 150
H
hierarchic estimation
alternatives to 96
Index
I
approaches to 95
example 167
large matrices, approach for 94
levels of detail 94
matrices calculated 57
overview 94
HIERND 136
&OPTION keyword, Cube Analyst 137
highway matrix
estimating with part trip data, example 165
estimating, compared to public transport 20
I
ICP 74
IHTYPE
&PARAM keyword, secondary parameter 134
including part-trip data 87
inputs
estimating O-D matrix 110
INTCPT 136, 149
intercept file
description 149
introduction 2
IREP 128, 157
ITERH
&PARAM keyword, secondary parameter 133
influencing computation time with 159
L
large matrices, estimating 112
link counts, Cube Analyst input 26
LMC 106
LNGLAB 136, 136
local matrices
control file 147
M
mathematical notation
equation 34
letters and symbols 32
mathematics
calculations summary 48
introduction 35
MATID, &PARAM keyword 128
matrices
estimation process 83
preparing for analysis 61
maximum likelihood method 48
mixed districts 97
170
model form, selecting 155

model parameter file 144
MODPAR, &OPTION keyword 137
MODPER, required file 144
MXCALL 128, 128
MXFREE 128, 128
MXITER 128, 159, 159
N
NETID 128, 128
networks
preparing data for 64
NFLOW, &PARAM keyword 132
NLINE 128, 128
NODLAB 136, 143
O
objective of Cube Analyst 13
O-D 2
optimization log file 157
OPTION Keywords 136
options for using Cube Analyst 15
origin-destination matrix, estimating 2
outputs
Cube Analyst, overview 111
P
PARAM Keywords#widend 128
partial O-D matrix, data type 27
part-trip data
including in estimation 87
inputs 29
routing information 74
passenger counts 64
permanent files 125
prior trip data, estimation with 163
prior trip matrix data 27
PRTTRP 136, 136
public transport matrices 20
public transport part-trip data 166
R
RCP 74
reports
summary, Cube Analyst 116
results analysis 75
route choice probability 68
routing information 28, 74
routings 68
running Cube Analyst
Index
S
approaches 152
from Cube Voyager 160
running Cube Cargo
S
screenline file
description, Cube Analyst 140
screenline volumes report 168
screenlines 65
SCRFIL 136, 140
selecting model form 155
Sensitivity Analysis 86
Setting 70
Confidence Levels 70
Data 30
Side Constraints 57
Study Area 80
summary of reports 116
T
TABLES user control parameter 128
The Estimation Model 113
The Optimization Step 113
traffic counts 64
Trip Cost Matrix 26, 26
Trip End Data 163
Trip End File 142
trip ends
data description 28
determining 63
trip matrix
estimating in Cube Analyst 2
TRIPM 136, 136
TRPEND 136, 142
Tuning 73
Estimation Performance 73
U
UMAX 128, 128
user options 15
UTOL 128, 159, 159
V
Variations 6
W
Warm Start 164
WARMST 136, 150
Whats New in Version 7.1 4
WIDEND 128, 128
Z
ZCONF 106, 128, 128
Zonal Detail 96
Index
Z
172
Citilabs, Inc.
1211 Miccosukee
Tallahassee FL 32308 USA
World Wide Web
www.citilabs.com

RG CubeAnalyst

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RG CubeAnalyst

Uploaded by

Copyright:

Available Formats

Cube Analyst Reference Guide

Copyright 20072013 Citilabs, Inc. All rights reserved.

Cube Analyst Reference Guide

About This Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Cube Analyst Reference Guide

Possible Data Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Data Preparation and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Cube Analyst Reference Guide

Using Cube Analyst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Cube Analyst Reference Guide v

Matrix totals and zone generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

Control Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

Program Specific Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

Notes on Program Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

Cube Analyst Reference Guide

Cube Analyst Reference Guide vii

viii Cube Analyst Reference Guide

Cube Analyst Reference Guide

About This Document

Welcome to Cube Analyst!

Chapter 2, Estimation System

Chapter 3, Possible Data Inputs

Chapter 4, Mathematical Background

Chapter 5, Data Preparation and Analysis

Chapter 6, Estimation Process

Chapter 7, Hierarchic Estimation

Chapter 8, Using Cube Analyst

Chapter 10, Files

Chapter 11, Control Data

Chapter 12, Program Specific Data

Chapter 13, Notes on Program Use

Chapter 14, Examples

Cube Analyst Reference Guide

About This Document

Cube Analyst Reference Guide

Cube Analyst Reference Guide

This chapter introduces you to Cube Analyst. Topics include:

What is Cube Analyst?

Scope of this document

Common elements and variations

Reading this document

Conventions used in this document

Cube Analyst Reference Guide

What is Cube Analyst?

2 Cube Analyst Reference Guide

Scope of this document

Cube Analyst Reference Guide

4 Cube Analyst Reference Guide

Cube Analyst Reference Guide

Common elements and variations

6 Cube Analyst Reference Guide

Reading this document

An overview of Cube Analyst

A set of Standardized Procedures, suitable for different types of

The document considers estimation of highway and public

The next four chapters provide an essential overview of Cube

Chapter 6, Estimation Process documents an example of

Chapter 7, Hierarchic Estimation is concerned with the

Cube Analyst Reference Guide

Conventions used in this document

Parameters, options, and selections appear in upper case.

Terms and phrases with particular meaning in the context of

8 Cube Analyst Reference Guide