Professional Documents
Culture Documents
Sections
1.1 Vectors
1.2 Vector Addition
1.3 Scalar Vector Multiplication
1.4 Inner Product
1.5 Complexity Of Vector Computations
Exercises
1.1 Vectors
| -1.1 |
| 0.0 |
| 3.6 |
| 5.2 |
The elements of the vector are called entries, elements, components, or coefficiens
of that vector.
The size (or dimension, or length) of the vector is the number of entries it
contains
The element above has size 4 and its third (_ note 1 indexing) is 3.6
Two vectors a and b are considered equal iff they have the same size, and the same
elements in the same order.
Denoted by a = b. If a and b are n-vectors a = b iff a_1 = b_1 and a_2 = b_2
and .... and a_n = b_n.
The numbers that are the elements in a vector are called scalars.
We focus on the case where elements of a vector are real numbers (other types of
vectors, where contents are, e.g: complex numbers, exist)
e.g:
v = | a |
| b |
| c |
So | 1 |
| a | where a is a 3-vector is the same as
| 1 |
| a_1 |
| a_2 |
| a_3 |
Subvectors/Slices
a_r:s is the vector that contains elements a_r thru a_s inclusive and has size (s -
r + 1)
The subscript r:s is called an index range.
Zero Vector: A vector whose elements are all 0. Written as 0_n or just 0, the size
being figured out from the context.
A standard unit vector is an n-vector with all its elements equal to zero *except*
one element which is equal to 1.
The i-th unit vector is a standard unit vector with the i-th element equal to 1.
Above, e_1 denotes a unit *vector*, not the ith element of vector e. This is an
example of notational ambiguity.
(e_i)j = 0 if i != j
l if i == j
Unit vectors are sparse since they have only 1 non zero entry
The zero vector is the 'sparsest' possible vector.
nnz(unit_vector) = 1
Examples:
feature vectors.
the elements in a vector denote n different quantities relating to attributes
of a single thing or object. The entries of a feature vector are called features or
attributes.
e.g: a 6 vector v could be age, height, weight, blood pressure, temperature,
gender of a patient admitted to a hospital, with gender being encoded as say, 0 for
male, 1 for female. Note that the quantities have different physical units.
Two vectors *of the same size* can be added together by adding the corresponding
elements to create a new vector called the sum vector.
Example of vector addition
| 0 | | 1 | | 1 |
| 7 | + | 2 | = | 9 |
| 3 | | 0 | | 3 |
Similarly vector subtraction. The result is called the difference of the two
vectors.
e.g
| 1 | | -2 |
(-2)| 9 | = | -18 |
| 6 | | -12 |
If a_1, a_2, ...., a_m are n-vectors, and alpha_1, ... , alpha_m are scalars, then
the n-vector alpha_1 * a_1 + ... + alpha_m * a_m is the linear combination of
vectors a_1 through a_n
The scalars alpha_1 thru alph_m are called the coefficients of the combination.
(KEY) Any m-vector can be written as the linear combination of unit-vectors of size
m.
b = b1 * e1 + ... + b-m * e-m.
where b_i is the i-th entry of b and e_i is the i-th unit vector.
e.g
| 2 | 2 * | 1 | 9 * | 0 | 6 * | 0 |
| 9 | = | 0 | + | 1 | + | 0 |
| 6 | | 0 | | 0 | | 1 |
Examples
1. When vectors represent displacement, linear combinations represents the sum
of scaled displacements.
2. when vectors represent a time series of audio signals over the same time
period (called a 'track') the linear combination is called a mixture or a 'mix'. A
producer in a studio or a sound engineer at a rock show chooses alpha_1 through
alpha_m to provide a balance between different instruments and voices.
3. when a vector represents a cashflow, a linear combination represents a
replication
4. When a and b are different vectors, the affine combination
when 0 <= theta <= 1, the affine combination is called a convex combination of
a and b and the point is said to lie on the *segment* between a and b.
e.g:
T
| -1 | | 1 |
| 2 | | 0 | = (-1 * 1) + (2 * 0) + (2 * -3) = -1 + 0 + -6 = -7
| 2 | | -3 |
note: when m = 1, the inner product reduces to the product of two numbers
Properties:
1. Commutativity a.b = b.a
2. Associativity with scalar multiplication: alpha*(a T b) = (alpha * a) T b
3. Associativity with vector addition: alpha * (a + b) = alpha * a + alpha * b
General Examples
1. (e_i T a) = a_i the inner product of a vector with the i-th unit vector
gives or 'picks out' the i-th component of a.
2. (1 T a) = gives the sum of elements of the vector.
3. (1/n T a) = average of the elements of the vector.
4. (a T a) = a_1 ^ 2 + a_2 ^ 2 + .... + a_n ^ 2 = sum of squares of the
elements of the vector.
5. Let b_m be a vector all of whose entries are 0 or 1. Then (b T m) is the sum
of those elements of a, which correspond to the elements of bi with value 1.
if a vectors a and b are composed of block vectors (REM: here the components are
themselves vectors)
then T
| a_1 | | b_1 |
| ... | | ... | = (a_1 T b_1) + ... + (a_k T b_k) IFF the size of a_i = size
of b_i aka 'if a_i and b_i *conform*.
| a_k | | b_k |
Applications:
1. If A and B are m-vectors that describe occurrences, i.e each of their
elements is 0 or 1, then their inner product gives the total number of indices for
which their components are both 1.
2. when vector a represents features of an object and vector b represents a
list of weights, then a T b represents a weighted sum of features, sometimes called
a score. Thus a credit score can be obtained from a feature vector (age, income
etc) and a weight vector.
3. price quantity (as in a bill of goods) if one vector represents quantities
of goods, and another vector represents their prices, then the inner product
represents the total price of goods
4. If one vector represents the probabilities of m outcomes (which sum to 1)
and another represents the value of a variable per outcome, then their dot product
represents the expected value of the outcome.
5. Polynomial Evaluation:
Consider the polynomial
p(x) = c0 + c1 x + c2 x^2 + ... + cn x^n (note: n + 1 terms. I use zero
indexing here vs 1 indexing in the text to make the exponent and co-efficient
number match up)
Sparse vectors are stored in a way that tracks indices and non zero values.
Roundoff Errors
When computers do numerical FP ops (aka FLOPS), the results are rounded to the
nearest FP operator. The very small error in the result (between actual and rounded
value) is called floating point error. For most applications this is irrelevant.
The study of FP errors and how to mitigate thim is a part of numerical analysis.
This is not considered in this book.
FLOP counts.
A very rough approximation of the time it takes to do vector (and matrix and
tensor) operations can be done by counting the total number of floating point
operations performed to do that operation. The speed with which a computer can
perform FLOPS is expressed as giga flops per second. (the actual time it takes for
a computer to perform a lin alg op depends on many factors other than flops, so
flops can be approximated trivially (ignoring factors of 2 for example).
Complexity.
In this book we use the term 'complexity' to denote the number of flops
required to carry out a lin alg op by the best method.
2. Vector addition takes m additions, one for each a_i, b_i. Order of m flops.
How long does it take a computer that can do a billion flops per second to
compute the inner product of two vectors having a million entries each?
2. vector addition -
3. scalar multiplication -
4.linear combination:
suppose c1, c2, c3 are m-vectors representing loans, investments etc.
the linear combination (beta1 * c1) + (beta2 * c2) + (beta3 * c3)
represents a cash flow that has *been replicated* by the original cashflows c1, c2,
c3.
let
c1 = (1, -1.1, 0) $1 loan. paid off in period 2. no money incoming or
outgoing in period 3
c2 = (0, 1, -1.1) no in/out flowin period 1. $1 loan taken in period 2.
paid off in period 3
5. inner product
let c be an m-vector representing a cash flow, with c_i the cash received
(paid out, when < 0) in a period. (_ so there are m periods)
let d be an m-vector defined as (1, 1/(1 + r), 1/(1 + r)^2, ... , 1/(1
+r^(m - 1)) ), where r >= 0 is an interest rate.
Then the inner product of these vectors is the 'discounted total of the
cash value' or net present value with interest rate r.
Exercises
1. Vector equations. Determine whether each of the equations below is true, false,
or contains bad notation (and therefore does not make sense)
(a)
| 1 |
| 2 | = (1, 2, 1)
| 1 |
true
(b) | 1 |
| 2 | = [1,2,1]
| 1 |
false incorrect notation [] instead of ()
2. Which of the following uses correct notation? when the expression makes sense,
calculate the length. In the following a and b are 10 vectors and c is a 20 vector.
a. (a + b ) - c_3:12
this is correct. result is a 10 vector.
b. (a, b, c_3:13)
this is correct. this is a stacked vector with 31 elements.
c. 2a + c
this is incorrect. trying to add a 10 vector and a 20 vector.
d. (a,1) + (c1, b)
this is correct. the result is a 11 vector.
e. ((a,b), a)
this is correct. the result is a 2 vector.
f. [a,b] + 4c
since we haven't learned matrices yet, this is incorrect. (and anyway you can't add
a matrix and a vector of non conforming dimensions)
g. | a | + 4c
| b |
answer:
w = (d, d, ..64 times .. d)
d = w_{1:64}
Exercise 1.5
Interpreting sparsity. Suppose the n-vector x is sparse, i.e., has only a few
nonzero entries.
Give a short sentence or two explaining what this means in each of the following
contexts.
(a) x represents the daily cash flow of some business over n days
most days no money comes in
Exercise 1.6
Exercise 1.7
Transforming between two encodings for Boolean vectors. A Boolean n-vector is one
for which all entries are either 0 or 1.
Such vectors are used to encode whether each of n conditions holds, with ai = 1
meaning that condition i holds.
Another common encoding of the same information uses the two values −1 and +1 for
the entries.
For example the Boolean vector (0; 1; 1; 0) would be written using this alternative
encodingas (−1; +1; +1; −1).
Suppose that x is a Boolean vector with entries that are 0 or 1, and y is a vector
encoding the same information using the values −1 and +1.
Express y in terms of x using vector notation. Express x in terms of y.
Exercise 1.8
The n-vector p gives the profit, in dollars per unit, for each of the n items.
(The entries of p are typically positive, but a few items might have negative
entries.
These items are called loss leaders,and are used to increase customer engagement in
the hope that the customer will make
other, profitable purchases.)
The n-vector s gives the total sales of each of the items, over some period (such
as a month), i.e., s_i is the total number of units of item i sold. (These are also
typically nonnegative, but negative entries can be used to reflect items that were
purchased in a previous time period and returned in this one.)
Solution:
There are n items.
p gives profit per item
s gives sales (and returns) per item.
p T s ;; inner product
Exercise 1.9
(b) The patient exhibits five out of the first ten symptoms.
Exercise 1.10
The record for each student in a class is given as a 10-vector r, where r1, ... ,
r8 are the grades for the 8 homework assignments, each on a 0-10
scale, r9 is the midterm exam grade on a 0-120 scale, and r10 is final exam score
on a0-160 scale.
The student’s total course score s, on a 0-100 scale, is based 25% on the homework,
35% on the midterm exam, and 40% on the final exam.
You can give the coefficients of w to 4 digits after the decimal point.
;; workthrough
doing manually
total homework score = (1 T r_{1..8}) / 80 = ths/80
mid-term exams score - r_9 = mte/120
final_exam score = r_10 / 160
T
= 1/480 * | 1.5 | | ths |
| 1.4 | | mte |
| 1.2 | | fse |
= T
1/480 * | 1.5 | | ths |
| 1.4 | | r_9 |
| 1.2 | | r_10 |
Exercise 1.11
Suppose the n-vector w is the word count vector associated with a document and a
dictionary of n words.
For simplicity we will assume that all words in the document appear in the
dictionary.
(a) What is 1 T w?
The total number of words in the document.
(c) Let h be the n-vector that gives the histogram of the word counts, i.e., h_i is
the fraction of the words in the document that are word i.
Use vector notation to express h in terms of w. (You can assume that the document
contains at least one word.)
Exercise 1.12
Total cash value. An international company holds cash in five currencies: USD (US
dollar), RMB (Chinese yuan), EUR (euro), GBP (British pound), and JPY (Japanese
yen), in amounts given by the 5-vector c.
For example, c2 gives the number of RMB held. Negative entries in c represent
liabilities or amounts owed.
Express the total (net) value of the cash in USD, using vector notation.
Be sure to give the size and define the entries of any vectors that you introduce
in your solution.
answer:
I introduce a 5-vector e which holds the exchange ratios from the currencies in c
to the USD (the first value is 1)
Then inner product e T c gives the total value of the holdings in US $
Exercise 1.13
(You can assume that x != 0, and that there is no one in the population over age
99.)
1 * x
(b) The total number of people in the population age 65 and over.
1 * x_{66:100}
(c) The average age of the population. (You can use ordinary division of numbers in
your expression.)
]/99 * ((0 .. 99 ) T x)
Exercise 1.14
Let f be an n-vector that encodes whether each asset is in some specific industry
or sector, e.g., pharmaceuticals or consumer electronics.
Specifically, we take f_i = 1 if asset i is in the sector, and f_i = 0 if it is
not.
Let the n-vector h denote a portfolio, with h_i the dollar value held in asset i
(with negative meaning a short position).
The inner product f T h is called the (dollar value) exposure of our portfolio to
the sector.
It gives the net dollar value of the portfolio that is invested in assets from the
sector.
A portfolio h is called long only if each entry is nonnegative, i.e., h_i ≥ 0 for
each i. This means the portfolio does not include any short positions.
work through.
n assets
practically only one sector, call it pharma
n-vector f is a boolean vector which denotes if asset i (imagine a specific
share here, say GOOG) is in the sector or not.
a second n-vector h contains dollar value of each asset in a portfolio
inner product f T h == exposure of portfolio h to the sector
defn: h neutral to sector s = f T h = 0 (i.e this could be because we hold no
stocks in that sector or the dollar value of that asset is zero the sum of long and
short exposures is zero)
defn: long portfolio = a portfolio in which all entries are non negative
Putting the two defns together, it means that the portfolio has no assets in
that sector or the asset values are zero. Or in vector terms, f is a zero vector
for that sector.
Exercise 1.15
Cheapest supplier.
You must buy n raw materials in quantities given by the n-vector q where q_i is the
amount of raw material i that you must buy.
A set of K potential suppliers offer the raw materials at prices given by the n-
vectors p1, ... , p_K. (Note that p_k is an
n-vector; (p_k)_i is the price that supplier k charges per unit of raw material i.)
workthru:
n raw materials
n-vector q with amounts of raw materials to buy. all quantities positive.
k potential suppliers
p is a k vector of supplier prices, each of which is an n-vector. all prices
positive
If you must choose just one supplier, how would you do it? Your answer should use
vector notation.
To get two suppliers, select suppliers with lowest values for (inner product q T
p_k)
Exercise 1.16
Exercise 1.18
Let b1 = p1 a1 + p1 a2
b2 = p3 a1 + p4 a2 p_n are scalars
then
c = p5 b1 + p6 b2
== substuting for b1 and b2
p5 (p1 a1 + p2 a2) + p6 (p3 a1 + p4 a2)
== algebra
(p5p1 + p6p3) a1 + (p5p2 + p6p4) a2
== arithmetic
p7 a1 + p8 a2
Exercise 1.19
Auto-regressive model. Suppose that z1, z2, ... is a time series, with the number
z_t giving
the value in period or time t.
For example z_t could be the gross sales at a particular store on day t.
An auto-regressive (AR) model is used to predict zt+1 from the previous Mvalues,
z_t, z_(t−1), ... , z_(t−M+1)
p_(t+1) = (z_t, zt−1, . . . , z_(t−M+1)) T β, t = M, M + 1, ....
Here p_(t+1) denotes the AR model’s prediction of z_t+1, M is the memory length of
the
AR model, and the M-vector β is the AR model coefficient vector.
For this problem we will assume that the time period is daily, and M = 10. Thus,
the AR model predicts
tomorrow’s value, given the values over the last 10 days.
For each of the following cases, give a short interpretation or description of the
AR model
in English, without referring to mathematical concepts like vectors, inner product,
and
so on. You can use words like ‘yesterday’ or ‘today’.
(a) β ≈ e1.
sales today will be the same as yesterday's sales
(b) β ≈ 2e1 − e2.
sales at time T will be the difference between twice yesterday's sales and day
before yesterday's sales
(c) β ≈ e6.
sales at time T will be the sales from 6 days ago
Exercise 20
How many bytes does it take to store 100 vectors of length 10^5?
10^5 * 8 bytes
How many flops does it take to form a linear combination of them (with 100 nonzero
coefficients)?
About how long would this take on a computer capable of carrying out 1 Gflop/s?
200 / 1,000,000,000 seconds approximately 1 tenth-millionth of a second.