You are on page 1of 314

RELATIVITY

K Young
Department of Physics
The Chinese University of Hong Kong

December 2000

@ Copyright
Preface
A course on relativity, includingboth the special theory and the general theory, was first
offered in the academic year 1994195. It was intended for both final year undergraduates and
for graduate students, both on an elective basis. From the very beginning, it was found that no
textbook was quite suitable. Broadly speaking, the available books fall into two categories. The
first are often qualitative in nature, relying very heavily on intuition and heuristic arguments,
and are therefore unsuitable for a course at this level. The second type are exhaustive tomes,
exemplified by the works of Weinberg, and of Misner, Thorne and Wheeler. These are really
meant for those who intend quite seriously to specialize in general relativity or an associated
field, and would be far too much. for a one-term course.
Because of the lack of suitable textbooks or indeed references, it was planned b develop
a relatively complete set of lecture notes. These started in the &st year (1994195) as copies
of the overhead transparencies handed out to students. In the second year that the course is
offered (1995/96), roughly half of the chapters have been turned into more formal lecture notes
-though of course in a somewhat preliminary form that shall require many rounds of revisions
and polishing in the years to come. It is thought that the other chapters will be completed in
one more year, and then improvements will be made in the light of the feedback.
The approach to the subject is somewhat different from most textbooks on relativity,
Most textbooks try to cover a lot of material, often to the forefront of research; this is true
both of the more heuristic texts and also of the more rigorous, exhaustive ones. But if one
were to look at the textbooks for electromagnetism (with which relativity has much formal
and conceptuaI resemblance), the norm now is to learn the subject in several stages. The
first is often an integral approach to the field equations, necessarily restricted to very idealized
geometries (spheres, infinite planes). The second round introduces the differential formalism
(and in particular the mathematics of vector calculus), but again stays with simple geometries
and simple situations (statics in the main). In neither of these stages is the student seriously
brought to the forefront of research, but instead, a firm conceptual foundation is laid, with a
reasonable degree of rigour. Only in the third stage, which is perhaps taken only by a minority,
would the students be introduced to more sophisticated techniques and problems closer to
research.
The approach in these lecture notes is essentially analogous to the first two stages de-
scribed above. Just as vector calculus figures prominently in electromagnetism, so differential
geometry also figures prominently here. Fortunately, very idealized situations already suffice to
discuss the two systems of greatest interest: the Schwarzschild metric and the Robertson-Walker
metric.
Incident ally, differential geometry is dealt with from the embedding approach. In essence,
the only thing that the students are asked to believe is the existence of a finite-dimensional
embedding in flat space. Though no longer fashionable among mathematicians, this approach
is probably easier to grasp for physics students. In the same vein, differential forms are not
mentioned, since an extra set of symbols will only burden the first-time learners.
Chapter 8 is optional, and is not needed for the rest of the course.
I am grateful to the United College Student Campus Work Scheme for funding to support
the typing of the manuscript. I thank Mr Lai Chi Wai, a student who took the course in the
first year that it was offered, for doing the initial typing. Mrs Alice Mak kindly performed the
final editing. Both steps are onerous because of the many complicated equations involved.
Because it is hoped that the lecture notes will be improved as the course evolves, students
(and colleagues) are requested to send in suggestions and to point out errors, of which there
must be many
Relativity

Pnt roduct ion


1.1 How does one phenomenon appear to different observers
1.2 How do physical laws appear to different observers
1.3 Experimental basis
1.4 Consequences and applications

Rotation
2.1 Derivation of rotational transformation
2.2 Combining two transformations

Moving Reference Frame -I


3.1 Galilean transformation
3.2 Speed of light and M-M experiment
3.3 Derivation of the Lorentz transformation
3.4 Choice of units
3.5 Difference form

Moving Reference Frame - I1


4.1 Length contraction
4.2 Time dilation
4.3 Space-time diagrams
4.4 Transformat ion of velocity

Mat hernat ics of four-vectors


5.1 Mat hematics of t hree-vectors
5.2 Mat hematics of four-vectors
5.3 Scalar, vector and tensor fields
5-4 Basis vectors
5.5 Differentiation
6 Relativistic Kinematics
6.1 Momentum
6.2 Analysis of collisions
6.3 Center of momentum frame
6.4 Relat ivist ic invariants
6.5 Frequency and wave number

7 Particle Dynamics and Electromagnetism


7.1 Overview
7.2 Definition of force and Lorentz force law
7.3 Four-force
7.4 Four-vector potential and field tensor
7.5 Covariant form of the Lorentz force
7.6 Transformation of fields
7.7 Maxwell equations

8 Action Formalism
8.1 General principles
8.2 Action principle in non-relativistic physics
8.3 Action principle for a relativistic free particle
8.4 Action principle for a particle in the electromagnetic field
8.5 Act ion principle for Maxwell's equation

9 Gravity as Spacetime Curvature - An Introduction


9.1 Principle of equivalence
9.2 Gravitational redshift
9.3 Tidal gravitational force
9.4 Curvature
10 Mathematics of Curved Space I: The Metric
Introduction
Coordinates
Distances and curvature - qualitative discussion
Riemannian geometry
General method applied to the sphere
Expression for distance - general discussion
Homogeneous manifolds
Other examples

11 Poor Man's Cosmology


11.1 Introduction
11.2 Observational evidence: Homogeneity and expansion
11.3 Kinematics
11.4 Dynamics
11.5 The belief in the 1920s
11.6 The belief in the 1960s
11.7 T h e belief in the 21st century
11.8 Other issues and further reading

12 Mathematics of Curved Space 11: Vectors


12.1 Displacement vector and tangent plane
12.2 Embedding in flat space
12.3 Basis vectors
12.4 Velocity and momentum
12.5 Transformation of vectors
12.6 Gradient of a vector
12.7 Local Cartesian system and physical components
13 Mat hematics of Curved Space 111: Differentiation
13.1 Example of differentiating a vector
13.2 General embedding definition
13.3 Intrinsic definition
13.4 Applications in general relativity
13.5 Covariant differentiation

14 Motion of Point Particles


Law of motion: Derivation I
Law of motion: Derivation I1
Weak fields
Cosmological model
Deflection of Iight
Gravitational redshift
Precession of perihelion
Conservation laws

15 Black Holes
15.1 Introduction
15.2 Formation of black hole
15.3 Questions about black holes
15.4 Coordinates around a black hole
15.5 Infinite redshift
15.6 Motion into a black hole
16 Mathematics of Curved Space IV: Curvature
Introduction
Parallel transport of vectors
Riemann curvature tensor - calculation
Riemann -curvature tensor - properties
Ricci tensor and curvature scalar
Examples
Curvature tensor and flatness
Einstein tensor

17 Einstein's FieId Equations


17.1 Introduction
17.2 The source
17.3 The energy-momentum tensor
17.4 Conservation of energy and momentum
17.5 Einstein's equations
17.6 Some solutions of Einstein's equations
1 Introduction
1.1 How does one phenomenon appear to different observers

Figure 1

The theme in relativity is: How does the same phenomenon appear to digerent observers?
Figure 1 shows the classic experiment of Galileo: two stones released at the same time strike
the ground at the same time. This fact of physics 3s the same whether we discuss it in
rectangular coordinates that are horizontal and vertical,
rectangular coordinates that are tilted,
polar coordinates, or
generalized coordinates of any type.
Therefore -we say

Physics is absolute.
Coordinates are arbitrary.
Therefore physics should be independent of coordinates.

To make sure that physics is independent of coordinates, we must first ask: What happens
under coordinate transformations?
Rectangular coordinates are often denoted by the components of a vector, so we first
introduce the notation for vectors.

Notation for vectors


Vectors in 3-d are denoted as x, y, p etc.
In cartesian coordinates x = (xl, x2,x3).
The column vector is denoted as
The components are collectively denoted as si.
An index such as i, j runs from 1 to 3.
The axes are called the 1-axis, 2-axis etc.
The length of x is denoted as z.
Unit basis vectors are el, es, ea.
Their components are e l = i-th component of ej.
,j = @.

Sometimes we also denote the components of x as z, y, z and the axes as the x, y, z axes.
Sometimes the unit vectors are also denoted as i, j,k
2 2'

X
1'

Figure 2

Example 1
Figure 2a shows a rod of length L inclined at an angle 4 to the 1-axis. Let its tip be x. Then
in this coordinate system S,

x l = ~ c o s +, x 2 = L s i n $ (1-1)
Figure 2b shows a new coordinate system Sf with the 1'-axis aligned along the rod.

XI1 =L , xa = 0
So the coordinate components are different; they are relative.

Notation
Some books denote the components in the coordinate system S' as xl' ,x2', to emphasize that
"x" is the same (i.e., the same vector), and only LL1",'L2n
are different (i.e., different components).
Transformation of coordinates
In relativity we deal with three types of coordinate transformafions.
1. Rotation of coordinates

2. Moving coordinates
nl

3. General coordinate transformation (of space and time)

1
Figure 3

Rotation of coordinates leads to the concept of vectors.


Moving coordinates lead to special relativity, and the concept of 4-vectors.
General coordinate transformations lead to general relativity, and the concepts of differ-
ential geometry.
We first use the rotation of coordinates to illustrate the ideas.
Rotation of coordinates

Figure 4
The general rule for the rotation of coordinates in 2-d is as follows. Let

CY = angle between the two frames


4 = polar angle of x with respect to 1-axis
q5' = polar angle of x with respect to 1'-axis

xtl = x cos
= x cos(4 - a )
= xcos~cosa+xsin~sina
= +
x1 cos a x2sin a

xR = xsinq5'
= xsin(q5 - a )
= xsin4cosa-xcos4sina
- x2 cosa-x 1sina
This can be written in matrix form

[:"I = [ cos a
-sins cos a I[ I
C I

The matrix [R] depends only on the relation between the two sets of axes. It is the same for
all vectors x.
We can also write (1.3) or (1.4) in terms of components.
Index notation
An index such as i in (1.5) is called a free index.
It appears once on each side.
It can take on any value (1, 2 in 2-d; 1, 2, 3 in 3-d).
Unless otherwise specified, it is allowed to take on each of these values successively.
An index such as j in (1.5) is-called a dummy index.
It appears twice in the same term.
It is summed over all allowed values.

Summation convention
To save some writing, every index (i,j, -) that appears twice in the same term is understood
to be summed over. Thus (1.5) can be written simply as

Problem I
Write out the following equations explicitly in terms of components. Choose any one value for
each free index.
(a,) S = a'b'
(b) a' =
(c) c i k = ~ i i ~ j k
(d) ~ i =k k j ~ k j
(e) S = A"
(f) s = ~ i j ~ j i

Problem 2
Write the above expressions in terms of the matrices [A] ,[B], [C] and the column vectors [a] ,[b].

Problem 3
Denote the matrix in (1.3) as [R(a)].Verify that

where [ I ]is the identity matrix. What is the physical meaning of this mathematical relationship?

Problem 4
Show that [R(-a) J = [ ~ ( a ) ~Hence
]. show that

Matrices satisfying this condition are said to be orthogonal.


Problem 5
Verify from (1.3) that

What is the physical meaning?

1.2 How do physical laws appear t o different observers


Y

We consider a trivial "lawn. Let there be 2 rods, of length L and 2L, along the same direction.
Each rod has one end at t h e origin 0 (Figure 5). Let the other end-points be x and y. Then
the "law" is y = 2 x . According to one observer S,the components satisfy

Or in column vector form

[YI = 2 1x1
Multiply by the rotation matrix [R(a)]:

or back in component form

,p = 2$h' (1.10)

Compare (1.7) and (1.10). Or compare (1.8) and (1.9). Although the variables change (yfl #
yl), a valid law of physics takes exactly the same form in the two coordinate systems. We say
the variables are covarkant (they transform in the same way). We say the laws are invariant
(they stay in the same form).
Although y = 2x is a very trivial "lawn, this concept generalizes to all laws of physics.
Principle of relativity
The above idea is elevated to a principle.

All valid laws of physics should take the same


form in different coordinate systems S and St.
I
Thus there are two types of questions.
How are the variables in different coordinate systems related to each other?
What laws of physics are compatible with the principle of relativity? Only these laws are
allowed.

1.3 Experimental basis


The theory of relativity must be based on experimental facts.
The special theory of relativity is based on the Michelson-Morley experiment: the speed
of light is the same for all observers
The general theory of relativity is based on the fact that all objects fall at the same
acceleration in a gravitational field.
Both facts are known to great precision, and are believed to be strictly correct. They will be
discussed in greater detail later.

1.4 Consequences and applications


Although rerativity is very mathematical, in the end we must ask: what are its physical conse-
quences and applications? A very brief summary is given here.

Relativistic kinematics and dynamics

Figure 6
Figure 6 illustrates how we can deal with the kinematics and dynamics of particles moving
rapidly ( "relativisticaUyn) in a coordinate system S. We transform to a co-moving coordinate
system S', in which the particies are moving slowly. In the latter, we can apply well-known
Newtonian laws. The result is then transformed back to S, Actually it is not necessary to do
it individually every time. We only need to do the transformation once to establish the laws of
relativistic kinematics and dynamics.

Mass-energy equivalence
One result from relativistic kinematics and dynamics is the mass-energy equivalence E = mc2,
which is very important in nlrclear physics and high energy physics.

Relation between electricity and -magnetism

F = electric
Figure 7

Magnetism is due to currents, e.g., moving charges. Figure 7a shows a charge q which is movmg
according to observer S. Figure 7b shows the situatio~las seen by a co-moving observer S'; now
q is stationary. But a stationary charge is only subject to an electrical force. Thus, we can go
through the following steps.
0 If we understand electricity, we would know how q moves in S'.

0 By a transformation, we would know how q moves in S.

0 Thus, we would know the magnetic force in S.

Such considerations show that electricity and magnetism are closely related.

Problem 6
(a) A positive charge q is moving at a constant velocity v in the +x direction, and enters a
magnetic field created by a horseshoe magnet that points in the +y direction. In which direction
does the charge accelerate? Which law of physics did you use to arrive at this conclusion?
(b) Now go to another frame S' which is moving in the direction +x at velocity v. According
to this observer, the charge q is initially static, but the horseshoe magnet is moving towards it
at a speed v in the -x direction. Since the phenomenon must be the same according to both
observers, in which direction does q accelerate, according to Sf?
(c) How would the observer in Sf explain this observation? Which laws of physics would he use
to arrive a t this conclusion?
Gravity

------

Figure 8

Figure 8a shows a ball inside a room falling under gravity. It accelerates downwards at g, and
hits the floor eventually. Figure 8b also-shows a ball inside a room. There Is no gravity, but the
room accelerates upwards at ao. Eventually the floor hits the ball. If a. = g , there is no way
we can tell the situations apart by observations inside the room. This is called the principle of
equivalence - equivalence between gravity and acceleration. So if we know how to transform to
an accelerating frame, we would begin to understand gravity. The general theory of relativity
is in reality a- theory of gravity.
However, the situation is slightly more complicated in general. Let there be two balls
inside a room near the earth. They are subject to slightly different gravitational accelerations
(Figure 9). This situation is not equivalent to the room accelerating upwards. So non-uniform
gravity makes the theory of general relativity more complicated mat hematically.

Problem 7
Refer to Figure 8. Let the x-axis point upwards and let the broken line be the origin x = 0.
The height of the room is h, and in situation (a) the ball is released at rest at the roof of the
room at t = 0. In situation (b), the room is at rest at t = 0, and then accelerates upwards at
ao. In this problem, assume a0 = g. Let zb(t)be the coordinate of the ball, and xj(t) be the
coordinate of the floor.
1. For situation (a), write formulas for xa(t) and zf(t). Sketch them together on one graph.
Also find the time t l when the ball hits the floor by solving x b ( t l ) = xf(tlp).
2. Do the same for situation (b).
3. Is the value of t l the same in-the two cases?

Astrophysics
Gravity is especially important both for large systems and for compact astrophysicai objects.
(This may seem paradoxical, since '%ompactnmeans small!) We can understand this fact as
follows. For a system of n a s s M and characteristic dimension R, the gravitational potential
energy is (Newtonian, but roughly correct)

The ratio of this to the rest energy is

If this is not small (i.e., not << I), then gravity is important, in fact so important that it is
necessary to use general relativity.
First regard the density p as fixed and the size R as variable. In other words, think of
a nearly uniform system, but of variable size. It is then convenient to write (1.11)- as

So for R sufficiently large, gravity must become important.

Problem 8
(The mass of the sun is 2 x lo3' kg, 1 pc = 3 x 1016m, G = 7 x 10-l1N m2 kg-2)
(a) A typical galaxy contains 10" stars, each one like the sun, in a radius of 15 kpc. Find p
and 6. Is gravity very important?
(b) For the universe as a whole, p w
important?
-
kg m'3, R lo4 Mpc. Estimate E . Is gravity very

On the other hand, consider a very compact star of mass M. Suppose other objects are
moving at different distances R from it. It is then more convenient to regard A4 as fixed and
R as variable. Then from (1.11) E a R-l and becomes important when R is small.
In fact, the Schwarzchild radius of a black hole corresponds to E = 112. A heuristic
derivation is as follows. Consider a point mass m at a distance R from a compact star of mass
M. To escape to infinity, it must have
GMm
KE > lPEl= -R
Using Newtonian physics but setting v 5 c, the maximum KE possible is (1/2)mc2. So escape
is only possible if

1 GMm
-mc2 > -
2 R

We are using a L'mixturenof Newtonian and relativistic physics. This is not really legitimate,
but is good enough for an order-of-magnitude estimate.

Problem 9
Estimate & for a star of 3 solar mass = 6 x lo3" kg.

Cosmology
The most important large system dominated by gravity is the entire universe. Cosmology is
the study of the evolution of the universe. It relies heavily on general relativity.

Constraining other laws of physics


If the laws of physics have to be invariant under the coordinate transformations of relativity,
then the laws can only take on very restricted forms. For example, given the transformation
of special relativity, the laws of electromagnetism become very natural. These constraints are
useful when we try to guess new laws of interactions.

Chl-2.tex; June 18, 1997


2 Rotation
Recall that we are concerned with three types of coordinate transformations:
0 transfarmations to a new coordinate system which is rotated relative to the original one;

0 transformations to a new coordinate system which is moving relative to the original one

(special relativity); and


general coordinate transformations to curvilinear coordinates (general relativity).
In this Chapter we use the analogy of rotations to develop some concepts and tools which will
be used for special relativity, and later, also for general relativity. In this Chapter we pretend
we do not know trigonometry.

2.1 Derivation of rotational transformation


2
2'

1'

1
x1
Figure 1

Basic object
The basic object is a point P, whose coordinates are
X = (xl, x2,23) in S
x = (x'l, xl2,xl3) in Sf
The components are different, but the vectors are the same (see Figure 1); so as a vector, they
are both denoted by the same x.
The question is: How are the components related?

Linear assumption
First assume the origins of S and S' coincide. (If they do not, we simply add a shift, which is
trivial.) Next, assume that the relationship between the components is linear. Then the most
general transformation is

Component not ation Zli =~ i j ~ j

Matrix notation [z'l = PI [XI

where [R] is an unknown matrix, called the rotation matrix. Note that the summation conven-
tion is employed.
Notation
We write the i j component of a matrix [A] as [AJij= Aij.

Identify an invariant
Although the components change (e.g, xtl # xl), there is an invariant, ie., a quantity which
is the same in both frames. By Pythagoras' theorem

are the same. This is an experimental result. Ants living on the surface of a sphere would find
that Pythagoras' theorem does not hold.
The invariant condition can be written as follows:

Component notation xt ixt i --5 2i i

Matrix notation [".I [XI] = [xT] 1x1


Note:

Condition on rotation matrix


If we choose an arbitrary matrix [R], the result of the transformation (2.1) would not satisfy
the invariant condition (2.2). In other words, the existence of an invariant places conditions on
[RJ;
We now derive these conditions in three equivalent ways.

(a) Explicitly in components for 2-d


Let
then (2.1) gives

Hence
,I2 = (Xn)2 + (xr2)2
= (p2 + q2)(x1)2+ 2 ( p s + qr)x1x2 $ ( s 2 + ~ ~ ) ( 2 ~ ) ~

But this must equal

u2 = 1 + 0 . x1x2 + 1 . ( x ~ ) ~

as an identity. Hence we get three conditions

Problem 1
Let [R] be given by ( 2 . 3 ) . Derive (2.5) and show that the solution is (in terms of s )

Another solution is discarded. Explain.


(b) In general using index notation
..
xli =p x j

x ti -
- @kxk
Note we use different dummy indices. Multiplying
Since they must be equal as an identity, we get

Here, jk are free indices, while i is a dummy index. Also, both sides are symmetric under
j ct k, so there are 3 conditions in 2-d (jk = 11,22,12) and 6 conditions in 3-d (jk =
11,22,33,12,23,31).
Let us check that in the case of 2-d, (2.6) agrees with (2.5). We put the summation sign
back explicitly. As an example, for j = k = 1

etc.

(c) Using matrix notation

[x'l = [Rl[xI
[x"] = [X'][R']
0" [ x l = [ x T ][RT][R][ X I
= [zm]
u2 = [xT][ x ] = [zT] [q[x]
Thus

I 1

Let us check that this is the same as (2.6).

{ [RT ] [ R ] ) j k= [ I J j k
But for any matrices [ A ] ,[B],

{ [ A ][B]Y
= [A]"'[B]'*
Hence

Problem 2
Show that the matrix equation [RT][R]= [I]leads to the same equations as (2.5).
Express in terms of minimum number of parameters
Consider the number of free parameters in the matrix [R].
Dimension Parameters Conditions Free Parameters

2-~2=4
Thus in 2-d, we should be able to express the most general rotation in terms of 1 parameter
only, obviously the angle of rotation. In 3-d, we should be able to express the most general
rotation in terms of 3 parameters (e.g., the 3 Euler angles). We shall not go into the 3-d case
in detail.

Problem 3
Of the 4 parameters p, q, r, s in (2.4), regard s as the free parameter and define s = sina. Find
p, q, T in terms of cx by using the three equations in (2.5). You will need to choose the sign of a
square root. Explain -the physical meaning. of your. choice. Your answer should be

[-
= cos a sin cx
sincx cos a I
~

2.2 Combining two transformations


Consider successive rotations about a fixed axis. We know that rotational angles add (Figure
2).

Figure 2
Mathematically, this can be stated as follows

] [R(w+- m)]
[R(a2)J[ R ( ~ I )=
Note that if we apply the left handside to a vector x, we get

The rotation [R(cul)Jis done first. So in all these matrix products, the log :a1 sequence is from
right to left. The order of operations does not matter in 2-d, but is imp01 ant in 3-d.
We can specify the rotation in two ways.

(a) By the angle a. Then

[-
= cos a "na
sin a cos cv I
(b) By the parameter s = sin a. Then

The second method is inconvenient for two reasons. First, there is a c.


Secondly, s is not
additive but a is additive.

Problem 4
Using (2.8) and (2.9), derive the addition laws for cos(al + al), &(al + a2).

Problem 5
Let vl = t a n cwl, v2 = tan a 2 , v = tan(a1 + a2). From the result in (a), show that the law of
addition for v's is
V =
Vl+ 'U2
1- VlV2

Ch2-2.tex; July 17, 1997


3 Moving Reference Frame - I
In relativity, the basic object is not a point, but an event E. For example:
E = a shor-t pulse of light is emitted from a lamp.
Each event occurs at a definite time and a definite place. Thus it is characterized by 4 coordi-
nates:
E = (t, xl,x2,x3)
We shall b e concerned with how these 4 coordinates transform between a frame S and another
frame St which is moving at a velocity V relatively to S. For simplicity, we shall assume

and the two origins coincide at t = 0 (Figure 1).

1
I I
Iv t 4-
t Figure 1 1

3.1 Galilean transformat ion


Let us start with the familiar Galilean transformations. These will turn out to be incorrect.
There are two assumptions.
(a) Clocks are not affected by motion.

t I= t
(b) Lengths are not affected by motion.

* Vt b

Figure 2
Then from Figure 2

where s is the cmrdinate of a particle in S and x' is the coordinate in St.


The velocities
dx , v ' = -dz'
v=-
dt dt'
are then related-by

3.2 Speed of light and the Michelson-Morley Experiment

Figure 3

Let the speed of light be c with respect to a fixed observer S. To send a pulse of light to a
distance L and then reflect it back requires a time At (Figure 3).

Now do the same measurement on a "train" which is moving at a velocity V. According


to the Galilean transformation, the speed of light as observed by S moving with the train is
+
c - V one way and c V another way. So the elapsed time should be

L L - 2Lc
At' = -
c-v c+v
+--
c2-v2
2L 1
= -
c 1 - V2/c2

At
At' = (Galilean)
1 - V2/c2
So by measuring the time on the train, in principle one can detect the absolute motion of the
train.
The Michelson-Morley (M-M) experiment near the end of the last century showed that
there is no such time difference. Therefore, experimentally (not by any theoretical deduction),
(3.7) is wrong, so the Galilean transformation (3.1) and (3.2) are wrong.
Outline of the M-M experiment
0
N -
In this case, the "train" is the earth itself, which is moving in the solar system at a speed
V 2nA.U.lyear 3 x 1@ms-l, or V/c
Thus the difference expected Gom Galilean theory is N V2/c2 The experiment
had to be accurate beyond this level in order to conclude that there is no effect and that
the Galilean transformation is wrong. This level of accuracy is obhined by means of
optical interference.
There is no way to stop this '%rainn and compare the times for V = 0 and V # 0.
Instead, compare the time for a round trip along the direction of motion with the time
for a round trip perpendicular to the direction of motion. The latter should be unchanged
by the motion. There should be a difference between the two cases. Actually there is no
difference.
Please read about the details of the experiment. The following problem gives some typical
orders of magnitude.

Problem I
In the Michelson-Morley experiment, the earth is moving at about 3 x l o 4 m s-l. Rays of light
are compared on two paths: (i) along the direction of motion of the earth, and (ii) perpendicular
to the direction of motion of the earth. The one-way length of each path is L = 3.0 m. In each
case, the rays traverse the return paths 10 times.
(a) According to Galilean theory, what is the difference At between the times needed on the
two paths?
(b) Express this difference as a phase Ay, assuming a reasonable wavelength.

N o absolute motion
If Galilean transformations were correct, there would be the concept of absolute motion. We
would be able to determine the velocity V of a train by doing experiments on the train, without
referring to the outside. The simplest experiment is this (Figure 4): measure the velocity of
light coming from the front and the velocity of light coming from the back. In one case it should
+
be c V; in the other case it should be c - V. So the difference reveals absolute motion.

Figure 4
Actually, such is not the case. The M-M experiment shows that even for a moving
observer:
The speed of light is the same in all directions (isotr~pic).
0 The speed of light is still c.

Instead of trying to explain this fact, we shall use it as the starting point to derive the trans-
formation between moving coordinates.

3.3 Derivation of the Lorentz transformation


The derivation follows very closely the analog of rotations in Chapter 2.

Basic object
The basic object is an event E, whose coordinates are
E = (t, XI, x2?x3) in S
xl3) in S'
E = (t', x'l, st2,
How are these coordinates related?

Linear assumption
We shall assume that they are related linearly:

etc. There are 16 coefficients. It wcruld be more compact if we can write the relationship using
index notation or matrix notation like (2.1). For this purpose, it is convenient to call time
the "zero component". (Some authors call it the "fourth component"; it does not matter.)
Secondly, it is better if the zero component has the same unit. Since the speed of light is a
universal constant, we define

Index notation and summation convention


Compare the similar discussion in Chapter 1.
0- Vectors in 4-d are denoted as 5, y',Fetc.

5 = ( X 0 ,X) = (xO,XI, x2,x ~ ) .


The column vector is denoted as
The components are collectively denoted as xp.
An index such as p, v, runs from 0 to 3.
If an index such as p appears twice in the same term, once as zn upper index and once
as a lower index, it is understood to be summed from 0 to 3:

Thus it is necessary to distinguish between an upper mdex and a lower index. The
coordinate vector is defined with upper indices. We shall construct vectors with lower
indices later.
Sometimes we also denote the components as t, x,y, z.

General transformation
With this convention, the most general linear transformation is

Camponent not ation = L,, xu


Matrix notation [XI] = [Ll[XI
Note that [L]is a constant matrix. Its first index is an upper index and its second
index is a lower index. It specifies the linear transformation. The matrix form looks a little
"dangerous" because it does not distinguish between upper and lower indices. However, this
does not really matter because they will all work out correctly in the end according to the
following rules.
* A free index [such as p in (3.9)] must appear once in every term, and must always be
either an upper index or an lower index in every term.

a, = bp Yes
ap = b, No
0 A dummy index must appear twice in one term, once as an upper index and once as a
lower index.

a'"b, Yes
a pbp ' No

Identify an invariant
We now claim that the M-M experiment tells us that the following is an invariant
In other words, we claim that it is equal to

Although we write it as u2, the quantity can in fact be negative.


Unlike the case of rotations, x is different, and in general 1xI2# 1 ~ ~ T-his
1 ~ . is true even
for the Galilean transformation (3.1), (3.2).
We now argue that u2 = ot2,in three steps.

(a) Proportional
Suppose a short pulse of light is emitted from the origin at the time when the two origins
coincide, i.e., at (ct, xl, x2,x3) = (ctt,xtl, xn, xt3) = (0,0,0,0). Let the event E be the receiving
of the pulse by an observer. Because the velocity-of light is exactly c in S:

(XI)' + (x2))"+ (2")" = (distance))"= (ct)'

Similarly, because the velocity of light is also exactly c in St:

ot2= -0

In other words, a2= 0 if and only if ut2 = 0. Thus these two quantities are proportional:

where the proportionality constant may depend on the relative velocity. Although we use the
example of o2= 0 to derive the proportionality, the coefficients Lp, in the transformation are
independent of the coordinates; therefore the proportionality holds always.

(b) A(V) depends only on /VI


The quantities u2 and on are scalars under rotation. So they (and hence their ratio)
cannot depend on the direction of any vector. Therefore A(V) depends only on JV]. In
particular

(c) Consider reverse transformation


Consider the reverse transformation. Interchange the role of u2 and ot2. Because the
relative velocity is now opposite, A(V) is changed to A(-V).
Combining (3.11), (3.12), (3.13)

Hence

(Consider the case of A(V) for V small. Obviously xfpE xp, A(V) E 1, so we take t h e + sign.
+
The sign cannot change suddenly, so we must always take the sign.)
This proves that

is an invariant. It is called the invariant interval.


The invariant is a mathematical way of stating the result of the M-M experiment. It
is the analog of Pythagoras' theorem in 3-d. A space where the invariant is given by (3.15) is
called Minkowski space in contrast to Euclidean space.

Metric and lower indices


We can write a2 as

p=u=o
~ P Y= p=v=lor2or3 (3.17)
p#v
The matrix r],, takes the place of Jij in usuaI Euclidean space. In fact, if we change 7 + 8 , we
would obtain the familiar results of the last Chapter.
For any vector XP with upper indices, we define the corresponding vector x, with lower
indices by

5, 7+wxV
To raise an index

where r]PU is obviously the inverse of 7,". In this case


In matrix notation

G
" [xT]' hl.. [XI'
[XI. = [sl.. [XI*
[XI* = hJl*@ [XI*
hl" = [s-'I..
The dots have been added to indicate whether the index is upper or lower; they are not really
needed once you become more familiar with the notation.

The minus sign


Actually, raising or lowering an index just means changing the sign of the 0 component, e.g.

and the metric 77 is just a way of taking care of this minus sign, which is just the same minus
sign as that in (3.15) - i.e., the square of the 0 component enters the invariant with a minus
sign.
A matrix [q] in principle has 16 elements; why do we need something so complicated
to deal with a mere minus sign? The answer is that this notation provides a good stepping
stone to general relativity, in which we have curved coordinates, and qpv is replaced by a more
general (and position-dependent) matrix g,,. Thus, in terms of mathematical structure, to go
from Euclidean space to special relativity to general relativity simply involves

Another question is often asked: Can we define

so = ict ???

instead? If so, (3.15) would appear with plus signs, and all formulas would be familiar. Some
elementary books do this, but it is very bad practice, for two reasons.
First, in Euclidean space, a subset such as a2 = 1 is a bounded domain, because each
component is bounded (in fact in this case at most 1). But in Minkowski space, a subset such as
a2= 1 is unbounded - the components can be as large as you want. Thus there is a difference
in topology, which is obscured if we LLhide" the minus sign.
Secondly, in quantum mechanics, the factor i appears, and we have expressions such as
$*$. For such expressions, $* means changing i -, -i. But this applies only to the genuine
i's, not to the "fake" i's that are introduced in so = ict. It would be a nightmare to keep track
of the genuine i's versus the "fake" i's.
So, for these reasons, we shall continue with the matrix [q].
Condition on transformation matrix
The existence of an invariant places conditions oa the transformation matrix [L]. We now
derive these equations in three equivalent ways.

(a) Expicity in components for (1 + 1) d


+
Notation: ( 1 + 1 ) d means 1 space 1 time dimension.
Consider a case where the reldive velocity V is along tbe 1-direction. Then x 2 and x3
are not transformed.

x/3 - 23
-
and we only need to transform xO,x l . The matrix [L]is reduced to 2 x 2.

Let

so that explicitly

Let x 2 = x3 = 0. Then

But this must equal

a2 = -1 +O
( x O ) ~ xOxl + 1 .( x ~ ) ~

as an identity. Hence we get three conditions on the four coefficients in [L]:


Except for some minus signs, these conditions are the same as in Chapter 2. The minus signs
come about because of the minus sign in the invariant u2.

Problem 2
Derive the results in (3.22) from the 3 conditions on p, q, r, s.
( b ) In general using index notation

Since they must be equal as an identity

Compare with the analogous equations in Chapter 2; in particular, note that if we replace q by
6, these equations would reduce exactly to those in Chapter 2.
Let us check that in the case of (l+l)d, (3.23) agrees with (3-22). We put the summation
sign back explicitly. Also, in (1 +l)d, the summation goes only over 0, 1.

Since

this reduces to
But from (3.20)

hence

This gives the first equation in (3.22).

Problem 3
Consider the other cases and derive the other two equations in (3.22).

Problem 4
Consider the case of (3 + 1)d.
(a) How many conditions are there in (3.23)? Hint: note the symmetry under p t, a.
(b) Hence determine how many free parameters there are in LP,. Half of these correspond to
rotation and half of these correspond to relative velocity. Explain physically why there are this
number of free parameters.

( c ) Using matrix notation

[x']' = [L]' [XI*

[xtT]' = [xT]' [ L ~
t'] note the Bdots transposed

Since these are equal as an identity

Let us check that this agrees with (3.23).


Express in terms of minimum number of parameters
We do this only far (1+1) d. The situation is similar to the case of rotations and is left as a
problem.

Problem 5
Of the 4 parameters p, q , r , s in (3.21)) regards s as the free parameter and define s = - sinh a.
Find p, q , r in terms of a by using the three equations in (3.22). You will need to choose the
sign of a square root. Explain the physical meaning of your choice. Your answer should be

1 ~ =1 [- cosh a
sinha
- sinh a
cosha 1
Note that the two off-diagonal entries have the same sign.

Relate to relative velocity


I t is true that we can use a to specify the transformation. But it is more usual to use the
relative speed V. We need to relate the two.
Let S be moving towards the +I-axis at a speed V relative to S. Thus the origin of St
is described by

(t', 2'1) = (t', 0 )

Put this into the transformation

xfl = - sinh a x0+ cosh a x1


0 = - sinh a(&) + cosh c r ( V t )
csinha = Vcosha
tanha = - (3.26)

It is conventional to define P as the dimensionless relative velocity in this way. For "ordinary"
speeds, IPI < 1; for relative motion near the speed of light, P FZ 1. From (3.26)

where

(3.28)

For small V,r FZ 1. For V FZ c,+y -, m.


Putting these back into [L], we get the usual form of the Lorentz transformation

Problem 6
Verify that [L(a2)][L(al)] = [L(w + m)].
Problem 7
Show that when two transformations are performed one after another, the relative velocities P
"add" as

Hint: Use the addition law for tanh a.

Problem 8
Start from (3.29) and solve for xO,x1 in terms of xtO,xtl. Show that the reverse transformation
derived this way has the same form, but with -P -+ P.

Signs
The signs can be remembered as follows:
0 The "diagonal" terms (i.e., relating x"' to xO,or x" to xl) always have a +
sign.
0 The "off-diagonal" terms with P both have the same sign.

+
Whether the "off-diagonal" sign is or - is easily determined by considering V -+ 0 , +
~
1, for which the Galilean transformation should be valid. For example, from the second
equation in (3.29)
The sign of the Vt term is easily checked with reference to Figure-2.

Problem 9
Explicitly find the coefficients of the transformation if S is a spaceship travelling at V =
9.0 x lo7 ms-I relative to an observer S.

Nonrelativistic limit
Although the Galilean transformation i.s conceptually wrong, it must be nearly correct if the
relative speed V is low. Otherwise it would contradict many experiments and observations in
daily life (all concerning-low speeds) which seem to confirm the Galilean transformation. So let
us check the nonrelativistic limit (V 4 0) and estimate the correction to the Galilean law.
Write the second -equationin (3.29) as

xtl = 7(x1 - Vt)


+
= (xl - Vt) (y - l)(xl - Vt)

The first term is the Galilean result and the second term is the correction. The fractional
correction is about

For a nonrelativistic transformation (181< 11, this is negligible.


Similarly, consider the first equation in (3.29), and write it as

The first term is the Galilean result and the other two terms are corrections. The (7 - 1)
correction is the same as (3.30), while the last term gives a fractional correction
Now suppose x1 and t refer to a particle moving at velocity v : xl/t = v. (This must not be
confused with the velocity of the frame V.) Then the correction is

where

is the dimensionless velocity of the particle.

Problem 10
(a) S is an air-traffic controller at the airport and S is an aeroplane flying at 300 ms-l (-
1000 kmhr-l). If the Galilean transformation is used to describe the relationship between S
and S', estimate the percentage error (3.30).
(b) A passenger is walking on the plane at v = 3 m s-l. Estimate the percentage error (3.31).

Problem 11
Choose the x axis downwards, and g = 10 m s-'. A stone is released at t = 0, x = 0 and falls
to the ground at a distance h = 10 m below. Let the event E be the stone hitting the ground.
(a) Give the coordinates (t, x) of E.
(b) Another observer is in an elevator moving uniformly upwards at V = 3 m s-l, and the
origins of the two observers coincide at t = tf = 0. Give the coordinates (tb, xb) for the event
E, according to the Galilean transformations.
(c) Give the coordinates (t;, x;) for the event E, according to the Lorentz transformation. Give
(t; - tL) and (xi - xk) to at least 1 significant figure accuracy.

3.4 Choice of units


Analogy
A stupid surveyer measures distances along the east-west direction (xl) in m and distance along
the north-south direction (x2) in km. His "Pythagoras theorem" would read

+
a2 = ( z ' ) ~ k 2 (x2 )2 = invariant

where k = 1000-is just a conversion factor to get his measurements into the same units. It
would be much smarter to use the same units for both directions and get rid of k.
In the same way, the factors of c that appear, e.g., in

(ct') = y(ct - px')

are there because we do not use the same unit for the zeroth direction.
Choice of c =1
We can get rid of c by setting

All formulas become simpler. When we get a final result. that is dimensionally "wrong", we
simply multiply by a suitable power of c ( = 1).

Example 1
(a) Convert a time of 3.0 m to conventional units.

3.0 m
time = 3.0 m =
3.00 x lo8 ms-l

(b) Convert an energy of 10-lo kg to conventional units

Actual units
There are certain units-in which c is really 1.
(a) Measure time in years and- distances in light-years.

1 light year
c= = 1 unit
1 year
This is sometimes convenient in astrophysics.
(b) Measure time in ns and distances in units of

Then
0.3 m 1 unit
c=-- - -- - 1 unit
1 ns 1 unit
This is convenient when dealing with high energy particles, which typically travel distances of
a few m in times of a few ns.
Standard of length and time
When we come to general relativity, it will be very important to give a clear prescription of how
length and time (i.e., distances in space-time) are measured. The modern definitions of length
and time standards ("rods" and "clocks') both rely on atomic transitions. Conceptually it is
easiest to imagine the following.
Consider a particular atomic transition A + B which ernits electromagnetic waves.
Each period of the wave is defined as I "tick".

-1 "tick" = 1 period of the wave T

Each wavelength of the wave is defined as 1 "rod".

1 "rodn = 1 wavelength of the wave X

Thus
X lCCrod"
c=-= =1
T 1 "tick"
The actual definition uses more complicated numbers, but the idea is the same.

3.5 Difference form


Consider an event A, with coordinates x: in coordinate system S and x z in coordinate S1.
Similarly, consider another event B. Then

x; = LP" x;;
Define the difference

then, because the transformation is linear, we get immediately

1 AxIP = LpvAx"
(3.32)
For example, consider relative velocity V = p c along the x1 direction and denote
x1 + x , x O + t . In units c = 1,
The differenceform is often convenient for applications, because we want to consider the interval
between two events. Correspondingly, instead of a2,we consider

From (3.32), we can also write

The difference (and differential) form has another advantage. When we come to general
relativity, space-time will be described as curve&. But if we look at a small portion of a curved
surface, it is nearly flat. So locally, i.e., in terms of small displacements Axp (strictly speaking
infinitesimal displacements d d ' ) general relativity looks very similar to (3.32) - (3.35).

Ch3-2.tex September 29, 1997


4 Moving Reference Frame - I1
This chapter starts with the Lorentz transfarmation derived in Chapter 3 and discusses some
applications. Some further concepts and mathematical tools are developed.

4.1 Length contraction

1'
Figure 1
Let S be an observer on the ground and S be an observer on a train moving at V = PC. A rod
of length Lo is fixed on the train. What is the length L as seen by S?
To save some writing, we use units with c = 1 and put

Operational definition of "length"


For S to determine the length L, he has to
a measure the coordinate X A of one end of the rod (event A); and
0 measure the coordinate xg of the other end of the rod (event B) at the same time.

Then

The condition "at the same time" is crucial. We cannot measure one end now and measure the
other end one year later (the rod is moving!) and call the difference the length. "At the same
timen means

Use of Lorentz transformation


We have two (reverse) transformations that can be used:

Ax' = -y(Ax - @At)


Both are correct. But which one is more convenient?
Since the rod is at rest in Sf, Ax' = Lo always.
Because the two events satisfy At = 0, (4.2) is more convenient.

Ax' = y( Az - p At )
t t t
Lo L 0 ~

Since 7 > 1, the rod appears contracted.


What would happen if we use (4-1) instead?

Would we get L = yLo (an increase in length)? No! Because At' # 0. In fact we can solve for
At':

At' = (i ) ;
-Ax-Ad -

Simultaneity
So we have two events which are simultaneous in one frame (At = O), but not simultaneous
in another frame (At' # 0). Thus simultaneity is not an absolute concept. We always have to
specify usimultaneous according to which observer".
In general, simultaneity cannot hold in both frames. This can be seen from the other
equations for the Lorentz transformation, e.g.,

At' = 7(At - PAX)

If two events do not occur at the same place (Ax # 0), it is impossible to have simultaneity in
both frames (At = 0 and At' = 0).
Problem 1
An aeroplane has a length of exactly 50 m.
(a) When it is flying at 300 m s-' (- 1000 km hr-l), by how much does it appear to be shortened
when observed by someone on the ground?
(b) What if it is flying at 10%m s-l?

Problem 2
A train of length 2 Lo is travelling at a speed P. Observer S is on the ground, and observer S'
is on the train. S' stands in the middle of the train (x' = O), and according to him
rn at t' = 0,he sends two pulses of light, one forward and oae backward (Event .A);
rn the first pulse reaches the front of the train, and is reflected by a mirror (Event B);
rn the pulse refiected from the front returns to the middle of the train (Event C);
rn the second pulse reaches the back of the train, and is reflected by a mirror (Event D);
rn the pulse reflected from the back returns to the middle of the train (Event E).
(a) Give the coordinates according to S', namely t2, xk; tb, xb,. .;th, xh.
(b) Are B and D simultaneous? Are C and E simultaneous (i.e., do the two pulses of light
reach the middle of the train at the same time)?
(c) Find the coordinates according to S, namely tA,XA;tB, XB, - a; tE, X E , by using the Lorentz
transformation.
(d) Are B and D simultaneous? Are C and E simultaneous (i.e., do the two pulses of light
reach the middle of the train at the same time)? Discuss the relationship with the answer in
('4.
Problem 3
This Problem continues with Problem 2, but tries to analyse the situation according to S
directly, without using the Lorentz transformation. Check all your answers against Problem 2.
(a) What is the length 2 L of the train according to S?
(b) Draw a sketch showing the situation at time t ~ (For
. convenience, just show the front half
+
of the train.) Based on this diagram, show that v t ~ L = ctB. Hence find t ~ .
(c) Likewise find to.
(d) Also find the times tc - tB and tE - tD for the return trips. (Hint: These times are
respectively equal to tD and tB. Why?)
(e) Hence find t c and tE. According to this calculation, do the two pulses return to the middle
of the train at the same time?

Lack of symmetry
In relativity there is supposed to be no privileged frame (such as a frame "absolutely at rest").
All reference frames and all observers are supposed to be equivalent (Figure 2).
We are equivalent

b 0

Figure 2

If this is the case, then should we have both these statements: .

(a) Length in S (L) < Length in S' (Lo)


and by reversing the roles of the two observers

(b) Length in S (Lo)< Length in S (L)


Statement (a) is TRUE (as derived above). Statement (b) is FALSE. So why is there such a
lack of symmetry?

d I am special
0

Figure 3

The reason is as foliows. There are three things: S, St and the rod. The rod is at rest
in S'; it is not at rest in S. With the rod, the symmetry is destroyed. The frame St is special.
It is moving together with the rod.
Paradox
Many of these concepts are combined in the following paradox. There is a hole of length Lo on
the ground. A rod also of length Lo is moving rapidly past it. The observer S is fixed to the
ground. The observer S' is moving with the rod (Figure 4).

Figure 4
According to S, the rod is contracted to L < Lo. So at the same time, he pushes the
two ends of the rod down (Figure 5). The rod passes through- the hole.

Figure 5
As seen by S
Now what happens as seen by St? The rod is stationary, of length Lo. Now the hole
is contracted to L < Lo - the concept of symmetry (Figure 6a). Yet the rod must still pass
through the hole - this is an objective fact! How can this happen?

Figure 6
As seen by S'
The answer is that the two events A, B shown in Figure 5 are simultaneous in S (At = O),
but are not simultaneous in St (At' # 0) - simultaneity is not absolute. In fact, from (4.6),
At' < 0, i.e., A occurs before B. So the sequence of events is as shown in Figure 6a, b.
4.2 Time dilation

Again let S be an-observer on the ground and St be an observer on a train moving at V = pc.
A clock is fixed to the train. Two ticks of the clock are separated by an interval At' according
to S'. What is the time interval At according to S ?
Agam we have two (reverse) transformations _that can be used:

At' = 7(At - PAX)


Again both are correct. But which one is more convenient?
At1 is given.
0 Because the clock is fixed to St,the two ticks occur at the same XI : Ax' = 0.

The observer who is not comoving measures a longer time - time seems to be dilated.
Again there is no symmetry. St is special. It is the comoving frame.

Twin paradox
Consider twin brothers S and S1who were initially together. Let them carry identical clocks.
S1travels at high speed to a distant galaxy, and comes back. Which of the two clocks show a
longer time interval? In this case the two clocks are at the same place both at the beginning
and also at the end.
We know that a moving observer appears to measure shorter time. The paradox is this.
S thinks Sf has been moving, so At > At'. But S thinks S has been moving, so At' > At.
Only one of these can be true. So how is the paradox resolved?
If St has travelled to a distant galaxy and come back, then he must have experienced
some acceleration. Acceleration is absolute; it can be determined even by a person locked in
a room. So it is an absolute fact that S' has been moving, and not S. The situation is n d
symmetrical. So At > Att.
We have mentioned "clocks". But the aging process is just another uclockn (further
explanations below), so S has aged more.
There is clear experimental proof. The "twin brothers" are radioactive nuclei or elemen-
tary particles. Let them have an average life-time T at rest. One group (S)is at rest. After a
time t , the remaining number is

where No is the original number. Another group (Sf,')is sent around a circular accelerator at
constant speed V = PC. The time elapsed according to this group of observers is only t' = t/r.
So the remaining number is

If y >> I, there would be very few decays. We can also say that the apparent average life-time
has been increased from T to yT.

Problem 4
Muons ( p ) have a meao lifetime in their own rest frame of 2.2 x s. A beam of muons is
travelling at 0.90 c.
(a) What would be their apparent lifetime?
(b) How far would they travel (on average) before they decay?

Different clocks
In the twin paradox (including the example of muons), we find that the moving clock "slows
down". This is true for different kinds of clocks, e.g.,
0 clocks made of light pulses bouncing between mirrors,

0 atomic clocks based on electromagnetic oscillations associated with an atomic decay pro-

cess,
0 quartz watches,
0 clocks such as the decay of muons which are governed by the weak interaction,

0 clocks such as the decay of other particles that are governed by the strong interaction,

0 biological clocks that control aging.

Can we show explicitly that each one of these "slows down"?


The simplest are light pulses bouncing between mirrors. We know the law: speed is
always c; so we can analyze the situation in detail, and indeed we find such "slowing down".
See Problems 2 and 3 for example.
The other clocks are too complicated to analyze in detail. We can adopt two approaches.
First, we simply invoke the Principle of Relativity. If, in a moving frame, a biological
clock does not "slow down" to the same extent as a clock made of light pulses, then by simply
measuring the discrepancy we can detect absolute motion. (The discrepancy can be measured
by a person locked inside a room; he does not need to look outside or refer to another observer.)
In other words, we can define a preferred frame as the one where these clocks agree. Such a
state of affairs would contradict the Principle of Relativity and is therefore not allowed.
Secondly, we can study the basic laws that govern these clocks. For example, quartz
clocks, atomic clocks and even biological clocks are controlled by electromagnetism. (Biological
processes are really chemical processes; all chemical processes are really electric interactions -
consider for example the binding of a hydrogen atom.) Thus, if we can prove that the laws
of electromagnetism agree with the Lorentx transformation, we can then ensure that all these
clocks do indeed "slow down" as predicted. Likewise, if we can prove that the laws of weak
interaction and strong interaction agree with the Lorentz transformation, those clocks would
behave in the same manner as well. In this course, we shall study the laws of electromagnetism
in relation to relativity. Weak and strong interactions will not be dealt with, but they too agree
with the special iheory of relativity in the same way.
In other words, we can deal with these issues at two Ievels.
We deal with one phenomenon at a time (e-g., Problems 2 and 3). We have to do a bit
of this to gain a feeling for what is involved (and there will be some more of this in the
rest of this Chapter). This is the typical approach in elementary courses. But this is
an unending process; after we have discussed N phenomena, you can always raise the
+
(N 1)th and ask for an explanation.
0 We deal with the laws that govern these phenomena. For example, once we have dealt

with the Lorentz force law and the Maxwell equations, there is no further need to worry
about any individual phenomena inv~lvingelectromagnetism.

4.3 Spacetime diagrams


In this section we adopt the notation

Diagrams

Figure 8
Purely spatial relationships are illustrated by spatial diagrams like Figure 8a. This shows 2
of the dimensions (x, y) and the third (2) is out of the page. A point P is located by its
coordinates.
Similarly, spacetime relationships are iIlustrated by spacetime diagrams like Figure 8b.
The t axis and one spatial axis (x) are showo; the other two are understood to be the "into the
page". You should imagine rotating Figure 8b about the t axis t o turn x into y and z .
An event is represented by a point P in spacetime. The point P is located by its
coordinates.
The 90" angle between the x, y axes has a physical meaning. The angle between the x, t
axes has no physical meaning. It is usually shown as 90" (Figure 8b), but it is also valid to
show it at some angle (Figure 8c).
The point 0 need not be the origin. Then instead of the coordinates x'' of P, we can
refer to the difference Ax/"between P and 0.

Light cones

Figure 9

Construct 45" lines in -the spacetime diagram, i.e., t = x, t = -x as shown by the broken lines
in Figure 9. These lines should be regarded as cones if we imagine rotating the diagram about
the t axis. They are light cones: if light is emitted from 0, it travels along a path 1x1 = ct = t ,
i.e., on the surface of the cones.

Space-like separation
Consider points such as B, B', B". Relative to 0,

P x l > lAtl (4.10)


Therefore it is impossible to sent a signal from 0 to B, B' or B" (or from B, B1,B" to 0). This
is because such a signal would need to have a speed
which is impossible. Points in this region of spacetime can have no causal relationship with
0. Note that 0 cannot iduence B even though B is in the future (At > 0); and B" cannot
influence 0 even though B" is in the past (At < 0).
The condition (4.11) implies that the spatial component is larger; so we call such intervals
Axp space-like. This condition is invariant, in the sense that if Axp is space-like, .then so is
Ax'p in another frame. The best way to see this is to write (4.11) as

which is clearly an invariant condition.


In contrast, conditions such as At > 0,At = 0 or At < 8 in the space-like region are
not invariant. In fact, given a space-like separation Ax", it is always possible to find another
frame S in which At' = 0,i.e., the separation is purely spatial.

Problem 5
In reference frame S, the point B is displaced from the origin 0 by

(a) Draw a spacetime diagram, on it sketch the light cone, and label the point B.
(b) Another reference frame S' is travelling at V = pc with respect to S along the x-axis. Show
that for a suitable choice of P ((PI < I), At' = 0 (in other words, B is simultaneous with 0 )
(b) What is the value of Ax' in this case? Do this in two ways: (i) by explicitly using the
Lorentz transformation and (ii) by considering the invariant quantity AS)^.

Proper distance
From (4.12), and also Problem 3, we arrive at the following understanding of As for a space-
like interval: If Ax" is space-like, there is aframe St in which the two events are simultaneous
(At' = 0). The quantity As is equal to the spatial distance IAx'l in this frame. Therefore we
sometimes call As the proper distance.

Problem 6
In the example of length contraction, the two events are the measurements of the positions of
the two ends of the rod. What is the proper distance between these two events? Discuss from
both frames of reference.

Time-like separation
Consider points such as A and C. Relative to 0 ,

Therefore it is possible to send a signal from 0 to A, or from C to 0. This is because such a


signal requires a speed
Points such as A are in the future light-cone, and can be influenced by 0. Points such as C are
in the past light-cone, and can influence 0.
The condition (4.13) implies that the time component is larger; so we call such intervals
Axp time-like. This condition is invariant, in the sense that if Ad' is time-like, then so is Axt"
in another frame. Again we can see this through the invariant condition

(As)' = -(At)' + (AX)' < 0 (4.13)


In fact, given a time-like interval Axp, it is always possible to find another frame S in
which Ax' = 0, i.e., the two events occur at the same point.

Problem 7
In the reference frame S, the point A is displaced from the origin O by

(a) Draw a spacetime diagram, on it sketch the light cone, and label the point A.
(b) Another observer St is travelling at V = pc with respect to S along the x-axis. Show that
for any choice of p with 1/31 < 1, At' remains positive.
(c) Show that for a suitable choice of P (ID1 < I), Ax' = 0.
(d) What is the value of At' in this case? Do this in two ways: (i) by explicitly using the
Lorentz transformation and (ii) by considering the invariant quantity (As)'.

Another way of writing invariant interval


Remember the definition

This does not mean that this quantity is always positive. It is positive for a space-like interval,
and negative for a time-like interval. It is therefore better to write this as

We shall use As for space-like intervals and AT for time-like intervals. Both of these quantities
are real. We never use As for time-like intervals or AT for space-like intervals; they would be
complex and inconvenient.

Proper time
Let two events have a time-like separation Axp. Suppose the two events are two ticks of a
clock. Go to a frame St in which the two events occur at the same place, i.e., as if the clock
has not moved. In other words, the clock and observer S' are moving together. The frame Sf
is called the co-moving frame. The quantity AT is equal to the elapsed time At' in this frame.
We call AT the proper time interval.
Consider a particle moving at a speed V = PC. Then

Ax" = (At, AX)


and
Ax = VAt
So

Note that the proper t i n e is always less. This relation is just the reverse-of time-dilation.

Light-like separation
Finally, a point on the light-cone satisfies

At = ]Ax1 (4.17)
Such an interval Ax' is said to be light-like and is described by the invariant condition

+
(As)" = -(AT)' = - ( ~ t ) ~AX)^ = 0 (4.18)

Particle trajectories
What do particle trajectories look like on a spacetime diagram? Refer, for example, to Figure
10. Actually, these are nothing new: they are just like displacement-time graphs you learnt
about in secondary school, except that they are turned around by 90 deg - the t axis is drawn
vertically
First ~onsidera uniformly moving particle, say starting from 0. Since Ax/& = con-
stant, the trajectory is a straight line in the spacetime diagram. How about the slope? Since
it cannot move faster than light

So it must move along a time-like trajectory, as shown in Figure 10a. The slope is larger than
45".
More generally, if the motion is not uniform, then the trajectory is not a straight line
(Figure lob). But every small sectionis nearly straight, and each section has a slope of more
than 45".

Relation between reference frames


First of all consider rotations, e.g.,

x' = cos a x + sincr y


y' = - sin a x + cos cr y
where 0 < cr < 7r/2. The 2'-axis is defined by y' = 0,

Y =tana
-
x
This is a line lying in the first quadrant.
Likewise, the y'-axis is defined by x' = 0,

Y
- = -cots (4.21)
x
This is a line which lies in the second quadrant. Moreover, the lines defined by (4.21) and (4.22)
are perpendicular (see Problem below). So the situation is as shown in Figure l l a .
Similarly consider Lorentz transformations, e.g.,

t' = coshat -sinha x


x' = - s i n h a t + c o s h a x (4.22)

where 0 < a. Note that the two sinh terms have the same sign. The x' axis is defined by t' = 0,

0 = t ' = coshat - sinhax

t
- = tanha
x
This is a line lying in the first quadrant.
Likewise the t' axis is defined by x' = 0

t
- = cotha (4.24)
x
This is also a line lying in the first quadrant. The lines (4.24) and (4.25) do not make a right
angle. But as we stated earlier, the angle between the t and x axes has no physical meaning,
and it does not matter. The t' and x' axes are therefore as shown in Figure 11b.

Problem 8
(a) The positive XI-axisis defined by y' = 0 and I' > 0. Show that the line lies in the first (and
not the third) quadrant. Likewise determine the quadrant assignments for the lines in (4.21),
(4.22), (4.24), (4.25).
(b) Show that the lines (4.21) and (4.22) are perpendicular
(c) Show that the lines (4.24) and (4.25) are not perpendicular and that the angle between
(4.23) and the x-axis equals the angle O2 between (4.24) and the t-axis.
(d) What happens to the t' and x' axes in Figure l l b if ,O B l ?
(e) Re-draw Figure 11b if 0 > a.

4.4 Transformation of velocity

Figure I2

Let the frame S' move with velocity V = pc with respect to 5'. Let a particle P have velocity
v' as seen by St. What is its velocity v as seen by S? Thisis the problem of the transformation
of velocity. For simplicity we consider all velocities along the x-direction.
According to Galilean transformation (Figure 12),

ax dxr
v=- , vr=- (same t ! )
dt dt

1 u=v'+V I Galilean

So this problem is also called the addition of velocities. (Incidentally, we consider the transfor-
mation vr + v here because the reverse would be the subtraction of velocities, which is a little
bit less convenient.)

Derivation using Lorentz transformat ion


The transformation law (4.26) is not correct according to relativity. We present two derivations
of the correct law. The first makes direct use of the Lorentz transformation. Consider two
observations of the particle, and in particular the separation Axp.
Ax = 7(Ax1 + PAt')
At = .y(pAx' + At') (4.27)
Note that t and t' are different. The sign of & on the right hand side can be easily deduced
with reference to (4.26). Divide the equations in (4.28).
Ax- Ax'
- - + @At' - Ax1/At' + p
At ,8Axf+At' PAx'/Atf+l
However
Ax
- Ax'
= v , -=v
'
At At'
and in our units of c = I , @= V:

If we want to restore the factors of c, then obviously

1 J

If either the velocity of the St frame (V) or the velocity of the particle (v') is much smaller than
c, then (4.29) reduces to (4.26). Thus the law of addition of velocities does not contradict our
LL
common sense", which is based on experience and experiments at low speeds.

Problem 9
(a) An aeroplane is flying at 300 ms-' (- lo3 kmhr-I). It sees a second aeroplane flying at
300 ms-l relative to itself, in the same direction. Find the velocity of the second aeroplane
according to a person on the ground, according to (i) Galilean transformations, (5)Lorentz
transformations. Find the percentage difference in the two results.
(b) Repeat the above if these are space-ships and the two given velocities are 3 . 0 lo7
~ m s-'(0.1~)
instead of 300 m s-' .
(c) Repeat again if the two given velocities are 2.0 x lo8 m s-'(2c/3).

Problem 10
Let V be fixed (say 0 . 5 ~ ) .Plot v vs v'. Hence, or otherwise, show that (4.29) cannot lead to a
velocity larger than c.

Problem 11
Start with (4.28) and solve for v' in terms of v and V. Show that the result can be obtained
by v t, v', V + -V.
Derivation using two transformations
Recall from (3.25) that a Lorentz transformation is represented by

[Ll = [ msh a sinh a


- s i f i a cosha I
where
tanh cr = p
Now consider two Larentz transformations in succession, with relative velocities pl, P2. The
resultant should be equivalent to a single transformation with relative velocity P obtainect by
"adding" & and a. In terms of (4.30)

Note the analogy with the case of rotations in (2.9).


Writing this out

cosh a
- sinha
- sinh Q-
cash a
] [-
= cosh a 2 - sinh a 2
sinh a 2 cosh a 2
cosh a1 - sinh a1
- sink a1 cosh a1 1
Take the 00 component

cosh a = cosh a 2 cosh a1 + sinh a 2 sinh irl

Hence we see that in composing two transformations, the parameter a simply adds, exactly like
the angle in rotation. This makes the parameter cr very convenient:

a = a1 + a 2

Now use (4.31) and the addition law for tanh:

which is exactly the same as the addition law (4.28). This derivation has the advantage that
it is obvious that IPI < 1, because it is tanh a. A second advantage is seen when making many
transformations. Also compare with Chapter 2, Problem 2.

Multiple addition of velocities


As an example of the addition of velocities, especially the use of the "angle" a, we consider the
following example. Let the velocity of a spaceship be V, and let V = 0 initially. The proper
acceleration is a = constant, in the following sense. At any instant t , let there be an inertial
observer S' with velocity V, instantaneously co-moving with the spaceship. According to this
observer Sf, after a time At', the spaceship has gained a velocity Av' = aAt'. Given that a =
constant, what is V at any time?
First of all we note that At' is measuredin the co-moving frame, so-it is really the proper
time interval AT. So it is convenient to measure all times in terms of r, and the question is:
What is V(T)?

Using law of addition of velocities


+
Let the velocity at T AT be V + AV. This velocity is obtained by "adding" the velocity of
the S' frame, V, and the velocity AT relative to St:

By inspection, it is already clear that V can never reach 1 in any finite T . The reason is that
when V 4 1, RHS-, 0,so dV/dr t 0 and V no longer increases. More precisely, we can solve
(4.35).

Problem 12
Integrate the above equation with the initial condition that V = 0 when T = 0, and show that
the result is
-1
e2a~
V= = tanh a~
+
e2aT 1

Note two limits. (a) For r -t 0, V ar , which is the expected Galilean result. (b) For T t co,
v-, 1.
Using the "angle" a
We consider a series of co-moving frames at different times. The transformation from one to
the next involves increasing cr by Acr, where

tanh A a = p = AT

Since AT -+ 0, we get
Since cu is additive, the overall result is equivalent to

and the final V is

V = t a n h a = tanh aT
in agreement with (4.36), but the mathematics is much simpler.

The displacement Ax" = (At, Ax) transforms like a kvector, i.e., like (3.9). But the velocity
transforms in a complicated manner because-we divide by At (see (4.27)-(4.28)), and At is not
an invariant. If we divide by an invariant quantity instead, i.e., a quantity that is the same in
every frame, the result would again transformlike a 4-vector. The obvious choice is the proper
time AT, and we are led to define the 4-velocity up by
Ax"
=-
AT
Or, taking infinitesimal intervals

But using (4.16), we get

In particular, the components are

Hence
Problem 13
Since u p is a 4-vector, its "length"

must be an invariant. What is its value?

Ch4-2.tex; September 29, 1997


5 Mathematics of Four-Vectors
In this chapter we gather in one place the mathematics of 4vectors. We start by reviewing the
analogous mathematics of 3-vectors.

5.1 Mathematics of three-vectors

A 3-scalar is any quantity that remains unchanged under a rotation of axes. Examples: mass,
temperature, electric potential, mass density, time.

Basic 3-vector
The basic 3-vector A x is a line joining two neighbouring points, in other words the displacement.

Figure 1

Its cartesian coordinates are

We discuss a short displacement rather than the coordinate x (which would be a long displace-
ment starting from the origin) because eventually we wish to generalize to curved space. In
curved space (e.g., the surface of the earth), a short displacement is a straight arrow, and is a
vector. A long displacement is not a straight arrow, and cannot be thought of as a vector. We
shall come back to this point later.

Transformation laws
Under a rotation of axes, the components change according to

, [Ax']
A Z ' ~= R ~ A X ~ = [R][Az]
where the matrix [R] is independent of the vector.
Other 3-vectors
Any three quantities

vi = (vl,v 2 , v3)
which transform in the same way, ice.,

V
-ti -
-~'j,,j b11= [Rlbl (5.4)
is a 3-vector by definition. Obviously 3-vectors may be obtained by
rn adding or subtracting 3-vectors; or
rn multiplying or dividing a 3-vector by a 3-scalar (or invariant).
The time elapsed, At, is a 3-scalar; so is the mass m. So the velocity and the momentum

are also 3-vectors.

Length of a vector
The length of a displacement vector A x is As:

AS)^ = A x - AX = A X ~ A X ;
The length of other vectors are defined in the same way.

Condition on [R]
Since the length has to be an invariant (i.e., a 3-scalar), a condition is imposed on [R]. From
(2.6)
@jkk -
- 6jk 9 PTI[~1
= [I1 (5.7)
To anticipate some other development, we can write this in a more complicated way as
Dot product
Consider a vector z = x + cuy, where a is an arbitrary scalar, Now the following is a 3-scalar

where

x .y = ziyi (5.9)
Since this is a 3-scalar for a n y a,it follows that x . y is also a 3-scalar, called the dot product,

Basic rank-:! t e n s o r
Let x, y be two vectors and define the following nine quantities as the basic rank-2 tensor.

Its transformation law is

General rank-2 t e n s o r
Any 9 quantities t i j which transform in this way is called a rank-:! tensor.

H i g h e r r a n k tensors
A rank 3 tensor is 33 quantities which transform as
ttijk =Rd~jrn~kn~lrnn

The definitions for higher ranks are similar.

Sij is a t e n s o r
We cannot simply write down any 32 quantities t i j and say it is a tensor. We must check the
transformation laws. Consider the Kronecker S. In every frame

In other words 6lij = @ j . Is it a rank 2 tensor? According to (5.11), we have to check

3
#j = gij - k l ~ j k 6 1 k (5.14)
But this condition is exactly the same as (5.8) (up to a trivial relabefig of indices). So b'j is
indeed a tensor.

Example of a rank 2 tensor


Consider a mass m at a position x, and

Because
xixj is the fundamental rank 2 tensor
Sij is a tensor
x2 = )xI2is a scalar
so I i j is a rank 2 tensor.
More generally, if there are masses m a at positions x,, cr = 1, - - ,N , and
fi = C ma(&', - x y j ) (5.16)
(I

then this is also a tensor. In fact this is the moment of inertia tensor.

Contraction theorem
Instead of stating this theorem generally, we consider an example. Let t'j be a tensor and zj
be a vector. Then

is a vector. In other words, the dummy index j is contracted away, leaving a free index i. The
theorem states that this remaining free index i transforms like a vector. To prove this

Thus y' transforms like a vector. The generalization to more or fewer indices is obvious. The
case of contracting until there are no indices left, e.g., in analogy to (5.17)

y = t jxj
giving a scalar, is already known.
Reverse theorem
The reverse theorem is also true. Again we deal with an example. Let tij be a tensor and tj
be any three numbers. Further suppose that it is known that

y ' = t i j xj (5.20)
holds in every frame and y' transforms like a vector, then XI is also a vector. To prove this,
first we have

But we also have

By comparing the last two equations, and if these hold for several different tensors t (we leave
it as an exercise to determine how many different t's are required), then the r.h.s. must be
equal even if we peel off the factor tnm:

xli = Rim zm

This then show that x' transforms as a 3-vector.

Problem 1
It is given that

and that cp is a scalar, xi is a vector. Prove that y' is a vector. State the necessary conditions
clearly. (As stated above, the conditions are not quite sufficient.)
5.2 Mat hematics of four-vectors

A 4-scalar is any quantity that remains unchanged under both a rotation of axes and a trans-
formation to another uniformly moving frame. Example: the mass (also called the rest mass).
Note that time is not a 4-scalar.

Basic 4-vector
The basic 4-vector AS is a line joining two neighbouring points in spacetime, in other words
the spacetime displacement.

Its cartesian coordinates are

Ax" = (Ax0,Ax1, AZ', a x 3 ) = (At, Ax)

Transformation laws
Under a transformation, the components change according to

AzlP= Lp,Ax" , [Ax'] = [LJ


[Ax] (5.22)
where the matrix [L] is independent of the vector.
Note that rotations can be regarded as a special case of Lorentz transformations, with

O t h e r 4-vectors
Any four quantities
which transform in the same way, i.e.,

vtP = LP, v u , [v'] = [L]


[v]
is a Cvector by definition. Obviously 4-vectors may be obtained by
0 adding or subtracting 4-vectors

0 multiplying or dividing a Pvector by a kscalar (or invariant)

"Length" of a vector
The length of a displacement vector Ax is defined through

The length of other 4-vectors may be defined in the same way. Note that (As)2 may be +, -
or 0.

Metric
The matrix qpU is defined by

Lower indices
Define

In (5.27), f'"= qpu is the inverse matrix. The definition (5.27) may be applied to any vector,
and in fact also to tensors. We adopt the following names.

vp = (i) vector or a contravariant vector


v, = ( ) vector or covariant vector
a
Condition on [L]
Since the length has to be an invariant (i.e., a 4-scalar), a condition is imposed on [L]. From
(3.23) and -(3.24)

f)puLppE"' = VPU , [LT] [TI [Ll = [d (5.28)


We do not need to specify whether the indices in the matrix form are up or down. The rules
are as follows.
The "outer" indices must be consistent, e.g., we choose them to be all down.

p { [ ~ TW
l [ In
d = IdW
The "inner" indices must be paired, one up, one down.
T
[L I, lr
[slP&luc = [771PU

Dot product
Consider a 4-vector z'= 5 + cry', where cu is an arbitrary scalar. Now the following is a &scalar

where
-.
xey= 71~vx'l~ (5.29)
Since this is a Cscalar for any a, it follows that i .y' is also a 4-scalar, called the dot product.

Basic rank-2 tensor


Let 5,y' be two kvectors and define the 16 quantities as the basic rank 2 tensor:

pu = x p Y V

To be specific, this is a ( ) tensor, and we can likewise define


tpu =X p h (i) tensor

t p u= x p y y (: ) tensor
etc. These are related to each other through the raising or lowering of indices by qPy and q p y .
The transformation law for tpVis

General rank-2 tensor


Any 16 quantities tpVwhich transform in this way is called a rank-2 tensor.

Higher rank tensors


A rank 3 tensor is 43 quantities which transform as

tWP = LP, LY LP ~ P P Y
P 7

The definitions for higher ranks, and for lower indices, are similar.

Transformation law for covariant index


How does a covariant (i.e., lower) index transform? We can view this in two ways.

Ax: = qpVAxtU definition of lower index


= qpV LVpAxP transformation of AxP
= qpVLVp
~PuAxu definition of lower index (5.34)

The first way to view (5.34) is that the two [rl] factors simply raise and lower the indices on
[L], i.e.,

qpu LVp
qpu = VpV qUPLVp3 Lp4 (5.35)
Hence

In other words, transform by [L];keep indices in usual way


The second way is to rewrite (5.34) as
Now the matrix in { ) is, using the fact that [qT]= [qt],

In the above, we have used two relations:


From (5.28)

Also

Put (5.38) illto (537)

Thus: transform by [L-'1 rather than [L],but on the right.


The second point of view is particularly convenient when we want to show that con-
tracting an upper index with a lower index leads to an invariant.

is a (i) tensor

If q,, is given by (5.26) in every frame is to be a (i) tensor, we must have

Problem 2
Show that (5.40) follows from (5.28). Also show that is a
~ p " (i) tensor.
The tensor qp,
What happens if we raise one index in qpY? h general

So putting t + 77

Problem 3
Show that qp, as defined by (5.41) is identical with P,.Hence P, is a ( :) tensor.
Contraction theorem

Again we consider an example. Let tPY be a (20) tensor and x, be a ( ) tensor, i.e., a
covariant vector. Then

is a (k) vector, i.e., a contravariant ve=tor.


The proof is similar to the case of 3-vectors and is left as an exercise.

Problem 4
Prove the reverse theorem: If in (5.42), it is known that tp" is a ( ) tensor and yr is a
(k) tensor, then x, is a (;) tensor. Also state clearly the conditions for the validity of this
theorem.

Summary
Although at first glance the mathematics of 4-vectors looks slightly complicated, the index
notation automatically keeps track of everything. This is really all that needs to be remembered.

5.3 Scalar, vector and tensor fields


A field is a quantity that depends on the position, i.e., a function of z' (i = 1,2,3) in 3-space,
or a function of xp ( p = 0,1,2,3) in 4-d spacetime.

Scalar field in 3-space


Consider for example the electrostatic potential p. It is a scalar field. By this we mean:
Field: to every point P there is a y(P).
Scalar. y ( P ) has the same value in aIl frames.
Figure 3

Different functional forms


Suppose the point P has coordinates (xl, x2,x3) in frame S, and coordinat& (z", xt2,xR) in St,
then (5.44) can be written more explicitly as

(P(xl, x2, x3) = (pt(xn,xn, xn)


Example 1
The electrostatic potential is

Under a 45" rotation of axes

So p and pt have the same value, but different functional forms.

Example of 3-scalar fields


The following are some examples: electrostatic potential, temperature, pressure, fluid density.
Scalar field in spacetime
In the same way, suppose there is a quantity y such that:
Field: to every spacetime point P (i.e., event), there is a number cp(P).
Scalar: cp(P) has the same value in all frames.

In this case, by "all frames" we mean all frames rehted by a Lorentz transformation. Rot ations
may be considered special cases of Lorentz transformation.
In exactly the same way, a scalar field in spacetime would have the same value, but
different functional forms, in different reference frames.
There is no known example of classical scalar fields. There has to be a special reason
(roughly speaking some gauge symmetry) in order for a field to be long-ranged. This special
reason holds for some vector and tensor fields, but eannot hold for scalar fields. If the field is
long-ranged, the potential goes as l / r and the force goes as I/?; it can be "felt" far away and
detected classically. If the field is short-ranged, the potential goes as ( l / r )e-'IX, where X is
typically of nuclear dimension. Therefore such a field would be difficult to detect classically.

Vector fields
We skip 3-vector fields and come directly to 4-vector fields. Suppose at each point P in space-
time (i.e., each event), there are 4 quantities

A1(p), A ~ ( P ) A
Ap(P) = (AO(P), , ~(P))
such that under a coordinate transformation

where [L] is the transformation matrix for coordinate displacements. Then Aj' is said to be a
4-vector field.
In terms of coordinates, (5.46) becomes

~ , xt2,xO) = LpyA ~ ( xl,


A ' ~ ( Xxtl, Z x2,
~ ~x3) (5.46)
Again, the functional forms can be quite different.
The most important example is the Cvector potential in electromagnetism:

Ap = (cp? A)
where cp is the scalar potential and A is the vector potential, i.e.,

B=VxA (5.48)
We can either try to check that (9, A) as defined by (5.48) transforms like (5.46), or
alternatively, we can postulate that there is a 4-vector potential Ap and show that (5.48) follows.
This is what we shall do later in this course.
Tensor fields
Suppose at each point P in spacetime (i.e., each event), there are 42 quantities

Ap"(P) p, v =-0,1,2,3

such that under a coordinate transformation

Atp"(P)= Lp, L", APu(P) (5.49)

Then Apv is said to be a (i) tensor field. Tensor fields of higher rank are defined in a similar
manner.
The most important example is the gravitational field in relativity. Weak fields can be
regarded as a tensor h,,(x) on flat spacetime.

5.4 Basis vectors


Basis vectors in 3-space
In 3-space define basis vectors
0
el=:, e z = j , es=G

Thus a general displacement vector is

(In 3-space there is really need to distinguish upper and lower indices.)

Basis vectors in spacetime


Likewise in spacetime we can define basis vectors

e'o,&, G,&
(Note that vectors in spacetime, i.e., Cvectors, are denoted by ' rather than bold-face letter.)
These are illustrated in Figure 4. Then in analogy to (5.50)
Figure 4

Now if we change, for example, only x1 but keep the other components fixed, then

AZ = Ax1& (xO,x2,x3 fixed)

Going to the limit Ax1 -t 0 and generalizing to other components

Sometimes, if we denote a point by P, then we can write

Think of A P as the displacement of a point P when one of the coordinates 'x is changed by
Ax,.

Properties of basis vectors


Consider two displacement vectors AZ, Ay'

On the other hand


By comparison we see

In other words

Incidentally, we can reduce all these statements to-3-space by just setting all time components
(e.g. Ax0, AyO)to zero.

Transformation property
Under a change of axes, the vector as a whole does not change, even though the components
change. This is illustrated in Figure 5 in the case of rotations.

Figure 5

Thus
A5 = AxlG1= (Lp,Axv)Zpl

At the same time


A5 = Axv&
Comparing coefficients of Ax":
4
ev = Lpv Z,,' = GILpv

(We write it in this form so that the repeated indices are next to each other, in "natura11" order.)

16
Multiply on the right by [L-'1.

This is exactly the same as the transformation of a lower index in (5.39), i.e., it is multiplied
by the inverse matrix on the right. In short, we only need to em ember that the lower index in
$ behaves just like any lower index.

5.5 Differentiation
Consider a point x = (xO,xl, x2,x3) in spacetime. In anticipation of the discussion on curved
spacetime, we shall not put an arrow on x. The case of 3-space is easily recovered by setting
xO= 0. Moreover, in this section all displacements A x are understood to be infinitesimal.

Differentiation of a scalar
We start with a scalar field p(x) and compare its value at two neighbouring points. Then

Here we introduce the comma notation

Now Acp is a scalar, and Ax" is a (i) tensor. Hence, by the inverse contraction
theorem,
The gradient operator

In other words, 3, transforms like a (!) tensor opemtor. The corresponding (i) tensor
operator is P.

Differentiation of a vector
Let A(x) be a vector field and consider the difference in value between two near-by points
(Figure 6).

Figure 6

AK = K(x+Az) - i ( z )
+
= [AP(x Ax)ZP]- [AP(x)ZP] (5.63)
3,
The crucial point is that $ (i.e., :, & in the case of 3-space) are constant vectors; they are
+
the same at x and x Ax, as illustrated in Figure 7:
Hence

where by definition
AAp = Ap(x + Ax) - Ap(x)
Let us take the p component in (5.64).

= coefficient of + in (5.63)
= u p

= A(Ap)

m (AA)' = A(Ap)

This is an important (and well-known) concept. Let us specify clearly what we mean.
Let p = 1.
(5.65)

The left hand side (AA)l means


0 Take the vectors at x and x +
Ax (two vectors).
0 Subtract the two vectors.
0 In the result take the 1 component.

The right hand side A(A1) means


0 Take the 1 component at x and x +
Ax (two numbers).
0 Subtract the two numbers.

The relation (5.65) states that these two processes are equivalent. In effect, it teaches us how
to subtract (and hence how to differentiate) vectors-simply do it component by component.
We can write this in differential form.

dA = d (ApZp)= (dAp)
Writing out the change dAp

AplVdxvt$
We read off (dA)p as the coefficient of e', in the above expression

Since (dA)p is a (i) tensor and dxv is also a (i) tensor, this shows that A', must be a
tensor.

Ch5-2.tex; December 29, 1997


6 Relativistic Kinernatics
The theme of relativity is: All laws of physics must take the same form in all reference frames.
Two important laws of physics are:
0 Newton's second law F = ma, and

0 the law of conservation of momentum when there is no external force.

How must these laws be modified in ordm that they take the same form in all reference frames?
In this Chapter we concentrate on the conservation of momentum, because it is more funda-
mental; we come to forces in the next Chapter.
Let us specify more clearly the requiremeot that laws take the same form in all frames,
Let there be a law L in frame S. Suppose we transform to frame S by the transformation law
7. Then we should get the same law in S', which we denote as L'. In symbols

6.1 Momentum
Newtonian momentum + Galilean transformation
First we show that Newtonian momentum (L(N)) is compatible with Galilean transformation
( I ( G ) ) , or in shorthand

I L(N) - T ( G ) - L1(N) I Yes

It is easiest to illustrate with an example first.

(a) Before (b) After

Figure 1

Problem 1
(a) Particle a of mass 1unit is moving at a speed of 113 unit and hits particle b of mass 2 units
at rest (Figure la). After the collision, they move along the original direction, at speeds u, v
respectively (Figure 1b) . Assume Newtonian momentum = (mass) x (velocity) is conserved,
and Newtonian kinetic energy = ( 1 1 2 ) ~(mass) x (velocity)2is also conserved; find u and v.
[This is L(N).]
(b) Another observer is moving to the right at velocity V = 115 units. Find the velocities of a
and b before the collision, and also after the collision, as seen by this observer. Start with the
velocities in (a) and use the Galilean transformation for velocities. [This is 7(G).]
(c) Check whether momentum and kinetic energy are conserved in the frame S'. [This is L1(N).]
Problem 2
Prove the above relationship in general.

Problem 3
We can be even more ambitious. Assume only that Newtonian kinetic energy is conserved in
every frame, and that Galilean transformations apply. Prove that Newtanian momentum must
also be conserved.

Newtonian momentum + Lorentz transformation


But we know that we should not use Galilean transformation, but the Lorentz transformations
(T(L)). We next show that Newtonian momentum (L(N)) is compatible with the Lorentz
transformation ( I ( L ) ), or i n shorthand

Again, it is easiest to iflustrate with an example first.

Problem 4
(a) Particle a of mass 1 unit is moving at a speed of c/3 unit and hits particle b of mass 2 units
at rest (Figure la). After the collision, they move-dong the original direction, at speeds u, v
respectively (Figure lb). Assume Newtonian momentum = (mass) x (velocity) is conserved,
and Newtonian kinetic energy = ( 1 1 2 ) ~(mass) x ( ~ e l o c i t y is
) ~also conserved; find u and v.
[This is L(N). You can make use of the results from Problem 1.1
(b) Another observer is moving to the right at velocity V = pc with P = 1/ 5 units. Find the
velocities of a and b before the conision, and also after the collision, as seen by this observer.
Start with the velocities in (a) and use the Lorentz transformation for velocities. [This is I ( L ) .]
(c) Check whether momentum and kinetic energy are conserved in the frame St. [This is C(N).]

We do not need a general proof: one counter-example is enough.

The problem comes from the fact that velocity does not transform linearly. We are led to
consider quantities that transform linearly under the Lorentz transformation. For this purpose,
consider the kmomentum of a particle, defined as

where ii is the $-velocity, namely


This quantity has the following properties.
For non-relativistic speeds, v / c << 1, 7 + 1, and the spatial components reduce to the
''ordinaryn Newtonian momentum.
Under a Lorent z transformation, p' transforms linearly:

Four-momentum + Lorentz transformation


The linear property ensures that if p' is conserved in one frame, then it is also conserved in
another frame. In other words, relativistic 4momentum (L(R)) is compatible with the Lorentz
transformation ( I ( L ) ) . In shorthand

To see this explicitly, consider a collkion

where c may denote a after the collision, and d may denote b after the collision. The conservation
of momentum in frame S takes the form

Define

then

PP=0 , p=o,1,2,3
By (6.3), then, we also have

PfP=0 , p=0,1,2,3
so momentum is also conserved in the St frame.
Spatial Components
The spatial components of the kmomentum are p = mu or

,
m / d n
Some books call m the rest mass, and M = m7 = the relathistic mass. Then
p = Mv takes the usual Newtonian form. This is an extremely bad convention, and will not
be adopted here. The reason is that it suggests (incorrectly) that all Newtonian formulas can
be made correct by changing m + M.

The behavior of p = I p 1 is as follows: (Figure 2)


F o r v < c , pzmv

Figure 2

Time component
Next consider the time component of the Cmomentum

This is also conserved, in a way that it is intimately related to the conservation of momentum
To recognize what p0 is, consider the non-relativistic case v << 1.
1
= const + -mv2
2
+. (6.11)

We have restored the factors of c (= I), and also specialized to a case where m does not change.
It is recognized that up to an additive constant, p0 is the Newtonian kinetic energy
(approximately). (So far we are not considering any potential energy.) Thus we call the
energy E:

The behavior of the total energy = E in (6.11) is as follows.


Forv<c,E=mc2+~mv2+--.
0 Forv+c,E-+m

In particular, it takes an infinite amount of energy (if m # 0) to reach velocity c. Thus it is


never possible (if m # 0) to attain the velocity of iight.

Figure 3

Kinetic energy
From (6.11), we see that even at rest, there is an additive constant E = mc2. The kinetic
energy IC is defined as the energy E minus this constant

+
E = mc2 IC
so that nonrelativistically

but generally
Application t o collisions
Consider energy conservation in a collision a + b 4 c + d. Then, in obvious notation

We can distinguish two situations.


(a) In "classicaln collisions, for example the collision hetween billiard balls, the masses do not
change, say ma = m,, mb = md. Then

This is the familiar situation.


(b) In the collision of nudei and elementary particles, it is possible that new particles are
created, and the total mass is not conserved. Let

Then

In these cases, the additive "constant" is not really constant, and there is an effect. Heuristically,
we can say that a certain amount of mass, Am, has been converted to energy.

However, in the analysis of collisions, it is usually more convenient not to separate out
the kinetic energy.

Analogy
The equation E = mc2 is famous. An equivalent equation is Q = Am c2. It is common to say
that mass is converted to energy, and that they are quite different things. Actually, the modern
view is that mass is energy, and the factor c2 is just an LLexchange rate".
This point of view is best illustrated with an analogy. Let us assume that in a certain
country there are only (a) paper money in bills of $1000, and (b) coins in $1. A Martian
who arrives in this country first discovers two separate conservation laws governing monetary
transactions: (a) the law of conservation of paper money (m), and (b) the law of conservation
of coins (E). Later, he finds that rn can be converted into E, at a rate E = m x 1000, and that
really only the sum of the two is conserved.
Paper money and coins are conceptually the sane.
The conversion between the two is not a real transaction, or anything important.
The conversion rate is not fundamental. It is just a consequence of the fact that we use
different units for paper money and coins.
It is best to think of the "conversion" between mass and energy in the same way.

Relation between E and p


In Newtonian physics we have

I 2
p = mv , E = 2-mv
By eliminating v, we get a direct relation between E and p,

E = - p2
2m
In the same way, relativistically we have

By eliminating v, we get

or, restoring the factors of c,

This relationship can also be derived by the use of the invariant PPP,. (See Problems below)

Problem 5
Derive (6.23) from (6.22).

6.2 Analysis of collisions


In this section, we illustrate the analysis of collisions through a series of examples, from the
simplest to the more complicated:
linear, elastic
linear, inelastic
oblique, elastic
oblique, inelastic
In relativistic collision theory, an elastic collision is one in which the particle identities (and
hence masses) are unchanged. In an inelastic collision, the particle identities (and hence masses)
are changed; new particles may be produced.
In general, there are several steps:
count the given and the unknowns,
0

write down enough equations for the unknowns,


0

0 solve the equations.

The last step is only algebra and not physics. It may be messy but not conceptuany important.
Nevertheless there are some standard tricks.

Linear, elastic collisions


Example 1
Refer to Problem 4 and solve for u, v relativistically.
Solution'
B y using the conservation of momentum and energy, we have the following equations for u and
2).

We leave the rest as as exercise.


However, it is often better to analyse the situation in terms of momenta and energy
rather then velocities. There are several reasons.
0 Momenta and energies are more directly- measured and are usually quoted.

0 For high energy collisions, all velocities are very close to unity and it is inconvenient to

quote their value.


0 The result would be applicable to massless particles (like photons).

We now do this for a general elastic collision in a straight line, as illustrated below:
a + b c + d
Mass m M m M
Momentum P 0 C
pJ p
Ji
Energy E M J% Ed
The unknowns are pc , pd. We need not regard the E's as unknowns, since

Since there are two unknowns, we need two conservation laws:


The general tactics for solving these equations are as follows.
0 We need to square these equations, and eliminate.the energies by E,2 = pp + m2, Ed2 =
p: + M2.
0 Squaring leads to a quadratic equation and there will be two solutions. (Actually there
are two solutions even before squaring.) It is obvious that the extra solution describes no
collision at all, i.e., pc = P , pd = 0.
0 If this is the case, it is better to eliminate pc and solve for pd rather than the other way
round, because the extra solution pd = 0 is easier- to recognize and remove.

With these in mind, we can now proceed. From (6.27)

From (6.26)

Square both of these

Subtract

This is a quadratic equation for pd. Collect terms:

[(E + M)' - p2]p: - [2(E + M)MP] pd = 0


The terms without pd have cancelled. This is guaranteed, because we know that pd = 0 is a
solution. The first square bracket can be simplified.
Problem 6
Use (6.28) and (6.29) to solve for Ezample 1,

Problem 7
Re-do this derivation in the Newtonian case and solve for p,, pd. Show that in this limit, (6.28)
and (6.29) reduce to the same result.

Problem 8
Show that if the two masses are equal, all the momentum is transferred to the second particle.
Can this be derived in a simple way?
,
Problem 9
Show that for very high energies ( E >> n,M), nearly ail the momentum is again transferred
to the second particle. This can be understood heuristically as follows: At very high energies,
the masses make no difference, so the equal mass case must be applicable.

Linear, inelastic collisions


Many collisions involve the production of particles. The most famous is the production of
antiprotons (p).

Example 2
Antiprotons were first produced in the following reaction:
P + P + ~+P+P+<
Momentum P 0 P
Energy E M E'
At threshold, i.e., a t the minimum energy required to produce p, the four particles in the final
state move together without any relative velocity, and therefore behave like a single particle of
mass 4M. Find the threshold energy E. The mass of the proton is 0.94 GeV/c2.

Solution

E + M = E'
E~+~EM+M = ~
E'~
+ +
( P 2 M2) 2EM + M2 = p2+ (4M)2
2 E M + 2 M 2 = 16M2
E = 7M = 6.58 GeV
Example 3
Consider the following reaction, in which an electron hits a proton at rest, and produces a A.
What is the minimum energy of the election for this to occur?

e - + p + A
Mass m M m *
Ml
Momentum P 0 P
Energy E M El

The masses are rn = 0.5 MeV, M = 0.94 GeV, and MI= 1.24 GeV.
Solution
Again, at threshold we may take the final state to be a single particle, with mass m + MI

Check: If M' = M , E = m, i.e., the reaction is possible no matter how small the kinetic energy.
This is what we expect.

In this case, m z 0, and we can have a simpler expression


Mn - M2
E=
2M
= 0.35 GeV
Example 4
+ +
Consider the same reaction e- p + e- A as in Ezample 3. The electron energy is 18 GeV.
Find the energy of the electron after the collision.

Solution
After the collision, the e- and the A move separately and we can no longer regard the system
as a single particle. So the situation is as follows.

e- + P + e- + A
Mass m M m M'
Momentum
Energy
P
E
0
M
E l m
El W
We regard P t and Q as the unknowns. The energies E' and W can be expressed in terms of
the momenta. So we need two equations. These are given by the conservation laws:

Solve for W and Q, -and square

w2 = +
(E M -E')~
= E2 + M 2 + Et2 + 2EM - 2EtM - 2EE'
Q2 = ( P - p'j2
= p2+pn-2PP'
Subtract and use W 2- Q2 = Ma, E2 - P2= m2, El2 - PI2 = m2.

In order not to get too involved with the arithmetic, let us neglect the electron mass:
m = 0, E = P, E' x lP'l
We further assume Pt > 0. (This has to be checked later.) Then

In the present case, we have


0;942- 1 . N 2
Et = 18 + GeV = 17.65 GeV
2 x 0.94

Problem 10
Suppose the electron bounces backwards, then E' = -Pt. Find the value of E' in this case.

Problem 11
Return to the general case described by (6.31). Square this equation and express En in terms
of Pt2. Hence obtain the algebraic solution for PI. Explain why there are two solutions.
Oblique, elastic collisions

Example 5
A proton (mass = 0.94 GeV) travelling at momentum 30 -GeV hits another proton at rest. The
incident proton scatters at 3 5 O , while the target proton recoils without changing its identity.
Find (a) the final momentum of the incident proton, (b) the final momentum of the target
proton, and (c) the direction of recoil of the target proton.

Q
Figure 4

P + P P + P
Mass M M M M
Momentum
Energy
P
E
0
M
PEt3p.J W
The three unknowns are PI, Q, #J where q5 is the angle of recoil. The energies can be expressed
in terms of the momenta. Thus we need 3 conservation laws:

As usual, we first solve for W2 and Q2

w2 +
= ( E M - Et)'
= E~+M~+E~+~EM-~E'M-~EE'
+
Q2 = (P - Pt cos 19)~ (PIsin 8)'
+
= p2 pf2- 2PPt cos 8

Subtract

Move aLl Et to the left


Square and use En = Pn + M2

+ +
(E M ) 2 ( ~ RM2) = [(E+M)M + PPcosBj2
+ +
(E M)~(P" M') = (E + M)lIM2+ 2(E + M ) M P P f cos B f p2pa cos28
This is a quadratic equation for PI. The constant term cancels; thus one solution is P' = 0.
It could have been guessed from the start that a solution -3s P' = 0, Q = P, 4 = 0. (The two
particles exchange roles, so obviously everything is conserved.) Thus it is seldom necessary to
solve a quadratic.

The other solution is:

p' = 2M(E + M ) P cos 8


(E-+~ ) - 2~2

Once this is obtained, the other parts are trivial.


COS~e
I
Problem 12
Complete Example 5 and give numerical answers.

Problem 13
A photon strikes an electron (mass rn) at rest, and is scattered at an angle 8. The energy of
the photon is E = hc/X, where h is Planck's constant. Find the increase in wavelength AX.
This is the famous formula for Compton scattering.

0blique, inelastic collisions


We st art with a decay problem, which involves only 3 bodies.

Example 6
A Z0 particle (M 90 GeV) is travelling with momentum P = 150 GeV. It decays into e+, e-.
Find the angle between the e+ and the e-.

Q'
Figure 5
Solution
From the conservation of energy and momentum

In this case rn 3 0. SO
P .
cos 8 = = 0.857
J F
0 = 30.96" , 20 = 61.82"
Example 7
A neutral particle X decays by

where the masses are 0.94 GeV and 0.14 GeV for p and T - respectively. The original particle
X, being neutral, was not observed, but the momenta of the final particles were measured to
be 20 GeV for p and 15 GeV for T - . The angle between them was found to be 18". Find the
mass of X.
p 20 GeV

n- 15 GeV

Figure 6

Solution
First determine the angle 0 by considering the y component of momentum

20 sin 8 = 15sin(l8" - 8)
15sin 18"
tan8 =
+
20 15cos 18"
0 = 7.70" , 18" - 8 = 10.30"
Next determine the momentum P of X:
P = 20 cos 7.70" + 15cos 10.30" = 34.578
Also determine the energy E of X

The mass M of X is
M =d n = 5.56 GeV
This is one way to determine the mass of unstable particles.

Example 8
Electrons with energy 18 GeV hit a stationary proton target. In one event, the electron is
scattered at l o , with an energy 17 GeV. The target proton recoils, and is excited to become a
new particle X with mass MI. Find MI.

Solution

Figure 7

Let X have momentum and energy W.

For simplicity, we ignore the electron- mass. Hence P = E, PI = E'.

M~ = M~ + 2 M ( E - El) - 2EE1(1- cos 8 )

In this example, we find

M~ = 0.94~+ 2 x 0.94 x 1 - 2 x 18 x.17 x 1.523 x

M' = 1.69 GeV


These experiments are usually done in the following manner. A detector (called a spec-
trometer) is placed at a fixed angle 8. Electrons that are scattered at that angle enter the
detector, and their energies El are measured. From (6.33), each E' corresponds to a value of
Ma.So a plot of the number of events, N, versus El gives a distribution of Mn values. Any
particles that are produced will appear as a peak in this distribution.
N

M R =.
Figure 8
Problem 14
In Figure 8, the width of the peak is AE' = 0.3 GeV. Assume parameters as in Example 8.
Find the uncertainty in the mass M'.

6.3 Center of momentum frame


Suppose we go to a frame moving with velocity P , such that the total momentum of a colliding
system is zero. This frame is called the center of momentum (CM) frame. The concept is
illustrated in the following example.

Example 9
A proton with momentum 3 GeV hits another proton at rest. Find the velocity P of the CM
frame. Also find (a) the momentum of each proton, and (b) the total energy in the CM frame.

Solution
In the laboratory frame, the momentum of each particle, and the total momentum are

The energies are


We now transform to a new frame.

We want this to be zero.

In this new frame, the total energy is

The momenta of the individual particles are

Thus the picture in the CM frame before the collision is as shown in Figure 9a. Because the
momentum is zero after the collision, the only possible situation is as shown in Figure 9b.

Figure 9

For these reasons, it is often convenient to do the cahdation in 3 steps.


0 Transform to CM frame.

Find the final situation in CM frame.


0 Transform back to laboratory frame.

Energy in CM frame
Refer to Example 9. We see that of the total energy Et = 4.084 GeV in the lab frame, a part is
related to the overall forward motion. Only the part E: = 2.771 GeV is really available in the
CM frame, e.g., for creating new particles. The next Ezample considers this in a general way.
Example 10
A particle of mass M and energy E hits a target particle also of mass M. Find the tot a1 energy
E* in the C M frame.

Solution
The total momentum and energy in the lab frame are

The transformation to the CM frame is given by

The total energy in the CM frame is

Note that as E + co E* cc a. So increasingthe beam energy ( E )in a fixed target experiment


is very inefficient compared with colliding beam experiments, which is by definition in the CM
frame.

Example 11
In a colliding beam experiments, a proton beam of energy 100 GeV collides head-on with a
second beam of the same energy, travelling in the opposite direction. What would be the
equivalent beam energy if the same experiment is done in a fixed target situation? Take
M 1 GeV.

Solution

E * = J m - d % ? allinunitsofGeV
2E = 4 x lo4
E a 2 x lo4 GeV
6.4 Relativistic invariants
Available energy
From the last Example, it is clear that specifying the energy could be misleading: alarge energy
(e.g., E 2 x lo4 GeV) in the lab frame actually corresponds to a much smaller energy (e.g.,
E* 200 GeV) in the CM frame. The real physical situation should be expressed in terms of
N

a relativistic invariant, i.e., a quantity which is the same in every frame. For a collision such as

the total 4-momentum available is.

and the only relativistic invariant that can be constructed. is

Example 12
Refer to Example 10. Find P p and s in the lab frame and the CM frame.

Solution

P'" = (E*,
0, 0, 0) Momentum is zero by definition
s = -Pip" = E * ~
=2M(E +M)
In fact, this is a simpler way of doing Example 10.

Example 13
+
Consider the reaction ?r p -,...,where the mass of the pion is rn and the mass of the proton
is M. If the energy of the x is E, and the proton is at rest, find the energy E* in the CM.

Solution
Momentum transfer

Figure 10

Many scattering events are of the type

where c is the same as particle a, but deflected, as shown in Figure 10a. Figure 10a shows
that particle b was originally at rest, but recoils, and may even break up. We see that some
momentum is transferred from the beam particle to the target. This is illustrated more clearly
in Figure lob, where the wavy line denotes the transfer of momentum (often of other quantum
numbers as well). Thus we define the Cmomentum transfer

The relativistic invariant is

Example 14
Refer to Example 8 and find an expression for t in terms of the incident energy E, the scattered
energy E' and the scattering angle 6. Assume that E and El are large enough that the electron
mass may be neglected.

Solution

Pa = (E, 0, 0, E)

pc = (El, E' sin 8, 0, E' cos 0)


Q = (El- E, Efsin8, 0, E'cose- E )
t = (E' - E)2 - (E' sin 6)' - (Elcos 6 - E)'
= - 2E1E + E2- El2- E2+ 2EE1cosB
= -2EE1(1 - cos 8)

Note that t is the same in every reference frame, but the right hand side refers only to the
laboratory frame.

6.5 Frequency and wave number


The (angular) frequency w and the wave number k of a wave also form a 4-vector:

kp = (w, k)
There are two ways to see this.
(a) From quantum mechanics

Since @' is a 4-vector, k p must also be a kvector.


(b) A wave disturbance goes as, e.g.,

@ = cos(wt - k x). cos 6


where

is the phase. For example, 8 = 0,27~,. . - are the peaks. But the phase is an invariant (a peak
is a peak in any coordinate frame), hence kpx, is invariant. But x, is already known to be a
Pvector , therefore kp is a 4-vector. (Contraction theorem).

Ch6-2.tex December 29, 1997


7 Particle Dynamics and Electromagnetism
7.1 Overview
Newtonian mechanics cont ains three elements:
(a) Definition of momentum p = mv
(b) Newton's second law F = d p l d t
(c) Some force law, e.g., F = -kx far a spring
The three together determine motion. Take a spring as an example:

which gives the equations of motion.


Our task is to generalize all these to relativistic situations, i.e., velocities which are not
small. Part (a) is already accomplished in the last Chapter. We have introduced the concept
of a 4-momentum p"

pP = 4%7v)
This has the following properties.
0 It is conserved.
0 It reduced to the usual momentum ( p = 1,2,3) when v << c.

0 It transforms simply under Lorent z transformations.

Therefore i t remains for us to deal with (b) and (c) in this Chapter.

Figure 1

However, most forces are not relativistic - they do not assume the same force law in all
reference frames, but have a special reference frame in which the force law would be simplest.
For example, consider the force F on the mass m in Figure la. This force has a special frame
- the frame in which M is at rest. There is no reason to believe that the force law in other
frames would be equally simple. Next consider the force F on the mass m in Figure lb. This
is a frictional force due to the table, and will be simplest in the frame in which the table is at
rest. There is not much point in discussing the relativistic version of these forces.
The situation is different for electromagnetism. The force is due to the electric field E
and the magnetic field B, which exist in vacuum. Vacuum is the same to all observers. There
is no such thing as a special frame in which the vacuum (or "ether") is at rest. Therefore the
laws of electromagnetism should be the same in all reference frames. So we shall focus on this
force in this Chapter.
To take a broader view, there are 4 fundmental forces. In order of decreasing strength,
they are:
the strong interaction, responsible for nuclear binding
0 the electromagnetic interaction

0 the weak interaction, responsible for P decay

0 gravitation

(How we divide the different types of forces, or in reverse, how we integrate them, depends
on the level of understanding. One hundred and fifty years ago, electricity and magnetism
would be regarded as two types of interactions; now they are regarded as unified - which is
one of the successes of relativity, as will be discussed below. Recent research has unified the
electromagnetic and weak interactions through the standard model, and to some extent also
the strong interaction, but we ignore these for the moment.)
Of these four interactions, two are short ranged: the strong interaction has a range of
about 1 fm (1 fm = lo-'' m), while the weak interaction has a range of approximate fm.
Hence, they are manifested only rnicroscopicaliy and in quantum phenomena, but not in macro-
scopic, classical phenomena. The other two - electromagnetism and gravitation - are long
ranged and manifested macroscopically. For this course we shall be concerned with them.
All these forces are transmitted by fields (like the electric and magnetic field) which
reside in vacuum, and thus have no special frame. The relativistic transformation of these
forces is therefore of central importance.

7.2 Definition of force and Lorentz force law


We start with the relativistic definition of the 4momentum

P* = ( E ,P)

and define force as

We emphasize that F does not transform simply. Then it is found, experimentally, that the
force on a charge q is

in obvious notation. This law is well known in the non-relativistic case ( v << c). Everything
remains valid, even for large velocities, provided p is taken as the relativistic momentum (7.2).
This result can be understood at two levels. First, we can simply accept it as an
experimental fact; later we shall see the experimental consequences. Secondly we can ask how
this law fits into a more consistent overall picture.
Deflection in a magnetic field

Consider charged particles passing through a magnetic field B, say out of the page
(Figure 2a). Since the magnetic field does no work, the magnitude of the momentum does not
change; only its direction changes. Therefore the trajectory is an arc of radius R. What does
R depend on?
+
Consider two moments t and t At, and compare the momenta. In this time, the
momentum vector has changed direction by Ad, so (Figure 2b)

But since the angle changes by 27r in a period T,

Note that R is not proportional to mu.

Example 1
Compare the radii of curvature for particles travelling at (a) 0.99 c and (b) 0.9999 c.

Solution
Forvxc, p=myvzmyocy
(4 1
= 7.09
= J ~ ~ Z F
This kind of experimental observation verifies that it is correct to use the Lorentz force law
with the relativistic momentum myv.

Motion under an electric field: parallel case


Consider a charged particle travelling along a constant electric field E.

where we have assumed that the motion starts from rest, and

Note that T can be interpreted a s the time taken, according to Newtonian physics, for the
particle to attain velocity c.

t
Figure 3

For small t, the denominator 1, and the situation is given by line 1, which is the Newtonian
result. For large t , the velocity saturates at v 1. Thus, the fact that velocities cannot be
larger than the speed of light is built in. Also, in general, qE can be replaced by the force F.
We can analyse the motion in another way.

Put this into (7.6)

Thus the effective mass is increased by y3.

Motion under the electric field: perpendicular case


Consider a particle moving originally along x, but subject to a constant electric field along y.

Therefore

where we consider of any force F perpendicular to the direction of motion. The first equation
can be analysed simply: because the force is perpendicular to the motion, the energy does not
increase, and y = constant; thus v, is constant as well. Next carry out the differentiation for
(7.11).

The first term is zero, because the instantaneous value of v y is zero. Thus
Magnetic deflection belongs to the second case of force-being perpendicular to the di-
rection of motion, and from (7.5), it is also seen that the radius oT curvature is modified by a
factor of 7 (not r3).
There is a very important lesson. It is not true that all Newtonian formula can be made
correct by replacing the mass m by the "relativistic massn M = my;such a replacement works
for perpendicular forces but not for pardel forces. For this reason, the idea of a "relativistic
mass" is seldom used nowadays.

Recall that velocity is

Although Ax transforms simply as (the spatial components of ) a 4vector, the denominator At


is not a Cscalar. Therefore v is not a kvector, and transforms in a complicated way. Instead,
it is better to consider the 4velocity

This involves dividing a 4-vector Ax'' by a 4-scalar; the result is guaranteed to be a 4-vector,
and will therebre transform in a simple way.
For exactly the same reason

transforms in a messy way. We are led to define the Cforce

Explicitly,
Since the rate of change of the energy E is the work done per unit time, i.e., F - v, thus

If we can state the Lorentz force law in terms of Kp rather than F, then the'covariance properties
would be more apparent.

7.4 Four-vector potential and field tensor


To discuss electromagnetism, we start with the postulate that the scalar potential 4 and the
vector potential A together form a 4-vector field

The proof of this postulate will come from the transformation of the fields.
From the *otential we can form the field tensor

By construction, this must be a tensor. The components are as follows

= (V x A), = B3

In general
In explicit matrix form

The first index is the row, so Fol is row 0, column 1, i.e., the entry El.

7.5 Covariant form of the Lorentz force


We claim that the covariant force Kfi defined by (7.15) is given by

Let us check this component by component. First of all, note that u p = (7, yv), u, = (-7, yv).

KO A q~Ouu, (Y can only be 1,2,3)


= qFOiui
= qEi(7vi)
= y(qE;)vi = 7Ev; = y ( F - V )

Thus, combining (7.14) with (7.21), we have the law of motion in covariant form

Assuming that Ap does indeed transform like a 4-vector, and hence that Fp" does indeed
transform like a (i) tensor, it is guaranteed that (7.22) leads to the same physical consequence
in every reference frame.
7.6 Transformation of fields

Figure 4 I'
Consider a frame S' moving at speed V = pc along the x-direction, relative to a frame S. The
Lorentz transformation is given by

and all other entries are 0.


It is then straightforward to work out the transformation of the fields

There are only 4 choices of ( p v ) ,namely ( p )= (0 O ) , (0 I), ( 1 o), ( 1 1). So

,q +
= ( L ~ ~ LL ~ ~ ~ ~F L~
LOILIOF1O +
~~ ~ FLO^~L~~~~ + l l )

= (LO,,L~~ - LO^ L ~ ~ ) F O ~

7 7 - (-yP)(-yPjlEl
= 1
= y2(1 - P2)~ 1 El
=

The only allowed v is v = 2.


Likewise

E; = y(E3 + PB2)
Since @ can be regarded as a vector in the +z direction, all these can b e summarized as

Here 1 and I refer to the direction of relative motion, i.e., that of /3. Next for the magnetic
field

The only nonzero term is p = 2, v = 3.

B: = r(B2SPE3)
Likewise

B; = r(B3 - PE2)
All these can be summarized as

B', = ~ ( B r p x E l )
Example of field transformation

Figure 5
We consider how the same phenomenon appears to two different observers, in order to illustrate
what happens under field transformations. Consider two capacitor plates which create an
electric field

Let a particle of mass m and charge q traverse this space at speed v (Figure 5). In the lab
frame S, this particle experiences a force in the y direction, and accelerates at

(See (7.13).)
Now go to the co-moving frame St. What are the fields?

Note that a magnetic field appears! In the co-moving frame, the particle is non-relativistic, so
Newtonian mechanics apply:

Although there is a magnetic field, it does not matter, because the particle has zero velocity in
the co-moving frame, and hence does not couple to the magnetic field.
Are (7.25) and (7.26) consistent? To check this, we see that

(t' is proper time)

- -
dt' -
Hence (7.26) becomes

~ L Y= --
- f qE
dt2 7m
which is identical with (7.25).
An important aspect of the field transformation is that E and B are mixed. They are
really different aspects of the same thing, i.e., different components of the field tensor F p Y .

Checking invariance
This example shows that we can check invariance (i.e., that phenomena appear to all observers
in a consistent way) by three different methods.
We can check one phenomenon at a time, as in this example. There are infinitely many
phenomena to be checked.
We can check the non-covariant equations of motion, again as in this example.
a We can check the covariant equation of motion, which is much simpler. We check these
once and for all.

Relativistic invariants
Out of the tensor Fp",we can construct two quadratic invariants. The first is

Consider the two terms

Next consider the two terms


'1

This transforms like a Cscalar, i.e., it is the same in every frame.


If there is a pure B field (E = 0) in one frame, then Il > 0, and it is impossible to
transform to another frame so that it becomes a pure E field ( B = 0), since the sign of Il
would be changed. The reverse is also true.
In a plane wave, IB I = ]El, SO Il = 0. This is consistent since a plane wave (Il = 0)
stays as a plane wave (I; = 0) in every frame.
To introduce the second invariant, we have to start with the dual tensor

The totally antisyrnmetric symbol is defiiled as


E " " ~ ~
= +I, iT p u a P = even permutation of 0123
E ~ =w -1, ifpua~=oddpermutationof0123
E ~ V a P = 0, if any two indices are equal

It transforms as a (i) tensor - this is left as an exercise. Thus P transforms l i b a


tensor.

Problem 1
The transform of the totally antisyrnmetric tensor in another frame is given by

and we want to show that this is exactly the same as P " P 9 Take, for example, p = 0, u = 1,
p = 2, a = 3, then d'"Pu = 1, and we need to prove:

Prove the above identity. (Hint: Consider determinants. Also, you need to make an assumption
on a sign.)

To see explicitly what pp" is:

Thus
Referring to (7.20), we see

With p, we can construct another invariant

To evaluate this, consider ,

Hence

Again, this must be the same in every frame. For example, a plane wave is characterized
by B - E = 0, and we now see that this condition is the same in every frame.

7.7 Maxwell equations


The theory of electromagnetism consists of two parts.
0 How do the fields affect the charges and currents - Lorentz force law

How do the charges and currents generate the fields - Maxwell equations
We have dealt with the first part relativistically. Now we deal with the second part.
Four Current
First, we show that the charge density p and the current density J ,together form a 4-vector.

To see this, we have to write down the expressions for p and J. Suppose there is a single charge
q at position X. Then

Integrating this over space gives the correct charge. The more general case with charges q, at
positions X(,) would be

but it is sufficient to deal with the-transformationproperties of a single term. Next, the current
density is
J = charge density x velocity of charge
= qvS3(x - X)
where v = dX/dt. We insert the factor

where T is an arbitrary constant time.


= q63(x- X)~ (-t ~ ) d t
J = qvS3(x -X) S(t -T)dt
Now
- x ) s ( ~- T ) = S ~ ( X-Pxlr)
63(~
is a scalar, where we have introduced X" = (T, X). Hence
(p, J) oc (dt, vat) = (dt, dx) = dxp
Hence it transforms like a 4-vector. This is a formal proof. You can work out a more physical
proof through the following Problem.

Problem 2
(a) There are N charges, each of magnitude q, in a rectangular volume of area A and length L.
The charges are not moving. Find J'.
(b) Now go to a frame which is moving at a speed along the length direction. In this frame,
the charges are (i) moving at a speed -P, and contained in a volume A' x L', where A' = A
and L' = LIT. Find J'p.
(c) Show that J p and J'" are related exactly by the Lorentz transformation.

We can now write down Maxwell's equation in two groups. We simply write down the
covariant form, and check that they give the correct result.
Homogeneous equations

First, if any 2 indices are the same, this equation is -trivial. For- example, let the indices be
p = 1, v = 2, p = 2. Then (7.37) gives

which is a trivial identity. So we only have to consider the case of all 3 indices being distinct.
(a) Missing index is time-like, i.e., 0

V-B=O
(b) Missing index is space-like, e.g., 1

(V x E), = --B,
a
at

Thus the two homogeneous Maxwell equations:


(7.38) - no magnetic charge
0 (7.39) - Faraday's law

are components of the same covariant equation (7.37).


We can also write (7.37) in terms of the dual tensor as follows:

To see this, recall that

1
~p&u = ~ f r v o pP F " ~
Since p , v, a, have to be all different, once we choose v, it is just the missing index among
dfiFQP.So it recovers (7.37).
Inhomogeneous equation

or we can write it as

where
a a a a a
ax'.'
(a) v is time-like, i.e., 0

Note, there is no p = 0 term.

(b) v is spacelike, e.g., v = 1


B,F" = -4n~'

dt
VxB=4nJ+-E
a
Thus the two inhomogeneous equations:
(7.42) - Gauss' law
(7.43) - Ampere's law with displacement current
are again components of the same covariant equation (7.41).
To summarize, Maxwell's equations are
In terms of potential

then the homogeneous Maxwell equations are automatic. For example

because the six terms cancel in pairs, e.g., the two terms underlined.
The inhomogeneous equation becomes

Gauge transformation
The potential is convenient, but contains "too much" information. In other words, we can make
a change - called a gauge transformation - on Ap and not affect the physics. Let A ( x ) be a
Cscalar field, and let

A' -, AP+d'A
FPW-, dP(A"+ a"A) - P ( A P+ P A )
= ( P A " - P A P ) + (a'dv - avap)A
= pV

since the order of differentiation in a mixed derivative does not matter. Since classical electro-
magnetism depends only on F*", it is invariant under the gauge transformation.
We can make use of gauge transformations to choose any value of d . A = PA'.

Summary
All of electromagnetism is contained in
Ch7-2.tex; December 29, 1997
8 Action Formalism
8.1 General principles
Different ways of specifying dynamics
We have now come across two ways of writing down the dynamic evolution of a system. Consider
for example the motion of a charged particle under the influence of an electromagnetic field.
The first method is

dp = F = q(E+v x B)
-
at (8.1)
This is not covariant. The second method, which is slightly better, is

This is explicitly covariant. However, it may still look slightly unnatural.


Now we come to the third way of doing dynamics - the least action principle, which is
explicitly covariant, therefore well suited to discussions in relativity;
particularly simple, so that the Lorentz force law comes about "naturally".
We start by reviewing the general properties of the least action principle, and then specialize
to electromagnetism.

Newtonian mechanics

I XI
I
I
I
I
22 2
Figure 1

Forget about electromagnetism for the moment, and go back to Newtonian mechanics
in one-dimension. The variable is x ( t ) . Suppose it is given that
What is x(t) in between? Graphically, this means determining the correct path (solid line)
among all possible paths (say the broken line) in a t-x diagram. To conform with the usual
practice in spacetime diagrams, we draw t vertically, even though it is usually the independent
variable (Figure 1). In the equation of motion approach, we say that the correct path is the
one which satisfies the differential equation of motion, such as (8.1), or a Newtonian equation,
e-g.,

together with the conditions (8.3). The system is specified by F(x) or V(x), e.g., for a spring,
F = -kx, or V = kx2/2.
The least action approach takes a completely different point of view.

Least action principle


1. Consider any path P between (tl, xl) and (t2,x2). It need not be the correct path. Each
path P corresponds to a function [x(t)].

2. Give a way of calculating a oumber S[P]= S[x(t)] for each path P = [x(t)]. S is called
the action. For all cases we shall consider, S is made up of contributions A S for each
segment of path (additivity assumption).

3. Consider all possible paths, and select the one whose action is minimum. This is the
correct path.

Advantage of least action principle

Figure 2
A path P is independent of the coordinates used to describe the path. Figures 2a and 2b
show the same path P in two different coordinate systems. Here we are thinking of a Lorentz
transformation; e.g., see Chapter 4, Figure 8. If the action S ( P ) does not depend on the
coordinate system, then the principle of least action will seleck the same path in any coordinates.
It guarantees that the physics is invariant. Thus, the principle of least action is specially
convenient for
discussions in relativity, where invariance is a central issue, and
even in Newtonian physics, when using generalized coordinates - otherwise it is difficult
to verify that the equations in different generalized coordinates give the same physics.

We shall continue with Newtonian physics, and discuss


how we can guess the form of S; and
how to obtain the equation of motion from S.

8.2 Action principle in non-relativistic physics


Choice of action
We consider a single particle of mass m, moving in 1 dimension under a potential V ( x ) . Gen-
eralization to higher dimensions is straightforward.

(a) Because of the additivity assumption, we need only consider a small segment of path, centred
at ( t ,x) and of length (At, Ax); see Figure 3.

Ax

Figure 3
The action A S must be proportional to At:

L is called the Lagrangian.


D
as = LAt
(b) L can only depend on t, x and Ax/At = 5

L = L(x, x, t )

(c) Assume the system has no explicit time dependence, then t does not appear.

(d) Assume the system knows about the position x only though the potential V(s);

L = L(V(x), 5)
and moreover assume that L is linear in V(x).

where A, B are unknown functions.

(e) Expand A, B. The lower order expansion should be adequate for small velocities

(f) The term a0 would contribute

which is the same for all paths satisfying the boundary conditions. It is irrelevant for picking
the minimum, and we shall set it to zero.
The term alx would change sign under reflection (x ---+ - x ) . It is not allowed if the
physics is reflection-symmetric. al = 0.

(g) The lowest order guess is therefore

We shall show that the choice

gives the Newtonian equation of motion. (Actually, only the ratio bo/a2 matters, since multi-
plying S by a constant has no effect .)
Euler-Lagrange equation
We now start with

and derive the Newtonian equation of motion. Let x(t) be the correct path, and let z(t)-.
be a neighbouring path; 77 is considered a small quantity (Figure 4).

Z
Figure 4

Since the neighboring paths must satisfy the same initial and final conditions

q(t1) = q(t2) = 0
Now compare the two actions
The first term can be integrated exactly.

because of (8.8). Thus we are left with

If the original x(t) gives a minimum, then A S must vanish for all first order changes. This
means [ ] must be zero:

which is t h e Newtonian equation of motion.

M o r e general form of action


We shall later come across more general forms of action. So let us derive the equation of motion
for a general L. To save some writing, and to unify notation,
r)(t) is usually written as 6x(t),
we can integrate by parts (AB --t -AB) and not worry about surface term.

Hence

We call dL/% the conjugate momentum r. It may or may not be the same as the Uordinary"
or mechanical momentum p.
8.3 Action principle for a relativistic free particle
We now consider relativity, but start with a free particle.

Choice of action
Consider a single particle of mass m, moving in 3 dimensions. So think of Figures 1-3 as
space-time diagrams, and x + x. A point is now denoted as xp and the segment in Figure
3 is Ax" = (At, Ax). Since the segment represents a possible path, it must be time-like, i.e.,
IAtl> IAxl, Ax"Ax, < 0.
We now try to repeat the arguments in section 2, but with two differences:
Because of Lorentz invariance, the choice is much more limited.
It is no longer so natural to use L. The reason is that A S is invariant, but At is not. So
L = AS/ At is not invariant. Conceptually, it is better to deal directly with the invariant
quantity AS.

(a) Because of the additivity assumption, we can focus on a small segment Ax" (Figure 3).

(b) There is no dependence on t. If the particle is free, every position is equivalent, so there is
no dependence on x. Hence A S depends only on Ax".

(c) By Lorentz invariance, A S can only depend on the product AxpAx,.

(d) Additivity implies linearity. If we double a small interval, A S must be doubled. Therefore,

The - sign must be inserted because AxpAx, < 0.


(e) The proportionality const ant does not matter. For a free particle, this is the only term, and
multiplying S by an overall constant does not change which path gives the least action. So this
constant is a matter of convention.
We shall next discuss how to deal with such an object.

Equation of motion: noncovariant approach


Let us divide and multiply by dt:

We know that S is invariant, because of (8.13). But now we have written it in a way which is
not explicitly covariant .
With the Lagrangian, we can immediately apply the result of section 8.2.

where v = x

So the conjugate momentum .~r is exactly the same as the mechanical momentum p. From the
Euler-Lagrange equation (8.1I), we then get

dt-
We see that the relativistic momentum emerges very naturally.
Equation of motion: covariant approach
In the non-covariant approach, we use t as the independent variable, and express x = x(t)_,v =
v(t) etc. However, the four components of x' have the same status, so such a different treatment
of one compooent is not elegant. A better approach is to use an arbitrary path parameter s as
the independent variable, and let x p = x'(s). For example we can have
t = s+0.3s2 , O<s<l
x = sin xs
y = cosxs
z = o
The path parameter has no physical meaning. For example, we can let s = sI2

t = + 0.3(s')~ , 0 < S' <1


x = sin ~ ( s l ) ~
y = coS?r(~')~
z = o
It describes exactly the same path, but simply with different labelling. In using a path param-
eter, there are really two types of invariance that needs to be preserved:
0 Lorentz invariance (plus any other invariance of the dynamics); and

0 Invariance mder re-labelling such as s --+ s'.

For simplicity, we shall not deal with the latter.


With a path-parameter s, we then write (8.13) as

We assume all paths are labelled on the interval [sl, s2]. Because of the freedom to relabel, this
does not impose any real restrictions. The boundary conditions are

Now consider a change


Note that we have used different dummy variables in the different products.

Integrate by parts and use (8.20) to discard the integrated term. Also call %(s) = 6xp(s)

Thus the equation of motion is

-
As indicated, this is valid for any- -path parameter s.
1 any s

T h e equation of motion becomes simpler if we choose s to be the proper time interval:

In other words, label each point on the path by the proper time r elapsed along that path.
Then
dx" dx, -
--- -1
ds ds

and (8.22) simplifies to

I
' T = proper time
8.4 Action principle for a particle in the electromagnetic field
Choice of action
We assume that there is a kvector field Ap(x), which describes electromagnetism. (We could
ask: what sort of theory would be obtained if there is some 4scalar field? It would indeed be a
simpler theory, but it does not correspond to electromagnetism empirically.) We now construct
the action step by step.

(a) We assume the effect is to add a term to S, i-e.,

where the interaction term depends on Ap. We further assume that Ap enters linearly.

(b) For a small segment of path, ASI is


linear in A'
linear in Axp (additivity assumption)
e a Cscalar
The only possibility is

ASI = qA,(x)Axp
where q is a constant, which will turn out to be the charge. Hence

Equation of motion: non-covariant approach


If S in (8.25) is written as Ldt, then

Note that A,(x) is to be evaluated at xu = (t,x). Since the form is not covariant anyway, let
us separate the time and space components in the second term.

Ap = (4,A)
We can now use the standard techniques to derive the Euler-Lagrange equation. But first note:
L # KEPE, showing that the latter is not general.
The conjugate momentum will turn out to be diEerent from the mechanical momentum.

The Euler-Lagrange equation is

--a, - -
Mi) ( a A j dAi)
+ q v j ---
dt ax; at ax; axj

where

Thus the somewhat odd looking Lorentz force law comes naturally from the very simple
term SI= q J A , d x , .

Equation of motion: covariant approach


Now we re-derive the equation of motion using an explicitly covariant approach.

where So is the same as for the free particle case. Thus from (8.21), and choosing s = r =proper
time
Secondly,

Under a variation

The change is

Upon integrating by parts, and discarding the integrated term, this is equivalent to

= (PA" - ~"AP)k;;,6~,
Here '= dlds, or, upon choosing s = T , ' = d / d ~and

Hence

Combine (8.29) and (8.30)

Hence

which is the Lorentz force law in covariant form.

13
A technical note
We put s (a general path parameter) ---+ T (proper time) only after the variation. Can we set
s --+ T from the beginning? The answer is no, since we have assumed that the range of s is
the same for all paths, e.g., the two paths in Figure 4. This is possible if they are labelled by
s - since s is general, we can always re-scale one of the paths. But this is not possible if all
paths are labelled by the proper time to start with.

8.5 Act ion principle for Maxwell's equation


Overview and notation
(a) Previously we used x" to refer to the spacetime position of a particle. Instead, we now let

xra) = spacetime position of particle a


x" = general spacetime position

The reason is that when we talk about Maxwell's equation, we have to refer to E and B (or Ap)
at an arbitrary spacetime point, whether or not there is a particle at that point. In contrast,
when we were concerned with only the Lorentz force law, we only need to be concerned with
E and B at the position of the particle.
Also, we write the particle label in ( ) so as not to confuse with a space-time index.

(b) The action of a group of free particles is therefore

We allow a different path parameter for each particle. The second form is a short-hand only;
in actual calculations, we must go back to the first line.

(c) The interaction term is now

However we can also re-write it in the following way.


Insert a factor 6 3 ( -
~ x(~))@x= 1
Insert a factor bra) --t (dxral/dt)dt
With the J3(x - x(,)), the argument of A, can be changed to x.
Combine d3xdt -+ d4x
Then
But

(See (7.35) and (7.36); change x +x(,)). Hence

The two forms (8.34) and (8.36) are equivalent. The first form is more convenient when we
want to vary the particle paths. The second form is more convenient when we want to vary
Afl(x) at an arbitrary space-time point.

(d) We now want to add a third term which depends only on the fields:
SF = SF[Ap(x)]
Then

For this to be a minimum, the first order variation when changing xra)(sa) --+ xra)(sa) +
+
6xra)(sa),Ap(x) -+ AD(x) SAP(x) must be zero. In general

The first [ ] comes from So and SI, since SF does not depend on the particle positions.
It gives the Lorentz force law for particle a. We have already derived this, although at
that time we did not include the label a.
The second [ ] comes from SI and SF,since So does not depend on the fields. I t should
give the Maxwell equations. We shall derive this in the rest of this section.

Choice of action
Let us now try to guess the action SF.
(a) We assume it is quadratic in A'; this will lead to linear equations. This assumption is based
on the principle of superposition.

(b) We assume there is gauge invariance, then A" can only enter through F'. Note that the
following quantities, for example, are not gauge-invariant
(PA" + a"AP)(apAv+ a"Ap)
&
(A)
ApA,
(c) It must be a 4-scalar. The only choice is proportional to

So let us assume

I I

The constant k will be specified later.

Choice of units
In fact, the choice of k has no physical meaning, and only reflects a choice of units for electro-
magnetism. We can see this as follows.

(a) The unit of S is [energy]-[time]=Js. This is readily seen from S = Ldt, L = KEPE, for
example. This unit has nothing to do with electricity.

(b) A term in Sr goes as

This means qq5 must have the unit of energy, i.e., J. If we choose to measure q in Coulombs,
then
[q][$] = C x J C-'
However, we can choose to measure q in some other units, say esu
[q1[q5] = esu x J esu-'
In other words, we can make a transformation

q+a!l, $-+a-l4
and it does not change anything.

(c) Of course, to preserve Lorentz invariance, we have to change all 4 components of the
potential, so

(d) Under such a transformation


FPYFp,
-4 a - 2 ~ p " ~ P ,

This can be compensated by


k . a2k
Therefore, changing the value of Ic corresponds to changing units.
A simpler example
We shall be concerned with performing a variation on a field variable A p ( x ) . To introduce such
an idea, let us first consider a simpler example. Consider a region of space V bounded by a
surface S.

Figure 5

The electrostatic potential #(x) is prescribed on S:

There are no charges in V. What is 4(x) in V? We know the solution is

subject to the boundary condition. We now show that this also follows from minimizing the
energy

Let be the correct solution, and consider a small variation

6U = /y d 3 x v 4 . V6d
Now consider the identity
Put this into (8.43)

The first term has the form

(Gauss' theorem)

= {d~.V1(6$)
This is zero because 64 = 0 on the surface. In other words, we only consider those 4 satisfying
the boundary condition, so there is no variation on the surface. Hence

-.
This is really the same as the familiar integration by parts

The rule is
/ udv = -
/ du v . + surface term

a Move the derivative to "the other" factor.


0 Change sign.

In this case
(Vd) . ( V 6 0 = - [V (VC)]64 + surface term
If 4 is the correct solution, then (8.45) has to be zero for any 64, and hence '7'4 = 0.

Derivation of Maxwell's equation

Figure 6
The term So does not matter if we vary only A,. The integral is over a region of spacetime
bounded by two time-like surfaces Sl and S2(Figure 6). The values of Ap(x) are specified on
Sl, S2,i.e.,

This will allow us to discard the integrated surface terms.

G(FpVFpu)
= 2FpUSFpV
- aV6Ap)
= 2Fp,(.apSAV
= -4F,,aVSAP because of symmetry
+
= 4(aYFpu)SAp surface term

This has to vanish for all SAP, hence

Interchange p t-, v and interchange upper/lower indices


1
a,pu = - JV
4k
Compare with (7.44),
1
- =- 4 ~
4k

Note that in these units


Summary
0 All of electromagnetism is contained in

0 This form is obviously Lorentz invariant.

Ch8-2.tex; January 22, 1998


9 Gravity as Spacetime Curvature--An Introduction
9.1 Principle of equivalence
Millikan oil drop experiment
Consider doing a Millikan oil drop experiment (no gravity, no viscosity). Oil drops of mass m
and charge q are introduced into the space between two capacitors, where there is an electric
field E (Figure 1). They will accelerate.

Figure 1

The field E is a property of the region of space; it is the same for every drop. But the different
drops have different ratios of qlm. So they accelerate differently. The ratio is
q unit of force
- N

m unit of inertia

Free fall
Now consider objects in free fall, e.g. oil drops, under the iduence of gravity g. (Figure 2).

Figure 2

We have
The factor rn on the left comes from Newton's second law. It measures inertia. The factor m
on the right comes from the law of gravitation F = G M m / R 2 = m g ; it is like q in (9.1), and
measures the unit of force. There is no reason why they should be the same. So let us denote
them as

m; = inertial mass
m, = gravitational mass

If the ratio of mg/m;is different for different objects, then they would accelerate at
different rates. Yet it is well known that in free fall, all objects accelerate at the same rate.
(This was known to a few percent at the time of Galileo, and to an accuracy of at least 1 part
in 1012 nowadays. We omit the experimental determinaiion of this fact.) This means rng/miis
a universal constant. But convention we take it to be 1.

Postulate
The starting point of general relativity is that mg/rn;= 1 is not an accident; therefore we
believe it is exactly 1.

The ratio m, /mi =


exactly for all objects.

In other words,

All objects accelerate at exactly


the same rate during free fall.

Energy gravitates
Each atom (N 1 GeV rest mass per nucleon) contains of its rest mass as nuclear binding
energy (- 1 MeV per nucleon), and about of its rest mass as electrostatic binding
energy (N 10 eV per nucleon, e.g., 13.6 eV for H). Since mg/mi is unity to an accuracy of
10-12, this means that
Energy gravitates the same way
as "ordinary" matter.

This is known to an accuracy of 10-12/10-3 lo-' for nuclear energy, and 10-12/10-8
for electrostatic energy. We believe it to be exact.
-
lW4

The immediate consequence is that even light (photons) will exert a gravitational force
and will also be affected by an external gravitational force.

Principle of equivalence

Figure 3

Figure 3a shows an enclosed box in a gravitational field g. Various objects are falling
down, all at the same acceleration g. Figure 3b shows another enclosed box not subject to
gravity, but the box is accelerating upwards, with an acceleration a = g. In Figure 3b, the floor
will rise and hit the LLfloating"
objects. For an observer inside the boxes, there is no way to tell
the two situations apart.

The principle of equivalence states that


it is impossible to distinguish between
a uniform gravitational field g', and
the observer accelerating with -g'.

An example
Figure 4 illustrates the trajectory of a ball at three instants (i), (ii), (iii). The situation is
observed by two people:

(a) This observer thinks there is gravity, so the ball travels in a parabola.
(b) This observer thinks there is no gravity, so the ball travels in a straight line, but the floor
accelerates up.
Both observers agree that
(i) the ball is on the floor,
(ii) the ball is a distance h above the floor,
(iii) the ball is on the floor again.
In fact, by observation inside the boxes, it is not possible to tell (a) and (b) apart.

(ii)

(iii)

Figure 4
Other ways of stating equivalence principle
There are two other ways of stating the equivalence principle.

If the observer has an acceleration tio, the physics


appears to be the same as if there is a gravitational .

field g'= -Zo, i.e., every mass m is subject to a


4

pseudo-force F,, = mij = -mZo

We use the symbol Zo to emphasize that it is the acceleration of the observer, not the acceleration
of the mass m.
This statement is easily understood by reference to Figure 3.

(b) Imagine a freely-falling observer (i.e., fdling under the-influence of gravity and no other
force). Then all other objects would appear not to be accelerating, and therefore there appears
to be no gravitational force.

A freely falling observer cannot


detect a uniform gravitational field.

(If the field is non-uniform, e.g., a nearby ball feels a stronger field, then of course it can be
detected.) This statement is related to "weightlessness".
There is another way of expIaining this result. According to the falling observer:

Total "forcen on mass m


+
= gravitational force pseudoforce
+
= mg' (-mZo)

Ifthe observer is freely falling, then -iio = $, and this vanishes.

9.2 Gravitational redshift


From the above, a uniform gravitational field is in a sense not "real". It can be transformed
away by going to a freely falling observer. Therefore it is relatively easy to understand.
An important phenomenon in this regard is the gravitational redshift.
I
9

Figure 5

We wish to ask the following question: A photon of frequency w is emitted at z = 0 and


"climbs" t o z = h. What would be the frequency w' observed at the upper position? We-give
two arguments.

Using work done


The photon at z = 0 has an energy

and hence a "massn

In climbing to height h, the work done is

Hence the energy at the top is

This must be related to the frequency at the top as

Ew' = E' = fiw(1- g h / ~ 2 )


w' = w ( l - gh/c2)
More generally, if g is not uniform, then

gh ---+ @ = gravitational potentid


Also we work in units where c = 1

So, in "climbing up", the frequency is lowered - a redshift.


This argument relies on the principle of equivalence: it assumes that photons -("pure
electromagnetic energy") gravitates in the same way as ordinary matter.
Note that /@I << 1. A typical example might be

Even for the field of the earth from ground level to infinity

The above argument is only correct to first order in @. The reason is that in calculating the
work done, we have used the "uncorrected" rn. (Moreover, these Newtonian ideas are not valid
to high order.)

Using Lorentz transformation

t=t
Figure 6
The second argument relies on the principle of equivalence in a more explicit way: Replace the
gravitational field g by an upward acceleration of the box. Then the emission of the photon
(t = 0) and its reception (t = t) would be as shown in Figure 6. We-have-assumedthat the box
initially had zero velocity. The photon has travelled h + f gt2 in a time t, so

d =h + p1 t 2
We claim (1/2)gt2 is negligible. I£ this is the case

To check whether (1/2)9t2 is really negligible, put this approximate answer back into the right
hand side of (9--7)
2nd term ( 1 / 2 ) ~ ( h / c ) ~ 1gh 1
N N -- = -@
lst term h 2 c2 2
Provided 1@ 1 < 1, it is indeed correct to neglect (1/2)9t2.
Now, by the time the photon is received, the observer is already travelling at a speed

Therefore, rising the Lorentz transformation, and remembering that kp = (w, k) is a 4-vector.

Since Jc = w / c

Since we deal with IPI < 1

giving the same result as before.


The gravitational redshift has indeed been verified in the laboratory. Note that the
correction is typically one part in 1014 or 1015, and the experiment is extremely difficult.

9.3 Tidal gravitational force


Let us imagine throwing two balls upwards (Figure 7a). The two balls are initially separated
by Az(0). Suppose there is no gravitational field. Then both balls move uniformly, as shown
in Figure 7b. The a-t graphs are parallel straight lines, and after some time, the separation
remains the same: Aa(t) = Aa(0).
Next suppose there is a u n i f o m gravitational field g. Both balls will decelerate:
The trajectories would be as shown in Figure 7c. The paths are curved, but still parallel.
After time t, the separation remains the same Az(t) = Az(0). According to the principle of
equivalence, it is impossible to distinguish this situation from an accelerating observer without
gravity. So in a sense, there is no intrinsic difference from Figure 7b, except for a moving
observer.
Finally, suppose there is a non-uniform gravitational-field, such a s near the surface of
the earth. The lower ball experiences a stronger field, and its trajectory will bend more, as
shown in Figure
- 7d. The separation Az(t) increases: Az(t) > Az(0). The trajectories in Figure
7d are not parallel.

Figure 7

9
The final situation is not equivalent to an accelerating observer. The fact that the two
balls move apart is an objective fact, independent of observer, and cannot be transformed away.
It must be attributed to the gravitational field. In short

A uniform gravitational field is equivalent to an accelerating observer, and


also can be eliminated by transforming to an accelerating observer.

An inhomogeneous gravitational field (i.e., the difference in g'between


one point and another point) cannot be eliminated in such a way:
it is a "realn effect.

The inhomogeneous part of the gravitational field is called the tidal gravitational field.
The reason for this name is explained below.

The paradox of tides

sun

Figure 8
Let us look at a naive explanation of tides (Figure 8). The sun attracts the water on
the near side, forming a bulge - high tide H. The opposite side forms low tide L.
There are two problems with this explanation.

(a) The earth rotates once a day. Refer to Figure 8 and imagine the shaded sphere (solid earth)
rotating, but the oval (the oceans) not moving. In each rotation, every point on earth would
come across H once and L once. Thus there should be one high tide every 24 hours. In fact,
there is one high tide every 12 hours (approximately).

(b) The strength of the effect due to the sun would be


GMS
R:
where Ms = mass of sun, R, = distance to sun. Likewise the effect due to the moon is
GMm
R:
where Mm = mass of moon, R, = distance to moon. The ratio is

But in fact, the lunar effect is stronger.


Before we give the correct explanation, it is useful to consider a simple analogy.

Problem 1
A very tall elevator is undergoing free fall downwards. Because the elevator is tall, different
parts experience different strengths of gravity: the middle of the elevator B feels g = 9.80; the
top of the elevator C, being farther from the center of the earth, feels g = 9.79; the bottom of
the elevator A feels g = 9.81 (all in units of m s - ~ ) . However: the whole elevator falls at an
acceleration given by the strength felt by the center-of-mass, i.e., at 9.80.
+
(a) What is the net force pseudoforce experienced by a mass m at the three points A, B and
C? Pay attention to the directions.
(b) For an observer falling with the elevator, what would he say about the direction and the
magnitude of the "gravitational" force?
The correct explanation for tides is similar, and is illustrated in Figure 9..

sun or moon

Figure 9

The earth is freely falling towards the sun (or moon), with an acceleration
I

This is the centripetal acceleration of circular motion, and R is determined by the distance
from the sun or moon to the center of the earth B.
Now look at the total "force" in this freely-falling frame. The total "force" on a mass

gravitational force + pseudo force


GMm
(R + r)2 + (-mall)
GMm --GMm
(R + r)2 R2
G M m [ ( l +);
- - -2
R2

M - --
R2
GMm ( E) =-2- G M m
R3 T

We have taken the direction towards the sun as positive, so this is away from the sun. For the
point A, we simply change r + -r
GMm
ForceA = 2- r
R3
which is towards the sun.

Figure 10

(a) We see there are two bulges. So in each day, every point on earth rotates once, and meets
two high tides. Thus high tides are 12 hours apart.

(b) From (9.1 I), we see that the effect goes as GM/R3,not GM/R2. So the ratio of the solar
effect to the lunar effed is

This explains why the lunar tide is stronger.

(c) From (9.10), we see that the entire effect is due to the diflerence in gravitational force at
+
two positions, R and R r . Thus differences, or inhomogeneities, in the gravitational field are
called tidal gravitational forces.

9.4 Curvature
To summarize the discussion so far:
A uniform gravitational field does not matter; it can be transformed away by the principle
of equivdence.
What remains is the tidal gravitational force, which is best represented (e.g., Figure 7d)
by the divergence or convergence or lines which "should ben parallel.
So how do we explain the convergence or divergence of lines that "should be" parallel?
Note that we are referring to lines on a spacetime diagram, e.g., the z-t plane in Figure 7d.
There are two ways:
0 We can say there is a gravitational force (or more precisely different gravitational forces)

acting on the particles.


0 We can say that it is due to the underlying spacetime curvature.

The whole idea of general relativity is to adopt the second approach. This is possible because
all lines that start together (i.e., same z(0), i ( 0 ) ) will always keep together (Figure lla); this
happens because m; = m,, so all particles (so long as they are at the same point) will experience
the same acceleration. Note that this would not be possible for other forces, e.g., electric forces.
In this case, the acceleration is proportional to qlm, which is different for different particles.
So particles lines that start together will in generai diverge (Figure l l b ) . If this is the case, we
cannot blame the effect on the underlying spacetime.
Thus the principle of equivalence allows the spacetime explanation, and in fact makes
the spacetime explanation natural.

t
(a) Gravity (b) Electrostatics

Figure 11

Spatial analogy
The central theme is Figure 7d - two Eries on the z-t plane diverge, even when they start off
parallel. To make the introduction of curvature more natural, let us look at a spatial analogy,
on the y-x plane (Figure 12).
Y

Ax'
C- *

-
a

C---
A

----4- 5
Ax

Figure 12

Two persons are on the x-axis, separated by Ax. They move in the perpendicular
direction, along y, through the same distance. What is their separation Ax'? "Normallyn,we
would have Ax' = Ax. this comes about because of Euclid's axiom on flat space: parallel lines
maintain their separation.
However, consider the surface of the earth, and two points on the same latitude (say
30°S), as shown in Figure 13. Choose the x-axis to the east and y-axis to the north. Let two
persons start at these points, and again move by the same distance. Their new separation Ax'
would be larger.

Figure 13

If we draw this on a plane (Figure 14), the situation would be very like Figure 7d: two
-
lines which "should be" parallel have diverged.

14
Figure 14

There are two "explanations" for this phenomena:


There are "forcesn acting on these persons.
a The x-y "plane" is curved.
Obviously the second explanation is better.
In exactly the same way, we try to explain Figure 7d by saying that the z-t plane, i.e.,
spacetime, is curved. In the next few chapters, we shall develop the necessary mathematical
apparatus to handle curved spacetime.

Ch9-2.tex; December 30, 1997


10 Mathematics of Curved Space. I: The Metric
10.1 Introduction
The last Chapter provided physical motivation for describing spacetime as curved. For this
purpose, we need to develop some mathematical concepts, namely
0 distance and metric - this Chapter
0 vectors - Chapter 13
0 differentiation - Chapter 14
0 curvature - Chapter 15

These concepts together form the basis of differential geometry.

Nomenclature
We start with two familiar concepts: Euclidean space and Minkowski space.
Euclidean space
Euclidean space of N dimensions (to be denoted as EN)has the following properties.
0 The points are labeled by N coordinates (xl, - . ,sN).
0 The distance between neighboring points is given by Pythagoras' theorem

We "usually" think of our space as E3.


Minkowski space
Minkowski space of (Nl ,&). dimensions (to be denoted as MN1t N 2 ) has the following properties.
+
The points are labeled by N = Nl N2 coordinates (xl, - . ,xN).
The distance between neighboring points is given by a generalization of Pythagoras' theo-
+
rem, with the squares of the first Nl coordinates appearing with signs, and the squares
of the last N2 coordinates appearing with - signs.

We 'usually" think of spacetime as M3v1.


Mathematicians sometimes use the generic term LLspace'' to refer to a curved object; we
shall avoid it since the curved object in question is often spacetime. Therefore we shall use a
more formal term: a manifald M is a set of points which is locally like an Euclidean space or a
Minkowski space (Figure 1). For example, the surface of a sphere is locally like a Zdimensional
plane E2;SO a sphere is a 2-dimensional manifold.
like flat space
/

Figure 1

Example
Consider the 2-dimensional surface of a sphere of radius a, to be denoted as S2(a). We can
think of this as the earth; we shall also refer to ants living on the surface of a balI.

Embedding view vs intrinsic view


We can describe the surface of the earth in two ways.
0 Ride in an airplane and fly above the earth. We see that S2(a) is really a part of E3.

Since we know the geometry of E3, it is very easy to describe anything on S2(a). This is
called the embedding approach - the manifold A4 = S2(a) is embedded in E3.
0 Stay on the surface, as people did many centwies ago. This is called the intrinsic ap-

proach.
The intrinsic approach is more difficult: people many centuries ago could not easily tell
that the earth is round. However, the ac-tualmanifold M we are concerned with is 4-dimensional
spacetime. There are many ways of embedding it in higher dimensional Minkowski space; all
the extra dimensions are fictitious and meaningless. So we must make sure that our results
only depend on properties on M, and are independent of the extra dimensions.
As a compromise we shall do the following.
0 We shall use the embedding approach in intermediate steps, in order to get a more intuitive

picture.
0 But in the end we aim for expressions that do not involve the extra dimensions or the

embedding, i.e., the results will be intrinsic.

10.2 Coordinates
First, we need to label the points on the manifold M. To do so, draw a coordinate patch on
M (Figure 2), and in terms of this define N coordinates, denoted collectively as
Figure 2

The following properties will become clear through the examples below.
0 The coordinates need not have the unit of length; in fact the different components can
have different units.
0 Do not think of x as a vector; this will be explained in detail in the next Chapter.

0 The coordinates are described by upper indices. There will be no such thing as X I etc.;
coordinate indices cannot be lowered.
0 The same manifold M can be described by different coordinates, and an important issue
is how to ensure that the physics is independent of coordinates.

Examples

Example A1
Let M be 2-dimensional Euclidean space E2, and use rectangular coordinates (xl, x2) = (I, y)
(Figure 3a).
Example Bl
Take the same manifold as in Example A l , but use polar coordinates (xl, x2) = ( r ,4) (Figure 3b).
In this case, x1 and x2 have different units.
The two examples describe the same thing, and the relationship between the coordinates
is

This is an example of a general coordinate transformation, which is different from the linear
transformations considered in the early part of the course.

Ezample Cl
Take the manifold M to be S2(a),and use polar coordinates (0, (h), in which (Figure 4a)
8 = 90"- latitude
4 = longitude
Imagine ants living on a small region -of linear dimension L << a near the north pole;
mathematically we can say L/a -+ 0, or simply a + oo. In this limit, space would appear
to be nearly flat, and the ants would "normally" describe space by either Example A1 using
coordinates z,y or Example B1 using coordinates r, 4. It is therefore convenient to cast Example
C1 in a form such that the limiting case becomes apparent. For this, we go on to the next
example.

Ezample Dl
Take the same manifold as in the previous example, i.e., S2(a),and take the north pole (8 = 0)
as the origin. Use the same coordinate patch of latitudes and longitudes as before. Again, each
kmgitude is labeled by 4. For each latitude, define

circumference = 27rr (10.5)


The latitude is labeled by r rather than by 8. Thus, the coordinates are (xl, x2) = (r,4).

The relationship between Example C1 and Example Dl can be seen from Figure 4b:
r is the radius measured through the plane containing the latitude. The relationship between
r and 8 is
r = asin8 (10.6)
We note two properties.
0 The parameter r is not the distance s from the origin (measured along M).

0 Rather, it is defined in terms of the circumference. Hence r is called the circumferential

radius.
0 The same definition can be made for any space that has rotational symmetry about one

axis.
What is the advantage of Example D l over Example CL? In Example Dl, we use one
length r and one angle 4. This is the same as the case of flat 2-dimensional space in Example
B1, which also uses one length r and one angle 4. We shall later see more clearly the connection
between the two, in particular the property that Example Dl approaches Example B1 when
a -,w; see (10.16) below.

10.3 Distances and curvature - qualitative discussion


The central theme of differential geometry is:
0 We can tell curvature from the distances on a manifold M.

All distances can be built up from a knowledge of infinitesimal distances between neigh-
boring points.
In the next few Chapters we shall develop this concept mathematically and generally. But here
we first introduce the main ideas by means of an example, in fact, through comparing Example
B and Example D.

Examples
Example B2
We continue with Example 31 and try to write down the distance ds between the point
+ +
P = (r, 4) and Q = (r dr,$ d4). Refer to Figure 5. The radial distance is dr, and
the tangential distance is r dgi. These two distances are perpendicular, so using Pythagoras'
theorem, we have

(In all such formulas, dr2 means (dr)', etc., and the brackets will be dropped whenever there
is no danger of confusion.)
Thus in these coordinates, distances are not given by the simple formula in (10.1).
Figure 5

Example C2
+ +
In this case, we want to find the distance between the point P = (8,d) and Q = (8 do, 4 d4).
Refer to Figure 6. The distance along the north-south direction is ad8, where a is the radius of
the sphere. To calculate the distance in the east-west direction, we first notice that the length
of the latitude is 27rr = 27rasin8, which corresponds to the longitude changing by A4 = 27r.
So for a small longitude change d4, the distance is r d4 = a sin 8 dq5. Again, the two distances
aTe perpendicular, so

Figure 6

6
Example D2
In this case, we want to describe the same distance as in Example C2, but in terms of the
coordinates r and 4, rather than 0 and 4. Let us fist do it graphically in a general way. The
east-west distance is the same as before:

To find the north-south distance, refer to Figure 7. The parameter r can be considered to be the
radii of the circles projected onto a plane, whereas the perpendicular distance ds2 is measured
along the surface. The two are proportional for infinitesimal displacements, but not equal, so

ds2 = hdr (10.11)


where h = h(r). Combining (10.10) and (10.11), and denoting f ( r ) = h ( ~ )we
~ ,have

Figure 7

In fact, this derivation is valid for any surface that has rotational symmetry about one
axis. For any such surface, the ants living on it can make the following measurements on the
surface, i.e., intrinsically, and determine whether the surface is curved.
Measure the circumference of each circle (i.e., the set of points equidistant from the origin),
and hence determine the parameter r.
a For two neighboring circles, find the differencein circumferences and determine dr.
Measure the perpendicular distance between these circles. From this, determine f .
If f is always unity, the space is flat. If f is not unity, the space is curved.
Formula for a sphere
We now derive the formula for f ( r ) in the case of a spherical surface, which will be important
later on. From (10.6), we have

dr = acosOdO
Hence the perpendicular distance ds2 in (10.11) is

dr
dsz = ad8 = -
cos 6

Hence the factor h ( r ) in (10.11) is Jwand consequently

Thus the distance formula can be written as

Let us compare the two choices of coordinates for the surface of a sphere.
Example C makes it obvious that we are taking about a sphere - all points are equivalent.
0 Example D is more convenient because we see that the distance reduces to Example

B when a -+ oo. In particular, if the ants on this sphere started their mathematical
education with Euclidean geometry, it is much easier for them to think in terms of one
radius r and one polar angle 4.
In the end, both properties are important. This is one reason why we often need to transform
between different coordinate systems.

Problem 1
Some ants live on a 2-dimensional surface which they know to be spherical. They measure the
circumferences of two nearby concentric circles to be

and the perpendicular distance between these two concentric circles is found to be 0.001 000 200
km. Find the radius of the sphere.
This example shows that

Curvature can be determined intrinsically


on a manifold M by measuring distances.

Example E2
All the examples given above lead to expressions for ds2 which do not contain cross terms.
Although such is often the case when there is a high degree of symmetry, this is not a general
property. To emphasize this point, we now give an example which contains a cross term.

U
Figure 8

Refer to Figure 8. The coordinate grid consists of the following.


0 The u axis is horizontal, and marked with grids at a constant separation a.

0 The v axis is inclined at an angle 7 to the horizontal (7 # 7r/2), and marked with grids

at a constant separation b.
The transformation to rectangular coordinates is given by

Note that (u,v ) = (xl,x2) are the variables, while a, b, 7 are constants.

Problem 2
Find ds2 in terms of du and dv.

10.4 Riemannian geometry


Riemannian geometry is the study of manifolds on which ds2 is given by a quadratic expression
of the displacements dxp. All the examples discussed belong to this category. In this Section,
we generalize these examples and give the general expression for the distance. In the next two
Sections, we show that the Riemannian expression for distance is the most general one if the
manifold is obtained by embedding in a higher-dimensional flat space.
Consider two neighboring points P and Q, with coordinates

The distance ds between these two points is given by a quadratic expression

ds2 = Adxldxl + ~ ( d x l d +x ~dx2dx1)$ (10.18)


where the coefficients A, B are not constants, but may depend on position. Note that B is
defined so that it multiplies two off-diagonal terms.
Let us write this more systematically.

+
ds2 = g l l d ~ l d ~ lg 1 2 d ~ 1 d+~g22 1 d ~ 2 d+~ 1. - +g N N d ~ N d ~ N
Note that we have broken up the two terms associated with B. Thus

In shorthand we then have

We shall later come across many formulas involving many indices, and it can be confusing when
you first see them. It is important to concentrate on the overall structure and forget the indices;
the indices can always be worked out quite simply or by reference to a book. So to start this
habit, we write the above in the schematic form

Line element and metric


The whole subject of differential geometry begins with the quadratic expression (10.20) called
the line element. The coefficients g, are by definition symmetric. These coefficients together
form the metric tensor. Once the metric tensor is given, the geometry of the manifold has been
specified intrinsically, i-e., without reference to the "extra" dimensions of the embedding. (We
have not yet explained what is a tensor; this will be done in Chapter 13.)

Relation with special relativity


Compare with the distance expression in special relativity. We see that the line element reduces
to the case of special relativity if g,, 4 vPv,where the latter is the constant, diagonal matrix. In
fact, many formulas in general relativity can be obtained by the reverse replacement T~~ + 9,".

Problem 3
Write out g,, for each of Example A, B, C, D and E.
10.5 General method applied to the sphere
(This Section may be skipped in the first reading.)
In the last Section, we postulated that ds2 is given by a quadratic expression in dxp. We
now show that this must be the case if the manifold is em%edcledin a higher-dimensional flat
space. In this Section, we show this for the special case of the surface of a sphere; in the next
Section, we give the formalism in general. Both Sections may be skipped if you are willing to
accept the postulate for the time being.

Step 1
We start with Euclidean space in 3 dimensions, and denote the coordinates as (x, y , z). The
distance is

Step 2
Introduce polar coordinates (8,4, R ) . (We denote the radial coordinate by R, to distinguish
from r in Example D2. Also, we put the radius last, for reasons that will be apparent later.)
The rectangular coordinates are related to these by

Step 3
In view of (10.22), we have the differentials, for example

dx = (R cos 19cos 4)dB + (-8 sin 0 sin +)d4 + (sin B cos 4)dR
= AxedO + Ax4d4 + AxRdR (10.23)

For later convenience, we have introduced

ax
- 3 Axe = RcosBcosg5
a0
ax
- Az4 = -RsinBsin$
a4
dx
- EE AxR = sin B cos 4 (10.24)
aR

Problem 4
Write down similar expressions for dy and dz, and give explicit expressions for AYs, etc.
step 4
Put these expressions into (10.21). The result will be a quadratic expression in the 3 differentials
do, d#, dR. Thus there could in principle be terms proportional to de2,dBd#, etc. In this case,
it turns out that all the cross terms cancel (this is not a general property), and

+
ds2 = ~ ' d f ? ~R2sin20d42 + dR2 (10.25)
This still describes flat Euclidean space E3, but using polar coordinates. One poht should be
noticed immediately: although the line element is not of Pythagoras form, the space is still flat.

Problem 5
Derive the above expression for ds2.

Step 5
To reduce to the surface of a sphere of radius a, all we need to do is to set

R = a = constant
dR = 0

Then (10.25) becomes

This is then the distance expression (10.9) on the sphere, using the coordinates of Example C2.
There are only two coordinates left.

Step 6
The expression for the distance can be written in a way that is easier to generalize.

Note that we have dropped all terms that involve dR from the very beginning, because on a
sphere R is constant. Because the final result must be a quadratic in de and d4, we have in the
final form defined the coefficients as gee and go4 etc.

In the next Section, we shall deal with this problem in general. But as a warm-up, we
consider another example as an exercise.
Problem 6
A spheroid is defined by

Introduce a general set of coordinates r, d, a by

The spheroid is given by a = 1. Go through the above steps and find the distance expression
on the spheroid. In particular, set b = a and try to recover the result of Example D2. Also note
that before we set a = 1, the formula for the distance contains cross terms involving da dr.

10.6 Expression for distance - general discussion


(This Section may be skipped in the first reading.)
Now we can develop the above method for the general case. Every step below is the
exact counterpart of the corresponding step in the last Section.

Step 1
We start with Euclidean space in M dimensions, and denote the coordinates as (zl, . . ,zM).
The distance is

ds2 = + - + (fi M )2 (10.31)


Step 2
Introduce new coordinates (XI, - ,xM). The rectangular coordinates are related to these by

Step 3
In view of (10.32), we have the differentials

For later convenience, we have denoted the partial derivatives as


step 4
Put these expressions into (10.31). The result will be a quadratic expression in the differentials
dxl, -,dxM; e.g., there could be terms proportional to (dxl)l, dx1dx2, etc. Most generally, we
can write such a quadratic expression as

which defines the coefficients g,, . This still describes flat Euclidean space EM,but using the
new coordinates.

Step 5
To reduce to the surface of the manifold M, alll we need to do is to set the last M - N
coordinates t o constants.

xp = cp = constant
dxp = 0 , p= N+1,..-,M (10.36)

Then (10.35) becomes

,,v=o

The only differences are that (a) the sums go only up to N rather than M, and (b) the values of
the last N - M coordinates are set to the given constant values in evaluating these expressions.
From now on, the summation convention will be used on the Greek indices, and the indices are
understood to go up to N, the dimensionality of the manifold M:

Step 6
The expression for the coefficients can now be written as follows.
By comparison with (10.38), we see that

Although most of the examples we deal with in this course have diagonal g,,, the general
expression (10.40) allows off-diagonal g,, as well. In fact, an example of off-diagonal g,, can
be seen from Example E2.

Problem 7
In order to understand the abstract notation, go back to the case of the sphere S2(a). Calculate
all the transformation coefficients Ai, as in (10.24), and substitute these results into (10.40) to
obtain the explicit expressions for g,,.
We should also mention one minor generalization: in some cases (including all cases
referring to spacetime), it is necessary to embed not in Euclidean space but in Minkowski space
- otherwise there is no way to have ds2 < 0. Then some signs have to be changed in (10.24)
and in a few subsequent places, but otherwise the results (including (10.38) without any sign
changes) remain valid.
In the rest of this Chapter, we give some examples of metrics that are important in
general relativity.

10.7 Homogeneous manifolds


In the next Chapter we shall deal with cosmology. In cosmology we need to first describe space
by itself, i.e., forget time for the moment. Cosmological models of space have the following
properties.
0 It is 3-dimensional.

0 It is homogeneous: all points P are equivalent.

0 It is isotropic: fIom a given point P, all directions are equivalent.

The last two conditions are very stringent, and there are only three possibilities, which we
describe below.

Closed manifold
If a Zdimensional manifold is (a) closed, (b) homogeneous, and (c) isotropic, then there is only
one possibility. We take 3-dimensional Euclidean space E3, and in it embed the surface of a
2-dimensional sphere S2(a). The only remaining degree of freedom is the radius a of the sphere.
This has been described in Examples C and D.
Now the generalization is obvious. We need a 3-dimensional manifold that is (a) closed,
(b) homogeneous, and (c) isotropic. Again there is only one possibility. We take Cdimensional
Euclidean space E4, and in it embed the surface of a 3-dimensional sphere S3(a). The only
remaining degree of freedom is the radius a of the sphere.
To describe points on S3(a), we can either use (a) three angles, or (b) one length r and
two angles 8,4. The latter is closer to our Uusualnthinking - because in the limit a + oo, it
clearly reduces to flat bdimensional space. By analogy with (10.16), it is easy to see that the
line element is

I 1
We note the following features.
0 The parameter r sets the scale for all distances in the tangential directions (i.e., associated

with dB and d#); so again, r is the circumferential radius.


0 But the radial separation is not dr, so the manifold is curved.

0 It is not immediately obvious that this manifold is homogeneous and isotropic, but this

property becones apparent when we realize that this is the surface of a sphere.
0 One can guess that the manifold is finite and closed, because (10.41) suggests that there

is a maximum value of r, i-e., r = a. This argument is not a proof; the singularity could
be a singularity of the coordinate system. But this property is easily proved by showing
that this is S3(a).

F l a t manifold
An even simpler possibility is flat 3-dimensional space, i.e., the "usual" E3 that we learn about
in high school Euclidean geometry. The line element in polar coordinates is

which is the same as (10.41) if we set a = oo. This is not surprising - the surface of a large
sphere is nearly flat. In fact, we can write both of these cases as

where k = a-2 > 0 for the closed manifold, and k = 0 for the flat manifold. This clearly shows
that the latter is a limiting case of the former.
This then suggests that the case k < 0 should also be possible, which gives the third
possibility.

Open manifold
If k = -Ik(, we can put -kr2 = +lklr2 + ( r / ~ ) Then
~ . the metric is
The embedding description of this manifold is a little complicated, because we need to start
with a Minkowski space. However, it is simpler to regard this as the "continuationn of the
previous two cases, so that it will "inheritn most of the relevant properties.
This manifold is again homogeneous and isotropic.
But (10.44) shows that there is no limit to the value of T ; the manifold is infinite and
open.
If a = oo,we again recover the flat manifold. The situation is illustrated in Figure 9.

flat

open closed
I

Figure 9

Uniform treatment
It is convenient to write the three cases in one unified way. We introduce the parameter
K = +I, 0, -1 to denote the three cases, and put the line element as

1 dsY =
1 - I<(r/a)2
+ r* (do- + sin' 0 dip')

Note that in the case of the flat manibld, we have introduced an arbitrary length parameter a,
which does not matter since it appears only in a term multiplied by I< = 0. The parameter a
in all cases will be called the scale parameter.
We shall later deal with an expanding universe, i.e., a increasing with time. The positions
of the galaxies will expands with it, i.e., r will increase together with a. For this reason, it
is often better to concentrate on the ratio r / a , which should be constant. To anticipate this
development, we introduce the new coordmate r" = ria. In terms of this, we have
so that the line element becomes

which has the advantage that the scale parameter a is factored out.

Robertson-Walker metric
Example F2
To go to the description of spacetime in cosmo!ogy, we have to add two ingredients.
There is an additional coordinate, namely "time" t . Because the space is homogeneous,
the time elapsed is the same in all places. (This is a heuristic statement, but essentially
correct .)
0 The scale parameter a depends on t, i.e., there could be expansion or contraction.

So the line element, now on a Cdimensional manifold, is

where the first term describes the time elapsed. If K = 0, this reduces back to the Minkowski
metric we studied in the earlier part of this course; the only effect now is that the spatial part
may be curved. All the next Chapter will be devoted to discussing how a ( t ) varies.

10.8 Other examples


Schwarzschild metric
Example G2
In electromagnetism, the simplest situation is the field generated by a point charge q - which
is also valid outside any spherical distribution of charge. In general relativity, the corresponding
situation is the gravitational field or metric generated by a point mass M - which is also valid
outside any spherical distribution of mass, e.g., outside a star. We shall simply write down the
metric here; the derivation of this metric (and indeed also of the Robertson-Walker metric) will
be given in Chapter 16.

Here G is Newton's gravitational constant. Although we do not give the derivation here, we
shall at least describe some of the properties.
Far away ( r + oo), this approaches flat Minkowski space.
The last two terms involving d0 and d# measure tangential distances, and as usual, identify
r as the circumferential radius.
The radial separation is not dr. It has a similar structure as (10.12), but with a specific
form of f (r). Since f ( r ) # 1, space is curved.
The f i s t term is not simply dt2. The meaning is as follows. If-we increase t by 1 unit,
the proper time elapsed is not the same in all places. In other words, clocks move faster
or slower in different places - a consequence of gravitational redshift.
The departure from flat Minkowski space is given by the ratio E r GMIr, This is discussed
below.
Consider a test particle of mass rn at a distance r from a star of mass M, and use
Newtonian mechanics. We have
Potential energy = -GMm/r
Rest energy = mc2 = m
Hence we see
magnitude of PE
E =
rest -energy
When this ratio is very small, spacetime is almost %at.

Weak fields
Example H2
If the gravitational field is weak, we should be able to describe the situation entirely in terms
of the Newtonian potential @ (such that m@is the potential energy of a test particle of mass
m). It can be shown that in this case, the appropriate line element is

where the spatial part has been written in rectangular coordinates, which is often convenient.
This form is valid to first order in @, and is applicable in the solar system, or close to earth.

Problem 8
Consider the earth as a test particle in the field of the sun, and estimate the order of magnitude
of cf, (expressed without dimensions).

Consider the Schwarzschild metric and identify @ = - G M / r . I f we reduce (10.48) to


first order in a, we should get the weak field case. But in fact, if we do this naively (i.e.,
+
assuming r2 = x2 $ Y 2 z2), we do not get (10.50).

Problem 9
+
Show that the Schwarzschild metric reduces to (10.50) to first order if we identify x2 y2 $ z2 E
R2, and R = r - G M .
ChlO-2.tex; January 5, 1998
11 Poor Man's Cosmology
(Earlier versions of this Chapter dealt with the classical model of cosmology without a cosmologicd
constant or the idea of idation, because these concepts were too tentative. However, recent advances
have made these ideas much more reliable. Accordingly, this Chapter has been totally rewritten in
2001 to reflect the new understanding, in order to bring students taking even this introductory course
dose to the frontier of research.)

11.1 Introduction
The problem of cosmology
Cosmology studies the universe in th ge, i.e., eraged over large distances, and is especially
concerned with two questions.
The spatial structure of the universe is characterized by two parameters: (a) a discrete
parameter (K = 1,0, -1) indicating the topology (spatially closed, flat, open); (b) a
continuous parameter a indicating the "size of the universe".
0 The temporal development is described by the t dependence of a(t); this includes the
history (e.g., the origin and age of the universe) and the future.

Distance scales
We first give a rough idea of the distance scales. To each characteristic distance L we also
associate a typical time TL = Llc. All numbers given below are orders-of-magnitude only.

I I L i n m I Tr, - in s 1
at om lo-1o 3 x 10-l9
human being- 2 6 x lo-'
radius of earth 6 x106 2x10-'
distance to sun 1.5 x lo1' 5 x lo2
I
L
I I

distance to closest stars 1 3 x 1016 1 1 x lo8 1


size of galaxy loz1 3 x 1012
distant clusters loz4 3 x loi5
"size of universe" loz6 1 3 x 1017 1
0 We shall talk about features averaged over distances large compared to the size of galaxies,
in fact, large compared to the separation between galaxies, say loz3 m lo7 light years.
0 The "size of the universe" (more appropriately the Hubble distance to be defined below)

0
is about loz6 m 1010 light years.
-
The corresponding characteristic time is 10'' years (10 Gyr) or 3 x 1017 s.

Limitations of present approach


This Chapter gives an introduction to cosmology using minimum mathematics and without the
full apparatus of general relativity - the "poor man's" approach.
Although spacetime is described by Einstein's equations, we caa use mostly Newtonian
ideas, by focusing on a small volume of space. This gives correctly the effect of the gravity
of matter.
0 It turns out that vacuum may generate an additional effect with two peculiar features:
(a) it is repulsive, and (b) the density p' is constant as the universe expands. This effect is
known as the cosmological constant A, and sometimes in the popular press as dark energy
(not to be confused with dark matter).
0 These two effects -the attraction of matter and the repulsion of vacuum -determine the
acceleration. When the equation of motion is integrated once, the constant of integration
K' is like the total energy in one-dimensional motion. We need ooe result from general
relativity: K t is the same as the constant K describing the spatial structure. We simply
state this result ( a d discuss its significance), but the derivation will have to wait till
Chapter 16.

11.2 Observational evidence: Homogeneity and expansion


Isotropy
If we observe the universe in the large (beyond the local cluster),

The universe looks the same in all directions - it is isotropic.

Thus the situation in Figure l a is allowed, but the situation in Figure l b is not allowed. Here
the dots schematically represent galaxies.

Figure 1

Principle of cosmology
Although we can observe the universe only from one point P, we believe in the principle of
cosmology:

Our position is not special.


Therefore the universe must also be isotropic when viewed from any other point Q. Thus the
situation in Figure 2a is allowed, but the situation in Figure 2b is not allowed - both are
isotropic from P, but only the former is isotropic from all other points Q as well.

.,. .- .. .. .- .- .-- . '. ... .


.......
. . . . .. .. '. .. .. .. .. . -
' d *
a # . -

-. ... . .-:p/, .: . . . ...


..
1 ' -

.. ..
*- * -
. . .
I . ' ,

. . -- * - . . * . * -
*

.
*..
. . . . . . . ;.. . b ' . ' .
4 .

' . . = .......
' : . - I . .

Figure 2

Homogeneity
So isotropy together with the principle of cosmology implies that every point is equivalent:

e The universe is homogeneous.

Expansion - Hubble's constant


By the 1930s, observations have shown that all galaxies are moving away from us. Moreover,
for nearby galaxies, their velocities v are proportional to their distances s from us (Hubble's
law); see Figure 3 (the Hubble diagram). Thus

where H is Hubble's constant. The unit of H is [velocity]/ [distance] = [time]-'. It is convenient


to define T r H-l, so

Although H is called a constant, actually it may change with time; Figure 3 only shows
what we obtain now - it could have a different slope a billion years ago (see below). Thus
H = H(t) and T = T(t). We adopt the convention that now is t=O and all quantities at t=O
are denoted with a subscript 0: H(t=O) E Ho.

Figure 3

We next indicate how distances, velocities m d Ho are determined observationally.

Problem 1
A certain galaxy A is known to be of the brightest type, and its absolute luminosity (energy
emitted per unit time) can be assumed to be L = 1.45 x W (a "standard candle"). Its
apparent luminosity (energy received on earth per unit time per unit area) is measured to be
1 = 1.00 x lo-'* W m-'. Find its distance s from us. [Hint: 1 = L/4xs2.]

Problem 2
(a) If a galaxy is receding from us at a velocity v << c, find an expression for the red-shift
parameter z in terms of v / c , where z is defined by

Here A, is the wavelength of an optical line when emitted and X is the red-shifted wavelength
that is received. [Hint: the angular frequency and the wave number form a $-vector.]
(b) For galaxy A, the red-shift is measured to be z = 0.1. Find v.

Problem 3
Using the above data, estimate Ho and To Hi1.

Astronomers do things slightly differently in two trivial respects.


0 Instead of Figure 3, they plot the red-shift parameter z (closely related to the velocity v)

versus the apparent magnitude m (cx log 1 or logs, assuming L is fixed).


0 Velocities are expressed in km s-l and distances in Mpc (1 pc = 3.26 light years). Thus,

the conventional unit of H is km s-l M ~ C - ' .


The most important "standard candles" until recently are Cepheid variable stars. These
stars pulsate; the larger they are, the slower they pulsate, and also the brighter they are. So the
observed period in the variation of light intensity can be used to infer the absolute luminosity.
The earliest estimates by Hubble in the 1930s gave Ho of about 500 units - way too
high. Decades of improvement have resulted in estimates of 50 - 100 units, still a factor-of-two
uncertainty. However, recent measurements1 v 3 i 4*5 have pinned it down to f10%:

Ho E 70 km s-I M~C-' , To E 14 Gyr (11.3)

Problem 4
As a simple exercise in the conversion of units, evaluate To from Ho.

Astronomical measurements come with considerable mcertainties, but to keep this in-
troductory account simple, we shall use the above value without quoting ranges around it.
The parameter Hosets the scale for all quantities in cosmology. Thus, the characteristic
time scale is the Hubble time To = Hi1, and the characteristic length scale is the Hubble
distance Lo = cTo. Using these scales, all relevant physical quantities can be expressed in
dimensionless parameters, many of which we shall discuss below.

11.3 Kinematics
In this Section we discuss the kinematics, i.e., the description of how various quantities change
with time.

Changing H - Analogy
Although s = u T , the motion is not uniform, as illustrated by the following analogy.

Problem 5
A collection of beads i = 1:2, - .. all have the same mass m and are projected to travel hori-
zontally in the same viscous medium, with retarding force -kv, where v is the velocity and k
is the same for all the beads. However, the initial velocities of the beads are different.
(a) Show that the equation of motion is

(b) Hence show that the velocity and position of particle i at time t are given by (7= k/m)

(c) Compare diflerent particles at the same time - plot v; against s; at fixed time t. Show
that the plot is a straight line, i.e., H = v / s is independent of i, and express H in terms of 7
'W. L. Freedman et al., Astrophysical Journal, 533, 47 (2001)
2http://www.journals.uchicago.edu/ApJ/journal/issues/ApJ/v553nl/524l7/524l7.web.pdf
3http://xxx.lanl.gov/astro-phy/9801080
4W. L. Freedman, Scientific American,March (1998)
5http://www.sciam.com.specialissues/0398cosmos/O398freedman.html
and t. Does s cc v indicate uniform motion? Explain. Is T = H-' the same as the "age" of the
system?
(d) Compare the situation at di8erent times. To do so, introduce an arbitrary velocity scale V
(say the mean value of x).
Show that the distance and the velocity can be written as

where Zi = K/V is independent of time, and the universal scale parameter

is independent of i. Sketch a(t) versus t. Think of. a(t) as a dispIacement. Does this graph
indicate uniform motion?
(e) Show that H(t) = a(t)/a(t). On the sketch of a(t) versus t, draw the tangent at the time t
and relate T = H-l to the horizontal intercept. Show graphically that if there is deceleration
(acceleration), then T is more (less) than the age of the system.

With this analogy, we can now consider the problem of cosmological expansion, and
focus attention on a universal scale factor.

Spatial s t r u c t u r e
Because we believe the universe to be homogeneous, i.e., all points are equivalent, only three
types of spatial structure are allowed, namely, closed, flat or open, corresponding to K =
1,0, -1; see Chapter 10.
To make this Chapter self-contained, we give a qualitative discussion of the spatial
structure. For simplicity, we reduce to an analogy in one lower dimension - Zdimensional
space.
The flat case (IC = 0) is like a flat piece of paper. On such a flat piece of paper, if we
draw a circle of radius r l , the circumference is exactly 2 ~ 7 - l .
The closed case (IC = 1) is like the surface of a sphere. To generate a spherical surface,
we stand at one point on the piece of paper, and bend the paper down in all directions.
The circumference of a circle is less than 27rrL. Such a surface is characterized by a radius
of curvature a, which is just the radius of the sphere.
The open case ( K = -1) is like the surface of a saddle. To generate such a surface, we
stand at one point on the piece of paper, and bend the paper up along the x-axis, and
down along the y-axis. The circumference of a circIe is more than 2xrl. If we take a cross-
section along either axis, there is a radius of curvature a. (However this description is not
exactly right, since not all points of a saddle are equivalent - unless we embed the surface
not in 3-dimensional Euclidean space but in 3-dimensional space with a Minkowski-like
metric.)
Scale parameter
Consider a flat spatial structure as an example. Space (or a Zdimensional analog) is like a
sheet of rubber that is being stretched. All distances are magnified by the same ratio as time
increases. So draw a spatial coordinate grid that also expands with the universe (Figure 4).
Since all lengths expand together

The galaxies are "stuck" onto the coordinate grid.

Let one grid (or any fixed number of grids) be a; this changes with t,ime, so a = a(t). D' stances
r can be expressed as

T = fa(t)
and the statement that galaxies are %tuck" to the coordinate grid means that

The reduced coordinates ? are independent of time.

The reduced coordinate is analogous to 5 in Problem 5, and to r" in Chapter 10. It remains to
consider the evolution of a single function a(t) - the scale factor of the universe.

Figure 4

However, notice that there is an arbitrary multipiicative constant in a(t). (a) In Problem
5, the velocity scale V is arbitrary. (b) In the present discussion, a can be one grid, or two grids,
etc. Thus we have the freedom a(t) t Ba(t), and physical quantities must be independent of
the arbitrary scale B. We shall fix the scale only when we come to (11.16).
General form
As the universe expands, the possible behavior of a(t) is shown schematically in Fi,me 5.
a If gravity is very strong, the expansion will have a maximum and the universe ultimately
contracts - eventually back to a point. (Curve 1)
In the marginal case, gravity slows the expansion to zero rate eventually. (Curve 2)
a If gravity is weak, the expansion goes on forever. (Curve 3)
a If for any reason there is net repulsion, then the expansion accelerates. (Curve 4)

Figure 5

Hubble constant

Figure 6

Because of (11.4), the distances si to galaxy i and the corresponding velocities vi are given by
(compare Problem 5)

From this we obtain two definitions of H, namely


where

u
0 The first of these is just the same as the observational evidence in (11.1) or Figure 3, and
allows H to be determined from data at one time.
The second relates H to the evolution of a(t). In particular, it allows H to be determined
from the time T = H-I illustrated in Figure 6,

Deceleration p a r a m e t e r
Until recently, it was believed that the expansion is decelerating (i.e., cases 1 to 3, but not 4))
because gravity is attractive. So conventionally we talk about the deceleration -a, or better
yet the dimensionless deceleration parameter q defined by

PI q = --

Note that q can change with time, and the present value is denoted as qo.
(11.8)

Problem 6
Let q = -aaman.
(a) By requiring q to be dimensionless, determine rn and n.
(E) Show that the freedom in scaling a(t) I-+ Ba(t) does not change q.

1 Determination from t h e data: Conceptual definition


Conceptually, we can determine H(t) for all past times and hence aiso the deceleration. This
is illustrated in the next problem.

Problem 7
(a) In Figure 6, the tangent is drawn for the "present", t = 0. Draw another tangent at an
earlier time t = -At. In which case is T larger? In which case is H larger?
(b) To be more quantitative, prove that

Thus, in principle, q can be determined if we know H in the past.


(c) Suppose that present measurements give T(0) = 14.0 Gyr, and an earlier civilization on
earth left a record that 0.1 Gyr ago, T(-At) = 13.8 Gyr. Assume there are no errors in the
data. Estimate qo.
Actual determination
But we do not have past records. We give a much simplified and qualitative account of the
actud determination of deceleration.
e The slope of the Hubble diagram ( v versus s) gives H.
0 In particular, the slope near s = 0 (i.e., for the nearest galaxies) gives the present value

Ho .
0 Light from distant galaxies was emitted a long time ago. Therefore the slope for the
distant part of the Hubble diagram gives H in the past.
For example, if H was larger in the past, then the earlier data would have a larger slope,
as shown in an exaggerated w2y in Figure 7.
0 So the deviation of the Hubble diagram upwards or downwards determines q.
0 Because there are other minor complications arising from the propagation of light in an
expanding universe, the Hcbble diagram is not strictly linear even if there is no decelera-
tion. Moreover, in p~acticeone does not plot s versus v but magnitude versus red-shift z
(or Inz). So the deviation does not refer to the deviation from a straight line, but from
the theoretical line corresponding to q = 0.

Figure 7

The modern methodology is more refined: Measuring the slope for every part of the
Hubble diagram in principle gives H for all times in the past. In practice, this is too difficult;
so we assume a form of a ( t ) (and hence of H ( t ) ) with a few parameters, and fit the Hubble
diagram to determine these parameters. But first we need a model of the dynamics.

11.4 Dynamics
Newtonian formalism
Consider a small part of space, within a distance r = r'a(t) of the origin, with i: << 1. Assume
all particles are L L s t ~ to n coordinate grid, so ? is independent of time. Such a small part
~ kthe
of a curved surface is nearly flat, and Newtonian concepts apply. A test particle m at the edge
of this part of space (Figure 8) satisfies
where M ( r ) is the total mass inside this small part, and p is the density. Here we have assumed
as usual that all the mass inside acts as if it is concentrated at the origin and all the mass
outside has no effect. But using r = r"a(t), where r" is independent of time, we find that ?
cancels:

Thus we get an equation for the scale parameter a(t) without reference to f . This should be
expected - r" is an arbitrary choice, and should not appear in the final result.

Figure 8

Vacuum repulsion
However, Einstein's equations allow aoother effect: Vacuum contains another density p1 which
repels, with a "gravitational" constant GI. Thus (11.1-0) becomes instead

Matter domination
As the universe expands, p decreases inversely as the volume:

In the earlier history of the universe (when radiation dominates), this would not be true;
instead p would then scale as la)^. For this reason, (11.12) is called the assumption of
matter domination (as opposed to radiation domination).
Problem 8
If the density p is due to radiation (i.e., photons), then

p = ( no. of photons / vol x 7iw (11.13)


As the universe expands, wavelengths expand proportionately - the nodes of an electromag-
netic wave axe also stuck to the coordinate grid. Hence derive the property that p = po(ao/a)4
for radiation.

Cosmological constant
On the other hand, the density p' has always has the same value - vacuum does not get
"thinned" as the universe expands. Thus p' = constant and we d e h e the cosmological constant

A = 4.rrG1p' (11.14)
The individual factors G' and p' are introduced only for a heuristic discussion, and have no
meaning.

Equation of motion
Putting (11.12) and (11.l4) in (11.11) then gives

This can be thought of as the Newtonian equation of motion for a unit mass, subject to an
attractive inverse-square force oc -a-2 and a repulsive Hookian force oc +a; the latter is like a
spring with the opposite sign. The conservation of energy then leads to

The second and third terms on the left are the potential energy and 11"/2 is the total energy.

Problem 9
Show that integrating (11.15) gives (11.16). Or, equivalently and more simply, show that
differentiating (11.16) with respect to time gives (11.15).

Result f r o m general relativity


Both (11.15) and (11.16) are valid for any definition of the scale factor a, i.e., a being one grid,
two grid, etc. But K' cannot have a simple meaning if a is arbitrary. Henceforth we define a
not to be any scale factor, but the "size" of the universe, in the sense that the spatial metric is
(see Chapter 10)
The physical interpretation is simplest when K = $1 (topology is the surface of a
sphere), then a is just the radius of the sphere. If I< = -1 (topology is like a saddle), then
a is the radius of curvature, namely the distance where the curvature becomes significant.
Incidentally the scale of a remains undefined (and unimportant) for K = 0.
Now we quote, without proof, a key result from general relativity:

The constant K' is exactly the same as the constant I<.

Note that I< gives the spatial structure whereas I<' affects the temporal development; general
relativity relates the two. This used to be an important result; however, accoding to current
understanding, this is now no longer important, even though it is still correct (see below).

Simplifying the equations


We now simplify the acceleration equation (11.15) and the velocity equation (11.16) as follows.
Henceforth write I< instead of Kt.
Choose the present size of the universe as the unit of length, i.e., define the dimensionless
size variable

Choose the present Hubble time as the unit of time, i.e., define the dimensionless time
variable

Define dimensionless const ants

Then the two equations can be written as

The omitted proportionality constants in (11.20) are to be chosen so that these two equations
take the simple form as above, and in fact
Problem 10
Derive (11.21) , (11.22) and (11.23).

Problem 11
Show that
(a) dR(r=O)/dr = 1;
(b) (1/R)dlR/dr2 = -(H/Ho)2q in general and
(c) (1/R)d2 R(r=0)/dr2 = -go.

Hence, evaluating at the present, (11.21) and (11.22) become

-
The simple form of (11.25) explains why we chose the constants in this particular way. Because
of (11.25), there are only two independent constants, and it is conventional to represent all
possibilities on the OM-flAplane. We need two pieces of input to determine their values.
The key equation (11.25) refers to the present. We can write a similar one for any other
time. To do so, go back to (11.22) and note that the LHS is (H/Ho)2. Moving this factor to
the other side, we get

We should imagine that these parameters were "set" as initial conditions at a very early
time r * by other processes, e.g., inflation (see below). Therefore eventually we should try to
explain the values flM(r*),flA(r*)and OK(r*)rather than the present ones. The latter have
simply evolved from the "originaln ones.

Interpretation of t h e constants
From now on we refer to RM, Ra, OK instead of Gpo,A, K. So it is necessary to have some
physical interpretation of these dimensionless constants.
OM gives the amount of "normal" gravitational attraction due to matter; it causes decel-
eration.
OA gives the amount of repulsion due to the cosmological constant; it causes acceleration.
flK relates to the spatial structure, as explained below.
First consider the sign of OK. Since OK cx -K, space is closed, flat or open according
to whether OK is negative, zero or positive. Next consider its magnitude (irrelevant in the case
K = 0). We have
The age of the universe is to - To,and the size of the universe is ao. Thus

age of universe
size of universe
But light from any galaxy farther than do would not reach us. Thus

O KI -( size of visible universe


size of universe
(11.29)

If laK -
1 1, then most of the universe is visible. If InK
I << 1, then only a small part of the
universe is visible, and this small part will appear as nearly flat.

T h e big question
The big question in observational cosmology is: What are the values of these parameters? It
turns out that our belief has gone through three main stages, as explained separately in the
next three Sections.

T h e mass p a r a m e t e r
Previously, people tried to obtain iIM by "counting" the arnoxnt of matter. Vie can estimate
the mass of galaxies, and how far they are from us (using the Hubble relationship). Thus we
know the density of matter in the universe, po. Together with Ho, this would give O M . This
method gives approximately (with very large errors)

po - kg rn-3 , flM N 0.1 (11.30)


Much effort went into refining the estimate, but in the 1970s and 1980s, it was increasingly
recognized that there is a lot of dark matter in the universe - which would be missed in any
cccountingy'.Thus, "counting" is now abandoned, except for two purposes:
0 The luminous contribution as in (11.30) gives a lower bound to f l M .

0 If we have another way of determining f l M , then this tells us the percentage of matter in
the universe that is dark.
A topic of current research interest is: what is the nature of the dark matter, and how
is it distributed in the universe?

O t h e r effects of t h e cosmological constant


The cosmological constant A describes a new type of "gravitational" interaction. Can we search
for its effect in other situations? Consider the example of the earth in orbit around the sun.
The "normal" gravitational effect is GM/r2, where M is the solar mass and r is the
distance of the earth to the sun, i.e., 1 AU. The new effect is G'p'r; note this increases with the
size of the system because the density rather than the mass is constant. The ratio is
G'p'r --N
G'p'r3 G' p'
Ratio = - -
GM/r2 GM Gp,
-
where p, = M/(47ra3/3) loe3 kg m-3 is the effective density of the solar system if we imagine
the mass to be distributed uniformly within 1 AU.
On the other hand, we shall see later that that O M and On are the same order (why this
should be the case is a separate issue; see below), so

Dividing these equations we get

Ratio N
po
- - kg rnW3
10-3 kg m-3
p,
_
Thus the cosmological constant cannot be detected in small systems.

11.5 The belief in the 1920s


Steady state
When Einstein first proposed the general theory of relativity in the 1 9 2 0 ~it~was not known
that galaxies are moving away from us. People thought that the universe was in a steady state.
Therefore both the acceleration and the velocity are zero.

Value of the parameters


However, we need to be careful on one technical point. If all velocities were zero, then H =
v / s = 0, and we cannot use Ho = 0 to set the scale in the definition of OM, an,OK. Rather,
we take Ho to be some arbitrary scale, say Ho = (10 Gyr)-l. The L H S of (11.25) is (daldt)' in
such dimensionless units; therefore it should be 0 instead of 1. Hence (11.24) and (11.25) give

Thus

and in fact all three are of order unity.


Thus Einstein concluded that
a there is a significant positive cosmological constant; and
a the spatial structure of the universe is closed.
In retrospect the steady-state model is unlikely.

Problem 12
Refer to (11.16) as the conservation of energy. The terms associated with OM and are
the potential energy V(a). If there is no acceleration, then a is at an equilibrium position:
VJ(a) = 0. Sketch V(a) and state whether this is a stable equilibrium.
Of course the cosmological model is only approximate. Any deviations from an unstable
equilibrium will grow rapidly; it is unlikely that the universe remains static in a position of
unstable equilibrium.

11.6 The belief in the 1960s


Expansion observed
In the 1930s, Eubble observed the recession of galaxies and established Hubble's law (Figure
3). This shows that the universe is not static, but expanding.
When Einstein learnt of this result, he said that including A was the biggest mistake he
ever made. He said that he should have believed in simplicity and dropped A, then he could
have predicted that the universe expands, before it was discovered by Hubble. He was only
partially correct, because we now believe that (a) the universe is expanding, but (b) A # 0.
(See the next Section.)
However, the idea of an expanding universe was so contrary t o entrenched belief that,
for a long time, skeptics engaged in a series of rear-guard action.
0 Some questioned whether the observed red-shifts were due to the Doppler effect. Could

these be gravitational red-shifts instead? But it is difficult t o understand why gravita-


tional red-shifts would be correlated with distance.
Hoyle and others tried to reconcile expansion with a steady-state universe. He claimed
that as the universe expands, matter is created, so that the density of matter remains the
same. This theory of continuous creation ran into so many difficulties that eventually it
was given up.

Big bang
If the universe is expanding, and we go back in time ("reverse the movie"), then the universe
would contract. The contraction would end up in a "point": t h e universe must have begun by
expanding from a tiny volume and a huge density - the big bang. (Although the name is now
established, it is misleading: there was never any "explosion".) The time from the big bang to
now is the age of the universe. I

Cosmic microwave background


More importantly, as we "reverse the movie", the universe is getting compressed and like a
gas, becomes very hot, with a lot of black-body radiation. The amount and spectrum of black-
body radiation at these early times are determined by the temperature alone. Then, as the
universe expands, the radiation cools - but cannot disappear. So there should be scme cool
(temperature a few kelvin) black-body radiation left in the universe today.
In the 1960s, such radiation was discovered (entirely by accident). Its temperature was
found to be about 3 K; now it has been determined with great accuracy to b e 2.73 K. At this
temperature, the spectrum peaks at a wavelength of 2 mm, namely in the microwave range;
N

so the phenonemon is now called the cosmic microwave background ( C M B ) .


Why should there be such identical radiation coming from everywhere in space unless
the various parts of space were once in close contact (more about this later)? This discovery
and its interpretation killed the steady-state models and established the big-bang model; it was
also assumed (actually on no solid evidence) that StA = 0.

Value of the parameters


The key equations (11.24) and (11.25) then become

The first relation (11.36) has the simple interpretation "acceleration = gravitational
force". The second equation tells us that the universe is spatially closed (aK< 0) if and only
if !IM > 1 (there is "sufficient" mass in the universe), or equivalently, qo > 112 (the universe is
observed to be decelerating at a sufficient rake). Therefore much effort went into the estimation
of qo. The best that could be said was

which was not very helpful in settling whether qo > 112. The estimate was poor because only
relatively nearby galaxies could be measured, so qo came from comparing the slopes in two
nearby portions of the Hubble diagram, and it was therefore difficult to see how much the curve
bends.
Incidentally, the critical value qo = 112 can be understood from a purely Newtonian
example (which of course does not contain the cosmological constant).

Problem 13
Throw a ball of mass m upwards from the earth. Denote the distance from the centre of the
earth as a . Define q in the same way as before. Show that the ball will escape if and only if
q < 112.

Problem 14
Show that StM can be written as po/p,, where p, is the critical mass density to make the universe
closed. Give an expression for p, in terms of G and Ho, and give the numerical value based on
Ho = 70 units. (However this result is valid only if CIA = 0; therefore the critical density is no
longer much referred to.)
11.7 The belief in the 21st century
Value of the parameters
Current belief can be summarized by

It is believed that Q K is very very small, and for all practical purposes can be taken to be
exactly zero. The uncertainties in flM and ahare at least 0.1. But i t seems definite that On is
non-zero and positive.
The qualitative conclusions are as follows.
0 The universe is nearly flat ( a K m 0).
9 The cosmological constant is more important than "normal" gravity.

0 There is net repulsion and thus the expansion is accelerating.

0 The universe will therefore expand forever.

We shall next explain the evidence for these beliefs, and also some of the consequences.

Homogeneity problem (Horizon problem)


The CMB coming from any direction in space is a black-body spectrum, and can be character-
ized by a temperature T (x 2.73 K). There are however tiny inhomogeneities ATIT.
Overall, ATIT lW3.
N

0 Most of this is due to our absolute motion in the universe: the motion blue-shifts (increases

T )the radiation in front and red-shifts (decreases T) the radiation from the back. We can
transform to a frame to remove this effect; incidentally this gives a good measurement of
our absolute velocity.
0 In the new frame (at rest with respect to the universe as a whole), the remaining inho-

mogeneity is A T I T < due to thermal fluctuations. But the mystery is: why are the
fluctuations so small, or in other words, why is the CMB so homogeneous?
The problem and its resolution can be explained heuristically by an analogy. Suppose
you are given a temperature chart (without scale factor) of a part of HK: it could be nearly all
of HK (say 10 km x 10 km); a district (say 1 km x 1 km); a street (say 100 m x 100 m); all
the way down to a part of a room (say 1 m x 1 m); or even a tiny part of a room (say 1 mm
x 1 mm). There must be a relation between ATIT and the scale - in particular, if the chart
shows ATIT < it must be the chart of a tiny portion of HK, probably as small as 1 mm
x 1 mm. The reason is that we expect the thermal processes that homogenize the temperature
to operate only on small scales. (To make this vgument more precise, we have to estimate the
distance that effects can propagate - the horizon - and the size of the universe at the time
when the CMB was emitted by matter.)
Since all of the visibile universe (i.e., all that we can see) has a highly uniform temper-
ature, we conclude that it must be a tiny portion of the whole universe. Referring to (11.29),
this gives
Size of the universe
This in turn means that the size of the universe is

A more stringent bound will be given later.


We note that (11.21) and (11.22) only give the size relative to the present. The actual
size must be multiplied by the scale (11.41), and it is very large.

Spatial structure no longer important


A major question of the 1960s was the spatial structure of the universe: Is the universe (a)
cbsed ( K = I), like the surface of a sphere of radius ao; (b) flat (K = 0); or (c) open ( K = -I),
like a saddle with a radius of curvature ao?
However, this question now becomes essentially irrelevant. First, from the temporal
point of view, the evolution is governed by O K ;but its magnitude is so small that it does not
matter whether it is positive, negative or exactly zero. We may as well take it to be exactly
zero. Second, from the spatial point of view, we are (reducing the dimension by 1 for a simple
analogy) like ants living on the surface of 2-dimensional surface. However, we can only see a
tiny part of the surface (the visible universe). Then we cannot tell (and do not care about) the
sign of the curvature, whether it is a sphere, flat or a saddle. We may as well assume it to be
flat.

Flatness problem
The universe is nearly flat in the sense that OK is small. More precisely, (11.25) can be
interpreted as three contributions to the KE (the LHS), of which the curvature term KtK is by
far the smallest. As a ratio,

-But we can make a similar analysis at an earlier time T, in fact the time T* when these
parameters were "set". Then the appropriate equation to look at is (11.26) and the relevant
ratio is

where R = R(r*) << 1. This ratio is many orders of magnitude smaller still, and we would have
to explain this very very small number E(T*).This is called the flatness problem.

Inflat ion
Both the homogeneity problem and the flatness problem have to do with: Why is the size of the
universe so large (compared to its age)? The currently accepted solution is that the universe
at an early stage went through a short period of very rapid (in fact exponential) expansion,
called inflation. Quantum field theory suggests that inflation probably occurred from s to
s after the beginning of the universe, and during this time the linear size of the universe
increased by at least a factor ~ 1 (making 0 ~ E ( T~) -lo-'' immediately after infiation). Thus
i d a t i o n supplies the initial condition that a starts off very very large and OK is very very small.
In fact, OK is so small that we shall take it to be exactly zero, and the size of the universe to
be effectively infinite.
The physics of inflation cannot be discussed beyond this superficial level without quazt-
turn field theory.

Corroboration
It may come as a surprise that the universe is (very nearly) flat. In fact, it may be a bit of a
disappointment - perhaps we do not need to learn so much Riemannian geometry if space is
flat! (But spacetime is not flat.) In any event, is there other evidence that OK w O?
Here we come back to CMB and look at the tiny temperature fluctuations ATIT, in
particular the angular distribution -roughly speaking telling us how tightly correlated in space
are the fluctuations.
How did the fluctuations come about? For this, we have to trace back to the time when
the universe was much smaller (R N and denser, so that matter and radiation were in
contact. The density fluctuations at that time can be calculated using our knowledge of the
properties of the matter at that time (a plasma); these lead to predicted fluctuations in CMB.
How these then propagate to us now can be traced by solving for the evolution of photons in
the universe with parameters (aM, fin); different parameter values lead to different patterns of
fluctuations seen today. Turning this around, the measured pattern of fluctuations can be used
to fit the parameter values. It gives OK x 0 with a high ~onfidence.~ 171819110

Notice that CMB has contributed in three stages to cosmology, in an increasingly so-
phisticated fashion.
The very existence of CMB suggests a big bang, i.e., a time in the past when the universe
was small and hot.
The extraordinary isotropy of CMB shows that the visible universe is a small part of the
whole universe, leading to the theory of idation.
The details of the tiny fluctuations of CMB provides information on the way the universe
is structured and the way it has expanded since the CMB was created.

Supernova observations
With OK = 0, we still need one more piece of data to fix the parameters - the deceleration
parameter q or something equivalent.
Recently, the red-shifts and distances of another group of "standard candles" - a class
6Physics Today, July 2000
7P.de Bernardis et al., Nature 404, 955 (2000)
8http://~.lanl.gov/astro-ph/0005004
ghttp://xxx.lanl.gov/astro-ph/0005123
10http://xxx.lanl.gov/astro-ph/OO05124
of supernovas (SNs) - have been carefully measured.'' 112*13SNS are extremely bright objects,
so even those at very large distances s can be observed (if one can catch them during their brief
existence - and this is one of the technical breakthroughs). Light from very distant objects
came from a long time ago, so these observations probe the ancient history of the universe. As
the authors of these papers say, the results "provide a record of changes in the expansion rate
over the past several billion years".
It is conventional to express distance and age by the red-shift z, because this quantity
is directly measured. Most of these SNs are in the range z = 0.3 - 0.7, with one case as large

-
z = 0.83. Recall that wavelengths scale as l+z, together with the size of the universe. So the
light observed originated when the universe had a size relative to the present of-l/(l+z) 0.5
for the case of the largest z. Assuming nearly uniform expansion (which is correct as an order-
of-magnitude estimate), it also follows that the age of the universe was roughly half of the
present value. Thus the paper is entitled "'Discovery of a Supernova Explosion at Half the Age
of the Universe and its Cosmological Implications".
With data extending much further on the Hubble diagram, this allows the curvature of
the diagram to be determined much more accurately. Remember that the curvature is related
to the deceleration: the more deceleration, the more the curve bends upwards.
The great surprise is that the SN data show that

The expansion is accelerating: qo < 0.


I
This means there must be a repulsive force, i.e., OA > 0, and the fitted parameters (assuming
OK = 0) are given in (11.39).

- -
We can give a little bit more detail, in a heuristic way. The SN data are centered at

-
red-shifts z s ~ 0.5. Since we compare the slopes at z z s and
the acceleration at the midpoint z,
~ z = 0, in effect we determine
0.25. Recall that wavelengths scale by l+z, together with
the size of the universe. So the acceleration -q thus determined does not refer to the present,
but to a time when the size of the universe (relative to now) is R = R, (111.25). In other
words we determine the LHS of (11.24) at this value of R, yielding the combination

The coefficient of f l M turns out to be -1 by accident, on account of the value z, of the data.
The SN results give

whereas f l K = 0 gives

- -

llS. Perlmutter et al., Nature 391,51 (1998)


12http://xxx.lanl.gov/astro-ph/9712212
13http://~~w.1bl.g~v/~~pern~va
and these together give the results in (11.39).
Astronomers now regard the SN results as quite reliable, because of the cross-checks
possible. We mention only one example. Normal stars hardly change over times of 0.1 or
even 1 Gyr. In contrast, SNs have very short time scales - their luminosities change in days.
Therefore the brightness curve over time reveals the dynamics of the explosion, giving a very
good cross-check on the nature and intrinsic brightness of t h SN. Moreover, the rise and decay
of the brightness curve is time-dilated by the same red-shift factor l+z, so matching this-curve
to the LLstandard"one expected gives another way of determining z.
Although these numbers are basically right, they will no d ~ i l b be
t refined and updated
in the years ahead, and latest results can be obtained from severaI sources.14 113
The expansion is not always accelerating. Refer to (11.24); the expansion is accelerating
only for

and before the universe got to this size, there was deceleration.

Why these values?


Although we now have a good idea of what the values of QM and flA are, we do not know why.
Can we predict or understand them in some way?
It may seem LLnatural" that Q M and RA have the same order-of-magnitude, i.e.,

In fact, this argument is misleading. We defined these parameters for convenience for now, i.e.,
to make (11.25) look simple. There is nothing special about now, so the proper question to ask
is the relative magnitude of the analogous terms in (11.26)) at the very early time T* when the
initial conditions were set, when R was very very small. So we have to understand

This is very very small and is very much of a mystery at the moment.

Detailed kinematics and age of universe


If we take the parameters in (11.39), then (11.21) or (11.22) completely determines the evolution
of the universe, including the age of the universe. With the values OM = 0.3, RA = 0.7, it is
found that the age of the universe is almost the same as To,so
Age of universe is to z 14 Gyr

However, the ratio to/Tois not necessarily unity.

Problem 15
( a ) Take (11.21) with the "initial" conditions R(r=O) = 1, dR(r=O)/d~= i and integrate the
equation both forward and backward numerically. Plot R versus T .
(b) The size R becomes zero at some negative time -TO. Give an accurate value for TO. Give
the age of the universe to = H&-, in Gyr.
Do this problem for three sets of parameters (OM,CIA): (i) (0.0,1.0), (ii) (0.3,0.7) and (iii)
(1.0,O.O).

Problem 16
Another way of determining the age of the universe is to use (11.22). Show that it can be
written in the form

Cixry out the integral numerically and hence find TO as well as to ili Gyr. (The integral is not
singular at the lower limit, but a bit of care is needed for accurate answers.) Again do it for
the three sets of parameter values.

Summary
We summarize how our understanding of cosmology has evolved since Einstein formulated
the general theory of relativity. For simplicity we do not show uncertainties in the currently
accepted values.

1960s I Present

-.

I velocity
&I acceleration -q

10-20 Gyr
unimportant
14 Gyr
t"__size 10-20 Glyr

11.8 Other issues and further reading


In this short introductory account we have omitted many topics - both aspects that are now
regarded as known (but complicated and involving knowledge of other branches of physics) as
well as problems that are still open. Also we have glossed over the observational techniques
that have led to the determination of the cosmological parameters. To fill in some of these
gaps, we give a very brief sketch of the current belief in cosmology, in chronological order.

Quantum gravity e r a
At a time of s and before (see Problem 17 below), quantum gravity must have been
important.
What is quantum gravity? In classical mechanics, the position x has a definite value;
in quantum mechanics, we talk about the probability amplitude $(x) instead. So similarly,
in classical general relativity, the size of the universe a (more generally the metric g,,) has
a definite value; in quantum gravity, we talk about the probability amplitude +(a) (or more
generally $(g,,)) instead. No proper theory of quantum gravity exists at the moment - there
are too many degrees of freedom and summing over them (as you would have to do in quantum
perturbation theory) gives nonsense. Some attempts have been made t o get an approximation
by throwing away most of the degrees of freedom, and keeping only one or two (e.g., only a). The
most famous of these speculations is due to Hawking and others16 and made popular in his book
A Brief History of Time. However, this book is not recommended, as it gives a totally wrong
impression about how scientists actually approach these problems. Many alternate proposals
have been given, including one by Suen and Young.17 However, none of these are likely to be
correct, because there are too many unknowns before we get to such early times and small sizes.
Works such as these are just attempts to push our current theories as far as posssible, as a way
of exploration.

Problem 17
The typical length scale lp where gravity becomes essentially quantum is called the Planck
length, and the corresponding time is the Planck t i n e tp = epic. We expect lpto depend only
7 . expressions for lpand tp and evaluate their values in MKS
on G, fi and c: lp= ~ " f i ~ c Find
units.

Inflation e r a
At s to S, inflation occured. For this era, we need quantum field theory but not
quantum gravity, i.e., spacetime can be regarded as classical, but all other particle degrees of
freedom have to be treated like quantum fields. This is an exciting testing ground where high
energy physics and cosmology intersect, but much remains open.

Post-inflation e r a
After inflation, the development is basically known, even though details are still being fine-
tuned. This era began at very high temperatures (-lo1' K say) and densities. Everything was
in thermal equilibrium, for which all qauntities can be calculated in terms of the temperature.
As the universe expands and cools, protons, neutrons and eventually nuclei are formed. This
16See, e.g., J. B. Hartle and S. W.Hawking, Phys. Rev. D 28, 2960 (1983)
17W. M. Suen and K. Young, Phys. Rev. D 39, 2201 (1989)
takes places up to a time of 200 s. A very good account is given by Weinberg.18 This account
of the era after inflation remains basically correct even though it was first written in 1977.
Thereafter, the evolution is well described by our equations (11.24) and (11.25), except
that up to -10'' S, radiation dominated. At about -1015 S, galaxies formed. The present time
is about ,10l8 s or -10 Gyr. See for example Figure 28.1 in Misner, Thorne and Wheeler.lg

Further reading
For further reading, especially of recent developments, a good text is Bergstr6m and G ~ o b a r . ~ '
For historical interest in original works, there is a nice collection of reprints.21
Developments in the last few years are best found on the web, using a search engine to
look for key words such as "cosmological constant". The resources include the following types:
Electronic versions of original papers and preprints, for example at xxx.lnl.gov/astro-ph.
Most of these are probably too diEcult for students in this course.
0 Electronic versions of semi-popular journals such as Nature, Science, Scientific American,
Physics Today.
0 There are also web pages of astronomy and cosmoIogy courses in various universities
around the world.
Some useful sites are given below. Nature maintains a "Science Updatesnz2,and there is
a similar one at Physics Webz3 and at Lawrence Berkeley Lab2*. More general sources include
Physics Toda y25 and Science News26.
Survey and news can be found at27. There are pedagogical introductions to the cosmo-
logical constant28~29~30~31,the SN work3', and the flatness problem33.
Many of these are cross-linked as well. With knowledge developed in this Chapter,
students can begin to access much of this exciting new development,

Chll-3.tex; January 8, 2002


lSS. Weinberg, The First Three ~ i n d e s :Modern
~ View of the Origin ofthe Universe, Basic Books (1993)
19C.W.Misner, K. S. Thorne and J. A. Wheeler, Gravitation,Freeman (1973)
'OL. Bergstrom and A. Goobar, Cosmology and Particle Astrophysics, Wiley (1999)
'lE. W . Kolb and M . S. Turner, The Early Universe: Reprints, Addison-Wesley(1988)
22http://ww~.nature.corn/nsu
23http://physicsweb.org
24http://www.lbl.gov/Science-Articles
25http://www.aip.org/pt
26http://~~~.sciencenews.org
27http://www.cosmologymodels.com
28http://~~~.Id.gov/abs/astro-ph/9807128
29http://~~Per.colorado.edu/-michaele/lambda.html
30http://pancake.uchicago.edu/~carroll/encyc
31http://astron.berkeley.edu/~jcohn/chaut/references.html
32http://~~~.Ibl.gov/supernova
33http://archive.ncsa.uiuc.edu/Cyberia/~osmos/~latness~roblem.html
12 Mathematics of Curved Space. 11: Vectors
12.1 Displacement vector and tangent plane
The object of this Chapter is to define and study vectors on an N-dimensional manifold M,
for example the surface of a sphere of radius a: S2(a).
Recall the familiar case of Euclidean space. We start with the definition of a primary
vector, namely the displacement; then all other vectors (velocity, momentum, force, electric
field, etc.) follow from it, and have the same transformation properties. We did the same
thing earlier for Minkowski space in special relativity. So as before, we have to start with
displacements.

"Usual" definition
On flat space, a displacement from A to B is represented by the straight line joining the two
points; it is a straight arrow - the most important property of a vector (Figure la). But on a
curved manifold, the displacement from A to B is not a straight arrow (Figure lb); it cannot
be called a vector.

Proper definition
4

However, each small part of M is nearly flat, and infinitesimal displacements dx are vectors.
They can be represented by straight arrows (Figure 2a).

Figure 2
Let the coordinates of the neighboring points be

Then the vector d;: has components

The following points should be noted.


(xl, . ,xN) is not a vector.
0 (dxl, ... ,dxN) is a vector. (That is why we write the arrow over the entire combination,

as in &, rather than as dz'.)


The different components may have different units. For example, if we use polar coor-
dinates x1 = t9,x2 = 4,x3 = R (see, e.g., (10.16)), then [dxl]= dimensionless, [dx3] =
length.

Tangent plane
In Euclidean space, say E3, the points belong to E3, and we also think of the vectors as
belonging to E3. But for a curved manifold M, we should not think of vectors as belonging to
M. We should think of the displacement & as belonging to the tangent plane at x, denoted
as Tp(x) (Figure 2b).
At a given point x, T,(x)is a flat space, like Euclidean or Minkowski space, so all vector
operations-can be defined in the usual way. (We shall come to a few of these below.)
But in general T,(X)# T,(y).
So vectors in different tangent planes cannot be added or subtracted naively. This will
be the focw of the next Chapter.

Example of a vector
Consider M= S2(a) (see, e.g:, (10.20)), and

In words:
Start with the point (6,$).
Move an infinitesimal distance (dB, d4).
0 This displacement is the vector.

Thus the vector is specified not just by (do, dd), but also by (8,d).
In physics, we need not distinguish between infinitesimal displacements and sufficiently
small finite displacements. For example, we can think of the displacement from Hong Kong to
Macau as a vector on the surface of the globe.
12.2 Embedding in flat space
An expression such as (12.1) becomes easier to visualize if M is embedded in a higher-
dimensional flat space, say of M dimensions, M > N.

Step 1: Cartesian coordinates and unit vectors


Let this space be described by the cartesian coordinates

and corresponding unit vectors


-+N
~ l ? -?
-u- e.g., I, j, f
The uIlit vectors satisfy

Step 2: Change to generalized coordinates


Now change to a new set of generalized coordinates

and reduce to the manifold M by setting the "extra" coordinates t o constants (e.g., see (10.26)
or (10.36))

xp = cp = constant
dxp = 0

In other words, any point

x = (xl,. -.,zN)

on M is now thought of as the corresponding point

x= . . xN, '-N+l.. . ,p)


in the larger space.

Step 3: Vectors
In the larger space, x itself is a vector, and we can write it as 6:

(All sums over i are from 1 to M:)Hence a small displacement is


The unit vector ?i are constant, so they are not differentiated. Now change to the generalized
coordinates (see (10.33)):

The index p is understood to be summed. In general, the sum is over the dimension of the
larger space, i.e., p = 1, - ,M; however, if the displacement stays on the manifold M, &* =a
for p > N, so from now on, the sum over p is understood to be from 1 to N only.
As before, define (see (10.34))

so we have

d ! = dx* (C A',$)

Example B3 (Compare each step with the derivation of (12.5))


In this example, we write out a general vector on a plane using polar coordinates, in terms of
the underlying Cartesian basis vectors.

2 = xP+yj
d; = dzi+dyj
= d(r cos )) :+
d(r sin )) j
= (drcos) - rsin4d)) i
+
+(dr sin 4 r cos 4 dd) j
= dr (cos ) :+
sin 4 j)
+d) (-r sin 4 +
r cos ) j)
+
The vector dx = (dr, d)) in 2-d polar coordinates means precisely this.

Example C3 (Compare each step with the derivation of (12.5))


In this example, we write out a general vector on the surface of a sphere of radius a, in terms
of the underlying Cartesian basis vectors.

z' = X ~ + ~ ~ + Z ~ (12.8)
d; = dx 2+dy j + d r &
= d(a sin 0 cos )) i + d(a sin 8 sin )) j + d(a cos 8) k
= (acosBcosddB-asinBsin4d4) I
+
+(a cos 6 sin 4 dB a sin 0 cos # d # ) j
+(-a sin 9 dB) k
= d B ( a c o s B c o s # ~ + a c o s B s i n ~ ~ - a s i nk)
s
+d$ (-a sin 0 sin q5 :+
a sin 0 cos q5 j)

The vector = (dB, d 4 ) means precisely this.

12.3 Basis vectors

Example B4
We continue with Example B3. Define the basis vectors e', and Z+ respectively to be the
coefkients of d r and d$ in d5. Thus

Example C4
U'e continue with Example C3. Define the basis vectors & and Z+ respectively to be the
coefficients of dB and dq5 in d5. Thus

General embedding definition


Refer to (12.5). The general definition of the basis vectors in terms of the embedding space is

General intrinsic definition


It is also useful to have a definition of the basis vectors without reference to the embedding.
We have said earlier that Zfl is just the coefficient of d x p in the vector d;. So
From this we obtain

e, = - (12.14)

The point x is a vector in the embedding space, but is only a point P, not a vector, on the
manifold M . Thus, we also write

ep= - (1215)

Since P is just an arbitrary point, mathematicians sometimes write simply

Regard this as just a shorthand for (12.15).


Compare with the analogous equations (5.52) nad (5.53) for Minkowski space.
In words, the above equations mean the following.
Choose p (e.g., p = 1).
Set all coordinates other than x p (e.g., s2,. .- ,xN) to be constants - that is the meaning
of the partial derivative.
The one remaining degree of freedom describes a line, along which only x, changes. Move
a little bit Ax" along this line.
The result is a short vector Kx.
0 Then

$t is a vector in the direction of increasing xp, keeping all other


components fixed. Its length is the displacement per unit change
in x".

Note that a11 these operations can be performed on M; so this is an intrinsic definition.

Length of basis vectors


Consider Example C4 and (12.11).
Problem 1
Refer to Example B4. Are the basis vectors orthogonal? What are their lengths?
In general, we see that

The natural basis defined above is not an orthonormal basis.

What is the dot product between two basis vectors?

Embedding derivation

Intrinsic derivation
We can derive the same result even more directly by using the intrinsic point of view. First,
we note that the distance between neigbouring points are

ds2 = &.
= (dxPZZ)- (dz"e',)
= (,$ E',) dz"dxU

But we also know that this should equal

Comparing these two expressions, we see that


This implies that the lengths of the basis vectors are

The absolute value sign is inserted to deal with time-like vectors. For example, in special
relativity, goo = 700 = -1, but we still say that the time-like basis vector has a length 1.

Problem 2
Continue with Problem 1 and check that the dot products agree with (12.1 7).

12.4 Velocity and momentum


4

So far we have concentrated on the primary vector - the infinitesimal displacement dx. Ewe
consider a curved space or curved spatial coordinates (not spacetime), then time is an invariant;
thus the velocity and momentum

are also vectors. (If we are in a curved spacetime, then all dt should be replaced by d r ; however,
here we illustrate with more familiar examples in curved space only.)
Let us see what this means in terms of the example of polar coordinates on a plane.

Example B5
For polar coordinates on a plane

Thus we identify the components of the velocity as

where v4 is recognized to be just the angular velocity w.


Likewise the momentum components are

It is also interesting to calculate the components with lower indices. Because the metric is-
diagonal, the calculation is simple.

Problem 3
Show that pb is just the angular momentum J.

12.5 Transformation of vectors


Generalized coordinate transformations
P~eviously,under rotations and Lorentz transformations, we considered only linear transforma-
tions of the coordinates, e.g.,

xfP = a ~ v x Y (12.25)
where are constants. On a general manifold M, there is no special set of cartesian coordi-
nates, so we must consider completely general coordinate transformations:

The simplest example is the transformation (x, y, z ) + (8,$, R)

which is of course nonlinear.


Another example is on the curved surface S2(a), and the relation between Example
C2 and Example D2 (see Chapter 10). There the transformation (8,#) -+ (r, $) is given by
r = a sin 8, which is again nonlinear.
Vector transformation
Under the general transformation (12.26), where xp 4 XI", the differentials transform as

The infinitesimal displacements transform linearly, just like (l2.25), even though the
coordinate transformations need not be linear.
e On a given tangent plane T,(z), a", is a constant matrix.
On different tangent planes a", can be different.

General definition of a vector


Consider N quantities v p ( p = 1, --a .,N), defined at the point x E M. If, under a coordinate
transformation

then vp forms a vector - more precisely a () vector.

Vectors with lower indices


Recall from (5.39) that vectors with lower indices are to be transformed by [b] = [a]-' on the
right. So consider N quantities v, ( p = 1, .. ,N), defined at the point x E M. If, under a
coordinate transformation they transform as

then v, forms a vector - more precisely a () vector.


To summarize

An upper index is transformed by [a] on the left.


A lower index is transformed by [b] = [&]-I on the right.

This definition ensures that the contraction of an upper index with a lower index gives an
invariant. The proof is analogous to the case of the linear transformations in special relativity.
Higher rank tensors
Higher rank tensors are objects that transform in the following way, e.g.,
t w = uppaYutPa
,:t = tPubPpbuY (12.31)

Comparison with Lorentz transformations


Even though the coordinate transformation is general, the vector transformation is linear, with
the matrix [a] and the inverse matrix [b]. Thus, essentially everything derived in Chapter 5
about the transformation of Cvectors can be taken over, with two exceptions.
The transformation matrix [L] for a Lorentz transformation is now replaced by a general
transformation matrix [a].
e vp can no longer be used to raise and lower indices. (See below.)
In particular, the contraction theorem works as before.

Metric tensor

Since ds2 = g,vdx~dxY


is an invariant, and dx' is a vector, so g
, must be a (i ) tensor.
Hence it must transform as

Lawering of indices
We define

and similarly for higher rank tensors. By the contraction theorem, A, so defined is a
tensor.

The displacement vector


Return to the displacement vector, and compare dxp and dx,. We note that dxp has two
meanings:
0 It is the p component of d>, i.e., (dx)'.
It is the change in the coordinate xp, i.e., d(zp).
In contrast, dx, has only one meaning:
4

It is the p component of dx, i.e., (dx),.


But i t is not the change in a coordinate x,; in general there is no such thing as a coordinate
x,. Coordinates are are not vectors, and indices cannot be lowered.
Problem 4
+
(a) Consider the differential Adz1 Bdx2 in terms of the coordinates xl, x2, where A =
A(x1,.x2), B = B(xl, x2). If it is to be the change in a certain quantity F (i.e., an exact
diflerentialj, we would have

Show that a necessary condition for such a function F to exist is


dA
- --- dB
ax2 ax1
(Hint: Compare mixed second partial derivatives.)
(b) Now consider Example B2, with x1 = r and x2 = 4. We have

This is of the form (12.34), with A = 0 and B = r2. Can we write dx2 as the change in a
certain quantity F?

Raising of indices
Since raising is the opposite of lowering, it must be done by the inverse of g,,. Hence define

(12.35)

or more explicitly

The raising operation is then

12.6 Gradient of avector

Let @(x)be a scalar field, and consider its change. By Taylor's theorem

d@(x)= @,,(x)dx"
where

Now d@is a scalar and dx" is a () vector. SO , , is a () vector. It is the gmdieni of @.


Excmple C4
Let = sin28 cos24 on S2(a). Calculate its gradient.
Solution
a@
= --
- 2 sin 8 cos 8 COS' 4
a0
- a@ sin20 cos 4 sin 4
@>dJ - -
84 = -2

12.7 Local Cartesian system and physical components


In many situations we encounter metrics that are diagonal. In any event, by a coordioate
transformation, it is always possible to make the metric diagonal at one point xo. Thus, in this
Section, we consider diagonal metrics. This means that the basis vectors are orthogonal (see
(12.26); but they are not necessarily unit vectors. Most of the examples we have dealt with are
in this category.
The concepts are in fact very simple, and can be illustrated by the following Problem.

Figure 3

Problem 5
(a) On a flat 2-d plane, the basis vector & points east, and the basis vector & points north.
However, they are not unit vectors, but have lengths ll = 5, 12 = 3 respectively (Figure 3). A
certain vector u' is given by

Find v1,v2 as well as v 1 , v ~ .


(c) What would you say are the easterly and northerly components of v?
Let us now discuss the situation generally. Suppose we have a diagonal metric (at one
point):

+-.
ds2 = g l l ( ~ 1 )$2 g22(dz2)2 (12.40)
This means that the lengths of the basis vectors are given by I, = lGl = m.We therefore
define unit basis vectors d , by
Thus, for any vector 5,

We therefore identify the coefficient of the unit basis vector as the physical component of the
vector v':

PI- = (12.43)

Because these are physical components (e.g., the easterly and northerly components),
we have

Moreover, the physical components all carry proper units, e.g., for a velocity, the physical
components would have unit m s-l for every component.

Example B6
For polar coordinates in 2 d

Problem 6
For polar coordinates in 2 d, express v i in terms of r and w. Also express #, p i and in
terms of r and w. Which of these is conserved?

Ch12-2.tex; January 22, 1.998


13 Mat hematics of Curved Space. 111: Differentiation
From the last Chapter we see that differentiating a scalar is easy, a d the resulting gradient is
a vector:

However, differentiating a vector is more complicated. We first describe, through a familiar


example, why this is the case; then we show how the differentiation should be done.

13.1 Example of differentiating a vector


Consider 2-d flat space, and the velocity vector ii. (We shall be using examples from 2-d and
3-d space to motivate the formalism for spacetime; therefore, we shall use a common notation,
e.g., v' rather than v, to denote vectors.)

In other words, we differentiate component by component. The reason is that


- - -

The basis vectors I, j are constants

as illustrated in Figure la.

(a>
Figure 1
Example B7
Now go to polar coordinates:

where the basis vectors e', and Z4 introduced in Chapter 12 are not constant - see t-he two sets
in Figure lb. Therefore the derivative is

The second line is new; this is the main feature in this Chapter.

I In a general coordinate system, the basis vectors are


not constants, and must be differentiated a s well.

By continuing with this example, we shall see that this extra contribution is actually
familiar. In this example, the basis vectors e", and Z+ are given by

so the spatial derivatives of the basis vector are

and the time derivatives are


Hence using the chain rule we have

We shall pay attention to the radial component, i.e., the coefficient of e', .

This example shows that for curved coordinates (in flat space or in curved space), when
we differentiate a vector
we do not simply differentiate the components; but
0 there are additional terms due to the change of the basis vectors.

These additional terms account for the difference between the following operations, which can
be seen from (13.11).
First differentiate a vector and then take the /I component - like (dZ/dt)r on the LHS.
First take the /I component and then differentiate - like dvr / d t on the RHS.
The acceleration is given by the first of these two operations acting on the velocity. The extra
term in (13.11) is - r ( ~ d )=~ -rw2: it is just the centripetal acceleration.

Problem I
Take a point P at rectangular coordinates (1,O) and another point Q at (cos a,sin a), where
0 < a << 1. At these two points, construct unit vectors pointing in the x direction.
(a) If we first subtract these two vectors and then take the r component, what is the result?
@) If we first take the r component in each vector and then subtract, what is the result?
(Note: calculate to first order in a only.)

Problem 2
What can you say about the "extra" term in the coefficient of Z+ in (13.10)?
13.2 General embedding definition
We see that all the complication comes from the changes in the basis vectors.

Algebraic d e h i t i o n
Go back to (i3.7). It is clear that all the extra terms come from contributions such as

The coefficientsin (13.12) tell us how the basis vectors are changing. These coefficients are
associated with three directions. Take the second coefficient l / r in (13.12) a s an example.
0 Take the basis vector in the r direction (in general the p direction).

.- Differentiate it with respect to 4 (in general the v coordinate).


0 The result is still a vector; look at its 4 component (in general the p component).

In general, the RHS will be a sum over p (the two terms in (13.12)).
Thus we can define these coefficients by the general formula

I I

It is also useful to write this relationship in the following form:

I J

The coefficients,;?I are called Christogel symbols. A more modern mathematical term is the
connection 1-form; but we shall not use the latter terminology since we shall not go into the
theory of p-forms in this course. The Christoffel symbols tell you how the basis vectors are
changing; once they are known, you can differentiate vectors and tensors.

Warning

Although I?;, carries 1 upper index and 2 lower indices, it is not a () tensor. We shall not
discuss its transformation properties.

Problem 3 [Example B8J


Continue with the example of polar coordinates in 2 d, with (xl, x2) = (r, 4 ) . From (l3.7), read
off all the components of rt,. In particular, what can you say about the relationship between
I'z, and ??&in this example?
Geometric interpretation

+
Figure 2a shows two basis vectors e', at differentradii r and r dr; both are unit vectors. From
this diagram, we see that de',/dr = 0. Figure 2b shows two basis vectors e', at slightly different*
+
angles q5 and q5 dq5. The difference is in the i?$ direction; thus ae',/dq5 cx +Z+

Problem 4 [Example B9j


(a) What can you say about the Christoffel symbols in Example B8 from the statements in the
above paragraph?
(b) Draw diagrams similar to Figure 2 for the basis vector 2+. Write down a descriptive
statement sirrdlar to the paragraph just before this Problem. Then say what other conclusions
you can draw from these statements.

Problem 5 [Example Cq
We now try to do the same thing for polar coordinates in 3 d, with (xl,x2,x3) = (8,4, R).
(a)The displacement vector is

From this, express the basis vectors, e.g.,

in terms of the rectangular basis vectors.


(b) Next calculate all the derivatives, e.g.,

in terms of the rectangular basis vectors.


(c) Eliminate the rectangular basis vectors and express the above derivatives in terms of 8,$, R
and Ze, Z4, &, similar to (13.7).
(d) Hence determine all the Christoffel symbols in this Example.
These two Examples concern curved coordinates in flat space. But it is now easy to
discuss curved space - by just embedding it in flat space.

Problem 6 [Example C6]


Consider the surface of a sphere of radius a. This is described by two coordinates (6,+), and is
readily obtained from the previous Example by setting R = a = constant, and elimioating all
R components of vectors. Find all the Christoffel symbols for this Example.

13.3 Intrinsic definition


The embedding definition given above is intuitive, and relates the Christoffel symbols to how
the basis vectors e', change in terms of-the constant basis vectors of the embedding space.
However, there are some disadvantages to this method.
Embedding is not unique; the same manifold M can be embedded in two different flat
spaces. How do we know that the Christoffel symbols thus calculated is independent of
the embedding, and is an intrinsic property of M?
0 Eventually, we shall be interested in M being spacetime. The embedding space must be
very high diti~ztensional(at least 5 dimensions, possibly many more). So it becomes quite
difficult to visualize what is happening in the embedding space.
For these reasons, we have to develop an intrinsic formula for the Christoffel symbol: a way
of calculating it without ever leaving the surface of M. We simply quote the result here and
learn to use it; the derivation is given in Appendix A. You can also check the intrinsic formula
by compazing the results in this Section with the results in the last Section.

Remember ihat
-e The dummy index X is summed over.
0 Commas denote differentiation.

go* is the inverse of g,, .


This formula implies that I' is symmetric in its two lower indices:
To illustrate the use of the intrinsic formula (13.15)) we show the example of polar
coordinates in 2 d.

Example BlO
The metric is

Since this matrix is diagonal, it is easy to obtain the inverse matrix

Moreover, the only nonzero derivative of the g,, is

NOWconsider

In the above, since go* is diagonal, the only allowed value of X is X = r; and g4,,+ = g,4,4 = 0,
9 4 4 , ~= 2'.

Problem 7
Continue with the above Example and calculate all the Christoffel symbols. Compare with the
results of Problem 3.

Problem 8 [Example Cq
On the surface of a sphere of radius a

Follow the above steps and calculate the Christoffel symbols intrinsically. Compare with the
results of Problem 5.

Problem 9 [Example 031


The sphere of radius a can be described by different coordinates ( r ,4), where r = a sin 0. The
metric is

Find the Christoffel symbols.


13.4 Applications in general relativity
We now apply the intrinsic definition to examples in general relativity. This discussion will be
structured as a series of Problems.

Problem 10 [Example H3: Weak fields]


For weak fields, we have, to first order in a,

and we shall a s s m e that iP = @(x,y, 2).


(a) Show that

(b) Hence calculate r:, and

Problem 11 [Exumple F3: Robertson- Walker metric]


The Robertson-Walker metric for describing cosmology is given by the metric

(a) Write down an the nonzero elements of g,, .


(b) Write down all the nonzero elements of g".
(c) Find all the nonzero elements of g,,,, .
(d) Calculate the following elements of the Christoifel symbols, in terms of a and h;

Problem 12 [Ezample G3: Schwarzschild metric]


The Schwarzschild metric for a point mass M (e.g., a star) by

where A = A(r) = (1 - 2GM/r) and B = B(r) = A(r)-l. In the following, do not use the
explicit form of A and B, but just express the answers in terms of A, B, A', B' etc.
(a) Write down all the nonzero elements of g,,.
(b) Write down all the nonzero elements of g".
(c) Find all the nonzero elements of .,g
(d) Calculate the following elements of the Christoffel symbols.
13.5 Covariant differentiat ion
Differentiating a scalar
Differentiation just means comparing a function at two neighbouring points x and x + dx.
Consider first a scalar field a; the difference in its values between the two points is

@ (x + ax) - @ (x) E d@(x) (13.17)


This difference must- be linear in the displacement components dxp; we therefore define the
proportionality constants by

+
@(x dx) - @(x)r d@(x)= @,Jx)dzp (13.18)
By considering the case where dxp has only one nonzero component, e.g., dx = (dxl, 0, . - ,0),
i t is easy to see that these coefficients are just partial derivatives

Differentiating a vector
+
.
Now ccnsider a vector field A(x), and compare its value at two neighbouring points. There are
two ways of doing this comparison.
Cornpisre components
We can first take the components (say the p component) at these two points, and subtract, i.e.,
consider the difference

Ap(x + dx) - Ap(x) G dAp(x)

where
AP(x -
Since the component A1 is just a function like @, we again have

+ dx) - Ap(x) dAp(x) = AP,,(x)dxV (13.21)

Compare vectors
Another method is to compare the two vectors directly, i.e., take the difference vector:

i(x + dx) - i ( x ) 1 dA(x)


4

(13.23)
In particular, we take its p component; this must again be linear in the displacement, so

@(x + dx) - ~ ( x ) ) i (&(x)) = AP;u(x)dxv

which defines the it covariant derivatives Ap;,(x).


We shall now use an example to emphasize the difference between the ordinary derivative
and the covariant derivative, and then come to the general formula for the covariant derivative.

Problem 13 [Refer also to Problem 11


A :
In 2 d, a vector field is defined by = everywhere. Use polar coordinates and compare
+
the two points P = (r,4) = ( 1 , O ) and Q = ( r dr,4 + d4) = ( ~ , c Y ) , where a is regarded as
infinitesimal.
(a) What is A' at each point? What is dAT? Hence find AT,+.
(b) What is dA? What is its r component? Hence find AT,+

Formula for covariant derivative


We now derive the general formula for the covariant derivative of a vector field z(i(z).

4.) = Ape;
d l q x ) = dAPe; + A'de;
We use (13.21) for dA" and (13.14)for d G :

d&x) = (A',,dxU)Zp + A' (I?~,ZPdxY)


= (A',udx")Zp + AP ( ~ ~ ~ ~ ~ d x " )
= (A', + API';) dd"',, ( 13.26)-
where in the-second line we have interchanged the dummy indices p and p in the second term.
Now let us take the p component, i.e., the coefficient of G:
(di)' +API':,,)
= (Apvu dz"
= APiYdxY
From this we can read off the covariant derivative

Transformation properties
We consider the definition

Now the LHS are the components of a vector, and dx" are also the components of a

vector. Therefore the covariant derivatives A';, transform like a () tensor.


Differentiation of higher-rank tensor fields
Consider a tensor field which is a product of two vectors:

C'Lv= A,Bv (13.30)


Its covariant derivative is obtained by appending a Christoffel symbol term to each index:

~ ;+ rpu
c ~= cpvlU P C
~ PV + p' C,P
PC

This is readily shown by considering the tensor

Cpv+gG',, (13.32)
where @ denotes an exterior product. However we shall not go into this derivation.

Appendix A
In this Appendix, we derive the intrinsic definition of the Christoffel symbol from the embedding
definition. The cartesian coordinates of the embedding space are denoted by indices i, j, .,
while the coordinates of the manifold are denoted by Greek indices p , v, . . -. Thus, an intrinsic
expression is one in which the indices i ,j, . do not appear in the end.

Step 1: Express in t e r n of coordinates without using basis vectors


We start with cartesian coordinates 2 and unit vectors 2. A general point 5 in the embedding
space is

The basis vectors in the curvilinear coordinates are

We are here concerned with the changes in these basis vectors, so we consider

Now the inverse of (13.34) is


(Sums over Greek indices are understood.) Put (13.36) into (l3.35), and we get

Thus we read off the Christoffel symbol as

It is convenient to introduce the (nonstandard) notation

(13.39)

The last quantity is symmetric in the two lower indices because of the property of mixed partial
derivatives.
This allows (13.38) t o be expressed compactly as

This is a simple expression for the Christoffel symbol in terms of the two sets of coordinates,
without mentioning the basis vectors.

Step 2: Ezpress in t e r n of metric


The formula (13.40) is still not intrinsic, because the embedding cartesian coordinates ?istill
?
appear. To eliminate them, we start with the following expression for the metric, from Chapter
10.
Differentiating with respect to x X then gives

Now from (13.40)

= c (;J (p)
(;) (;J (:) (;)
c (;J
=

= 6:

= (i")
In the above, we have used the fact that

are inverse matrices.


From (13.43) and changing j I+ i, we have

Now put (13.44) in (13.42).

The Christoffelsymbols can be taken outside the xi,and the remaining factors can be expressed
back in terms of g,, by (13.41):
Thus we get

The cartesian coordinates no longer appear; this is an intrinsic expression.


Now permute the three indices p, v and A.

There are three pairs of terms as shown. Take B + C - A.

1
rG~%l = 5(gp*,v f gb,p -g p v , ~ )
Multiplying by the inverse matrix gXu = guX

This is the relation that we seek; it expresses,:?I in terms of the metric and its derivatives, and
is therefore intrinsic.
To be explicit where a curved manifold comes in, let us consider the case of a sphere
embedded in 3 d. Here, the Greek indices (e.g., p ) would range over 1, 2, 3 or (6,4, R), while
the Latin indices (e.g., i) in the intermediate steps would range over 1, 2, 3 or (x, y, I). Thus
the Christoffel symbols obtained, e.g., (l3.51), refer to curvilinear coordinates in 3-d flat space.
However, by just setting R = a = const, and allowing the Greek indices to range over only 1, 2
or (6, d), the whole discussion is restricted to the surface of the sphere, and the same expression
now refers to a curved manifold.

Chl3-l.tex; January 26, 1998


14 Motion of Point Particles
Electromagnetism consists of two parts.
rn How do the fields act on charges and currents? The answer is the Lorentz force law
rn How do the charges and currents produce the fields? The answer is the Maxwell equations
Similarly, in gravitation, there are two parts.
How does the gravitational field act on the particles? We answer this question in this
Chapter.
rn How do the particles (which give rise to energy and momentum) produce the fields? The
answer is the Einstein field equation, which we deal with later.
We shall derive the equation of motion for point particles, and then apply it to
rn Weak fields
rn The cosmological model
rn Deflection of light
rn Gravitational redshift
rn Precession of perihelion
rn Conservation laws
rn Motion in the vicinity of a black hole (next Chapter)

14.1 Law of motion: Derivation I


The central theme of relativity is: Once we have taken account of spacetime curvature, then
we can forget about gravity locally, and regard the particles as Lcfree".Then the law of motion
should be

The momentum does not change.

Mathematically

Figure 1
Figure 1 illustrates two neighbouring points A, B on the path of a particle in spacetime,
with

The statement (14.1) means

- + - -
d p - pB -FA = 0 (14.3)
In other words we subtract the two vectors, then take the components, so

(ap
= d ( p p ) + ~ ~ p p " d=
x P0 (14.4)
Divide by the proper time d r for this interval

We can write this in terms of the coordinate xp: put pp = m d x p / d ~

+ mrEP-- -

Force
An alternate expression is

Compare with the Lorentz force law:

The structure is very similar.


Gravity couples to 4-velocity quadratically.
0 -rEPis analogous to the field tensor Fp,.
0 The m on the left is the inertial mass rnr; the m on the right is the gravitational mass

m G . We have m r = r n automatically.
~
0 In electromagnetism, the part of the force that depends on the spatial components of the

velocity is the magnetic force. It is proportional to v' N P. Likewise, in gravity part of


the force is proportional to the spatial components of the velocity, and will be called the
"magnetic" part. These terms go as v i P and vivj P2.
Geodesic equation
A second way is to cancel m from (14.6); this is actually more convenient for calcuiations.

1 r

The fact that mI = m ~or, equivalently that m cancels in (l4.9),is the principle of equivalence.
This property comes out naturally.

14.2 Law of motion: Derivation I1


(This subsection is not needed for the subsequent material.)

We now give a second approach, based on the principle of least action. First review section 8.3.
There, for flat Minkowski space, the action for each small segment of path is.

A S = -m(-AS- Ax)1/2 -
= -rn(-~x~Ax~)'~~
= -m(-qpVAxpAx V )112
dS = -m(-qpydxPdx v )112

(See (8.13) for example.) In flat space, the metric tensor is

In curvilinear coordinates (and hence also for curved space) the generalization of distance
is simply qpVH gpV. So (14.8) should be modified to

Introduce any path parameter s

The correct path is the one that makes S stationary, i.e., 6S = 0 under a first-order change

where q p is small, and qp = 0 at the end-points. The change, in the J-- is


where [ ] denotes the expression under c.
The two terms

d6xP dx"
-- dxP -
- d6xv
ds ds ' ds ds
have been combined, hence the factor of 2. In the first term in { }

dxp dxY
First term = 1[ ]-1/2 {-g,vlpxz

In the second term, 6xY = qv. Since the expression is under the integral sign, we can integrate
by parts.

Secondterm = -[ ] -112 g pdxPd$'


v x z
d
= - [
ds
{ ]-llzgpV
dxp
q" + surface term

In the last step we have changed dummy variables v + p. Thus the two terms together yield
.
where { }qP = (14.12) + (14.13). Since qP(s) is arbitrary, we put { } in (14.14) equal to
zero.

The above is for any arbitrary path parameter d s . If we choose in particular ds = d r

d r 2 = -gpydxPdxY

then in (14.15) [ ] = 1, and (14.15) simplifies to

dxp dx"
-0
dr d r
(We shall come back to this in section 14.8.) Carrying out the differentiation

But

dgPp
-- dx"
dr - QPPPZ
Hence

Since (dxp/dr)(dxY/dr) is symmetric under p ct v, the first term inside the bracket can be
written as

Hence

Multiply by guP

which is the same as (14.9).

14.3 Weak fields


As the first application, consider a particle in a weak gravitational field. The gravitational
potential @ is regarded as a small quantity: B = O(e). For simplicity, we also limit to slow
motion, i.e., v2 = O(E)as well. The calculation is carried to first order in E .

Problem 1
(a) Estimate the numerical value of B and v2 for the motion of the earth around the sun.
(b) Using Newtonian mechanics, derive a relation between the time-averaged value of @ and of
v2, for a bound orbit. [Hint: consider d(p r)/dt. For bound orbits, p . r stays within bounded
limits, say f B. So for long times T, I (d(p . r)/dt) 1 5 2B/T -+ 0. Express d(p r)/dt in terms
of and v2. ]
The equation of motion is

(The 0 component is not needed.) In the above, ' = d / d ~ .The velocity is 0(e112);d / d t and
d / d r differ by O(7 - 1) = O(v2) = O(E). SOin the fist term, if we replace d / d r -t d l d t , the
error is only 0(212), which can be neglected. Also r = O(E),SO we only need to take Q(eo) in
x"xp,This means we only take v = p = 0 and

Hence

So we recover the Newtonian law of motion.


Note that -I?& (both lower indices being time components) is like the electric field. To
this approximation, gravity is analogous to Coulomb's law for static electricity.

14.4 Cosmological model

Figure 2

We now return to the Robertson-Walker metric. If IC = +I, space is like the surface of a
spherical balloon with radius a ( t ) . The spatial coordinate system ( F , 8 , (6) can be thought of as
being painted on this balloon, and expands with the balloon (Figure 2).
-4lthough the coordinate system expands with the balloon, the motion of particles may
be different. We claim that particles (i.e., galaxies) are "stuck to" the coordinate grid as the
latter expands. Mathematically we have to show

?(t) = const , O(t) = const , $(t) = const (14.19)


can be an exact solution to the equations of motion. We shall assume (14.19) and check that
it satisfies the equations of motion.
From the metric in Chapter 13,

since d?/dt = dO/dt = d$/dt = 0 by assumption. Hence the solution is (up to an irrelevant
additive constant)

The equations of motion (14.9) have to be checked:


d%P dx" dxp ?
- + pP-- =0
dt2 dt dt
First we note that from (14.19) and (14.20)
dx" 1 v=t
(14.22)
dt 0 otherwise
Hence the second term in (14.21) reduces simply to r$. Moreover, in the first term we note,
by differentiating (l4.22), that
622,
- - -0 for all p
dt2
Hence we only need to check

r&A o for all p


This is verified from Chapter 13. More directly we use the definition

In the first two terms, because g,, is diagonal, X must be t , and gttjt = 0. In the last term,
because gtt = -1, gtt,* = 0. In fact this shows that we rely only on the following properties of
the metric:
fix = 0 for X # t.
gtt = constant.
Thus this calculation shows that galaxies can be "stuck to" the expanding coordinate system.
N d e that we say "can ben, and not LLmust be". The reason is that, depending on initial
conditions, there are also solutions that move with respect to the coordinate system. This is
clearly allowed physicdly - we can launch a satellite at high speed to travel from galaxy 1 at
(?I, 01, $1) to galaxy 2 at (72, 02,#2). In that case ?, 0,4 are not constants.

14.5 Deflection of light


htroduction
Figure 3 shows schematically a ray of light passing at impact parameter R close to a mass M.
If gravity has no effect, then the path would be a straight line:

Figure 3

However, gravity distorts spacetime around M, and the ray will be deflected by an angle
a. Referring to Figure 4, we see that a is simply related to the asymptotic momentum-

Figure 4
In short, we have to calculate the change in pY. The calculation is made simple by two obser-
vations :
We calculate only to first order in G.
Because of the nearly straight path (Figure 3), rectangular coordinates would be conve-
nient.

Equation of motion
It is most convenient to write the equation of motion as

where p is the momentum of the photon. Consider /I = y, and note that r = O(G), so the
other factors can be calculated to O(Go),i.e., for the straight line path. Thus we get

where to be slightly more general we consider a particle moving at velocity PC. Thus from
(14.27),

Hence

We therefore arrive at a very simple interpretation:

The deflection of light directly measures the


curvature of spacetime as expressed by r.
Here it is important to stress another advantage of the rectangular coordinates: I' # 0 is only
due to curvature, i.e., I' = O(G). In polar coordinates, I? is nonzero even without curvature,
i-e., I? = O(@'). Thus (14.29) and the corresponding simple interpretation is only valid in
rectangular coordinates.
Note that a slow particle ( p << 1) would only sense the "electric" component, but a fast
particle (P 1) would sense the "magnetic" components as well, i.e., those with one or more
lower spatial indices, and which give rise to velocity-dependent forces.
Since we need only O(G), it will be adequate to use the weak field approximation from
Chapter 13:

hence

Evaluate a
a=--- GM , Q'Y = GMY
7
T

The integration goes along a straight line y = R, r = d m

We have inserted a factor 1/c2to get the units right. This formula is valid whenever the particle
goes along a nearly straight trajectory at nearly constant speed.
Note that the order-of-magnitude of the effect is (say for a test mass m)

mc2 rest energy


which is a typical general-relativistic effect.

Deflection of light
For light, /3 = 1, and
This value is appropriate for R = radius of sun, i.e., for a ray passing close to the rim of the
sun. Of course this can only be observed during an eclipse, and was so confirmed in 1919. It
was one -ofthe triumphs of general relativity.

Non-relativistic limit
Suppose the particle is non-relativistic, IPI << 1. Then (14.33) becomes

Let us try to recover this result by Newtonian physics. In Newtonian physics, and referring to
Figure 3,

Hence, putting into (14.26), which is always valid

which is just the first term in (14.31), and will lead to (14.34).
However, if we
0 do the Newtonian calculation, and

naively put /3 = 1,
then the answer would be off by a factor of 2. The reason is that we would have missed the
I?,: contribution, i.e., the second term in (14.33). This is the 'Lmagnetic" contribution. So
the experimental observation verifies that there is a velocity-dependent ''magnetic" force which
corrects Newton's law of gravity. Of course, light is one of the hest ways to detect such a force,
which would be negligible for P << 1.

14.6 Gravitational redshift


Consider a photon that "climbs" through a height z, in a gravitational field @(z). Again we
need only calculate to first order in the field. Start with
Because I' is already first-order, we can calculate pYdxP for free space.

pz = pt , dz = dt
Choose p = t in (14.36).

dpt + (I?:, + 2 r ; + r;) ptdz = o


We have, for weak fields

Hence

~ ' (=4[1- 2@(41Pt(0)


For simplicity we have assumed @(0)= 0. But since the metric goes as

the physical energy is

E= mpt
= ( 1 + @)pt
Thus

I I

For a photon, w oc E, hence

This is the formula for gravitational redshift, correct to O(@).


14.7 Precession of perihelion
Preparation
We now consider the precession of the perihelion of the orbit of a planet. This is mathematically
more complicated because we have to evaluate to O(G2) in the intermediate steps. But first,
let us prepare the ground by reviewing the Newtonian results on Kepler orbits.

Problem 2
(a) Show that the equations of motion for a particle in a gravitational field are

(Hint: the second equation is the conservation of angular momentum, and the second term in
the first equation is the centrifugal force.)
(b) Use the angular momentum equation to show

(c) In the radial equation, replace the dependent variable by

and replace the independent variable t by #, using the above relationship between d l d t and
d/d#. As an example

Hence show that the equation for the orbit (i.e., a relationship between r and $, without t) is

(d) Hence show that (by a suitable choice of the coordinates), a solution is

where e and ro are constants. What is the meaning of e and of ro?


(e) Show that the orbit is closed.
(f) Suppose that the relativistic corrections result in

where. la/ < 1. In terms of a, by how much would the perihelion (the point of smallest r)
advance in each revolution?
Metric
The metric is that due to the sun, regarded as a point particle.

ds2 = -B(r)dt2 + ~ ( r ) d r +' r2d02+ r2sin2Bdq5'

Equation of motion
The equations are

d2xC" dxYdxp
0=-
dr2 + r&~z
There are 4 such equations, corresponding to different choices of p.

Evaluation of I'
The nonzero elements are

1 t B'
r =-
r
, r:@= cot 0 , rtr=-
2B

Problem 3
Derive the results shown above for I?:,. I?;,.

The 4 equations
Putting these into (14.45), we get the following 4 equations.

d2r A' dr r sin2 0 B' dt 2


0 = -dr
+ - ( - 2A
) 2 - ( )d2r - A dr A ()2+-(-) 2A dr (14.46)

d28 2dOdr
0 = - + ---
dr2 r d r d r
-sinOcos9
(14.47)

0 =

0 =
Problem 4
Show the derivation of these 4 equations.

Reduce to equatorial plane


Since the planet moves only in a plane, say 8 = w/2, the above can be simplified. Moreover,
the 8 equation is no longer necessary.

Constants of motion
The last two equations can be integrated once. From (14.51),

r2-d4 = J = constant
dr
Obviously J is the angular momentum/mass. From (14.52),

dt
B ( r ) - = I< = constant
d7

Simplify radial equation


We are now left with the radial equation (14.50). Put A = B-l, A'/A = -B1/B.

TOeliminate (dtldr)', we use the definition of d r , from (l4.43), but specialized to 8 = 1712.
Put this into the last term in (14.55).

The (drldr)' terms cancel.

where we have pui in for B = 1 - 2GMI.r , B1/2= G M / r 2 ,d$/dr = J/r2.

Effective Newtonian system


We need to sohe (14.58) together with (14.53). The radial equation (14.58) is equivalent to
motion in a potential

The ratio between the two terms is

Ratio - - (. )
J2
-
r2
2 d4
r 2 - 2
v = O(E)

Thus we arrive at the following interpretation.


0 Relativistic corrections lead to an extra effective potential that goes as r-3.
0 Thus, the inner planets will be affected more.

Solve effective Newtonian system


If we drop the last term in (14.58), it would be exactly the same as the Newtonian case, so we
use exactly the same technique. Introduce

and from (14.53)


Then putting (14.60) and (14.61) into (14.58)' we have

The "extra" post-Newtonian term are underlined.

Newtonian limit
Let us first ignore the last term in (14.63). Note that the u term in (14.63) has a coefficient
exactly 1. Let

Au($) oc cos 4

The sin 45 term can be ignored by adding a constant phase ji.e., redefine what we mean by
# = 0). Introducing a constant e (which will turn out to be the eccentricity),

GM
) -(1
~ ( 4=
J2
+ e cos #) r uo(1 + e cos 4 )

(14.64)

This means r is minimum whenever cos 4 = 1, i.e., # = 2nn The path is closed (Kepler's first
law), because r(6) = r($+27r).
Correction
Again we assume that

Planetary orbits in the solar system have small eccentricities, i.e., the changes in u are small
compared to the mean value of u: 1Aul << uo. Then

u2 FJ u; + 2u0Au(g5)
and (14.63) becmes

We choose u o so that the first [ ] vanishes. Then

where

3(GM)2
a=
J2
Thus, following the same procedure as before,

Au(g5) = uoe cos(1-a)+


where we have used
(I - 2a)lI2w 1 - a
Hence

I 1

If q5 = 0 is one case of minimum r , then the next occurrence is

$ FJ (l+a)2n
Thus the perihelion shifls forward in each revolution by
Apply to Mercury

Eliminate GM by

Hence

64 = 6n- (14.71)

This makes it very clear that it is a relativistic correction. Further, we express it in terms of
radius and period.

Putting in numbers for Mercury

r = 5.8 x 10'' m
T = 88 days = 7.6 x lo6 s
64 = 4.8 x radians/revoIution

Express in another way, as I' per century:

100 x 365
64 = (4.8 x lo-'' x 180
7r
x 36001') x
88
cent -'
N 41" cent-'

A more accurate calculation gives 43" cent-l. (In the above, we have assumed a nearly circular
orbit, i.e., only to lowest order in the eccentricity e.)
Problem 5
The result (14.72) is written in a form that depends on both the orbit radius r and the period
T. However, the two are related by Kepler's third law. Hence write the advance per revolution
64 as

Find the exponent n, and give a numerical value for the prefactor b, if r is expressed in AU.
Write a similar expression for the shift per century.

14.8 Conservation laws


In classical mechanics, there is a well-known theorem: If xff does not appear in the Lagrangian,
then p, is conserved. (In this case there is no need to distinguish between upper and lower
indices.) Now we have an analogous theorem for the motion of a point particle in a metric gp,.

If g,, is indeperident of z", then p, is


conserved.

Note that we refer to p, and not to pa - in this case there is a difference.


To prove this, consider

In the first term we have used the geodesic equation (14.5) for $, and in the second term we
have used the chain rule. Now multiply throughout by m, and note that gap contracts with gPr
in r to give 67,

In the last term, change P --+ a and collect terms

The first, second and fourth terms cancel, because they multiply the symmetric pPpu. Hence
we are left with

So if gpu is independent of xu, then p, is constant.


Incidentally, if you had followed Section 14.2, then a simpler derivation follows directly
from (14.16).
Problem 6
In fact, several results obtained previously are special cases of this general theorem.
(a) If the metric is independent of 4, then pd is conserved. Show that this is just the angular
moment urn.
(b) If the metric is independent of t , then pt is conserved. Use this result to derive the gravi-
tational redshift.
(c) The metric for a star is independent of 6. Relate the conservation of p4 to the constant of
motion J.
(d) The metric for a star is also independent of t . Relate the conservation of pi to the constant
of motion I - .

Ch14-4.tex; December 22, 2000


15 BIack holes
15.1 Introduction
In this Chapter, we describe the spacetime around a black hole (BH), and briefly discuss the
motion of point particles near a BH.

Heuristic argument
A BH is a region of space where gravity is so strong that nothing can escape, not even light -
hence it is "black".
Consider a mass M at the origin. A test mass m at a radius r can escape only if
KE > IPEI, i.e.,
1
-mv 2 GMm
>-
2 r
But since v 5 c, a necessary condition is r > &, where

I I

where & is the critical radius. Anything closer than & cannot escape; a particle farther than
& could escape if its velocity is high enough.
The above is a mixture of Newtonian concepts (e.g., (1/2)mv2) and relativistic concepts
(e.g., v _< c), which is not really legitimate. It is however acceptable for an order-of-
magnitude estimate.
The form of the potential assumes a point mass M. If the mass M (say a star) has
a physical radius R larger than &, the BH condition r < & is never satisfied in the
exterior of the star. But if the physical radius is less than &, then the BH condition can
be satisfied in the exterior.
Thus we conclude: If a star of mass M collapses to a radius R < & = 2GM/c2, then it
becomes a BH.

Example
Consider a star with the mass of the Sun: M = M,, = 2 x lo3' kg. Then

Thus a star has to collapse to a very small size to become a BH.


15.2 Formation of black hole
Next we give a qualitative and brief discussion on two questions.
0 What determines the size of a star?
0 How is a black hole formed?

Figure 1

The size of a star is determined by the balance between two forces (Figure 1): the force
of gravity pulling in, and the pressure force pushing out.

Normal star
In a normal star such as the Sun, the pressure is due to the hot gas:
NkT
p=- cx T
v
When the nuclear fuel is exhausted, T1 and Pl, which therefore cannot support the force of
gravity, and the star starts to collapse.

Degeneracy pressure

Figure 2

So let us go to the limit of T = 0. Classically, P cx T = 0. But in quantum mechanics


that is not the case. This is illustrated in Figure 2. If particles are confined to a box of length
L, the corresponding wavefunctions must exactly fit into L, with
and the corresponding momentum is

We can only put 2 particles (electrons, protons or neutrons) into each energy level (Pauli
exclusion principle). Thus, if there are many particles, they have to fit into highly excited
levels (large n) with large momenta p. Even at absolute zero, the particles cannot be at rest.
This motion gives rise to degeneracy pressure.
To derive the degeneracy pressure P, start with the formula in kinetic theory

It is better to write this as

which is valid even relativistically. Recall that the momentum delivered per collision with the
wall is 2p, and the freqnency of collisions is 1/(2L/v) cc v, accounting for the two factors.
To estimate the typical momentum p of the particles, we note that in ID, each state
occupies

So similarly in 3D, each state occupies

where V is the volume in space and V, is the volume in momentum space. Thus N particles
would occupy V and V,,where

and the factor of 2 accounts for spin. Assume all states in momentum space up to pm, are
occupied, i-e., V, is a sphere of radius ,p, then

where the typical momentum is p


is
- pmaX (say up to a factor of 2). Thus the typical momentum
where .n = N/V is the number density. This formula clearly shows that the effect is quantum-
mechanical.

Nonrelativistic case
We use (15.5) and convert pv t+ p2/m:

Thus

since the number density goes as n oc V-l R-3 for a sphere of radius R.

Ultra-relativistic case
We again use (15.5) and convert pv I+ pc for the ultra-relativistic case.

Thus

since n oc V-' R-3 for a sphere of radius R.


Balance of forces
The pressure force must balance against the force of compression generated by gravity, which
caa be estimated as

iorce
P - J -
area
GM2/R2
N

4x R2
GM2R-4
In the nonrelativistic case, setting the degeneracy pressure CR-5 equal to (15.11) gives

However, if the star is very heavy, R becomes very small. The spatial volume V is small, so the
particles must occupy a large volume V, in momentum space; hence they must move at high
speeds. Eventually, we have to go over to the ultra-relativistic case. Now setting the degeneracy
pressure CR-5 (the value of C is different) equal to (15.11) gives the limit

Note that R disappears (both sides go as R-4), and we get a constraint, representing the
maximum M. In other words, the ultra-relativistic case (v H C) provides an upper limit, called
the Chandresekhar limit M,.
Thus to summarize
Even at absolute zero, there is a pressure that can support gravity.
But this does not work if i l l > M,. The star has to further collapse.

White dwarfs
So when a star exhausts its nuclear fuel, it becomes supported by the degeneracy pressure of
the electrons. This stage is called a white dwarf. If the mass is too large, the white dwarf is
unstable, and the star further collapses.

Neutron stars
As the star further collapses, the electrons get "squeezed7' back into the protons, forming
neutrons - in effect the inverse of P-decay:
and a neutrino escapes. One then has a collection of neutrons only. The star is then supported
by the degeneracy pressure of neutrons, and is very dense. This stage is called a neutron star.
If the mass is so large that it exceeds the Chandresekhar limit for neutron degeneracy
pressure, then nothing further can support the force of gravity, and the star collapses to a radius
less than &. Then a BH is formed. We shall see later that once the star collapses to less than
&, the collapse cannot stop, and it has to collapse all the way to a point.

15.3 Questions about black holes


We can ask various questions about BHs.
a How can gravity be so strong that it manages to overcome the pressure and cause a star
to collapse to < &?
What happens during the collapse?
a Once the BH is formed, what are its properties?
a What is the observational evidence for BHs?

In the rest of this Chapter, we shall deal with only the third question. We assume a
star has been formed that is smaller than &, and try to understand the spacetime and the
motion of particles near it. This question can be approached within the framework of a static
spacetime, i.e., we assume that the BH is formed and is no longer changing, and likewise the
spacetime around it is not changing.

15.4 Coordinates around a black hole


Recall the Schwarzschild metric external to a mass M

where & = 2GM, c = 1, and the angular terms are omitted because we shall consider only
radial motion, so that do = dq!~= 0.
Clearly something funny happens at &, called the Schwarzschild radius.
The example earlier shows that the Schwarzschild radius is 3 km if M = Msu.A BH is
formed only if the physical radius is smaller than the Schwarzschiid radius; otherwise (15.14)
would not be d i d at &, which would be inside the star.

What is time?
Let us now examine the coordinates near the BH. It is useful to go back to the case of a flat
Minkowski space; ignore y and z.

We recognize t as the "time" not because it is called "t" - physics cannot depend on which
+
letter of the alphabet we use. If we write (15.15) as ds2 = -dt2 dq2, we would recognize [ as
the time. We can make several related remarks.
A change in t (dt # 0) leads to ds2 < 0; this characterizes time-like displacements.
0 Particie trajectories are time-like. This can be seen because

0 The time coordinate has only one sign, i.e., time "flows" only into the future and-not into
the past. Thus, the last equation can be expressed as

for a particle trajectory.


Let us apply this chain of reasoning to the Schwarzschild metric.

0 For r > &, B > 0, so we recognize t as the time coordinate; this is the "usual" situation.
0 For r < &, B < 0, so we have to call r the time coordinate, i.e., changes in r leads to

ds2 < 0.
Thus the roles of t and r are reversed for r < &. The coordinate r is time. It can
flow only one way, i.e., particle trajectories can only go along dr < 0. (We shall later give a
heuristic argument why it is not the reverse. See Figure 5 below and the arguments immediately
following it.) The situation is illustrated in the spacetime diagram in Figure 3.

Figure 3

Light cone
Next consider the light cone, defined by

ds2 = -Bdt2 + B-'dr2 =0


giving
Problem 1
On a t-r plane (draw t vertical), sketch the light cone at r / & = w, 2,1.1,0.9,0.5,0.2, paying
attention to the angle it makes with the vertical. Which side of the light cone is time-like?
Determine this by checking the sign of ds2 in each region. One case is shown for you in Figure
4.

Figure 4

Particle trajectory

Figure 5

Particle trajectories have to point in a time-like direction. Therefore the situation must
be qualitatively as shown in Figure 5.
0 Trajectories are nearly "vertical" (i.e., in the t direction) near &.

0 Trajectories point "inwards" (i-e., in the r direction, with dr < 0) near r = 0.

Thus, once inside r = I& = 2GM, r becomes the "time" coordinate and can only
decrease. Particle cannot come back out from r < - thus, it is a black hole.
The surface r = & which divides the "inside" from the "outside" is called the event
horizon.

15.5 Infinite redshift


We show that a photon climbing out of a BH, starting from just outside I&, will suffer an
infinite redshift when detected at spatial infinity. It is easier to derive the reverse - that a
photon falling in will suffer infinite blueshift when it arrives just outside &.
We start by noting that the metric is t-independent, so from the last Chapter, pt is a
const ant:

pt = constant r -E
-
(15.19)
We pay attention to pt and the physical energy E:

The physical energy E is essentially the geometric average of the two:

The above equation for r -t co shows that k is just the energy at infinity: 2 = E(m).
Thus

since E = iiw. So

when r + Ro. In other words, when the photon reaches &, there is an infinite blueshift.
Reversing the argument, if a climbs out of the BH starting from Ra, there is an infinite
redshift.
We can also give a heuristic argument that leads to %earlyn the same result.

iiw(oo) = energy at co
= energy at r - work done
= iiw - m A @
fiw GM
= fiw---
c2 T

( )::
= iiw I - -

Using units where c = 1, we have

w ) = w(l-y) Heuristic derivation

u ) = w ( 1 - ?)Ii2 Actual result


The two agree to first order in GM.

Coordinate singularity
The above result suggests that there is some sort of singularity at T = &. This is apparent
from the metric as well: gtt = 0 and g,, = oo at T = &. Is this a physical problem, or is it
just a problem with the coordinate system? The former is a problem with the geometry of the
surface (e.g., a kink, corner, edge); the latter would only be a problem with the grid we draw
on the surface.
As an example, consider the surface of a sphere of radius a. This is obviously a perfectly
smooth surface, with no singularities anywhere. Now in polar coordinates

so at the poles (0 = 0, x), we have g44 = 0. But there is nothing special about the poles. It is
only a coordinate problem - the polar coordinates become degenerate at the poles. It is easy
to find another set of coordinates which is regular there, e.g., a set of polar coordinates with
the poles defined in some other place.
As a second example, consider a plane:

ds2 = dx2 + dy2


Now transform to

so that

Although we have gxx = 0 and g y y = co at various places (e.g., the origin in the X-Y plane),
the surface is actually perfectly smooth. The only problem is with the coordinates.
In just the same way, the problem at r = & is only a singularity of the coordinate
system, not of spacetime itself. If this is the case, then one should be able to transform to
a new set of coordinates in which there is no singularity - called the Kruskal coordinates.
We shall not write out these coordinates. But we emphasize that the transformation between
regular coordinates (Kruskal) and singular coordinates (Schwarzschild) must itself be singular
- because the regular transform of regular coordinates must be regular.
There are two other ways of seeing whether there is a real problem at r = &.
Calculate the physical components of the curvature tensor. We shall find that it is zero
at r = &. (In fact, so long as it is finite, there would be no problem.)
Calculate the motion of a particle into a BH, and show that nothing singular happens at
T=&.

The former is a rather tedious calculation and we shall omit it. The latter will be done in the
next Section, just to prove this point.
15.6 Mstion into a black hole
For simplicity, consider only radial motion, so that d8 = dd = 0. Because the metric is
independent of t , we have
-
pt = constant G -E
Thus

We shall only- be interested in the qualitative behavior as r -+ &. So let

Then the leading behavior is

This means that there is infinite time dilation as the particle approaches the event horizon.
Each unit of proper time dr becomes an infinite amount of time dt according to a distant
observer. (We know that t is the time measured by a distant observer, because as T -+ co,the
metric becomes the standard flat Minkowski space, where t is obviously the time.)
Now from the metric

The term on the left is negligible, and we have

-
- -
In other words, it takes only a finite proper time dr to fall through the event horizon.
From dr1d.r 1 and dtldr 1 / we ~ get
Thus E +0 + t 4 m, i.e., it takes an infinite amount of t to fall through the event horizon.
To summarize, the process of falling through the event horizon takes k i t e time according
to a comoving observer, but to an observer at infinity, it seems to take forever - the motion
is- ''slowed down". In fact, this is the same effect as the redshift - what takes one period of a
photon still appears as one period to the distant observer, but the period is infinitely redshifted.

Physical singularity at t h e origin


Although the event horizon at r = & is not a real singularity, the origin r = 0 is a real
singularity. We make this argument in three steps.
Consider a single particle. Once it falls inside the event horizon, it has to keep falling
inwards, because we have already shown that dr can have only one sign (it is like a "time",
in other words the future light cone points towards dr < 0). Thus, the particle has to end
up at r = 0. Note however that we can never see this motion from the outside, because
just reaching r = I& already takes t = m.
0 Next consider a spherical star that is collapsing, and assume it has already collapsed

inside the event horizon (Figure 6). Look at a tiny part of the star near the surface. It
is just like one single particle; and by the previous argument, must keep falling to r = 0.
This argument applies to evey part of the surface of the star.
In general, one can show rigorously that even if the star is not spherically symmetric, the
above result still holds, and the whole star collapses to r = 0.

Figure 6

So ultimately all matter falls to r = 0. Thus we have infinite density and a real singu-
larity. This is the BH singularity.

Naked singularity?
On the one hand, we know there must be singularities in general relativity (the spherically
symmetric BH is one example). On the other hand we do not like singularities in physics -
we believe all physical quantities should be finite. In this example there is an escape: the
singularity exists cannot be seen from the outside. We say that the singularity is "clothed" and
not "nakedn. There is a question whether general relativity allows naked singularities. The
question is not completely settled.
Blackholes have no hair
The star inside the event horizon may be very complicated. But we cannot see any of the
complications frorn the outside. How much can we tell from the outside? It has been shown that
we can tell only three properties - the mass M , the charge Q and the angular momentum J.
This is called the property that "black holes have no hair", i.e., no complications or "extraneous"
properties.

Chl5-1.tex; January 22, 2001


16 Mathematics of Curved Space IV: Curvature
16.1 Introduction
Extrinsic and intrinsic curvature
We speak about the curvature of a space, or spacetime, or a manifold. What exactly do we
mean?
There are two ways to define curvature (Figure 1): the extrinsic curvature seen by an
observer outside the space, and the intrinsic curvature seen by an observer confined to the
space. The former depends on the embedding; the latter does not. Since the extra dimensions
of the embedding is a mathematical artefact, physics can only depend on the intrinsic curvature.
So how can we define curvature intrinsically, using quantities measured on the space itself?

Figure 1

Metric?
Can we use the metric g,? Can we say that a space is flat if g,, = 6,,(if it is locally Euclidean)
or g,, = q,, (if it is locally Minkowski), or more generaly if g,, is constant? No! Consider
polar coordinates in two dimensions:

In this case, g,, is not constant, yet the space is flat.


The metric is no good: it mixes the properties of the space with the properties of the
coordinate system - and the latter is quite arbitrary.
Sum of angles
Another definition is intuitive and well known: the space is flat if the sum of the interior angles
of any triangle is exactly a. P

Figure 2
As an example, consider a triangle on the surface of the earth (Figure 2). Let

P = the north pole


Id = 0 deg E on the equator
N = 90 deg E on the equator
Each of LP, L M and L N is w/2. So the sum of interior angles is 3w/2. We conclude that the
surface of the earth is not flat - which is correct.

Exterior angles
Equivalently, consider the exterior angles. Refer to Figure 3. (All such diagrams project
a possibly curved surface onto a flat piece of paper, so do not rely on "normal" geometric
intuition.) In obvious not aton,

Figure 3

Ext LP = w - LP
Ext LM = w-LM
Ext LN = w - LN
C E ~ ~L = 3a-x1nt L (16.2)
Thus, the sum of interior angles is w if and only if the sum of exterior angles is 27r.
Generalize to polygon
We generalize this idea to a polygon of N sides (Figure 4): the space is flat if and only if the
sum of exterior angles a; is 2n.

Figure 4

This idea can be given mother interpretation. Imagine that we march along the polygon.
We turn by a; at each vertex, for a total of a;. Do we turn exactly 2n? In Figure 2, the
answer is no - we turn a total of 3n/2 in going round the triangle M N P .
We can further generalize. By letting N -t m, the path can become any closed curve.
This qualitative discussion leads us to the key idea of parallel transport of vectors.

16.2 Parallel transport of vectors


Introductiox
Again consider the surface of the earth-,and walk around triangle MNP. But at the same time,
carry a rod represented by a vector A, and keep its direction fixed - this is called parallel
transport. The direction of A provides a fixed reference (because it is not changing) against
which we measure the direction of walking.

Figure 5

Refer to Figure 5. Initially, at the point M, the rod i and the side & make an angle
L(R,G). At the point N, the side turns from & to &, through an angle al. Then the rod
makes an angle L(X,&) given by
Going around- the triangle, we have

To be more precise, let us denote the final direction as 2,so (16.4) should more properly be
written zs

Thus returns to the same direction (i.e., Z = 2)if and only if xi


a; = 27r. So the condition
for flatness is whether parallel transport around a closed loop brings a vector back to itself.

Example
Return to the example in Figure 2. The situation is illustrated in Figure 6.

Step 1 Step 2
North t P
Step 3
west

North t
t
M I
Figure 6
0 A
Start at M with the vector pointing north.
March along the equator to N. The rod 2 remains always at 90 deg to the path. When
we get to N, the rod makes 90 deg with the path (the equator), so it is pointing north,
towards the pole.
0 Turn to walk along N P . The rod started pointing north (towards P) and stays pointing

north.
0 At P, turn to walk dong P M . The rod was along NP, which is now pointing west. As

we walk along, the rod stays pointing west. When we get back to M, the rod points west.
Thus we have (Figure 7)

2
-,
= north
A' = west
Start (north)

End (west) +@J


t
M

Figure 7

A space (manifold) is flat if and only if the parallel transport of every vector around
every closed loop gives hack the original vector 2,i.e., if and only if

Figure 8

The difference vector


A
The two vectors and ri' are defined at the same point; so their difference is also a vector. (The
difference between two vectors at different points is not a vector.) More precisely, because both
vectors refer to the same basis vectors, doing vector substraction is the same as substracting
components :

and the question is whether this is zero. So, instead of following around a closed loop, we
follow its component AP and compute the total change.
Reduce t o small loops
A-s with the usual proof of Stokes' theorem, the above computation can be reduced to checking
small (i.e., infinitesimal) loops, as indicated schematically in Figure 9.

Figure 9

We therefore consider a small "rectangle", whose vertices are

where f, fi are infinitesimal displacements, hence vectors (Figure 10). We now take a vector
around the loop: A j B -t C + D -, A, and consider AAp. This quantity must be
proportional to itself, and to the sides f and rj:

AAp oc A"C"qP (16.8)


All the indices are different and independent because, e.g., AA1 may depend on A2 etc.

Figure 10
16.3 Riemann curvature tensor - calculation
Definition
We are led to the definition 04 the Riemann curvature tensor, which carries 4 indices:

Tensor character
We have already stated that AA is a vector (because it is the difference between two vectors at
the same point). On the right hand side of (l6.9), R is a vector, and so are the idnitesirnal dis-
#.-

placements C and q. Therefore by the contraction theorem, Rfi,,, is a tensor, more specifically
a (i) tensor.
Crucially, this implies that under a coordinate transformation, Rp,,, transforms linearly
- in particular, if it is zero in one coordinate system, it is zero in all coordinate systems. So
whether the curvature tensor is zero is an objective property independent of the coordinate
system; it is a property of the space itself. Contrast with the property g,, = const; this is not
an objective property of the space itself.

T h e indices
We note several properties about the indices.
The last two indices p and a refer to the displacements. Therefore they relate to the
underlying ma~ifoldand run over 0,1,2,3 in the case of spacetime.
0 The first two indices p and v refer to the vector being transported. In this case, the vector

Ais a tangent vector, similar to the displacements, so the indices again run over 0,1,2,3
in the case of spacetime. In other applications, we can consider the parallel transport
of other vectors, e.g., a complex number (a quant um-mechanical wavefunction), which
would have only two directions (1 = real, 2 = imaginary). In such cases, the range of
the p , v indices could be different.
a Because p and v are the same type of index, it is often convenient to put them both
"together" as subscripts:

Evaluate i n t e r m s of metric
Remember that we need to calculate the total change of Ap around a closed loop - we deal
with just one component and not the vector. The total change is the sum of the changes in
each small step:
But we know that under parallel transport,

0 = (dA)' = dAp + 17~pA"dzP


Putting this into (16.11), we have

where in C, we have suppressed the index p. Now, similar t o the usual derivation of Stokes'
theorem (see Appendix), the line integral around the loop in Figure 10 can be changed to a
surface integral, namely

But since

Cp = I'cPAv (16.15)
we find

This expression contains the derivative of A", which we eliminate via (16.12)' now written
as

0 = (dz)" = + I'~,A*)dz"
giving

A " , = -I';,A,
When this is put into (16.17), we find

where in the last step-we have interchanged the dummy indices A t, v in the second term.
Finally, comparing (16.19) with the definition of the curvature tensor in (16.9)' we find

which is the crucial relation for calculating the curvature.


Discussion
Recall that the Christoffel symbol is obtained by differentiating the metric; schematically

-
r ag (16.21)
and the curvature tensor is obtained by another differentiation as well as a nonlinear term

R -- m+rr
sag f nonlinear

The important property - to be proved later - is that the space is flat if and only if
the curvature tensor is identically zero.

16.4 Riemann curvature tensor - properties


We now summarize some important properties of RPupU.

Tensor character
We have already stated that from (16.9), R",,, is a tensor. Thus, if it vnaishes in one coordinate
system, it vanishes in every coordinate system.

A n t i s y m m e t r y in last two indices


The curvature tensor satisfies

which can be proved in two ways:


Explicitly from (16.20). -+

By noting that p H cr is equivalent to interchanging the two displacements C and f , which


is in tLrn equivalent to traversing the loop in the reverse direction.

A n t i s y m m e t r y in first two indices


The curvature tensor also satisfies

&vpu = -Rupp~ (16.23)


4

To prove this, consider the scalar S = g,,A"BU, constructed out of two vectors A and
. We parallel transport both vectors and hence S. Clearly, a scalar does not change under
parallel transport, so
There is no Ag,, because when we complete the loop, g,, returns to the original value. There
is also no AAPAB" t a m because we are entitled to consider a small loop, and hence calculate
only first-order changes.
Now use the curvature tensor to express the changes in the vectors upon parallel trans-
port:

where we have lowered the first index in the curvature tensor using the factor g,,. Finally,
change the dummy variable by X H p in the first term and X I+ v in the second term:

-.
Since this has to be zero for arbitrary A,B, ( and ij, the square bracket is zero, thus proving
(16.23).

Symmetry in pairs
We state without proof that the curvature tensor is symmetric under the interchange of the
first pair of indices with the second pair:

Cyclic property
We also stzte without proof the cyclic property: fix the first index and add the cyclic permu-
tations of the other three, the result is zero:

Bianchi identity
Finally we state without proof an identity concerning the derivative of the curvature tensor. If
we perform a (covariant) differentiation on the curvature tensor, we get 5 indices: RPvmp;,.Fix
the first 2 indices and add the cyclic permutation of the other 3; the result is zero:

Number of independent components


In N dimensions, each of the 4 indices can take on N different values, so superficially there
are N4 components. For N = 4, this amounts to 44 = 256 components - which is a lot. But
because of all these identities, the number of independent components of the curvature tensor
is as follows.
We shall skip the derivation of these numbers, which is in fact not difficult.
Table 1: Number of independent components of curvature tensor

1 dimension Ipcomponents I

16.5 Ricci tensor and curvature scalar


Motivation
The Einstein's equations will take the form

Curvature - Source (16.30)


where as will be discussed later, the source is the energy-momentum tensor - a rank 2 tensor.
Thus, on the LHS, we need to construct a rank 2 tensor as well. We do this by contracting a
pair of indices in the Riemann curvature tensor. We choose to contract the 1st and 3rd indices.
(Contracting the 1st and 2nd indices gives zero, since these two indices are antisymmetric;
contracting the 1st and 4th is the same as contracting the 1st and 3rd, up to a minus sign; etc.
It is easily seen that this is esse~tiallythe only choice.)

Ricci tensor
Thus we define the Ricci tensor

Rvp = RpvClp
= gpuRpvop

Symmetry
It is readily shown that the Ricci tensor is symmetric (interchanging the two indices v and p
above is equivalent to interchanging the 1st and 3rd index, but these are identical and summed):

C~rvaturescalar
We can further construct a scalar by contracting the two indices in the Ricci tensor:
16.6 Examples
Polar coordinates in 2D
We take p-olar coordinates in flat 2D space, with (xl, x2) = (T, q5), and the metric

ds2 = dr2 + r2d4'


so that the metric components are
2
911 = 1, 922 = r 9 g12=0
and the nonzero eIements of the ChristofEel symbol are

We now consider the curvature tensor R,,,,. The only nontrivial element is ( p v ) = (12)
and (pcr) = (12):

We should have known this beforehand, without having to do any calculations, for the
following reason.
This space is flat.
So there is a Cartesian coordinate system.
0 In that system, the curvature tensor is RC",,,= 0 because ag = 0.
Now change to polar coordinates. R',,, transforms linearly and stays zero in the new
coordinate system.
In other words, our explicit computation in (16.37) verifies that RpuPp
= 0 is an objective
property independent of coordinate system.

Polar coordinates on a sphere


We take polar coordinates on the surface of a sphere of radius a, with (xl, x2) = (0, $), and the
metric

so that the metric components are

2 2 2
911 = a , 922 = a sin 0, g12 = 0 (16.39)
and the nonzero elements of the Christoffel symbol axe

I?:, = cot 8, I?:, = -2 sin 8 cos 6 (16.40)


We now consider the curvature tensor R,,,,. Again, the only nontrivial element is
(pv) = (12) and ( p a ) = (12):

This is not zero! This proves that this space is not flat - which is of course correct.
Although whether R,,,, = 0 is coordinate-independent, the actual value (if it is not
zero) is coordinate-dependent. Therefore we have to be careful and not say that the "amount
of curvature" goes as a2 (as might seem to be the case from (16.41)). Such a conclusion is
counter-intuitive and clearly wrong - if a -+ oo, the space becomes nearly flat rather than
more curved. We need a coordinate-independent measure to indicate the 'Lamountof curvature";
this will be shown below.
Next we calculate the Ricci tensor in this example.

Finally we calculate the curvature scalar in this example.

Note that R is a constant - this is expected since this space is homogeneous, and every
point is equivalent. A related property is that R,, o: g,,. A heuristic way to understand this
is as follows. Try to separate the space itself from the coordinates, i.e., the grid that we choose
to draw on the space. For the space itself (without the grid), there are no preferred or special
directions; thus directions are distinguished only by the grid. Thus, the tensor indices of Rpu
must be carried by the indices related to the coordinate system, namely g,,.
Since R is independent of coordinates, it is a useful overall measure of the "amount of
curvature". We see thai in this example, it goes as a-2, which is sensible - as a -+ oo,the
space becomes nearly flat.
We shall apply the concept of curvature to physics in the next Chapter. But for now,
the important point is this: given the metric, we can calculate the curvature tensor.

16.7 Curvature tensor and flatness


We have said many times that the curvature tensor indicates whether the space is flat. We now
establish the equivalence of zero curvature tensor and flatness.

Flatness implies zero curvature tensor

- Then there are Cartesian coordinates Zp. In these coordinates,


Assume that the space is flat.
Jp, is constant, and hence I', = 0, & , = 0. But the curvature tensor transforms linearly
"P
under coordinate transformations, so R,,,, = 0 in every coordinate system. In fact, we have
already-used this argument in an earlier example.

Zero curvature tensor implies flatness


The reverse is also true, but slightly more difficult to prove.
We choose a point P, and set up a local orthogonal system with basis vectors

( P ), - - 7 ZN ( P )
whose dot products are

G ( P ) &(P)= 6,, (16.45)


if the space is locally Euclidean (or q,, if the space is locally Minkowski).
We now parallel-transport each of these basis vectors to another point Q. Suppose it is
given that R,,,, = 0. Then the result of the parallel-transport would be unique, independent
of path. (Otherwise, if the results for path 1 and path 2 are different, we can transport along
path 1 and the reverse of path 2, and on the closed path get a difference at the end - this
would then contradict the vanishing of the curvature tensor.)
Thus, we establish a corresponding set of basis vectors &(&),. . . ,ZN(Q) at the point
Q. Next notice that dot products are scalars, which are therefore preserved under parallel
transport, so at Q we also have

Since Q is arbitrary, we have constructed a global orthogonal Cartesian system. If the space
has such a system, it must be flat.
16.8 Eins-teintensor
Introduction
As we have said before, the field equation will take the form

Curvature N Source (16.47)


where the RHS is (a) a rank 2 tensor and (b) conserved. Thus, on the LHS, we also need tensor
with these two properties. The Ricci tensor satisfies (a) but not (b); the Einstein tensor G,, is
closely related and satisfies both. In particular, the conservation property is

Conservation
To review the concept of conservation, recall charge conservation in flat space, in obvious
notation:

or, using $-vector notation

The analogous statement in curvilinear coordinates (or on curved space) is obtained by simply
changing the derivatives to covariant derivatives:

J P ; ,= 0
This explains that (16.48) is the analog of such a conservation law.

Construction of Einstein tensor


We start with the Bianchi identity

+ +
Rpvap;r RPvrcr;p Rpvpr;ry- 0
Set cu = p and sum, and recall the definition of the Ricci tensor:

Raise the Y index

Then set P = v and sum:


+
RYuir- RY,iY RPYYTi,,= O
Ri7 - RYy;v- R",;,,= 0
R;, - 2RV,;V = 0
(RhY,- 2RY,),, = 0

We define the Einstein tensor to be -(1/2) times the quantity in the brackets:
1
GY, = Rv, - -RSYr
2
or iowering the p index

To sumainrize, the Einstein tensor is symmetric, conserved and related to curvature. We


expect that the field equation will relate the Einstein tensor to the source.

Appendix
Stokes' theorem should be familiar. In this Appendix, we prove it in a form that is valid in any
dimension. We consider the integral § CpdxP over the "rectangle" in Figure 11:

Figure 11

Note that the mid-points of the 4 sides are

where we have omitted component indices, i.e., x stands for the point whose coordinates are xp.
Also, we have drawn this as a rectangle just to emphasize the similarity with the situation that
-,
<
you are familiar with, but there is actually no need for C and to be perpendicular; compare
Figure 10.
Then the integral is obtained by adding the 4 contributions. For each contribution, we
evaluate (schematically)

C.AX (16.58)
where C is the-value at the mid-point of the segment and Ax is the displacement. Thus

The 1st and 3rd terms together give

In the above, after a factor of [ q has been explicitly taken out, we can evaluate C anywhere in
the small rectangle, to the order of accuracy required. In the last line, we have i~terchanged
dummy indices p H CT.
Likewise the 2nd and 4th terms together give

Putting (16.60) and (16.61) together then gives

proving the result we need.

Chl6-l.tex; November 16, 2000


17 Einstein's Field Equations
17.1 Introduction
In this Chapter, we "derive" Einstein's field equations and find some solutions, especially solu-
tions for the examples discussed earlier.

Compare electromagnetism
Einstein's general theory of relativity can be viewed from two perspectives: as a description
of spacetime curvature, or as a theory of gravity. We start from the latter point of view, and
compare the theory of gravity with electromagnetism (em). Both em and gravity consist of two
parts.

Part 1
First, we describe haw the fields (assumed to be given) act on the particles. In em, this is given
by the Lorentz force law:

or in covariant notation \

where we have indicated schematically that the force goes as (charge) x (field) x (kvelocity).
The parallel statement for gravity is

The force goes as the second power of the 4-velocity, and the field I? carries one more index.
Otherwise, the force law is quite similar to em.
In both cases, if the fields are given, the motion of point particles is in principle deter-
mined. We have already discussed some examples in Chapters 14 and 15.

Part 2
Secondly, we have to say how the particles (more generally the sources) generate the fields.
Let us trace the development in the case of em. We start with Coulomb's law:

This can be written in terms of Poisson's equation for the electrostatic potential Qi:

where p is the charge density. Going from (17.4) to (17.5) involves no new physics.
However, b d h Coulomb's law (17.4) and Poisson's equation (17.5) are only valid for
statics. When charges move, the full desciption is given by Maxwell's equations (which we shall
not display). Going from Coulomb's law to Maxwell's equations does involve new physics, e.g.:
While Coulomb's law gives the electric effect of static charges, Maxwell's equations also
give the magnetic effect of moving charges.
Maxwell's equations also take care of time delay - to be further discussed below.
In much the same way, gravity starts with Newton's inverse-square law:

which can again. be written in terms of Poisson's equation for the gravitational potential @:

where p is now the mass density. Going from (17.6) to (17.7) also involves no new physics.
As with em, Newton's theory of gravity is only valid for static masses, and we need a
generalization (similar to Maxwell's equations) which has the following properties:
While Newton's law gives the '(electric" effect of static masses, the new theory also gives
the '(magnetic" effect of moving masses.
0 The new theory will also take care of time delay - to be further discussed below.

We use the terms "electric" and "magnetic" to denote forces that are velocity-independent and
velocity-dependent respectively. -
The new theory is given by Einstein's field equations -the analog of Maxwell's equations
for em. Both Maxwell's equations and Einstein's equati~nsare partial differnetial equations
(FDEs). However, there is one major difference: Maxwell's equations are linear PDEs, but
Einstein's equations are nonlinear PDEs. We shall explain the physical reason behind this
later. Thus, to summarize:

Coulomb ( l / r 2 ) -r Maxwell (linear PDE)


Newton (1/ r 2 ) -t Einstein (nonlinear PDE) (17.8)

There is one historical difference. In t h e case of em, the magnetic part (e.g., Ampere's
law) was discovered experimentally, and the overall thecretical framework developed later. In
the case of gravity, the overall theoretical framework was developed first, and the 'magnetic"
part etc. were predicted from the theory. There is a reason for this, which we shall come to
later.

What is wrong with Newton


Einstein's equations provide a theory of gravity that improves upon (and where the corrections
are significant, replaces) the Newtonian theory, so we should ask "What is wrong with the
Newtonian theory of gravity?" Interestingly, and unlike most advances in physics, the problem
with the Newtonian theory was not any readily measured disagreement with experiment, but
rat her conceptual problems.
Covariance
The Newtonian theory is not covariant : the time t and the spatial coordinates x, y ,z show up
differently. Also, the source is mass (- energy), but momentum does not appear at all. Yet,
energy and momentum form a 4-vector (E,P).
No delay
In Newtonian theory, there is no delay between cause and effect. In (17.6) the force F at time
t is determined by the position r of the masses at the same time t. In (17.7), the potential
iP at time t is determined by the mass density p also at time t. N o matter how far away the
masses are, their effects are felt instantaneously, with no time delay. This is called action at a
distance and contradicts the princple that effects propagate at most with the speed of light c.
Coulomb's law suffers from the same problem. We shall elaborate on this point below.
No eflect of'current
This problem is closely related to the lack of covariance. Density and current form a 4-vector
and should appear together - the density of energy and the flow of energy here, or the charge
density and the flow of charge in em. We expect the current, i.e., the flow of energy, to produce
a "magnetic" effect. Newton's theory of gravity does not contain this element.

No nonlinear efect
This problem is peculiar to gravity and has no counterpart in em, and we give a heuristic
argument here. Let there be a mass M. It creates a gravitational field g o: GM. This
field carries a field energy U m (1/G)g2 m G M 2 . (Recall that in em, the field enegy is

the field energy is equivalent to an extra mass AM -


U cc eoE2 and replace !/4neo H -G, E H g.) But since energy is equivalent to mass,
U / c 2 CK GM2/c2, and an extra field
Ag cc G A M cx (G2M2/c2). We can repeat this argument and show that there are terms
proportional to M3,M4, -. Thus the total field is not proportional to M alone. In this sense,
we expect that a proper gravitational theory should be nonlinear.
In this Chapter, we shall see that Einstein's equations solve all these problems.
/
Delay effect

Figure 1

Let us explain the delay effect in more detail, by considering the em case. Figure 1 shows a
test charge q at a distance from a source Q; the force felt by q is
where r is the instantaneous separation. Let us move Q a little bit. Then r changes immediately,
and F changes immediately, so q feels the change immediately. But we know that signals
propagates at most at the speed of light, so q should not feel any change until a time t = r / c
later. Newton's inverse-square law of gravity suffers from exactly the same problem.
This problem is solved by introducing the concept of fields. Instead of (17.9) stating
the force, we consider the electric field E. In pictorial terms, this means we draw the field lines
(Figure 2).

Figure 2

Mathematically, the fields are described by PDEs of the schematic form

dE
- = spatial derivatives
at
dB
- = spatial derivatives
at
We shall not write out the RHS, but simply note that a spatial differentiation tells us about the
field "at the next point" in space. Since the effect propagates only "to the next point", it takes
a finite time to propagate a finite distance, and there is delay. This situation can be understood
pictorially by imaging the field lines to be like a string, and changes to be like "bumps" that
propagate along the strings with a finite speed (Figure 3).

Figure 3

In short, fields and PDEs automatically introduce delay, and ensure consistency with
causality.
Motivate form of equations
What should be the form of Einstein's equations? Start with the Newtonian theory, i.e., (l7.7),
which we know to be basically correct. It has the form

aaQ -
Now recall that for weak fields,

goo " -(1 + 2Q)


hence it is natural to replace

Note that the 1 in (17.12) does not contribute, which is physically correct - gravity relates to
the departure of the metric from the Minkowski metric.
The Christoffel symbol and the curvature tensor R are related by

So, up to nonlinear terms (which in any case we would not be able to guess from Newtonian
theory), we expect that the LHS of the Newtonian equation is replaces by

and the LHS of Einstein's equations involves the curvature tensor R (or something constructed
out of it, but without further differentiations).
On the RHS would be the source: the mass density p in Newtonian theory, and something
related to -energy-momentum in relativity. Thus, we expect

Fl
We next discuss the source, which will also tell us what exactly is on the LHS.

17.2 The source


Energy
In Newtonian theory, the source of gravity is the mass m. In relativity we have mass-energy
equivalence, m = E / c 2 , SO we expect that the source now becomes energy.
Let us make the above formal statement more physical. If energy is also the source
of gravity, then light (or em fields) can generate gravity. Since action = reaction, light also
responds to gravity. That is an experimentally known fact - from the deflection of light.
Momentum
But the energy E (say of a particle) is the time component Po of a 4-vector Pp. If Po generates
gravity, then by covariance, all components Pp must also generate gravity. Thus we conclude

source N Pp (17.17)

Count components in em
IE em, the source is the charge Q (1 global conserved quantity). Locally, it gives rise to
The charge density p (Q per unit volume) - 1 local quantity.
The charge flux or current ..?j (flow of Q per unit area per unit time, in a certain direction
j ) - 3 local quantities.
The 4 local quantities form a 4-vector

Jp = (PI J) (17.18)
Then the field also has 4 components, namely the Cvector potential Ap, and Maxwell's
equations take the schematic form

Count components in gravity


Now do a similar count for gravity. The first part of the source is the energy E (1 global
conserved quantity). Locally, it gives rise to
The energy density Too(E per unit volume) - 1 local quantity.
The energy flux TOj(flow of E per unit area per unit time, in a certain direction j ) - 3
local quantities.
In addition, there is also the source Pi (3 global conserved quantities). Locally, it gives rise to
The momentum density T'O (Pi per unit volume) - 3 local quantities.
The momentum flux T G (flow of P' per unit area per unit time, in a certain direction j)
- 9 local quantities.
Thus we have a total of 16 quantities, summarized as T p " . The components of T p " have the
following meaning.

Too = density of energy


TO^ = flux of energy in direction j
Ti0 = density of i component of momentum
T ' ~ = flux of i component of momentum in direction j

We expect Einstein's equations to take the form (compare (17.19))


or, in view of the earlier discussion for the LHS,

where R deootes the curvature tensor (or something related to it).

17.3 The energy-momentum tensor


The tensor Tp" gives the distribution and flow of energy and momentum due to matter. We
provide several models for calculating it; we start in flat Minkowski spacetime.

The model
If the energy and moment= are due to a collection of particles, then

where

n = number density
Pp = 4-momentum of particles
E = PO = energy of the particles
all calculated at a certain point in spacetime, with (. a) denoting an average.
For example,

(since Po = E) which is clearly the correct expression for the energy density. The derivation
for the other components is given in the Appendix.
We note that T is symmetric:

which is in fact a general property. It says, e.g., that the energy flux is equal to the momentum
density.
Problem 1
Consider the following situation. A box of size L3 contains N particles of mass m, all moving
along the 1-axis with speed v. (a) What is the energy density? (b) What is the 1-component
of the momentum density? (c) How much energy crosses the 2-3 plane per unit time? Hence
calculate the 1-component of the momentum density. (d) Hence show that TO1 = TIo.
/
Because the number density n and the energy E are both the 0-components of vectors,
they transform in the same way, and the combination n / E is invariant (Appendix). Thus TpV
transforms like PpP", i.e., like a (i) tensor.
We now consider -three sub-models:
Dust, for v = 0
Perfect fluid
Radiation, for v = c

The dust model consists of particles that are not moving, or which have negligible velocities.
T h ~ vs = O and the spatial components are Pi= 0. On the other hand, Po= E = m (in units
where c = 1). There is no flow of energy since the particles are not moving. Thus

In other words, there is only energy density, no momentum density, no energy flux, no momen-
tum flux.

Perfect fluid
This model consists of particles moving at velocities v. For simplicity we assume that the fluid
as a whole is not in motion. (Otherwise we consider the following in the local rest frame and
perform a Lorentz transformation.)
The 00 component is simply the energy density, to be denoted as p (see (17.25)).
Since the fluid is at rest, there is no momentum and no energy flux: Ti' = TOi = 0. So
it remains to calculate T ' j . Recall that, Pi = rn7vi and E = my, so

First, this vanishes if i # j. For example (m7v1v2)= 0 because the system must be invariant
under the reversal of the 1-axis, v 1 H -vl. Secondly, we claim that all the diagonal entries are
just the pressure:
Figure 4
To prove this, consider the usual derivation of the pressure in kinetic theory (Figure
4). Each collision of a particle with a wall (in the 2-3 plane) delivers a momentum 2P1. The
particle has to travel a distance of 2L before the next collision with the same wall, so the time
between collisions is 2L/v1, and the frequency of collisions is v1/(2L). This must be multiplied
by the number of particles in the box, namely N = nL3, and the total force is delivered on a
wdl of area L2. Thus the pressure is

1
p = no. of particles x frequency of collisions x momentum delivered x -
area

Since the speeds of the particles are different, the above derviation should refer to the average.
Thus, we get exactly (17.28). In short,

and for the tensor as a whole

P O 0 0

;:;)
T = ( ;o o o p

However, p and p are not independent; they are related by an equation of state: p = p(p),
or both in terms of the temperature 0: p = p(O), p = p(0).
In fact, the dust model can be regarded as a special case of the perfect fluid model, with
p = 0.
Problem 2
Consider a parcel of fluid moving with overall velocity V along the 1-direction. In its own rest
frame, the energy-momentum tensor is given by (17.32). By using a Lorentz transformation,
find the energy-momentum tensor in the lab frame.
Radiation
In this model, the particles are photons, or other massless particles (e.g., neutrinos). It also
applies to situations where -massiveparticles are moving very rapidly, so that P e E.
As before Too= E, and Ti$= 0-if i # j. By isotropy,

1 pip
= 3-nC(+
.

since xi P'P' = P2= E2 (the last step using the fact that the mass is zero or negligible).
So for the tensor as a whole

!)
0 0

T= (! 0 0
p3.
0 p/3
(17.34)

In fact, the radiation model is a special case of the perfect fluid model with p = p / 3 .

Curved spacetime
The above calculations were implicitly performed in flat spacetime, but the results apply to
curved spacetime as well. The chain of argument is straightforward. (a) Each little piece of
spacetime can be regarded as flat. We do the calculation in this small piece of flat spacetime,
using cartesian coordinates. (b) We transform to generalized coordinates locally. The equations
keep the same forms because both sides are tensors.

Embedding space
The above is also valid if the curved manifold is embedded in a larger flat space. Let the manifold
be defined by x p = 0 for p > N, i.e., these are the Uextra" dimensions. Then the particles
do not move outside the manifold, and we have PF = 0 for p > N. Accordingly, T p " = 0 for
p > iV or v > N. We shall use this property below for a simple proof of conservation laws.

17.4 Conservation of energy and momentum


Conservation of charge in flat spacetime
Recall that, in flat spacetime, the conservation of charge takes the local form

or the convafiant form


To show that (17.36) leads to the conservation of charge globally, consider

Integrate this over a certain volume V:

where Q is the total charge in the volume, and we have used Guass' theorem to convert the
second term to a surface integral over the boundary of V. Thus, the rate of increase of Q is
equal to minus the rate of outflow.

Conservation of energy and momentum in flat spacetime


Similarly, in flat spacetime, the conservation of the Cmomentum component PP is given by a
local conservation law:

Note that p is a free index while v is summed. To show that (17.39) leads to the conservation
of PC" globally, consider

Integrate this over a certain volume V:

where in the first term we note that TP0 is the density of PP and in the second term we note
that Tpi is the rate of flow of P@per unit area per unit time, in the i direction. In exactly the
saine way, this equation can be interpreted as: the rate of increase of P P is equal to minus the
rate of outflow.
Change to curved spacetime
It is easy to convert these statements to curved spacetime, namely

where ; denotes the covariant derivative.


We give a heuristic argument for this result, and do so only for the case of the energy-
momentum tensor.
Imagine N-dimensional spacetime (curved) embedded into M-dimensional fiat space-
time. Now by using cartesian coordinates in the M-dimensional embedding space, we have
T v , , = 0 (as derived above), and hence TPUiV= 0 since all Christoffel symbols are zero in
the cartesian system. But the covariant derivative is a tensor, and transforms linearly; thus
TPuiV= 0 in all coordinate frames.
Choose a coordinate system such that the physical manifold is given by x" = const for
Y > N. Particles move only along the physical manifold, so PV = 0 for v > N, i.e., there are
no momentum components pointing out of the physical manifold. Thus

if p > N or v > N. The expression TPui,originally implies summing over v = 1,- - ,M ;but
in view of (17.43), we can sum over only Y = 1,- . ,N, i.e., only along the manifold. Thus
Tfi";,= 0 on the manifold.

17.5 Einstein's equations


Motivation
Newton's law of gravity takes the form

V2@ p- (17.44)
and since changes in are related to changes in the metric tensor g, we expect

ddg source (17.45)


Also recall that the curvature tensor (schematically R) is related to ddg, and we have just
shown that the source should be TP". So, the equation should be

R-T (17.46)
Here, the RHS is the energy-momentum tensor, which has the following properties:
It is a rank 2 tensor, i.e., it carries two indices.
It is symmetric.
It is conserved: TpVi,= 0.
Hence the LHS must be something made from the Riemann curvature tensor, also with these
properties. From the last Chapter, we see that the only choice is the Einstein tensor G P " :

G p = R p v - +Rgpv
Thus, we have "derived" Einstein's equation

Gpu = KT,', (17.48)


where only the constant I< remains to be determined. (We have chosen to write this with lower
indices, but of cowse the free indices on both sides can be raised or lower together.)
Needless to say, the "derivation" is hardly rigorous. We can regard the foregoing as
motivation only, with the Eisntein equations eventually justified by comparison of its predictions
with measurements.

Consider weak fields


The value of the constant If car: be established by studying weak fields, which should also be
describable by Newton's theory. In this case the metric is

and it is straightforward to show that in this case

On the other hand, on the RHS of the Einstein equation we have

TooR Too= p = mass density


Thus the 00 component of the equation gives

- 2V2@= Kp
But Newton's theory gives

v2@
= -47rGp
Comparing these two then gives K = -87rG, giving finally
Nature of the equations
We now examine some properties of this set of equations.
0 The equation (17.54) allows free choices of p and v, and so contains 4 x 4 equations.

But because both sides are symmetric in p t, v, there are in fact only 10 independent
equations.
0 Recall that R,, etc. involve derivatives of g,, etc., up to second order. Thus these are
second-order diflerentz'alequations for the 10 metric components g,,. Of course these are
PDEs.
0 The equations are coupled, i.e., gll, 922 etc. do not occur separately in different equations.

The equations are nonlinear. We can see this mathematically in at least two ways. First,
thecurvature tensor contains a term quadratic in the Christoffel symbol: R aI' I N + T'.
Second, even in calculating I?, we need the inverse matrix elements gll etc., and these
depend nodinearly on 911 etc.
In short, the Einstein equations are coupled, nonlinear PDEs.

Compare electromagnetism
Maxwell's equations take the form

These are 4 equations for the unknowns A". They are again PDEs, and they are again coupled.
(However, in a suitable gauge, they can be decoupled so that each Ap appears by itself without
the other component.) But crucially, Maxwell's equations are linear.
The spatial components of (17.55) describes magnetism: Ji gives rise to A' and hence
the magnetic field. In just the same way, the spatial components To'and T ' j give rise to the
"magnetic" effects of gravity, i.e., additional gravitational effects caused not by masses, but by
the motion of masses.

Nonlinearity
We wish to understand the physical origin of the nonlinearity in a heuristic way. In em, a charge
Q produces a field E oc Q. The field itself is not charged, so the story ends. Thus E oc Q and
the theory is linear.
In gravity, a mass M produces a field (or acceleration due to gravity) g cx M. But the
field itself carries energy U cx g2 and hence mass, AM oc U cx g2 oc M2,which produces a field
Ag oc AM cx M2. The story goes on, and the total field is not simply proportional to M.
The crucial difference is: the electric field (or the photon) has no charge, but the gravi-
tational field (or graviton) has mass or energy.
This leads to an important consequence. A linear theory obeys the principle of super-
position: the sum of two solutions is a solution. So all of em can be reduced to point charges
- if we know the effect of one point charge, then for any problem we simply have to add (or
integrate). Thus, there are no really difficult problems in em - we simply add things up by
Coulomb's law (electrostatics), the Biot-Savart law (magnetostatics) or the Lenard-Wiechert
potential (moving charges). In gravity, the situation is totally different. Even if we know the
solution for one point mass, we cannot simply add these up t o obtain the solution for two point
masses. Tf we know the solution for one point mass M , we cannot simply multiply by 2 to get
the solution for one point mass 2M. Therefore very few analytic solutions are known.
In some sease, however, this limitation is now becoming irrelevant, because equations
can always be solved numerically on the computer.
Nevertheless, in the next Section, we show two exact solutions that are well known,
and that have been used in previous Chapters - now we finally justify the assumptions made
previously.

Magnetic effect
In em, -the magnetic effect is small, but still easy to detect experimentally. To understand this,
first remember that the electric field is produced by charges, but the magnetic field is produced
by currents:

where v is the typical velocity of the charges. Now, acting on a test charge q , we have

Electric force
Magnetic force
N -
- - qE qQ
qvB q ~ v 2
Thus, magnetic forces are down by a factor of v Z = P2. (In the above, for simplicity we have
assumed that the speed of the source and the speed of the test charge are the same order; it is
easy to remove t.his restriction.)
For example, consider two beams 9f electrons, moving in parallel but some distance
apart, say at a speed of v = 300 m s-', i.e., P = There is a Coulomb force of repulsion,
and also a magnetic force of attraction between two parallel currents. However, the latter is
smaller by a factor P2 x 10-12. If we really do this experiment, we need at least 12-figure
accuracy to detect the magnetic force.
But you have done this experiment (in fact with much smaller values of v) even in high
school - but in a slightly different way. The beams are not in free space, but are electrons
moving in parallel wires. In the wires are positive ions of exactly equal density. Thus, the
net charge in each wire is zero, and the electric effect cancels exactly. This leaves the much
smaller magnetic force as the largest remaining contribution, and is therefore easily detected.
The cancellation of the electric field depends on the existence of opposite charges, and the fact
that the strong Coulomb attraction makes them combine until neutrality is achieved.
However, we do not have opposite signs of mass. So the "electric" effect is never cancelled
to reveal the "magnetic" effect. The latter appears only as a tiny correction to the Newtonian
force. There is one important exception: if the test particle is a photon, P = 1, and the
LLmagnetic"terms arre of the same order. We have actually seen this in the discussion of the
deflection of light, where naive adaptation of the Newtonian theory is wrong by a factor of 2.
This explains why the magnetic effect in em was first discovered experimentally, whereas
the corresponding effect in gravity was first postulated theoretically.
Comparison
We summarize the comparison between em and gravity.

em gravity
Eauations Maxwell Einstein
I Static avvrox 1
A
I I

A.
I Coulomb I Newton
Nature PDE PDE
Order 2 2
I Basic variables I A, I ~
- U Y I
I NO. of ea. II 4 I 10 I
I I

I
A

Linear? I Yes I NO
I Superposition?
- -
I

I Yes
I

I NO I
I TWO signs of source?
I

I Yes
I

I No I

17.6 Some solutions of Einstein's equations


Schwarzschild metric
The Schwarzschild solution describes one single point mass M which is static and non-rotating.
Under these assumptions, the most general form of the line element is

The 8 and 4 terms must take this form because of spherical symmetry, and r is the circumferen-
tial radius. In the rest of the metric, the coefficients cannot depend on t because the situation
is assumed to be static, and cannot depend on 8 and c$ because of spherical symmetry. Thus,
there can be only two functions depending on r alone, denoted as A(r) and B ( r ) .
Our object is to use the Einstein equations to determine A and B. Remember: we
expect to get second-order differential equations for A and B.
First, we calculate the Christoffel symbols. Some of the non-zero ones turn out to be,
e-g.,

As an illustration, we show how one of these is calculated.


In general, note that I? involves up to one derivative of A and B.
Problem 3
Calculate all the components of I?.
Next we calculate the Ricci tensor. Schematically R aI'+rr. So we expect terms that
go like A", B" as well as (A')', A'B', (B')'. A straightforward but slightly tedious computation
shows that the non-zero elements are

Problem 4
Calculate all the components of the Ricci tensor and verify the above results.
Now, away from the point mass, there is no energy and momentum, i.e. T,, = 0 and
Einstein's equation becomes R,, = 0. We consider

1
- -- (BA' + AB')
rA

showing that

The value 1 is obtained by evaluating at r 4 m, where space must be flat. The relation (17.65)
implies
We put all these back into the 88 equation to give

This gives

leading to

where C is a constant. So

To determinec, we consider r oo,where the field must be weak, and should therefore
agree with

Comparison then gives C = -2GM. Although the derivation made use of large r , once the
value of the constant is obtained, it is of course valid for all r. So finally we have

Although we referred to a point mass, it is easy to see that the derivation goes through
for teh space outside any static, spherical distrbution of mass. Thus it is valid outside a non-
rotating star. We have already used this formula many times, e.g. in the discussion of the
black-hole phenomenon, especially at the point r = 2GM. We have also used the results for
the deflection of light and the advance of perihelion in the gravitational field of a star.
Robsertson-Walker metric
In our discussion of cosmology, we had introduced the Robertson-Walker metric with the fol-
lowing line element

In the above, all r are really 5 , i.e., a dimensionless length measured in units of a(t).
The factor a(t) is the scale factor of the universe.
0 The structure of the spatial terms is dictated by the requirements of homogeneity and

isotropy.
The parameter (I is either +I, 0 or -1, indicating whether space is closed, flat or open.
8 In the second line above, we have introduced the notation jij to simplify the writing of

the expression in the square brackets. The reduced metric ij;j describes a unit 3D sphere
in 4D space (if Ii' = 1).
Now it remains to determine a(t), i.e., to see how the universe evolves.
As usual, we begin by calculating the Christoffel symbol. The results are

Problem 5
Calculate all the components of I'.
Next we calculate the Riemann curvature tensor and then the Ricci tensor, with the
results

Note that K appears only in the last equation above.


Problem 6
Check the above results for the Ricci tensor.
We now impose Einstein's field equations
It can be shown (Appendix) that this can also be written in the form

where T = Tp,. This form has the advantage of moving some of the complications from the
LHS (involving unknowns) to the RHS (involving knowns).
To evaluate the RHS of (17.77), we assume a non-relativistic matter-dominated universe.
Thus

and consequently

Now consider the tt equation

Evaluate the two sides:

3a
LHS = -
a
RHS = -4rGp

Putting these together gives

This is exactly the same as the Newtonian equation, showng that our previous derivation
by considering a small piece of the universe was indeed valid. We next integrate (17.82) once
in time, giving

where K1 is a constant of integration. The derivation from (17.82) to (17.83) follows the usual
calculation that proves conservation of energy from the equation of motion. In (17.83), the first
term is like th KE, the second term is like the PE and the term on the RHS is like the total
energy. The sign of K' indicates whether the variable a "escapes" t o infinity, i.e., whether the
universe expands forever. So far, we have no new information beyond what is already known.
Next consider the i j equation:

K j = -8aGSij
Evaluate the two sides:

LHS = -(aii + 2 i 2 + 2K) jij


RES = -4xGpg;j
= -4~G~a~ij;~

Putting these together then gives

aii + 2ir2 + 2K = 4aGpa2


In the above, eliminate ii using (17.82):

Some simplification then leads to

which is exactly the same as (17.83) with K' (the constant of integration in time)replaced by
K (the constant describing the structure in space). Thus we have derived the important result

The physical meaning is: the universe is closed in space ( K = 1) if and only if the
universe is closed in time (I-? = 1). In short, we have proved the key assumptions used in
earlier discussion of "poor man's cosmology". This is one major achievement of Einstein's
equations, which opened the way to the modern study of cosmology.
Of course, there are still 3 possibilities, namely K = IC' being either $1, 0 or -1. The
3 cases can only be distinguished by observations.

Appendix
Invariant combination
We show that the ratio n / E is invariant. Instead of a general proof, we simply demonstrate it
for one simple example, which is more instructive. Consider N particles of mass m, at rest in
a box of size L3. Then n = N / L3 and E = rn, so n/ E = N/(mL3).
Now transform to a reference frame that is moving at velocity v , say along the 1-direction.
The cross-section area in the 2-3 plane is unchanged, but the length in the 1-direction is
cootracted by a factor 7, hence the volume is V' = L2L' = L3/7. Thus

n' = N / V 1= 7 n
At the same time, the energy of each particle now becomes

E'=ym=7E
so the ratio n'/E1 is the same as n / E .

Alternate form of Einsteins' equations


Start with Einstein's equations in their original form

and take the trace. Note that the trace of g,, is 4. (We have to first raise one index and then
sum; with one upper index and one lower index, g becomes the Kronecker delta, with +1 down
the diagonal.) Then we get

Put this back into (17.92) and we have

Chl7-1.tex; December 11, 2000

You might also like