Professional Documents
Culture Documents
K Young
Department of Physics
The Chinese University of Hong Kong
December 2000
@ Copyright
Preface
A course on relativity, includingboth the special theory and the general theory, was first
offered in the academic year 1994195. It was intended for both final year undergraduates and
for graduate students, both on an elective basis. From the very beginning, it was found that no
textbook was quite suitable. Broadly speaking, the available books fall into two categories. The
first are often qualitative in nature, relying very heavily on intuition and heuristic arguments,
and are therefore unsuitable for a course at this level. The second type are exhaustive tomes,
exemplified by the works of Weinberg, and of Misner, Thorne and Wheeler. These are really
meant for those who intend quite seriously to specialize in general relativity or an associated
field, and would be far too much. for a one-term course.
Because of the lack of suitable textbooks or indeed references, it was planned b develop
a relatively complete set of lecture notes. These started in the &st year (1994195) as copies
of the overhead transparencies handed out to students. In the second year that the course is
offered (1995/96), roughly half of the chapters have been turned into more formal lecture notes
-though of course in a somewhat preliminary form that shall require many rounds of revisions
and polishing in the years to come. It is thought that the other chapters will be completed in
one more year, and then improvements will be made in the light of the feedback.
The approach to the subject is somewhat different from most textbooks on relativity,
Most textbooks try to cover a lot of material, often to the forefront of research; this is true
both of the more heuristic texts and also of the more rigorous, exhaustive ones. But if one
were to look at the textbooks for electromagnetism (with which relativity has much formal
and conceptuaI resemblance), the norm now is to learn the subject in several stages. The
first is often an integral approach to the field equations, necessarily restricted to very idealized
geometries (spheres, infinite planes). The second round introduces the differential formalism
(and in particular the mathematics of vector calculus), but again stays with simple geometries
and simple situations (statics in the main). In neither of these stages is the student seriously
brought to the forefront of research, but instead, a firm conceptual foundation is laid, with a
reasonable degree of rigour. Only in the third stage, which is perhaps taken only by a minority,
would the students be introduced to more sophisticated techniques and problems closer to
research.
The approach in these lecture notes is essentially analogous to the first two stages de-
scribed above. Just as vector calculus figures prominently in electromagnetism, so differential
geometry also figures prominently here. Fortunately, very idealized situations already suffice to
discuss the two systems of greatest interest: the Schwarzschild metric and the Robertson-Walker
metric.
Incident ally, differential geometry is dealt with from the embedding approach. In essence,
the only thing that the students are asked to believe is the existence of a finite-dimensional
embedding in flat space. Though no longer fashionable among mathematicians, this approach
is probably easier to grasp for physics students. In the same vein, differential forms are not
mentioned, since an extra set of symbols will only burden the first-time learners.
Chapter 8 is optional, and is not needed for the rest of the course.
I am grateful to the United College Student Campus Work Scheme for funding to support
the typing of the manuscript. I thank Mr Lai Chi Wai, a student who took the course in the
first year that it was offered, for doing the initial typing. Mrs Alice Mak kindly performed the
final editing. Both steps are onerous because of the many complicated equations involved.
Because it is hoped that the lecture notes will be improved as the course evolves, students
(and colleagues) are requested to send in suggestions and to point out errors, of which there
must be many
Relativity
Rotation
2.1 Derivation of rotational transformation
2.2 Combining two transformations
8 Action Formalism
8.1 General principles
8.2 Action principle in non-relativistic physics
8.3 Action principle for a relativistic free particle
8.4 Action principle for a particle in the electromagnetic field
8.5 Act ion principle for Maxwell's equation
15 Black Holes
15.1 Introduction
15.2 Formation of black hole
15.3 Questions about black holes
15.4 Coordinates around a black hole
15.5 Infinite redshift
15.6 Motion into a black hole
16 Mathematics of Curved Space IV: Curvature
Introduction
Parallel transport of vectors
Riemann curvature tensor - calculation
Riemann -curvature tensor - properties
Ricci tensor and curvature scalar
Examples
Curvature tensor and flatness
Einstein tensor
Figure 1
The theme in relativity is: How does the same phenomenon appear to digerent observers?
Figure 1 shows the classic experiment of Galileo: two stones released at the same time strike
the ground at the same time. This fact of physics 3s the same whether we discuss it in
rectangular coordinates that are horizontal and vertical,
rectangular coordinates that are tilted,
polar coordinates, or
generalized coordinates of any type.
Therefore -we say
Physics is absolute.
Coordinates are arbitrary.
Therefore physics should be independent of coordinates.
To make sure that physics is independent of coordinates, we must first ask: What happens
under coordinate transformations?
Rectangular coordinates are often denoted by the components of a vector, so we first
introduce the notation for vectors.
Sometimes we also denote the components of x as z, y, z and the axes as the x, y, z axes.
Sometimes the unit vectors are also denoted as i, j,k
2 2'
X
1'
Figure 2
Example 1
Figure 2a shows a rod of length L inclined at an angle 4 to the 1-axis. Let its tip be x. Then
in this coordinate system S,
x l = ~ c o s +, x 2 = L s i n $ (1-1)
Figure 2b shows a new coordinate system Sf with the 1'-axis aligned along the rod.
XI1 =L , xa = 0
So the coordinate components are different; they are relative.
Notation
Some books denote the components in the coordinate system S' as xl' ,x2', to emphasize that
"x" is the same (i.e., the same vector), and only LL1",'L2n
are different (i.e., different components).
Transformation of coordinates
In relativity we deal with three types of coordinate transformafions.
1. Rotation of coordinates
2. Moving coordinates
nl
1
Figure 3
Figure 4
The general rule for the rotation of coordinates in 2-d is as follows. Let
xtl = x cos
= x cos(4 - a )
= xcos~cosa+xsin~sina
= +
x1 cos a x2sin a
xR = xsinq5'
= xsin(q5 - a )
= xsin4cosa-xcos4sina
- x2 cosa-x 1sina
This can be written in matrix form
[:"I = [ cos a
-sins cos a I[ I
C I
The matrix [R] depends only on the relation between the two sets of axes. It is the same for
all vectors x.
We can also write (1.3) or (1.4) in terms of components.
Index notation
An index such as i in (1.5) is called a free index.
It appears once on each side.
It can take on any value (1, 2 in 2-d; 1, 2, 3 in 3-d).
Unless otherwise specified, it is allowed to take on each of these values successively.
An index such as j in (1.5) is-called a dummy index.
It appears twice in the same term.
It is summed over all allowed values.
Summation convention
To save some writing, every index (i,j, -) that appears twice in the same term is understood
to be summed over. Thus (1.5) can be written simply as
Problem I
Write out the following equations explicitly in terms of components. Choose any one value for
each free index.
(a,) S = a'b'
(b) a' =
(c) c i k = ~ i i ~ j k
(d) ~ i =k k j ~ k j
(e) S = A"
(f) s = ~ i j ~ j i
Problem 2
Write the above expressions in terms of the matrices [A] ,[B], [C] and the column vectors [a] ,[b].
Problem 3
Denote the matrix in (1.3) as [R(a)].Verify that
where [ I ]is the identity matrix. What is the physical meaning of this mathematical relationship?
Problem 4
Show that [R(-a) J = [ ~ ( a ) ~Hence
]. show that
We consider a trivial "lawn. Let there be 2 rods, of length L and 2L, along the same direction.
Each rod has one end at t h e origin 0 (Figure 5). Let the other end-points be x and y. Then
the "law" is y = 2 x . According to one observer S,the components satisfy
[YI = 2 1x1
Multiply by the rotation matrix [R(a)]:
,p = 2$h' (1.10)
Compare (1.7) and (1.10). Or compare (1.8) and (1.9). Although the variables change (yfl #
yl), a valid law of physics takes exactly the same form in the two coordinate systems. We say
the variables are covarkant (they transform in the same way). We say the laws are invariant
(they stay in the same form).
Although y = 2x is a very trivial "lawn, this concept generalizes to all laws of physics.
Principle of relativity
The above idea is elevated to a principle.
Figure 6
Figure 6 illustrates how we can deal with the kinematics and dynamics of particles moving
rapidly ( "relativisticaUyn) in a coordinate system S. We transform to a co-moving coordinate
system S', in which the particies are moving slowly. In the latter, we can apply well-known
Newtonian laws. The result is then transformed back to S, Actually it is not necessary to do
it individually every time. We only need to do the transformation once to establish the laws of
relativistic kinematics and dynamics.
Mass-energy equivalence
One result from relativistic kinematics and dynamics is the mass-energy equivalence E = mc2,
which is very important in nlrclear physics and high energy physics.
F = electric
Figure 7
Magnetism is due to currents, e.g., moving charges. Figure 7a shows a charge q which is movmg
according to observer S. Figure 7b shows the situatio~las seen by a co-moving observer S'; now
q is stationary. But a stationary charge is only subject to an electrical force. Thus, we can go
through the following steps.
0 If we understand electricity, we would know how q moves in S'.
Such considerations show that electricity and magnetism are closely related.
Problem 6
(a) A positive charge q is moving at a constant velocity v in the +x direction, and enters a
magnetic field created by a horseshoe magnet that points in the +y direction. In which direction
does the charge accelerate? Which law of physics did you use to arrive at this conclusion?
(b) Now go to another frame S' which is moving in the direction +x at velocity v. According
to this observer, the charge q is initially static, but the horseshoe magnet is moving towards it
at a speed v in the -x direction. Since the phenomenon must be the same according to both
observers, in which direction does q accelerate, according to Sf?
(c) How would the observer in Sf explain this observation? Which laws of physics would he use
to arrive a t this conclusion?
Gravity
------
Figure 8
Figure 8a shows a ball inside a room falling under gravity. It accelerates downwards at g, and
hits the floor eventually. Figure 8b also-shows a ball inside a room. There Is no gravity, but the
room accelerates upwards at ao. Eventually the floor hits the ball. If a. = g , there is no way
we can tell the situations apart by observations inside the room. This is called the principle of
equivalence - equivalence between gravity and acceleration. So if we know how to transform to
an accelerating frame, we would begin to understand gravity. The general theory of relativity
is in reality a- theory of gravity.
However, the situation is slightly more complicated in general. Let there be two balls
inside a room near the earth. They are subject to slightly different gravitational accelerations
(Figure 9). This situation is not equivalent to the room accelerating upwards. So non-uniform
gravity makes the theory of general relativity more complicated mat hematically.
Problem 7
Refer to Figure 8. Let the x-axis point upwards and let the broken line be the origin x = 0.
The height of the room is h, and in situation (a) the ball is released at rest at the roof of the
room at t = 0. In situation (b), the room is at rest at t = 0, and then accelerates upwards at
ao. In this problem, assume a0 = g. Let zb(t)be the coordinate of the ball, and xj(t) be the
coordinate of the floor.
1. For situation (a), write formulas for xa(t) and zf(t). Sketch them together on one graph.
Also find the time t l when the ball hits the floor by solving x b ( t l ) = xf(tlp).
2. Do the same for situation (b).
3. Is the value of t l the same in-the two cases?
Astrophysics
Gravity is especially important both for large systems and for compact astrophysicai objects.
(This may seem paradoxical, since '%ompactnmeans small!) We can understand this fact as
follows. For a system of n a s s M and characteristic dimension R, the gravitational potential
energy is (Newtonian, but roughly correct)
If this is not small (i.e., not << I), then gravity is important, in fact so important that it is
necessary to use general relativity.
First regard the density p as fixed and the size R as variable. In other words, think of
a nearly uniform system, but of variable size. It is then convenient to write (1.11)- as
Problem 8
(The mass of the sun is 2 x lo3' kg, 1 pc = 3 x 1016m, G = 7 x 10-l1N m2 kg-2)
(a) A typical galaxy contains 10" stars, each one like the sun, in a radius of 15 kpc. Find p
and 6. Is gravity very important?
(b) For the universe as a whole, p w
important?
-
kg m'3, R lo4 Mpc. Estimate E . Is gravity very
On the other hand, consider a very compact star of mass M. Suppose other objects are
moving at different distances R from it. It is then more convenient to regard A4 as fixed and
R as variable. Then from (1.11) E a R-l and becomes important when R is small.
In fact, the Schwarzchild radius of a black hole corresponds to E = 112. A heuristic
derivation is as follows. Consider a point mass m at a distance R from a compact star of mass
M. To escape to infinity, it must have
GMm
KE > lPEl= -R
Using Newtonian physics but setting v 5 c, the maximum KE possible is (1/2)mc2. So escape
is only possible if
1 GMm
-mc2 > -
2 R
We are using a L'mixturenof Newtonian and relativistic physics. This is not really legitimate,
but is good enough for an order-of-magnitude estimate.
Problem 9
Estimate & for a star of 3 solar mass = 6 x lo3" kg.
Cosmology
The most important large system dominated by gravity is the entire universe. Cosmology is
the study of the evolution of the universe. It relies heavily on general relativity.
0 transformations to a new coordinate system which is moving relative to the original one
1'
1
x1
Figure 1
Basic object
The basic object is a point P, whose coordinates are
X = (xl, x2,23) in S
x = (x'l, xl2,xl3) in Sf
The components are different, but the vectors are the same (see Figure 1); so as a vector, they
are both denoted by the same x.
The question is: How are the components related?
Linear assumption
First assume the origins of S and S' coincide. (If they do not, we simply add a shift, which is
trivial.) Next, assume that the relationship between the components is linear. Then the most
general transformation is
where [R] is an unknown matrix, called the rotation matrix. Note that the summation conven-
tion is employed.
Notation
We write the i j component of a matrix [A] as [AJij= Aij.
Identify an invariant
Although the components change (e.g, xtl # xl), there is an invariant, ie., a quantity which
is the same in both frames. By Pythagoras' theorem
are the same. This is an experimental result. Ants living on the surface of a sphere would find
that Pythagoras' theorem does not hold.
The invariant condition can be written as follows:
Hence
,I2 = (Xn)2 + (xr2)2
= (p2 + q2)(x1)2+ 2 ( p s + qr)x1x2 $ ( s 2 + ~ ~ ) ( 2 ~ ) ~
u2 = 1 + 0 . x1x2 + 1 . ( x ~ ) ~
Problem 1
Let [R] be given by ( 2 . 3 ) . Derive (2.5) and show that the solution is (in terms of s )
x ti -
- @kxk
Note we use different dummy indices. Multiplying
Since they must be equal as an identity, we get
Here, jk are free indices, while i is a dummy index. Also, both sides are symmetric under
j ct k, so there are 3 conditions in 2-d (jk = 11,22,12) and 6 conditions in 3-d (jk =
11,22,33,12,23,31).
Let us check that in the case of 2-d, (2.6) agrees with (2.5). We put the summation sign
back explicitly. As an example, for j = k = 1
etc.
[x'l = [Rl[xI
[x"] = [X'][R']
0" [ x l = [ x T ][RT][R][ X I
= [zm]
u2 = [xT][ x ] = [zT] [q[x]
Thus
I 1
{ [RT ] [ R ] ) j k= [ I J j k
But for any matrices [ A ] ,[B],
{ [ A ][B]Y
= [A]"'[B]'*
Hence
Problem 2
Show that the matrix equation [RT][R]= [I]leads to the same equations as (2.5).
Express in terms of minimum number of parameters
Consider the number of free parameters in the matrix [R].
Dimension Parameters Conditions Free Parameters
2-~2=4
Thus in 2-d, we should be able to express the most general rotation in terms of 1 parameter
only, obviously the angle of rotation. In 3-d, we should be able to express the most general
rotation in terms of 3 parameters (e.g., the 3 Euler angles). We shall not go into the 3-d case
in detail.
Problem 3
Of the 4 parameters p, q, r, s in (2.4), regard s as the free parameter and define s = sina. Find
p, q, T in terms of cx by using the three equations in (2.5). You will need to choose the sign of a
square root. Explain -the physical meaning. of your. choice. Your answer should be
[-
= cos a sin cx
sincx cos a I
~
Figure 2
Mathematically, this can be stated as follows
] [R(w+- m)]
[R(a2)J[ R ( ~ I )=
Note that if we apply the left handside to a vector x, we get
The rotation [R(cul)Jis done first. So in all these matrix products, the log :a1 sequence is from
right to left. The order of operations does not matter in 2-d, but is imp01 ant in 3-d.
We can specify the rotation in two ways.
[-
= cos a "na
sin a cos cv I
(b) By the parameter s = sin a. Then
Problem 4
Using (2.8) and (2.9), derive the addition laws for cos(al + al), &(al + a2).
Problem 5
Let vl = t a n cwl, v2 = tan a 2 , v = tan(a1 + a2). From the result in (a), show that the law of
addition for v's is
V =
Vl+ 'U2
1- VlV2
1
I I
Iv t 4-
t Figure 1 1
t I= t
(b) Lengths are not affected by motion.
* Vt b
Figure 2
Then from Figure 2
Figure 3
Let the speed of light be c with respect to a fixed observer S. To send a pulse of light to a
distance L and then reflect it back requires a time At (Figure 3).
L L - 2Lc
At' = -
c-v c+v
+--
c2-v2
2L 1
= -
c 1 - V2/c2
At
At' = (Galilean)
1 - V2/c2
So by measuring the time on the train, in principle one can detect the absolute motion of the
train.
The Michelson-Morley (M-M) experiment near the end of the last century showed that
there is no such time difference. Therefore, experimentally (not by any theoretical deduction),
(3.7) is wrong, so the Galilean transformation (3.1) and (3.2) are wrong.
Outline of the M-M experiment
0
N -
In this case, the "train" is the earth itself, which is moving in the solar system at a speed
V 2nA.U.lyear 3 x 1@ms-l, or V/c
Thus the difference expected Gom Galilean theory is N V2/c2 The experiment
had to be accurate beyond this level in order to conclude that there is no effect and that
the Galilean transformation is wrong. This level of accuracy is obhined by means of
optical interference.
There is no way to stop this '%rainn and compare the times for V = 0 and V # 0.
Instead, compare the time for a round trip along the direction of motion with the time
for a round trip perpendicular to the direction of motion. The latter should be unchanged
by the motion. There should be a difference between the two cases. Actually there is no
difference.
Please read about the details of the experiment. The following problem gives some typical
orders of magnitude.
Problem I
In the Michelson-Morley experiment, the earth is moving at about 3 x l o 4 m s-l. Rays of light
are compared on two paths: (i) along the direction of motion of the earth, and (ii) perpendicular
to the direction of motion of the earth. The one-way length of each path is L = 3.0 m. In each
case, the rays traverse the return paths 10 times.
(a) According to Galilean theory, what is the difference At between the times needed on the
two paths?
(b) Express this difference as a phase Ay, assuming a reasonable wavelength.
N o absolute motion
If Galilean transformations were correct, there would be the concept of absolute motion. We
would be able to determine the velocity V of a train by doing experiments on the train, without
referring to the outside. The simplest experiment is this (Figure 4): measure the velocity of
light coming from the front and the velocity of light coming from the back. In one case it should
+
be c V; in the other case it should be c - V. So the difference reveals absolute motion.
Figure 4
Actually, such is not the case. The M-M experiment shows that even for a moving
observer:
The speed of light is the same in all directions (isotr~pic).
0 The speed of light is still c.
Instead of trying to explain this fact, we shall use it as the starting point to derive the trans-
formation between moving coordinates.
Basic object
The basic object is an event E, whose coordinates are
E = (t, XI, x2?x3) in S
xl3) in S'
E = (t', x'l, st2,
How are these coordinates related?
Linear assumption
We shall assume that they are related linearly:
etc. There are 16 coefficients. It wcruld be more compact if we can write the relationship using
index notation or matrix notation like (2.1). For this purpose, it is convenient to call time
the "zero component". (Some authors call it the "fourth component"; it does not matter.)
Secondly, it is better if the zero component has the same unit. Since the speed of light is a
universal constant, we define
Thus it is necessary to distinguish between an upper mdex and a lower index. The
coordinate vector is defined with upper indices. We shall construct vectors with lower
indices later.
Sometimes we also denote the components as t, x,y, z.
General transformation
With this convention, the most general linear transformation is
a, = bp Yes
ap = b, No
0 A dummy index must appear twice in one term, once as an upper index and once as a
lower index.
a'"b, Yes
a pbp ' No
Identify an invariant
We now claim that the M-M experiment tells us that the following is an invariant
In other words, we claim that it is equal to
(a) Proportional
Suppose a short pulse of light is emitted from the origin at the time when the two origins
coincide, i.e., at (ct, xl, x2,x3) = (ctt,xtl, xn, xt3) = (0,0,0,0). Let the event E be the receiving
of the pulse by an observer. Because the velocity-of light is exactly c in S:
ot2= -0
In other words, a2= 0 if and only if ut2 = 0. Thus these two quantities are proportional:
where the proportionality constant may depend on the relative velocity. Although we use the
example of o2= 0 to derive the proportionality, the coefficients Lp, in the transformation are
independent of the coordinates; therefore the proportionality holds always.
Hence
(Consider the case of A(V) for V small. Obviously xfpE xp, A(V) E 1, so we take t h e + sign.
+
The sign cannot change suddenly, so we must always take the sign.)
This proves that
p=u=o
~ P Y= p=v=lor2or3 (3.17)
p#v
The matrix r],, takes the place of Jij in usuaI Euclidean space. In fact, if we change 7 + 8 , we
would obtain the familiar results of the last Chapter.
For any vector XP with upper indices, we define the corresponding vector x, with lower
indices by
5, 7+wxV
To raise an index
G
" [xT]' hl.. [XI'
[XI. = [sl.. [XI*
[XI* = hJl*@ [XI*
hl" = [s-'I..
The dots have been added to indicate whether the index is upper or lower; they are not really
needed once you become more familiar with the notation.
and the metric 77 is just a way of taking care of this minus sign, which is just the same minus
sign as that in (3.15) - i.e., the square of the 0 component enters the invariant with a minus
sign.
A matrix [q] in principle has 16 elements; why do we need something so complicated
to deal with a mere minus sign? The answer is that this notation provides a good stepping
stone to general relativity, in which we have curved coordinates, and qpv is replaced by a more
general (and position-dependent) matrix g,,. Thus, in terms of mathematical structure, to go
from Euclidean space to special relativity to general relativity simply involves
so = ict ???
instead? If so, (3.15) would appear with plus signs, and all formulas would be familiar. Some
elementary books do this, but it is very bad practice, for two reasons.
First, in Euclidean space, a subset such as a2 = 1 is a bounded domain, because each
component is bounded (in fact in this case at most 1). But in Minkowski space, a subset such as
a2= 1 is unbounded - the components can be as large as you want. Thus there is a difference
in topology, which is obscured if we LLhide" the minus sign.
Secondly, in quantum mechanics, the factor i appears, and we have expressions such as
$*$. For such expressions, $* means changing i -, -i. But this applies only to the genuine
i's, not to the "fake" i's that are introduced in so = ict. It would be a nightmare to keep track
of the genuine i's versus the "fake" i's.
So, for these reasons, we shall continue with the matrix [q].
Condition on transformation matrix
The existence of an invariant places conditions oa the transformation matrix [L]. We now
derive these equations in three equivalent ways.
x/3 - 23
-
and we only need to transform xO,x l . The matrix [L]is reduced to 2 x 2.
Let
so that explicitly
Let x 2 = x3 = 0. Then
a2 = -1 +O
( x O ) ~ xOxl + 1 .( x ~ ) ~
Problem 2
Derive the results in (3.22) from the 3 conditions on p, q, r, s.
( b ) In general using index notation
Compare with the analogous equations in Chapter 2; in particular, note that if we replace q by
6, these equations would reduce exactly to those in Chapter 2.
Let us check that in the case of (l+l)d, (3.23) agrees with (3-22). We put the summation
sign back explicitly. Also, in (1 +l)d, the summation goes only over 0, 1.
Since
this reduces to
But from (3.20)
hence
Problem 3
Consider the other cases and derive the other two equations in (3.22).
Problem 4
Consider the case of (3 + 1)d.
(a) How many conditions are there in (3.23)? Hint: note the symmetry under p t, a.
(b) Hence determine how many free parameters there are in LP,. Half of these correspond to
rotation and half of these correspond to relative velocity. Explain physically why there are this
number of free parameters.
[xtT]' = [xT]' [ L ~
t'] note the Bdots transposed
Problem 5
Of the 4 parameters p, q , r , s in (3.21)) regards s as the free parameter and define s = - sinh a.
Find p, q , r in terms of a by using the three equations in (3.22). You will need to choose the
sign of a square root. Explain the physical meaning of your choice. Your answer should be
1 ~ =1 [- cosh a
sinha
- sinh a
cosha 1
Note that the two off-diagonal entries have the same sign.
It is conventional to define P as the dimensionless relative velocity in this way. For "ordinary"
speeds, IPI < 1; for relative motion near the speed of light, P FZ 1. From (3.26)
where
(3.28)
Problem 6
Verify that [L(a2)][L(al)] = [L(w + m)].
Problem 7
Show that when two transformations are performed one after another, the relative velocities P
"add" as
Problem 8
Start from (3.29) and solve for xO,x1 in terms of xtO,xtl. Show that the reverse transformation
derived this way has the same form, but with -P -+ P.
Signs
The signs can be remembered as follows:
0 The "diagonal" terms (i.e., relating x"' to xO,or x" to xl) always have a +
sign.
0 The "off-diagonal" terms with P both have the same sign.
+
Whether the "off-diagonal" sign is or - is easily determined by considering V -+ 0 , +
~
1, for which the Galilean transformation should be valid. For example, from the second
equation in (3.29)
The sign of the Vt term is easily checked with reference to Figure-2.
Problem 9
Explicitly find the coefficients of the transformation if S is a spaceship travelling at V =
9.0 x lo7 ms-I relative to an observer S.
Nonrelativistic limit
Although the Galilean transformation i.s conceptually wrong, it must be nearly correct if the
relative speed V is low. Otherwise it would contradict many experiments and observations in
daily life (all concerning-low speeds) which seem to confirm the Galilean transformation. So let
us check the nonrelativistic limit (V 4 0) and estimate the correction to the Galilean law.
Write the second -equationin (3.29) as
The first term is the Galilean result and the second term is the correction. The fractional
correction is about
The first term is the Galilean result and the other two terms are corrections. The (7 - 1)
correction is the same as (3.30), while the last term gives a fractional correction
Now suppose x1 and t refer to a particle moving at velocity v : xl/t = v. (This must not be
confused with the velocity of the frame V.) Then the correction is
where
Problem 10
(a) S is an air-traffic controller at the airport and S is an aeroplane flying at 300 ms-l (-
1000 kmhr-l). If the Galilean transformation is used to describe the relationship between S
and S', estimate the percentage error (3.30).
(b) A passenger is walking on the plane at v = 3 m s-l. Estimate the percentage error (3.31).
Problem 11
Choose the x axis downwards, and g = 10 m s-'. A stone is released at t = 0, x = 0 and falls
to the ground at a distance h = 10 m below. Let the event E be the stone hitting the ground.
(a) Give the coordinates (t, x) of E.
(b) Another observer is in an elevator moving uniformly upwards at V = 3 m s-l, and the
origins of the two observers coincide at t = tf = 0. Give the coordinates (tb, xb) for the event
E, according to the Galilean transformations.
(c) Give the coordinates (t;, x;) for the event E, according to the Lorentz transformation. Give
(t; - tL) and (xi - xk) to at least 1 significant figure accuracy.
+
a2 = ( z ' ) ~ k 2 (x2 )2 = invariant
where k = 1000-is just a conversion factor to get his measurements into the same units. It
would be much smarter to use the same units for both directions and get rid of k.
In the same way, the factors of c that appear, e.g., in
are there because we do not use the same unit for the zeroth direction.
Choice of c =1
We can get rid of c by setting
All formulas become simpler. When we get a final result. that is dimensionally "wrong", we
simply multiply by a suitable power of c ( = 1).
Example 1
(a) Convert a time of 3.0 m to conventional units.
3.0 m
time = 3.0 m =
3.00 x lo8 ms-l
Actual units
There are certain units-in which c is really 1.
(a) Measure time in years and- distances in light-years.
1 light year
c= = 1 unit
1 year
This is sometimes convenient in astrophysics.
(b) Measure time in ns and distances in units of
Then
0.3 m 1 unit
c=-- - -- - 1 unit
1 ns 1 unit
This is convenient when dealing with high energy particles, which typically travel distances of
a few m in times of a few ns.
Standard of length and time
When we come to general relativity, it will be very important to give a clear prescription of how
length and time (i.e., distances in space-time) are measured. The modern definitions of length
and time standards ("rods" and "clocks') both rely on atomic transitions. Conceptually it is
easiest to imagine the following.
Consider a particular atomic transition A + B which ernits electromagnetic waves.
Each period of the wave is defined as I "tick".
Thus
X lCCrod"
c=-= =1
T 1 "tick"
The actual definition uses more complicated numbers, but the idea is the same.
x; = LP" x;;
Define the difference
1 AxIP = LpvAx"
(3.32)
For example, consider relative velocity V = p c along the x1 direction and denote
x1 + x , x O + t . In units c = 1,
The differenceform is often convenient for applications, because we want to consider the interval
between two events. Correspondingly, instead of a2,we consider
The difference (and differential) form has another advantage. When we come to general
relativity, space-time will be described as curve&. But if we look at a small portion of a curved
surface, it is nearly flat. So locally, i.e., in terms of small displacements Axp (strictly speaking
infinitesimal displacements d d ' ) general relativity looks very similar to (3.32) - (3.35).
1'
Figure 1
Let S be an observer on the ground and S be an observer on a train moving at V = PC. A rod
of length Lo is fixed on the train. What is the length L as seen by S?
To save some writing, we use units with c = 1 and put
Then
The condition "at the same time" is crucial. We cannot measure one end now and measure the
other end one year later (the rod is moving!) and call the difference the length. "At the same
timen means
Ax' = y( Az - p At )
t t t
Lo L 0 ~
Would we get L = yLo (an increase in length)? No! Because At' # 0. In fact we can solve for
At':
At' = (i ) ;
-Ax-Ad -
Simultaneity
So we have two events which are simultaneous in one frame (At = O), but not simultaneous
in another frame (At' # 0). Thus simultaneity is not an absolute concept. We always have to
specify usimultaneous according to which observer".
In general, simultaneity cannot hold in both frames. This can be seen from the other
equations for the Lorentz transformation, e.g.,
If two events do not occur at the same place (Ax # 0), it is impossible to have simultaneity in
both frames (At = 0 and At' = 0).
Problem 1
An aeroplane has a length of exactly 50 m.
(a) When it is flying at 300 m s-' (- 1000 km hr-l), by how much does it appear to be shortened
when observed by someone on the ground?
(b) What if it is flying at 10%m s-l?
Problem 2
A train of length 2 Lo is travelling at a speed P. Observer S is on the ground, and observer S'
is on the train. S' stands in the middle of the train (x' = O), and according to him
rn at t' = 0,he sends two pulses of light, one forward and oae backward (Event .A);
rn the first pulse reaches the front of the train, and is reflected by a mirror (Event B);
rn the pulse refiected from the front returns to the middle of the train (Event C);
rn the second pulse reaches the back of the train, and is reflected by a mirror (Event D);
rn the pulse reflected from the back returns to the middle of the train (Event E).
(a) Give the coordinates according to S', namely t2, xk; tb, xb,. .;th, xh.
(b) Are B and D simultaneous? Are C and E simultaneous (i.e., do the two pulses of light
reach the middle of the train at the same time)?
(c) Find the coordinates according to S, namely tA,XA;tB, XB, - a; tE, X E , by using the Lorentz
transformation.
(d) Are B and D simultaneous? Are C and E simultaneous (i.e., do the two pulses of light
reach the middle of the train at the same time)? Discuss the relationship with the answer in
('4.
Problem 3
This Problem continues with Problem 2, but tries to analyse the situation according to S
directly, without using the Lorentz transformation. Check all your answers against Problem 2.
(a) What is the length 2 L of the train according to S?
(b) Draw a sketch showing the situation at time t ~ (For
. convenience, just show the front half
+
of the train.) Based on this diagram, show that v t ~ L = ctB. Hence find t ~ .
(c) Likewise find to.
(d) Also find the times tc - tB and tE - tD for the return trips. (Hint: These times are
respectively equal to tD and tB. Why?)
(e) Hence find t c and tE. According to this calculation, do the two pulses return to the middle
of the train at the same time?
Lack of symmetry
In relativity there is supposed to be no privileged frame (such as a frame "absolutely at rest").
All reference frames and all observers are supposed to be equivalent (Figure 2).
We are equivalent
b 0
Figure 2
d I am special
0
Figure 3
The reason is as foliows. There are three things: S, St and the rod. The rod is at rest
in S'; it is not at rest in S. With the rod, the symmetry is destroyed. The frame St is special.
It is moving together with the rod.
Paradox
Many of these concepts are combined in the following paradox. There is a hole of length Lo on
the ground. A rod also of length Lo is moving rapidly past it. The observer S is fixed to the
ground. The observer S' is moving with the rod (Figure 4).
Figure 4
According to S, the rod is contracted to L < Lo. So at the same time, he pushes the
two ends of the rod down (Figure 5). The rod passes through- the hole.
Figure 5
As seen by S
Now what happens as seen by St? The rod is stationary, of length Lo. Now the hole
is contracted to L < Lo - the concept of symmetry (Figure 6a). Yet the rod must still pass
through the hole - this is an objective fact! How can this happen?
Figure 6
As seen by S'
The answer is that the two events A, B shown in Figure 5 are simultaneous in S (At = O),
but are not simultaneous in St (At' # 0) - simultaneity is not absolute. In fact, from (4.6),
At' < 0, i.e., A occurs before B. So the sequence of events is as shown in Figure 6a, b.
4.2 Time dilation
Again let S be an-observer on the ground and St be an observer on a train moving at V = pc.
A clock is fixed to the train. Two ticks of the clock are separated by an interval At' according
to S'. What is the time interval At according to S ?
Agam we have two (reverse) transformations _that can be used:
The observer who is not comoving measures a longer time - time seems to be dilated.
Again there is no symmetry. St is special. It is the comoving frame.
Twin paradox
Consider twin brothers S and S1who were initially together. Let them carry identical clocks.
S1travels at high speed to a distant galaxy, and comes back. Which of the two clocks show a
longer time interval? In this case the two clocks are at the same place both at the beginning
and also at the end.
We know that a moving observer appears to measure shorter time. The paradox is this.
S thinks Sf has been moving, so At > At'. But S thinks S has been moving, so At' > At.
Only one of these can be true. So how is the paradox resolved?
If St has travelled to a distant galaxy and come back, then he must have experienced
some acceleration. Acceleration is absolute; it can be determined even by a person locked in
a room. So it is an absolute fact that S' has been moving, and not S. The situation is n d
symmetrical. So At > Att.
We have mentioned "clocks". But the aging process is just another uclockn (further
explanations below), so S has aged more.
There is clear experimental proof. The "twin brothers" are radioactive nuclei or elemen-
tary particles. Let them have an average life-time T at rest. One group (S)is at rest. After a
time t , the remaining number is
where No is the original number. Another group (Sf,')is sent around a circular accelerator at
constant speed V = PC. The time elapsed according to this group of observers is only t' = t/r.
So the remaining number is
If y >> I, there would be very few decays. We can also say that the apparent average life-time
has been increased from T to yT.
Problem 4
Muons ( p ) have a meao lifetime in their own rest frame of 2.2 x s. A beam of muons is
travelling at 0.90 c.
(a) What would be their apparent lifetime?
(b) How far would they travel (on average) before they decay?
Different clocks
In the twin paradox (including the example of muons), we find that the moving clock "slows
down". This is true for different kinds of clocks, e.g.,
0 clocks made of light pulses bouncing between mirrors,
0 atomic clocks based on electromagnetic oscillations associated with an atomic decay pro-
cess,
0 quartz watches,
0 clocks such as the decay of muons which are governed by the weak interaction,
0 clocks such as the decay of other particles that are governed by the strong interaction,
with the Lorentz force law and the Maxwell equations, there is no further need to worry
about any individual phenomena inv~lvingelectromagnetism.
Diagrams
Figure 8
Purely spatial relationships are illustrated by spatial diagrams like Figure 8a. This shows 2
of the dimensions (x, y) and the third (2) is out of the page. A point P is located by its
coordinates.
Similarly, spacetime relationships are iIlustrated by spacetime diagrams like Figure 8b.
The t axis and one spatial axis (x) are showo; the other two are understood to be the "into the
page". You should imagine rotating Figure 8b about the t axis t o turn x into y and z .
An event is represented by a point P in spacetime. The point P is located by its
coordinates.
The 90" angle between the x, y axes has a physical meaning. The angle between the x, t
axes has no physical meaning. It is usually shown as 90" (Figure 8b), but it is also valid to
show it at some angle (Figure 8c).
The point 0 need not be the origin. Then instead of the coordinates x'' of P, we can
refer to the difference Ax/"between P and 0.
Light cones
Figure 9
Construct 45" lines in -the spacetime diagram, i.e., t = x, t = -x as shown by the broken lines
in Figure 9. These lines should be regarded as cones if we imagine rotating the diagram about
the t axis. They are light cones: if light is emitted from 0, it travels along a path 1x1 = ct = t ,
i.e., on the surface of the cones.
Space-like separation
Consider points such as B, B', B". Relative to 0,
Problem 5
In reference frame S, the point B is displaced from the origin 0 by
(a) Draw a spacetime diagram, on it sketch the light cone, and label the point B.
(b) Another reference frame S' is travelling at V = pc with respect to S along the x-axis. Show
that for a suitable choice of P ((PI < I), At' = 0 (in other words, B is simultaneous with 0 )
(b) What is the value of Ax' in this case? Do this in two ways: (i) by explicitly using the
Lorentz transformation and (ii) by considering the invariant quantity AS)^.
Proper distance
From (4.12), and also Problem 3, we arrive at the following understanding of As for a space-
like interval: If Ax" is space-like, there is aframe St in which the two events are simultaneous
(At' = 0). The quantity As is equal to the spatial distance IAx'l in this frame. Therefore we
sometimes call As the proper distance.
Problem 6
In the example of length contraction, the two events are the measurements of the positions of
the two ends of the rod. What is the proper distance between these two events? Discuss from
both frames of reference.
Time-like separation
Consider points such as A and C. Relative to 0 ,
Problem 7
In the reference frame S, the point A is displaced from the origin O by
(a) Draw a spacetime diagram, on it sketch the light cone, and label the point A.
(b) Another observer St is travelling at V = pc with respect to S along the x-axis. Show that
for any choice of p with 1/31 < 1, At' remains positive.
(c) Show that for a suitable choice of P (ID1 < I), Ax' = 0.
(d) What is the value of At' in this case? Do this in two ways: (i) by explicitly using the
Lorentz transformation and (ii) by considering the invariant quantity (As)'.
This does not mean that this quantity is always positive. It is positive for a space-like interval,
and negative for a time-like interval. It is therefore better to write this as
We shall use As for space-like intervals and AT for time-like intervals. Both of these quantities
are real. We never use As for time-like intervals or AT for space-like intervals; they would be
complex and inconvenient.
Proper time
Let two events have a time-like separation Axp. Suppose the two events are two ticks of a
clock. Go to a frame St in which the two events occur at the same place, i.e., as if the clock
has not moved. In other words, the clock and observer S' are moving together. The frame Sf
is called the co-moving frame. The quantity AT is equal to the elapsed time At' in this frame.
We call AT the proper time interval.
Consider a particle moving at a speed V = PC. Then
Note that the proper t i n e is always less. This relation is just the reverse-of time-dilation.
Light-like separation
Finally, a point on the light-cone satisfies
At = ]Ax1 (4.17)
Such an interval Ax' is said to be light-like and is described by the invariant condition
+
(As)" = -(AT)' = - ( ~ t ) ~AX)^ = 0 (4.18)
Particle trajectories
What do particle trajectories look like on a spacetime diagram? Refer, for example, to Figure
10. Actually, these are nothing new: they are just like displacement-time graphs you learnt
about in secondary school, except that they are turned around by 90 deg - the t axis is drawn
vertically
First ~onsidera uniformly moving particle, say starting from 0. Since Ax/& = con-
stant, the trajectory is a straight line in the spacetime diagram. How about the slope? Since
it cannot move faster than light
So it must move along a time-like trajectory, as shown in Figure 10a. The slope is larger than
45".
More generally, if the motion is not uniform, then the trajectory is not a straight line
(Figure lob). But every small sectionis nearly straight, and each section has a slope of more
than 45".
Y =tana
-
x
This is a line lying in the first quadrant.
Likewise, the y'-axis is defined by x' = 0,
Y
- = -cots (4.21)
x
This is a line which lies in the second quadrant. Moreover, the lines defined by (4.21) and (4.22)
are perpendicular (see Problem below). So the situation is as shown in Figure l l a .
Similarly consider Lorentz transformations, e.g.,
where 0 < a. Note that the two sinh terms have the same sign. The x' axis is defined by t' = 0,
t
- = tanha
x
This is a line lying in the first quadrant.
Likewise the t' axis is defined by x' = 0
t
- = cotha (4.24)
x
This is also a line lying in the first quadrant. The lines (4.24) and (4.25) do not make a right
angle. But as we stated earlier, the angle between the t and x axes has no physical meaning,
and it does not matter. The t' and x' axes are therefore as shown in Figure 11b.
Problem 8
(a) The positive XI-axisis defined by y' = 0 and I' > 0. Show that the line lies in the first (and
not the third) quadrant. Likewise determine the quadrant assignments for the lines in (4.21),
(4.22), (4.24), (4.25).
(b) Show that the lines (4.21) and (4.22) are perpendicular
(c) Show that the lines (4.24) and (4.25) are not perpendicular and that the angle between
(4.23) and the x-axis equals the angle O2 between (4.24) and the t-axis.
(d) What happens to the t' and x' axes in Figure l l b if ,O B l ?
(e) Re-draw Figure 11b if 0 > a.
Figure I2
Let the frame S' move with velocity V = pc with respect to 5'. Let a particle P have velocity
v' as seen by St. What is its velocity v as seen by S? Thisis the problem of the transformation
of velocity. For simplicity we consider all velocities along the x-direction.
According to Galilean transformation (Figure 12),
ax dxr
v=- , vr=- (same t ! )
dt dt
1 u=v'+V I Galilean
So this problem is also called the addition of velocities. (Incidentally, we consider the transfor-
mation vr + v here because the reverse would be the subtraction of velocities, which is a little
bit less convenient.)
1 J
If either the velocity of the St frame (V) or the velocity of the particle (v') is much smaller than
c, then (4.29) reduces to (4.26). Thus the law of addition of velocities does not contradict our
LL
common sense", which is based on experience and experiments at low speeds.
Problem 9
(a) An aeroplane is flying at 300 ms-' (- lo3 kmhr-I). It sees a second aeroplane flying at
300 ms-l relative to itself, in the same direction. Find the velocity of the second aeroplane
according to a person on the ground, according to (i) Galilean transformations, (5)Lorentz
transformations. Find the percentage difference in the two results.
(b) Repeat the above if these are space-ships and the two given velocities are 3 . 0 lo7
~ m s-'(0.1~)
instead of 300 m s-' .
(c) Repeat again if the two given velocities are 2.0 x lo8 m s-'(2c/3).
Problem 10
Let V be fixed (say 0 . 5 ~ ) .Plot v vs v'. Hence, or otherwise, show that (4.29) cannot lead to a
velocity larger than c.
Problem 11
Start with (4.28) and solve for v' in terms of v and V. Show that the result can be obtained
by v t, v', V + -V.
Derivation using two transformations
Recall from (3.25) that a Lorentz transformation is represented by
cosh a
- sinha
- sinh Q-
cash a
] [-
= cosh a 2 - sinh a 2
sinh a 2 cosh a 2
cosh a1 - sinh a1
- sink a1 cosh a1 1
Take the 00 component
Hence we see that in composing two transformations, the parameter a simply adds, exactly like
the angle in rotation. This makes the parameter cr very convenient:
a = a1 + a 2
which is exactly the same as the addition law (4.28). This derivation has the advantage that
it is obvious that IPI < 1, because it is tanh a. A second advantage is seen when making many
transformations. Also compare with Chapter 2, Problem 2.
By inspection, it is already clear that V can never reach 1 in any finite T . The reason is that
when V 4 1, RHS-, 0,so dV/dr t 0 and V no longer increases. More precisely, we can solve
(4.35).
Problem 12
Integrate the above equation with the initial condition that V = 0 when T = 0, and show that
the result is
-1
e2a~
V= = tanh a~
+
e2aT 1
Note two limits. (a) For r -t 0, V ar , which is the expected Galilean result. (b) For T t co,
v-, 1.
Using the "angle" a
We consider a series of co-moving frames at different times. The transformation from one to
the next involves increasing cr by Acr, where
tanh A a = p = AT
Since AT -+ 0, we get
Since cu is additive, the overall result is equivalent to
V = t a n h a = tanh aT
in agreement with (4.36), but the mathematics is much simpler.
The displacement Ax" = (At, Ax) transforms like a kvector, i.e., like (3.9). But the velocity
transforms in a complicated manner because-we divide by At (see (4.27)-(4.28)), and At is not
an invariant. If we divide by an invariant quantity instead, i.e., a quantity that is the same in
every frame, the result would again transformlike a 4-vector. The obvious choice is the proper
time AT, and we are led to define the 4-velocity up by
Ax"
=-
AT
Or, taking infinitesimal intervals
Hence
Problem 13
Since u p is a 4-vector, its "length"
A 3-scalar is any quantity that remains unchanged under a rotation of axes. Examples: mass,
temperature, electric potential, mass density, time.
Basic 3-vector
The basic 3-vector A x is a line joining two neighbouring points, in other words the displacement.
Figure 1
We discuss a short displacement rather than the coordinate x (which would be a long displace-
ment starting from the origin) because eventually we wish to generalize to curved space. In
curved space (e.g., the surface of the earth), a short displacement is a straight arrow, and is a
vector. A long displacement is not a straight arrow, and cannot be thought of as a vector. We
shall come back to this point later.
Transformation laws
Under a rotation of axes, the components change according to
, [Ax']
A Z ' ~= R ~ A X ~ = [R][Az]
where the matrix [R] is independent of the vector.
Other 3-vectors
Any three quantities
vi = (vl,v 2 , v3)
which transform in the same way, ice.,
V
-ti -
-~'j,,j b11= [Rlbl (5.4)
is a 3-vector by definition. Obviously 3-vectors may be obtained by
rn adding or subtracting 3-vectors; or
rn multiplying or dividing a 3-vector by a 3-scalar (or invariant).
The time elapsed, At, is a 3-scalar; so is the mass m. So the velocity and the momentum
Length of a vector
The length of a displacement vector A x is As:
AS)^ = A x - AX = A X ~ A X ;
The length of other vectors are defined in the same way.
Condition on [R]
Since the length has to be an invariant (i.e., a 3-scalar), a condition is imposed on [R]. From
(2.6)
@jkk -
- 6jk 9 PTI[~1
= [I1 (5.7)
To anticipate some other development, we can write this in a more complicated way as
Dot product
Consider a vector z = x + cuy, where a is an arbitrary scalar, Now the following is a 3-scalar
where
x .y = ziyi (5.9)
Since this is a 3-scalar for a n y a,it follows that x . y is also a 3-scalar, called the dot product,
Basic rank-:! t e n s o r
Let x, y be two vectors and define the following nine quantities as the basic rank-2 tensor.
General rank-2 t e n s o r
Any 9 quantities t i j which transform in this way is called a rank-:! tensor.
H i g h e r r a n k tensors
A rank 3 tensor is 33 quantities which transform as
ttijk =Rd~jrn~kn~lrnn
Sij is a t e n s o r
We cannot simply write down any 32 quantities t i j and say it is a tensor. We must check the
transformation laws. Consider the Kronecker S. In every frame
3
#j = gij - k l ~ j k 6 1 k (5.14)
But this condition is exactly the same as (5.8) (up to a trivial relabefig of indices). So b'j is
indeed a tensor.
Because
xixj is the fundamental rank 2 tensor
Sij is a tensor
x2 = )xI2is a scalar
so I i j is a rank 2 tensor.
More generally, if there are masses m a at positions x,, cr = 1, - - ,N , and
fi = C ma(&', - x y j ) (5.16)
(I
then this is also a tensor. In fact this is the moment of inertia tensor.
Contraction theorem
Instead of stating this theorem generally, we consider an example. Let t'j be a tensor and zj
be a vector. Then
is a vector. In other words, the dummy index j is contracted away, leaving a free index i. The
theorem states that this remaining free index i transforms like a vector. To prove this
Thus y' transforms like a vector. The generalization to more or fewer indices is obvious. The
case of contracting until there are no indices left, e.g., in analogy to (5.17)
y = t jxj
giving a scalar, is already known.
Reverse theorem
The reverse theorem is also true. Again we deal with an example. Let tij be a tensor and tj
be any three numbers. Further suppose that it is known that
y ' = t i j xj (5.20)
holds in every frame and y' transforms like a vector, then XI is also a vector. To prove this,
first we have
By comparing the last two equations, and if these hold for several different tensors t (we leave
it as an exercise to determine how many different t's are required), then the r.h.s. must be
equal even if we peel off the factor tnm:
xli = Rim zm
Problem 1
It is given that
and that cp is a scalar, xi is a vector. Prove that y' is a vector. State the necessary conditions
clearly. (As stated above, the conditions are not quite sufficient.)
5.2 Mat hematics of four-vectors
A 4-scalar is any quantity that remains unchanged under both a rotation of axes and a trans-
formation to another uniformly moving frame. Example: the mass (also called the rest mass).
Note that time is not a 4-scalar.
Basic 4-vector
The basic 4-vector AS is a line joining two neighbouring points in spacetime, in other words
the spacetime displacement.
Transformation laws
Under a transformation, the components change according to
O t h e r 4-vectors
Any four quantities
which transform in the same way, i.e.,
"Length" of a vector
The length of a displacement vector Ax is defined through
The length of other 4-vectors may be defined in the same way. Note that (As)2 may be +, -
or 0.
Metric
The matrix qpU is defined by
Lower indices
Define
In (5.27), f'"= qpu is the inverse matrix. The definition (5.27) may be applied to any vector,
and in fact also to tensors. We adopt the following names.
p { [ ~ TW
l [ In
d = IdW
The "inner" indices must be paired, one up, one down.
T
[L I, lr
[slP&luc = [771PU
Dot product
Consider a 4-vector z'= 5 + cry', where cu is an arbitrary scalar. Now the following is a &scalar
where
-.
xey= 71~vx'l~ (5.29)
Since this is a Cscalar for any a, it follows that i .y' is also a 4-scalar, called the dot product.
pu = x p Y V
t p u= x p y y (: ) tensor
etc. These are related to each other through the raising or lowering of indices by qPy and q p y .
The transformation law for tpVis
tWP = LP, LY LP ~ P P Y
P 7
The definitions for higher ranks, and for lower indices, are similar.
The first way to view (5.34) is that the two [rl] factors simply raise and lower the indices on
[L], i.e.,
qpu LVp
qpu = VpV qUPLVp3 Lp4 (5.35)
Hence
Also
is a (i) tensor
Problem 2
Show that (5.40) follows from (5.28). Also show that is a
~ p " (i) tensor.
The tensor qp,
What happens if we raise one index in qpY? h general
So putting t + 77
Problem 3
Show that qp, as defined by (5.41) is identical with P,.Hence P, is a ( :) tensor.
Contraction theorem
Again we consider an example. Let tPY be a (20) tensor and x, be a ( ) tensor, i.e., a
covariant vector. Then
Problem 4
Prove the reverse theorem: If in (5.42), it is known that tp" is a ( ) tensor and yr is a
(k) tensor, then x, is a (;) tensor. Also state clearly the conditions for the validity of this
theorem.
Summary
Although at first glance the mathematics of 4-vectors looks slightly complicated, the index
notation automatically keeps track of everything. This is really all that needs to be remembered.
In this case, by "all frames" we mean all frames rehted by a Lorentz transformation. Rot ations
may be considered special cases of Lorentz transformation.
In exactly the same way, a scalar field in spacetime would have the same value, but
different functional forms, in different reference frames.
There is no known example of classical scalar fields. There has to be a special reason
(roughly speaking some gauge symmetry) in order for a field to be long-ranged. This special
reason holds for some vector and tensor fields, but eannot hold for scalar fields. If the field is
long-ranged, the potential goes as l / r and the force goes as I/?; it can be "felt" far away and
detected classically. If the field is short-ranged, the potential goes as ( l / r )e-'IX, where X is
typically of nuclear dimension. Therefore such a field would be difficult to detect classically.
Vector fields
We skip 3-vector fields and come directly to 4-vector fields. Suppose at each point P in space-
time (i.e., each event), there are 4 quantities
A1(p), A ~ ( P ) A
Ap(P) = (AO(P), , ~(P))
such that under a coordinate transformation
where [L] is the transformation matrix for coordinate displacements. Then Aj' is said to be a
4-vector field.
In terms of coordinates, (5.46) becomes
Ap = (cp? A)
where cp is the scalar potential and A is the vector potential, i.e.,
B=VxA (5.48)
We can either try to check that (9, A) as defined by (5.48) transforms like (5.46), or
alternatively, we can postulate that there is a 4-vector potential Ap and show that (5.48) follows.
This is what we shall do later in this course.
Tensor fields
Suppose at each point P in spacetime (i.e., each event), there are 42 quantities
Ap"(P) p, v =-0,1,2,3
Then Apv is said to be a (i) tensor field. Tensor fields of higher rank are defined in a similar
manner.
The most important example is the gravitational field in relativity. Weak fields can be
regarded as a tensor h,,(x) on flat spacetime.
(In 3-space there is really need to distinguish upper and lower indices.)
e'o,&, G,&
(Note that vectors in spacetime, i.e., Cvectors, are denoted by ' rather than bold-face letter.)
These are illustrated in Figure 4. Then in analogy to (5.50)
Figure 4
Now if we change, for example, only x1 but keep the other components fixed, then
Think of A P as the displacement of a point P when one of the coordinates 'x is changed by
Ax,.
In other words
Incidentally, we can reduce all these statements to-3-space by just setting all time components
(e.g. Ax0, AyO)to zero.
Transformation property
Under a change of axes, the vector as a whole does not change, even though the components
change. This is illustrated in Figure 5 in the case of rotations.
Figure 5
Thus
A5 = AxlG1= (Lp,Axv)Zpl
(We write it in this form so that the repeated indices are next to each other, in "natura11" order.)
16
Multiply on the right by [L-'1.
This is exactly the same as the transformation of a lower index in (5.39), i.e., it is multiplied
by the inverse matrix on the right. In short, we only need to em ember that the lower index in
$ behaves just like any lower index.
5.5 Differentiation
Consider a point x = (xO,xl, x2,x3) in spacetime. In anticipation of the discussion on curved
spacetime, we shall not put an arrow on x. The case of 3-space is easily recovered by setting
xO= 0. Moreover, in this section all displacements A x are understood to be infinitesimal.
Differentiation of a scalar
We start with a scalar field p(x) and compare its value at two neighbouring points. Then
Now Acp is a scalar, and Ax" is a (i) tensor. Hence, by the inverse contraction
theorem,
The gradient operator
In other words, 3, transforms like a (!) tensor opemtor. The corresponding (i) tensor
operator is P.
Differentiation of a vector
Let A(x) be a vector field and consider the difference in value between two near-by points
(Figure 6).
Figure 6
AK = K(x+Az) - i ( z )
+
= [AP(x Ax)ZP]- [AP(x)ZP] (5.63)
3,
The crucial point is that $ (i.e., :, & in the case of 3-space) are constant vectors; they are
+
the same at x and x Ax, as illustrated in Figure 7:
Hence
where by definition
AAp = Ap(x + Ax) - Ap(x)
Let us take the p component in (5.64).
= coefficient of + in (5.63)
= u p
= A(Ap)
m (AA)' = A(Ap)
This is an important (and well-known) concept. Let us specify clearly what we mean.
Let p = 1.
(5.65)
The relation (5.65) states that these two processes are equivalent. In effect, it teaches us how
to subtract (and hence how to differentiate) vectors-simply do it component by component.
We can write this in differential form.
dA = d (ApZp)= (dAp)
Writing out the change dAp
AplVdxvt$
We read off (dA)p as the coefficient of e', in the above expression
Since (dA)p is a (i) tensor and dxv is also a (i) tensor, this shows that A', must be a
tensor.
How must these laws be modified in ordm that they take the same form in all reference frames?
In this Chapter we concentrate on the conservation of momentum, because it is more funda-
mental; we come to forces in the next Chapter.
Let us specify more clearly the requiremeot that laws take the same form in all frames,
Let there be a law L in frame S. Suppose we transform to frame S by the transformation law
7. Then we should get the same law in S', which we denote as L'. In symbols
6.1 Momentum
Newtonian momentum + Galilean transformation
First we show that Newtonian momentum (L(N)) is compatible with Galilean transformation
( I ( G ) ) , or in shorthand
Figure 1
Problem 1
(a) Particle a of mass 1unit is moving at a speed of 113 unit and hits particle b of mass 2 units
at rest (Figure la). After the collision, they move along the original direction, at speeds u, v
respectively (Figure 1b) . Assume Newtonian momentum = (mass) x (velocity) is conserved,
and Newtonian kinetic energy = ( 1 1 2 ) ~(mass) x (velocity)2is also conserved; find u and v.
[This is L(N).]
(b) Another observer is moving to the right at velocity V = 115 units. Find the velocities of a
and b before the collision, and also after the collision, as seen by this observer. Start with the
velocities in (a) and use the Galilean transformation for velocities. [This is 7(G).]
(c) Check whether momentum and kinetic energy are conserved in the frame S'. [This is L1(N).]
Problem 2
Prove the above relationship in general.
Problem 3
We can be even more ambitious. Assume only that Newtonian kinetic energy is conserved in
every frame, and that Galilean transformations apply. Prove that Newtanian momentum must
also be conserved.
Problem 4
(a) Particle a of mass 1 unit is moving at a speed of c/3 unit and hits particle b of mass 2 units
at rest (Figure la). After the collision, they move-dong the original direction, at speeds u, v
respectively (Figure lb). Assume Newtonian momentum = (mass) x (velocity) is conserved,
and Newtonian kinetic energy = ( 1 1 2 ) ~(mass) x ( ~ e l o c i t y is
) ~also conserved; find u and v.
[This is L(N). You can make use of the results from Problem 1.1
(b) Another observer is moving to the right at velocity V = pc with P = 1/ 5 units. Find the
velocities of a and b before the conision, and also after the collision, as seen by this observer.
Start with the velocities in (a) and use the Lorentz transformation for velocities. [This is I ( L ) .]
(c) Check whether momentum and kinetic energy are conserved in the frame St. [This is C(N).]
The problem comes from the fact that velocity does not transform linearly. We are led to
consider quantities that transform linearly under the Lorentz transformation. For this purpose,
consider the kmomentum of a particle, defined as
where c may denote a after the collision, and d may denote b after the collision. The conservation
of momentum in frame S takes the form
Define
then
PP=0 , p=o,1,2,3
By (6.3), then, we also have
PfP=0 , p=0,1,2,3
so momentum is also conserved in the St frame.
Spatial Components
The spatial components of the kmomentum are p = mu or
,
m / d n
Some books call m the rest mass, and M = m7 = the relathistic mass. Then
p = Mv takes the usual Newtonian form. This is an extremely bad convention, and will not
be adopted here. The reason is that it suggests (incorrectly) that all Newtonian formulas can
be made correct by changing m + M.
Figure 2
Time component
Next consider the time component of the Cmomentum
This is also conserved, in a way that it is intimately related to the conservation of momentum
To recognize what p0 is, consider the non-relativistic case v << 1.
1
= const + -mv2
2
+. (6.11)
We have restored the factors of c (= I), and also specialized to a case where m does not change.
It is recognized that up to an additive constant, p0 is the Newtonian kinetic energy
(approximately). (So far we are not considering any potential energy.) Thus we call the
energy E:
Figure 3
Kinetic energy
From (6.11), we see that even at rest, there is an additive constant E = mc2. The kinetic
energy IC is defined as the energy E minus this constant
+
E = mc2 IC
so that nonrelativistically
but generally
Application t o collisions
Consider energy conservation in a collision a + b 4 c + d. Then, in obvious notation
Then
In these cases, the additive "constant" is not really constant, and there is an effect. Heuristically,
we can say that a certain amount of mass, Am, has been converted to energy.
However, in the analysis of collisions, it is usually more convenient not to separate out
the kinetic energy.
Analogy
The equation E = mc2 is famous. An equivalent equation is Q = Am c2. It is common to say
that mass is converted to energy, and that they are quite different things. Actually, the modern
view is that mass is energy, and the factor c2 is just an LLexchange rate".
This point of view is best illustrated with an analogy. Let us assume that in a certain
country there are only (a) paper money in bills of $1000, and (b) coins in $1. A Martian
who arrives in this country first discovers two separate conservation laws governing monetary
transactions: (a) the law of conservation of paper money (m), and (b) the law of conservation
of coins (E). Later, he finds that rn can be converted into E, at a rate E = m x 1000, and that
really only the sum of the two is conserved.
Paper money and coins are conceptually the sane.
The conversion between the two is not a real transaction, or anything important.
The conversion rate is not fundamental. It is just a consequence of the fact that we use
different units for paper money and coins.
It is best to think of the "conversion" between mass and energy in the same way.
I 2
p = mv , E = 2-mv
By eliminating v, we get a direct relation between E and p,
E = - p2
2m
In the same way, relativistically we have
By eliminating v, we get
This relationship can also be derived by the use of the invariant PPP,. (See Problems below)
Problem 5
Derive (6.23) from (6.22).
The last step is only algebra and not physics. It may be messy but not conceptuany important.
Nevertheless there are some standard tricks.
0 For high energy collisions, all velocities are very close to unity and it is inconvenient to
We now do this for a general elastic collision in a straight line, as illustrated below:
a + b c + d
Mass m M m M
Momentum P 0 C
pJ p
Ji
Energy E M J% Ed
The unknowns are pc , pd. We need not regard the E's as unknowns, since
From (6.26)
Subtract
Problem 7
Re-do this derivation in the Newtonian case and solve for p,, pd. Show that in this limit, (6.28)
and (6.29) reduce to the same result.
Problem 8
Show that if the two masses are equal, all the momentum is transferred to the second particle.
Can this be derived in a simple way?
,
Problem 9
Show that for very high energies ( E >> n,M), nearly ail the momentum is again transferred
to the second particle. This can be understood heuristically as follows: At very high energies,
the masses make no difference, so the equal mass case must be applicable.
Example 2
Antiprotons were first produced in the following reaction:
P + P + ~+P+P+<
Momentum P 0 P
Energy E M E'
At threshold, i.e., a t the minimum energy required to produce p, the four particles in the final
state move together without any relative velocity, and therefore behave like a single particle of
mass 4M. Find the threshold energy E. The mass of the proton is 0.94 GeV/c2.
Solution
E + M = E'
E~+~EM+M = ~
E'~
+ +
( P 2 M2) 2EM + M2 = p2+ (4M)2
2 E M + 2 M 2 = 16M2
E = 7M = 6.58 GeV
Example 3
Consider the following reaction, in which an electron hits a proton at rest, and produces a A.
What is the minimum energy of the election for this to occur?
e - + p + A
Mass m M m *
Ml
Momentum P 0 P
Energy E M El
The masses are rn = 0.5 MeV, M = 0.94 GeV, and MI= 1.24 GeV.
Solution
Again, at threshold we may take the final state to be a single particle, with mass m + MI
Check: If M' = M , E = m, i.e., the reaction is possible no matter how small the kinetic energy.
This is what we expect.
Solution
After the collision, the e- and the A move separately and we can no longer regard the system
as a single particle. So the situation is as follows.
e- + P + e- + A
Mass m M m M'
Momentum
Energy
P
E
0
M
E l m
El W
We regard P t and Q as the unknowns. The energies E' and W can be expressed in terms of
the momenta. So we need two equations. These are given by the conservation laws:
w2 = +
(E M -E')~
= E2 + M 2 + Et2 + 2EM - 2EtM - 2EE'
Q2 = ( P - p'j2
= p2+pn-2PP'
Subtract and use W 2- Q2 = Ma, E2 - P2= m2, El2 - PI2 = m2.
In order not to get too involved with the arithmetic, let us neglect the electron mass:
m = 0, E = P, E' x lP'l
We further assume Pt > 0. (This has to be checked later.) Then
Problem 10
Suppose the electron bounces backwards, then E' = -Pt. Find the value of E' in this case.
Problem 11
Return to the general case described by (6.31). Square this equation and express En in terms
of Pt2. Hence obtain the algebraic solution for PI. Explain why there are two solutions.
Oblique, elastic collisions
Example 5
A proton (mass = 0.94 GeV) travelling at momentum 30 -GeV hits another proton at rest. The
incident proton scatters at 3 5 O , while the target proton recoils without changing its identity.
Find (a) the final momentum of the incident proton, (b) the final momentum of the target
proton, and (c) the direction of recoil of the target proton.
Q
Figure 4
P + P P + P
Mass M M M M
Momentum
Energy
P
E
0
M
PEt3p.J W
The three unknowns are PI, Q, #J where q5 is the angle of recoil. The energies can be expressed
in terms of the momenta. Thus we need 3 conservation laws:
w2 +
= ( E M - Et)'
= E~+M~+E~+~EM-~E'M-~EE'
+
Q2 = (P - Pt cos 19)~ (PIsin 8)'
+
= p2 pf2- 2PPt cos 8
Subtract
+ +
(E M ) 2 ( ~ RM2) = [(E+M)M + PPcosBj2
+ +
(E M)~(P" M') = (E + M)lIM2+ 2(E + M ) M P P f cos B f p2pa cos28
This is a quadratic equation for PI. The constant term cancels; thus one solution is P' = 0.
It could have been guessed from the start that a solution -3s P' = 0, Q = P, 4 = 0. (The two
particles exchange roles, so obviously everything is conserved.) Thus it is seldom necessary to
solve a quadratic.
Problem 13
A photon strikes an electron (mass rn) at rest, and is scattered at an angle 8. The energy of
the photon is E = hc/X, where h is Planck's constant. Find the increase in wavelength AX.
This is the famous formula for Compton scattering.
Example 6
A Z0 particle (M 90 GeV) is travelling with momentum P = 150 GeV. It decays into e+, e-.
Find the angle between the e+ and the e-.
Q'
Figure 5
Solution
From the conservation of energy and momentum
In this case rn 3 0. SO
P .
cos 8 = = 0.857
J F
0 = 30.96" , 20 = 61.82"
Example 7
A neutral particle X decays by
where the masses are 0.94 GeV and 0.14 GeV for p and T - respectively. The original particle
X, being neutral, was not observed, but the momenta of the final particles were measured to
be 20 GeV for p and 15 GeV for T - . The angle between them was found to be 18". Find the
mass of X.
p 20 GeV
n- 15 GeV
Figure 6
Solution
First determine the angle 0 by considering the y component of momentum
20 sin 8 = 15sin(l8" - 8)
15sin 18"
tan8 =
+
20 15cos 18"
0 = 7.70" , 18" - 8 = 10.30"
Next determine the momentum P of X:
P = 20 cos 7.70" + 15cos 10.30" = 34.578
Also determine the energy E of X
The mass M of X is
M =d n = 5.56 GeV
This is one way to determine the mass of unstable particles.
Example 8
Electrons with energy 18 GeV hit a stationary proton target. In one event, the electron is
scattered at l o , with an energy 17 GeV. The target proton recoils, and is excited to become a
new particle X with mass MI. Find MI.
Solution
Figure 7
M R =.
Figure 8
Problem 14
In Figure 8, the width of the peak is AE' = 0.3 GeV. Assume parameters as in Example 8.
Find the uncertainty in the mass M'.
Example 9
A proton with momentum 3 GeV hits another proton at rest. Find the velocity P of the CM
frame. Also find (a) the momentum of each proton, and (b) the total energy in the CM frame.
Solution
In the laboratory frame, the momentum of each particle, and the total momentum are
Thus the picture in the CM frame before the collision is as shown in Figure 9a. Because the
momentum is zero after the collision, the only possible situation is as shown in Figure 9b.
Figure 9
Energy in CM frame
Refer to Example 9. We see that of the total energy Et = 4.084 GeV in the lab frame, a part is
related to the overall forward motion. Only the part E: = 2.771 GeV is really available in the
CM frame, e.g., for creating new particles. The next Ezample considers this in a general way.
Example 10
A particle of mass M and energy E hits a target particle also of mass M. Find the tot a1 energy
E* in the C M frame.
Solution
The total momentum and energy in the lab frame are
Example 11
In a colliding beam experiments, a proton beam of energy 100 GeV collides head-on with a
second beam of the same energy, travelling in the opposite direction. What would be the
equivalent beam energy if the same experiment is done in a fixed target situation? Take
M 1 GeV.
Solution
E * = J m - d % ? allinunitsofGeV
2E = 4 x lo4
E a 2 x lo4 GeV
6.4 Relativistic invariants
Available energy
From the last Example, it is clear that specifying the energy could be misleading: alarge energy
(e.g., E 2 x lo4 GeV) in the lab frame actually corresponds to a much smaller energy (e.g.,
E* 200 GeV) in the CM frame. The real physical situation should be expressed in terms of
N
a relativistic invariant, i.e., a quantity which is the same in every frame. For a collision such as
Example 12
Refer to Example 10. Find P p and s in the lab frame and the CM frame.
Solution
P'" = (E*,
0, 0, 0) Momentum is zero by definition
s = -Pip" = E * ~
=2M(E +M)
In fact, this is a simpler way of doing Example 10.
Example 13
+
Consider the reaction ?r p -,...,where the mass of the pion is rn and the mass of the proton
is M. If the energy of the x is E, and the proton is at rest, find the energy E* in the CM.
Solution
Momentum transfer
Figure 10
where c is the same as particle a, but deflected, as shown in Figure 10a. Figure 10a shows
that particle b was originally at rest, but recoils, and may even break up. We see that some
momentum is transferred from the beam particle to the target. This is illustrated more clearly
in Figure lob, where the wavy line denotes the transfer of momentum (often of other quantum
numbers as well). Thus we define the Cmomentum transfer
Example 14
Refer to Example 8 and find an expression for t in terms of the incident energy E, the scattered
energy E' and the scattering angle 6. Assume that E and El are large enough that the electron
mass may be neglected.
Solution
Pa = (E, 0, 0, E)
Note that t is the same in every reference frame, but the right hand side refers only to the
laboratory frame.
kp = (w, k)
There are two ways to see this.
(a) From quantum mechanics
is the phase. For example, 8 = 0,27~,. . - are the peaks. But the phase is an invariant (a peak
is a peak in any coordinate frame), hence kpx, is invariant. But x, is already known to be a
Pvector , therefore kp is a 4-vector. (Contraction theorem).
pP = 4%7v)
This has the following properties.
0 It is conserved.
0 It reduced to the usual momentum ( p = 1,2,3) when v << c.
Therefore i t remains for us to deal with (b) and (c) in this Chapter.
Figure 1
However, most forces are not relativistic - they do not assume the same force law in all
reference frames, but have a special reference frame in which the force law would be simplest.
For example, consider the force F on the mass m in Figure la. This force has a special frame
- the frame in which M is at rest. There is no reason to believe that the force law in other
frames would be equally simple. Next consider the force F on the mass m in Figure lb. This
is a frictional force due to the table, and will be simplest in the frame in which the table is at
rest. There is not much point in discussing the relativistic version of these forces.
The situation is different for electromagnetism. The force is due to the electric field E
and the magnetic field B, which exist in vacuum. Vacuum is the same to all observers. There
is no such thing as a special frame in which the vacuum (or "ether") is at rest. Therefore the
laws of electromagnetism should be the same in all reference frames. So we shall focus on this
force in this Chapter.
To take a broader view, there are 4 fundmental forces. In order of decreasing strength,
they are:
the strong interaction, responsible for nuclear binding
0 the electromagnetic interaction
0 gravitation
(How we divide the different types of forces, or in reverse, how we integrate them, depends
on the level of understanding. One hundred and fifty years ago, electricity and magnetism
would be regarded as two types of interactions; now they are regarded as unified - which is
one of the successes of relativity, as will be discussed below. Recent research has unified the
electromagnetic and weak interactions through the standard model, and to some extent also
the strong interaction, but we ignore these for the moment.)
Of these four interactions, two are short ranged: the strong interaction has a range of
about 1 fm (1 fm = lo-'' m), while the weak interaction has a range of approximate fm.
Hence, they are manifested only rnicroscopicaliy and in quantum phenomena, but not in macro-
scopic, classical phenomena. The other two - electromagnetism and gravitation - are long
ranged and manifested macroscopically. For this course we shall be concerned with them.
All these forces are transmitted by fields (like the electric and magnetic field) which
reside in vacuum, and thus have no special frame. The relativistic transformation of these
forces is therefore of central importance.
P* = ( E ,P)
We emphasize that F does not transform simply. Then it is found, experimentally, that the
force on a charge q is
in obvious notation. This law is well known in the non-relativistic case ( v << c). Everything
remains valid, even for large velocities, provided p is taken as the relativistic momentum (7.2).
This result can be understood at two levels. First, we can simply accept it as an
experimental fact; later we shall see the experimental consequences. Secondly we can ask how
this law fits into a more consistent overall picture.
Deflection in a magnetic field
Consider charged particles passing through a magnetic field B, say out of the page
(Figure 2a). Since the magnetic field does no work, the magnitude of the momentum does not
change; only its direction changes. Therefore the trajectory is an arc of radius R. What does
R depend on?
+
Consider two moments t and t At, and compare the momenta. In this time, the
momentum vector has changed direction by Ad, so (Figure 2b)
Example 1
Compare the radii of curvature for particles travelling at (a) 0.99 c and (b) 0.9999 c.
Solution
Forvxc, p=myvzmyocy
(4 1
= 7.09
= J ~ ~ Z F
This kind of experimental observation verifies that it is correct to use the Lorentz force law
with the relativistic momentum myv.
where we have assumed that the motion starts from rest, and
Note that T can be interpreted a s the time taken, according to Newtonian physics, for the
particle to attain velocity c.
t
Figure 3
For small t, the denominator 1, and the situation is given by line 1, which is the Newtonian
result. For large t , the velocity saturates at v 1. Thus, the fact that velocities cannot be
larger than the speed of light is built in. Also, in general, qE can be replaced by the force F.
We can analyse the motion in another way.
Therefore
where we consider of any force F perpendicular to the direction of motion. The first equation
can be analysed simply: because the force is perpendicular to the motion, the energy does not
increase, and y = constant; thus v, is constant as well. Next carry out the differentiation for
(7.11).
The first term is zero, because the instantaneous value of v y is zero. Thus
Magnetic deflection belongs to the second case of force-being perpendicular to the di-
rection of motion, and from (7.5), it is also seen that the radius oT curvature is modified by a
factor of 7 (not r3).
There is a very important lesson. It is not true that all Newtonian formula can be made
correct by replacing the mass m by the "relativistic massn M = my;such a replacement works
for perpendicular forces but not for pardel forces. For this reason, the idea of a "relativistic
mass" is seldom used nowadays.
This involves dividing a 4-vector Ax'' by a 4-scalar; the result is guaranteed to be a 4-vector,
and will therebre transform in a simple way.
For exactly the same reason
Explicitly,
Since the rate of change of the energy E is the work done per unit time, i.e., F - v, thus
If we can state the Lorentz force law in terms of Kp rather than F, then the'covariance properties
would be more apparent.
The proof of this postulate will come from the transformation of the fields.
From the *otential we can form the field tensor
= (V x A), = B3
In general
In explicit matrix form
The first index is the row, so Fol is row 0, column 1, i.e., the entry El.
Let us check this component by component. First of all, note that u p = (7, yv), u, = (-7, yv).
Thus, combining (7.14) with (7.21), we have the law of motion in covariant form
Assuming that Ap does indeed transform like a 4-vector, and hence that Fp" does indeed
transform like a (i) tensor, it is guaranteed that (7.22) leads to the same physical consequence
in every reference frame.
7.6 Transformation of fields
Figure 4 I'
Consider a frame S' moving at speed V = pc along the x-direction, relative to a frame S. The
Lorentz transformation is given by
,q +
= ( L ~ ~ LL ~ ~ ~ ~F L~
LOILIOF1O +
~~ ~ FLO^~L~~~~ + l l )
= (LO,,L~~ - LO^ L ~ ~ ) F O ~
7 7 - (-yP)(-yPjlEl
= 1
= y2(1 - P2)~ 1 El
=
E; = y(E3 + PB2)
Since @ can be regarded as a vector in the +z direction, all these can b e summarized as
Here 1 and I refer to the direction of relative motion, i.e., that of /3. Next for the magnetic
field
B: = r(B2SPE3)
Likewise
B; = r(B3 - PE2)
All these can be summarized as
B', = ~ ( B r p x E l )
Example of field transformation
Figure 5
We consider how the same phenomenon appears to two different observers, in order to illustrate
what happens under field transformations. Consider two capacitor plates which create an
electric field
Let a particle of mass m and charge q traverse this space at speed v (Figure 5). In the lab
frame S, this particle experiences a force in the y direction, and accelerates at
(See (7.13).)
Now go to the co-moving frame St. What are the fields?
Note that a magnetic field appears! In the co-moving frame, the particle is non-relativistic, so
Newtonian mechanics apply:
Although there is a magnetic field, it does not matter, because the particle has zero velocity in
the co-moving frame, and hence does not couple to the magnetic field.
Are (7.25) and (7.26) consistent? To check this, we see that
- -
dt' -
Hence (7.26) becomes
~ L Y= --
- f qE
dt2 7m
which is identical with (7.25).
An important aspect of the field transformation is that E and B are mixed. They are
really different aspects of the same thing, i.e., different components of the field tensor F p Y .
Checking invariance
This example shows that we can check invariance (i.e., that phenomena appear to all observers
in a consistent way) by three different methods.
We can check one phenomenon at a time, as in this example. There are infinitely many
phenomena to be checked.
We can check the non-covariant equations of motion, again as in this example.
a We can check the covariant equation of motion, which is much simpler. We check these
once and for all.
Relativistic invariants
Out of the tensor Fp",we can construct two quadratic invariants. The first is
Problem 1
The transform of the totally antisyrnmetric tensor in another frame is given by
and we want to show that this is exactly the same as P " P 9 Take, for example, p = 0, u = 1,
p = 2, a = 3, then d'"Pu = 1, and we need to prove:
Prove the above identity. (Hint: Consider determinants. Also, you need to make an assumption
on a sign.)
Thus
Referring to (7.20), we see
Hence
Again, this must be the same in every frame. For example, a plane wave is characterized
by B - E = 0, and we now see that this condition is the same in every frame.
How do the charges and currents generate the fields - Maxwell equations
We have dealt with the first part relativistically. Now we deal with the second part.
Four Current
First, we show that the charge density p and the current density J ,together form a 4-vector.
To see this, we have to write down the expressions for p and J. Suppose there is a single charge
q at position X. Then
Integrating this over space gives the correct charge. The more general case with charges q, at
positions X(,) would be
but it is sufficient to deal with the-transformationproperties of a single term. Next, the current
density is
J = charge density x velocity of charge
= qvS3(x - X)
where v = dX/dt. We insert the factor
Problem 2
(a) There are N charges, each of magnitude q, in a rectangular volume of area A and length L.
The charges are not moving. Find J'.
(b) Now go to a frame which is moving at a speed along the length direction. In this frame,
the charges are (i) moving at a speed -P, and contained in a volume A' x L', where A' = A
and L' = LIT. Find J'p.
(c) Show that J p and J'" are related exactly by the Lorentz transformation.
We can now write down Maxwell's equation in two groups. We simply write down the
covariant form, and check that they give the correct result.
Homogeneous equations
First, if any 2 indices are the same, this equation is -trivial. For- example, let the indices be
p = 1, v = 2, p = 2. Then (7.37) gives
which is a trivial identity. So we only have to consider the case of all 3 indices being distinct.
(a) Missing index is time-like, i.e., 0
V-B=O
(b) Missing index is space-like, e.g., 1
(V x E), = --B,
a
at
1
~p&u = ~ f r v o pP F " ~
Since p , v, a, have to be all different, once we choose v, it is just the missing index among
dfiFQP.So it recovers (7.37).
Inhomogeneous equation
or we can write it as
where
a a a a a
ax'.'
(a) v is time-like, i.e., 0
dt
VxB=4nJ+-E
a
Thus the two inhomogeneous equations:
(7.42) - Gauss' law
(7.43) - Ampere's law with displacement current
are again components of the same covariant equation (7.41).
To summarize, Maxwell's equations are
In terms of potential
because the six terms cancel in pairs, e.g., the two terms underlined.
The inhomogeneous equation becomes
Gauge transformation
The potential is convenient, but contains "too much" information. In other words, we can make
a change - called a gauge transformation - on Ap and not affect the physics. Let A ( x ) be a
Cscalar field, and let
A' -, AP+d'A
FPW-, dP(A"+ a"A) - P ( A P+ P A )
= ( P A " - P A P ) + (a'dv - avap)A
= pV
since the order of differentiation in a mixed derivative does not matter. Since classical electro-
magnetism depends only on F*", it is invariant under the gauge transformation.
We can make use of gauge transformations to choose any value of d . A = PA'.
Summary
All of electromagnetism is contained in
Ch7-2.tex; December 29, 1997
8 Action Formalism
8.1 General principles
Different ways of specifying dynamics
We have now come across two ways of writing down the dynamic evolution of a system. Consider
for example the motion of a charged particle under the influence of an electromagnetic field.
The first method is
dp = F = q(E+v x B)
-
at (8.1)
This is not covariant. The second method, which is slightly better, is
Newtonian mechanics
I XI
I
I
I
I
22 2
Figure 1
Forget about electromagnetism for the moment, and go back to Newtonian mechanics
in one-dimension. The variable is x ( t ) . Suppose it is given that
What is x(t) in between? Graphically, this means determining the correct path (solid line)
among all possible paths (say the broken line) in a t-x diagram. To conform with the usual
practice in spacetime diagrams, we draw t vertically, even though it is usually the independent
variable (Figure 1). In the equation of motion approach, we say that the correct path is the
one which satisfies the differential equation of motion, such as (8.1), or a Newtonian equation,
e-g.,
together with the conditions (8.3). The system is specified by F(x) or V(x), e.g., for a spring,
F = -kx, or V = kx2/2.
The least action approach takes a completely different point of view.
2. Give a way of calculating a oumber S[P]= S[x(t)] for each path P = [x(t)]. S is called
the action. For all cases we shall consider, S is made up of contributions A S for each
segment of path (additivity assumption).
3. Consider all possible paths, and select the one whose action is minimum. This is the
correct path.
Figure 2
A path P is independent of the coordinates used to describe the path. Figures 2a and 2b
show the same path P in two different coordinate systems. Here we are thinking of a Lorentz
transformation; e.g., see Chapter 4, Figure 8. If the action S ( P ) does not depend on the
coordinate system, then the principle of least action will seleck the same path in any coordinates.
It guarantees that the physics is invariant. Thus, the principle of least action is specially
convenient for
discussions in relativity, where invariance is a central issue, and
even in Newtonian physics, when using generalized coordinates - otherwise it is difficult
to verify that the equations in different generalized coordinates give the same physics.
(a) Because of the additivity assumption, we need only consider a small segment of path, centred
at ( t ,x) and of length (At, Ax); see Figure 3.
Ax
Figure 3
The action A S must be proportional to At:
L = L(x, x, t )
(c) Assume the system has no explicit time dependence, then t does not appear.
(d) Assume the system knows about the position x only though the potential V(s);
L = L(V(x), 5)
and moreover assume that L is linear in V(x).
(e) Expand A, B. The lower order expansion should be adequate for small velocities
which is the same for all paths satisfying the boundary conditions. It is irrelevant for picking
the minimum, and we shall set it to zero.
The term alx would change sign under reflection (x ---+ - x ) . It is not allowed if the
physics is reflection-symmetric. al = 0.
gives the Newtonian equation of motion. (Actually, only the ratio bo/a2 matters, since multi-
plying S by a constant has no effect .)
Euler-Lagrange equation
We now start with
and derive the Newtonian equation of motion. Let x(t) be the correct path, and let z(t)-.
be a neighbouring path; 77 is considered a small quantity (Figure 4).
Z
Figure 4
Since the neighboring paths must satisfy the same initial and final conditions
q(t1) = q(t2) = 0
Now compare the two actions
The first term can be integrated exactly.
If the original x(t) gives a minimum, then A S must vanish for all first order changes. This
means [ ] must be zero:
Hence
We call dL/% the conjugate momentum r. It may or may not be the same as the Uordinary"
or mechanical momentum p.
8.3 Action principle for a relativistic free particle
We now consider relativity, but start with a free particle.
Choice of action
Consider a single particle of mass m, moving in 3 dimensions. So think of Figures 1-3 as
space-time diagrams, and x + x. A point is now denoted as xp and the segment in Figure
3 is Ax" = (At, Ax). Since the segment represents a possible path, it must be time-like, i.e.,
IAtl> IAxl, Ax"Ax, < 0.
We now try to repeat the arguments in section 2, but with two differences:
Because of Lorentz invariance, the choice is much more limited.
It is no longer so natural to use L. The reason is that A S is invariant, but At is not. So
L = AS/ At is not invariant. Conceptually, it is better to deal directly with the invariant
quantity AS.
(a) Because of the additivity assumption, we can focus on a small segment Ax" (Figure 3).
(b) There is no dependence on t. If the particle is free, every position is equivalent, so there is
no dependence on x. Hence A S depends only on Ax".
(d) Additivity implies linearity. If we double a small interval, A S must be doubled. Therefore,
We know that S is invariant, because of (8.13). But now we have written it in a way which is
not explicitly covariant .
With the Lagrangian, we can immediately apply the result of section 8.2.
where v = x
So the conjugate momentum .~r is exactly the same as the mechanical momentum p. From the
Euler-Lagrange equation (8.1I), we then get
dt-
We see that the relativistic momentum emerges very naturally.
Equation of motion: covariant approach
In the non-covariant approach, we use t as the independent variable, and express x = x(t)_,v =
v(t) etc. However, the four components of x' have the same status, so such a different treatment
of one compooent is not elegant. A better approach is to use an arbitrary path parameter s as
the independent variable, and let x p = x'(s). For example we can have
t = s+0.3s2 , O<s<l
x = sin xs
y = cosxs
z = o
The path parameter has no physical meaning. For example, we can let s = sI2
We assume all paths are labelled on the interval [sl, s2]. Because of the freedom to relabel, this
does not impose any real restrictions. The boundary conditions are
Integrate by parts and use (8.20) to discard the integrated term. Also call %(s) = 6xp(s)
-
As indicated, this is valid for any- -path parameter s.
1 any s
In other words, label each point on the path by the proper time r elapsed along that path.
Then
dx" dx, -
--- -1
ds ds
I
' T = proper time
8.4 Action principle for a particle in the electromagnetic field
Choice of action
We assume that there is a kvector field Ap(x), which describes electromagnetism. (We could
ask: what sort of theory would be obtained if there is some 4scalar field? It would indeed be a
simpler theory, but it does not correspond to electromagnetism empirically.) We now construct
the action step by step.
where the interaction term depends on Ap. We further assume that Ap enters linearly.
ASI = qA,(x)Axp
where q is a constant, which will turn out to be the charge. Hence
Note that A,(x) is to be evaluated at xu = (t,x). Since the form is not covariant anyway, let
us separate the time and space components in the second term.
Ap = (4,A)
We can now use the standard techniques to derive the Euler-Lagrange equation. But first note:
L # KEPE, showing that the latter is not general.
The conjugate momentum will turn out to be diEerent from the mechanical momentum.
--a, - -
Mi) ( a A j dAi)
+ q v j ---
dt ax; at ax; axj
where
Thus the somewhat odd looking Lorentz force law comes naturally from the very simple
term SI= q J A , d x , .
where So is the same as for the free particle case. Thus from (8.21), and choosing s = r =proper
time
Secondly,
Under a variation
The change is
Upon integrating by parts, and discarding the integrated term, this is equivalent to
= (PA" - ~"AP)k;;,6~,
Here '= dlds, or, upon choosing s = T , ' = d / d ~and
Hence
Hence
13
A technical note
We put s (a general path parameter) ---+ T (proper time) only after the variation. Can we set
s --+ T from the beginning? The answer is no, since we have assumed that the range of s is
the same for all paths, e.g., the two paths in Figure 4. This is possible if they are labelled by
s - since s is general, we can always re-scale one of the paths. But this is not possible if all
paths are labelled by the proper time to start with.
The reason is that when we talk about Maxwell's equation, we have to refer to E and B (or Ap)
at an arbitrary spacetime point, whether or not there is a particle at that point. In contrast,
when we were concerned with only the Lorentz force law, we only need to be concerned with
E and B at the position of the particle.
Also, we write the particle label in ( ) so as not to confuse with a space-time index.
We allow a different path parameter for each particle. The second form is a short-hand only;
in actual calculations, we must go back to the first line.
The two forms (8.34) and (8.36) are equivalent. The first form is more convenient when we
want to vary the particle paths. The second form is more convenient when we want to vary
Afl(x) at an arbitrary space-time point.
(d) We now want to add a third term which depends only on the fields:
SF = SF[Ap(x)]
Then
For this to be a minimum, the first order variation when changing xra)(sa) --+ xra)(sa) +
+
6xra)(sa),Ap(x) -+ AD(x) SAP(x) must be zero. In general
The first [ ] comes from So and SI, since SF does not depend on the particle positions.
It gives the Lorentz force law for particle a. We have already derived this, although at
that time we did not include the label a.
The second [ ] comes from SI and SF,since So does not depend on the fields. I t should
give the Maxwell equations. We shall derive this in the rest of this section.
Choice of action
Let us now try to guess the action SF.
(a) We assume it is quadratic in A'; this will lead to linear equations. This assumption is based
on the principle of superposition.
(b) We assume there is gauge invariance, then A" can only enter through F'. Note that the
following quantities, for example, are not gauge-invariant
(PA" + a"AP)(apAv+ a"Ap)
&
(A)
ApA,
(c) It must be a 4-scalar. The only choice is proportional to
So let us assume
I I
Choice of units
In fact, the choice of k has no physical meaning, and only reflects a choice of units for electro-
magnetism. We can see this as follows.
(a) The unit of S is [energy]-[time]=Js. This is readily seen from S = Ldt, L = KEPE, for
example. This unit has nothing to do with electricity.
This means qq5 must have the unit of energy, i.e., J. If we choose to measure q in Coulombs,
then
[q][$] = C x J C-'
However, we can choose to measure q in some other units, say esu
[q1[q5] = esu x J esu-'
In other words, we can make a transformation
q+a!l, $-+a-l4
and it does not change anything.
(c) Of course, to preserve Lorentz invariance, we have to change all 4 components of the
potential, so
Figure 5
subject to the boundary condition. We now show that this also follows from minimizing the
energy
6U = /y d 3 x v 4 . V6d
Now consider the identity
Put this into (8.43)
(Gauss' theorem)
= {d~.V1(6$)
This is zero because 64 = 0 on the surface. In other words, we only consider those 4 satisfying
the boundary condition, so there is no variation on the surface. Hence
-.
This is really the same as the familiar integration by parts
The rule is
/ udv = -
/ du v . + surface term
In this case
(Vd) . ( V 6 0 = - [V (VC)]64 + surface term
If 4 is the correct solution, then (8.45) has to be zero for any 64, and hence '7'4 = 0.
Figure 6
The term So does not matter if we vary only A,. The integral is over a region of spacetime
bounded by two time-like surfaces Sl and S2(Figure 6). The values of Ap(x) are specified on
Sl, S2,i.e.,
G(FpVFpu)
= 2FpUSFpV
- aV6Ap)
= 2Fp,(.apSAV
= -4F,,aVSAP because of symmetry
+
= 4(aYFpu)SAp surface term
Figure 1
The field E is a property of the region of space; it is the same for every drop. But the different
drops have different ratios of qlm. So they accelerate differently. The ratio is
q unit of force
- N
m unit of inertia
Free fall
Now consider objects in free fall, e.g. oil drops, under the iduence of gravity g. (Figure 2).
Figure 2
We have
The factor rn on the left comes from Newton's second law. It measures inertia. The factor m
on the right comes from the law of gravitation F = G M m / R 2 = m g ; it is like q in (9.1), and
measures the unit of force. There is no reason why they should be the same. So let us denote
them as
m; = inertial mass
m, = gravitational mass
If the ratio of mg/m;is different for different objects, then they would accelerate at
different rates. Yet it is well known that in free fall, all objects accelerate at the same rate.
(This was known to a few percent at the time of Galileo, and to an accuracy of at least 1 part
in 1012 nowadays. We omit the experimental determinaiion of this fact.) This means rng/miis
a universal constant. But convention we take it to be 1.
Postulate
The starting point of general relativity is that mg/rn;= 1 is not an accident; therefore we
believe it is exactly 1.
In other words,
Energy gravitates
Each atom (N 1 GeV rest mass per nucleon) contains of its rest mass as nuclear binding
energy (- 1 MeV per nucleon), and about of its rest mass as electrostatic binding
energy (N 10 eV per nucleon, e.g., 13.6 eV for H). Since mg/mi is unity to an accuracy of
10-12, this means that
Energy gravitates the same way
as "ordinary" matter.
This is known to an accuracy of 10-12/10-3 lo-' for nuclear energy, and 10-12/10-8
for electrostatic energy. We believe it to be exact.
-
lW4
The immediate consequence is that even light (photons) will exert a gravitational force
and will also be affected by an external gravitational force.
Principle of equivalence
Figure 3
Figure 3a shows an enclosed box in a gravitational field g. Various objects are falling
down, all at the same acceleration g. Figure 3b shows another enclosed box not subject to
gravity, but the box is accelerating upwards, with an acceleration a = g. In Figure 3b, the floor
will rise and hit the LLfloating"
objects. For an observer inside the boxes, there is no way to tell
the two situations apart.
An example
Figure 4 illustrates the trajectory of a ball at three instants (i), (ii), (iii). The situation is
observed by two people:
(a) This observer thinks there is gravity, so the ball travels in a parabola.
(b) This observer thinks there is no gravity, so the ball travels in a straight line, but the floor
accelerates up.
Both observers agree that
(i) the ball is on the floor,
(ii) the ball is a distance h above the floor,
(iii) the ball is on the floor again.
In fact, by observation inside the boxes, it is not possible to tell (a) and (b) apart.
(ii)
(iii)
Figure 4
Other ways of stating equivalence principle
There are two other ways of stating the equivalence principle.
We use the symbol Zo to emphasize that it is the acceleration of the observer, not the acceleration
of the mass m.
This statement is easily understood by reference to Figure 3.
(b) Imagine a freely-falling observer (i.e., fdling under the-influence of gravity and no other
force). Then all other objects would appear not to be accelerating, and therefore there appears
to be no gravitational force.
(If the field is non-uniform, e.g., a nearby ball feels a stronger field, then of course it can be
detected.) This statement is related to "weightlessness".
There is another way of expIaining this result. According to the falling observer:
Figure 5
Even for the field of the earth from ground level to infinity
The above argument is only correct to first order in @. The reason is that in calculating the
work done, we have used the "uncorrected" rn. (Moreover, these Newtonian ideas are not valid
to high order.)
t=t
Figure 6
The second argument relies on the principle of equivalence in a more explicit way: Replace the
gravitational field g by an upward acceleration of the box. Then the emission of the photon
(t = 0) and its reception (t = t) would be as shown in Figure 6. We-have-assumedthat the box
initially had zero velocity. The photon has travelled h + f gt2 in a time t, so
d =h + p1 t 2
We claim (1/2)gt2 is negligible. I£ this is the case
To check whether (1/2)9t2 is really negligible, put this approximate answer back into the right
hand side of (9--7)
2nd term ( 1 / 2 ) ~ ( h / c ) ~ 1gh 1
N N -- = -@
lst term h 2 c2 2
Provided 1@ 1 < 1, it is indeed correct to neglect (1/2)9t2.
Now, by the time the photon is received, the observer is already travelling at a speed
Therefore, rising the Lorentz transformation, and remembering that kp = (w, k) is a 4-vector.
Since Jc = w / c
Figure 7
9
The final situation is not equivalent to an accelerating observer. The fact that the two
balls move apart is an objective fact, independent of observer, and cannot be transformed away.
It must be attributed to the gravitational field. In short
The inhomogeneous part of the gravitational field is called the tidal gravitational field.
The reason for this name is explained below.
sun
Figure 8
Let us look at a naive explanation of tides (Figure 8). The sun attracts the water on
the near side, forming a bulge - high tide H. The opposite side forms low tide L.
There are two problems with this explanation.
(a) The earth rotates once a day. Refer to Figure 8 and imagine the shaded sphere (solid earth)
rotating, but the oval (the oceans) not moving. In each rotation, every point on earth would
come across H once and L once. Thus there should be one high tide every 24 hours. In fact,
there is one high tide every 12 hours (approximately).
Problem 1
A very tall elevator is undergoing free fall downwards. Because the elevator is tall, different
parts experience different strengths of gravity: the middle of the elevator B feels g = 9.80; the
top of the elevator C, being farther from the center of the earth, feels g = 9.79; the bottom of
the elevator A feels g = 9.81 (all in units of m s - ~ ) . However: the whole elevator falls at an
acceleration given by the strength felt by the center-of-mass, i.e., at 9.80.
+
(a) What is the net force pseudoforce experienced by a mass m at the three points A, B and
C? Pay attention to the directions.
(b) For an observer falling with the elevator, what would he say about the direction and the
magnitude of the "gravitational" force?
The correct explanation for tides is similar, and is illustrated in Figure 9..
sun or moon
Figure 9
The earth is freely falling towards the sun (or moon), with an acceleration
I
This is the centripetal acceleration of circular motion, and R is determined by the distance
from the sun or moon to the center of the earth B.
Now look at the total "force" in this freely-falling frame. The total "force" on a mass
M - --
R2
GMm ( E) =-2- G M m
R3 T
We have taken the direction towards the sun as positive, so this is away from the sun. For the
point A, we simply change r + -r
GMm
ForceA = 2- r
R3
which is towards the sun.
Figure 10
(a) We see there are two bulges. So in each day, every point on earth rotates once, and meets
two high tides. Thus high tides are 12 hours apart.
(b) From (9.1 I), we see that the effect goes as GM/R3,not GM/R2. So the ratio of the solar
effect to the lunar effed is
(c) From (9.10), we see that the entire effect is due to the diflerence in gravitational force at
+
two positions, R and R r . Thus differences, or inhomogeneities, in the gravitational field are
called tidal gravitational forces.
9.4 Curvature
To summarize the discussion so far:
A uniform gravitational field does not matter; it can be transformed away by the principle
of equivdence.
What remains is the tidal gravitational force, which is best represented (e.g., Figure 7d)
by the divergence or convergence or lines which "should ben parallel.
So how do we explain the convergence or divergence of lines that "should be" parallel?
Note that we are referring to lines on a spacetime diagram, e.g., the z-t plane in Figure 7d.
There are two ways:
0 We can say there is a gravitational force (or more precisely different gravitational forces)
The whole idea of general relativity is to adopt the second approach. This is possible because
all lines that start together (i.e., same z(0), i ( 0 ) ) will always keep together (Figure lla); this
happens because m; = m,, so all particles (so long as they are at the same point) will experience
the same acceleration. Note that this would not be possible for other forces, e.g., electric forces.
In this case, the acceleration is proportional to qlm, which is different for different particles.
So particles lines that start together will in generai diverge (Figure l l b ) . If this is the case, we
cannot blame the effect on the underlying spacetime.
Thus the principle of equivalence allows the spacetime explanation, and in fact makes
the spacetime explanation natural.
t
(a) Gravity (b) Electrostatics
Figure 11
Spatial analogy
The central theme is Figure 7d - two Eries on the z-t plane diverge, even when they start off
parallel. To make the introduction of curvature more natural, let us look at a spatial analogy,
on the y-x plane (Figure 12).
Y
Ax'
C- *
-
a
C---
A
----4- 5
Ax
Figure 12
Two persons are on the x-axis, separated by Ax. They move in the perpendicular
direction, along y, through the same distance. What is their separation Ax'? "Normallyn,we
would have Ax' = Ax. this comes about because of Euclid's axiom on flat space: parallel lines
maintain their separation.
However, consider the surface of the earth, and two points on the same latitude (say
30°S), as shown in Figure 13. Choose the x-axis to the east and y-axis to the north. Let two
persons start at these points, and again move by the same distance. Their new separation Ax'
would be larger.
Figure 13
If we draw this on a plane (Figure 14), the situation would be very like Figure 7d: two
-
lines which "should be" parallel have diverged.
14
Figure 14
Nomenclature
We start with two familiar concepts: Euclidean space and Minkowski space.
Euclidean space
Euclidean space of N dimensions (to be denoted as EN)has the following properties.
0 The points are labeled by N coordinates (xl, - . ,sN).
0 The distance between neighboring points is given by Pythagoras' theorem
Figure 1
Example
Consider the 2-dimensional surface of a sphere of radius a, to be denoted as S2(a). We can
think of this as the earth; we shall also refer to ants living on the surface of a balI.
Since we know the geometry of E3, it is very easy to describe anything on S2(a). This is
called the embedding approach - the manifold A4 = S2(a) is embedded in E3.
0 Stay on the surface, as people did many centwies ago. This is called the intrinsic ap-
proach.
The intrinsic approach is more difficult: people many centuries ago could not easily tell
that the earth is round. However, the ac-tualmanifold M we are concerned with is 4-dimensional
spacetime. There are many ways of embedding it in higher dimensional Minkowski space; all
the extra dimensions are fictitious and meaningless. So we must make sure that our results
only depend on properties on M, and are independent of the extra dimensions.
As a compromise we shall do the following.
0 We shall use the embedding approach in intermediate steps, in order to get a more intuitive
picture.
0 But in the end we aim for expressions that do not involve the extra dimensions or the
10.2 Coordinates
First, we need to label the points on the manifold M. To do so, draw a coordinate patch on
M (Figure 2), and in terms of this define N coordinates, denoted collectively as
Figure 2
The following properties will become clear through the examples below.
0 The coordinates need not have the unit of length; in fact the different components can
have different units.
0 Do not think of x as a vector; this will be explained in detail in the next Chapter.
0 The coordinates are described by upper indices. There will be no such thing as X I etc.;
coordinate indices cannot be lowered.
0 The same manifold M can be described by different coordinates, and an important issue
is how to ensure that the physics is independent of coordinates.
Examples
Example A1
Let M be 2-dimensional Euclidean space E2, and use rectangular coordinates (xl, x2) = (I, y)
(Figure 3a).
Example Bl
Take the same manifold as in Example A l , but use polar coordinates (xl, x2) = ( r ,4) (Figure 3b).
In this case, x1 and x2 have different units.
The two examples describe the same thing, and the relationship between the coordinates
is
This is an example of a general coordinate transformation, which is different from the linear
transformations considered in the early part of the course.
Ezample Cl
Take the manifold M to be S2(a),and use polar coordinates (0, (h), in which (Figure 4a)
8 = 90"- latitude
4 = longitude
Imagine ants living on a small region -of linear dimension L << a near the north pole;
mathematically we can say L/a -+ 0, or simply a + oo. In this limit, space would appear
to be nearly flat, and the ants would "normally" describe space by either Example A1 using
coordinates z,y or Example B1 using coordinates r, 4. It is therefore convenient to cast Example
C1 in a form such that the limiting case becomes apparent. For this, we go on to the next
example.
Ezample Dl
Take the same manifold as in the previous example, i.e., S2(a),and take the north pole (8 = 0)
as the origin. Use the same coordinate patch of latitudes and longitudes as before. Again, each
kmgitude is labeled by 4. For each latitude, define
The relationship between Example C1 and Example Dl can be seen from Figure 4b:
r is the radius measured through the plane containing the latitude. The relationship between
r and 8 is
r = asin8 (10.6)
We note two properties.
0 The parameter r is not the distance s from the origin (measured along M).
radius.
0 The same definition can be made for any space that has rotational symmetry about one
axis.
What is the advantage of Example D l over Example CL? In Example Dl, we use one
length r and one angle 4. This is the same as the case of flat 2-dimensional space in Example
B1, which also uses one length r and one angle 4. We shall later see more clearly the connection
between the two, in particular the property that Example Dl approaches Example B1 when
a -,w; see (10.16) below.
All distances can be built up from a knowledge of infinitesimal distances between neigh-
boring points.
In the next few Chapters we shall develop this concept mathematically and generally. But here
we first introduce the main ideas by means of an example, in fact, through comparing Example
B and Example D.
Examples
Example B2
We continue with Example 31 and try to write down the distance ds between the point
+ +
P = (r, 4) and Q = (r dr,$ d4). Refer to Figure 5. The radial distance is dr, and
the tangential distance is r dgi. These two distances are perpendicular, so using Pythagoras'
theorem, we have
(In all such formulas, dr2 means (dr)', etc., and the brackets will be dropped whenever there
is no danger of confusion.)
Thus in these coordinates, distances are not given by the simple formula in (10.1).
Figure 5
Example C2
+ +
In this case, we want to find the distance between the point P = (8,d) and Q = (8 do, 4 d4).
Refer to Figure 6. The distance along the north-south direction is ad8, where a is the radius of
the sphere. To calculate the distance in the east-west direction, we first notice that the length
of the latitude is 27rr = 27rasin8, which corresponds to the longitude changing by A4 = 27r.
So for a small longitude change d4, the distance is r d4 = a sin 8 dq5. Again, the two distances
aTe perpendicular, so
Figure 6
6
Example D2
In this case, we want to describe the same distance as in Example C2, but in terms of the
coordinates r and 4, rather than 0 and 4. Let us fist do it graphically in a general way. The
east-west distance is the same as before:
To find the north-south distance, refer to Figure 7. The parameter r can be considered to be the
radii of the circles projected onto a plane, whereas the perpendicular distance ds2 is measured
along the surface. The two are proportional for infinitesimal displacements, but not equal, so
Figure 7
In fact, this derivation is valid for any surface that has rotational symmetry about one
axis. For any such surface, the ants living on it can make the following measurements on the
surface, i.e., intrinsically, and determine whether the surface is curved.
Measure the circumference of each circle (i.e., the set of points equidistant from the origin),
and hence determine the parameter r.
a For two neighboring circles, find the differencein circumferences and determine dr.
Measure the perpendicular distance between these circles. From this, determine f .
If f is always unity, the space is flat. If f is not unity, the space is curved.
Formula for a sphere
We now derive the formula for f ( r ) in the case of a spherical surface, which will be important
later on. From (10.6), we have
dr = acosOdO
Hence the perpendicular distance ds2 in (10.11) is
dr
dsz = ad8 = -
cos 6
Let us compare the two choices of coordinates for the surface of a sphere.
Example C makes it obvious that we are taking about a sphere - all points are equivalent.
0 Example D is more convenient because we see that the distance reduces to Example
B when a -+ oo. In particular, if the ants on this sphere started their mathematical
education with Euclidean geometry, it is much easier for them to think in terms of one
radius r and one polar angle 4.
In the end, both properties are important. This is one reason why we often need to transform
between different coordinate systems.
Problem 1
Some ants live on a 2-dimensional surface which they know to be spherical. They measure the
circumferences of two nearby concentric circles to be
and the perpendicular distance between these two concentric circles is found to be 0.001 000 200
km. Find the radius of the sphere.
This example shows that
Example E2
All the examples given above lead to expressions for ds2 which do not contain cross terms.
Although such is often the case when there is a high degree of symmetry, this is not a general
property. To emphasize this point, we now give an example which contains a cross term.
U
Figure 8
0 The v axis is inclined at an angle 7 to the horizontal (7 # 7r/2), and marked with grids
at a constant separation b.
The transformation to rectangular coordinates is given by
Note that (u,v ) = (xl,x2) are the variables, while a, b, 7 are constants.
Problem 2
Find ds2 in terms of du and dv.
+
ds2 = g l l d ~ l d ~ lg 1 2 d ~ 1 d+~g22 1 d ~ 2 d+~ 1. - +g N N d ~ N d ~ N
Note that we have broken up the two terms associated with B. Thus
We shall later come across many formulas involving many indices, and it can be confusing when
you first see them. It is important to concentrate on the overall structure and forget the indices;
the indices can always be worked out quite simply or by reference to a book. So to start this
habit, we write the above in the schematic form
Problem 3
Write out g,, for each of Example A, B, C, D and E.
10.5 General method applied to the sphere
(This Section may be skipped in the first reading.)
In the last Section, we postulated that ds2 is given by a quadratic expression in dxp. We
now show that this must be the case if the manifold is em%edcledin a higher-dimensional flat
space. In this Section, we show this for the special case of the surface of a sphere; in the next
Section, we give the formalism in general. Both Sections may be skipped if you are willing to
accept the postulate for the time being.
Step 1
We start with Euclidean space in 3 dimensions, and denote the coordinates as (x, y , z). The
distance is
Step 2
Introduce polar coordinates (8,4, R ) . (We denote the radial coordinate by R, to distinguish
from r in Example D2. Also, we put the radius last, for reasons that will be apparent later.)
The rectangular coordinates are related to these by
Step 3
In view of (10.22), we have the differentials, for example
dx = (R cos 19cos 4)dB + (-8 sin 0 sin +)d4 + (sin B cos 4)dR
= AxedO + Ax4d4 + AxRdR (10.23)
ax
- 3 Axe = RcosBcosg5
a0
ax
- Az4 = -RsinBsin$
a4
dx
- EE AxR = sin B cos 4 (10.24)
aR
Problem 4
Write down similar expressions for dy and dz, and give explicit expressions for AYs, etc.
step 4
Put these expressions into (10.21). The result will be a quadratic expression in the 3 differentials
do, d#, dR. Thus there could in principle be terms proportional to de2,dBd#, etc. In this case,
it turns out that all the cross terms cancel (this is not a general property), and
+
ds2 = ~ ' d f ? ~R2sin20d42 + dR2 (10.25)
This still describes flat Euclidean space E3, but using polar coordinates. One poht should be
noticed immediately: although the line element is not of Pythagoras form, the space is still flat.
Problem 5
Derive the above expression for ds2.
Step 5
To reduce to the surface of a sphere of radius a, all we need to do is to set
R = a = constant
dR = 0
This is then the distance expression (10.9) on the sphere, using the coordinates of Example C2.
There are only two coordinates left.
Step 6
The expression for the distance can be written in a way that is easier to generalize.
Note that we have dropped all terms that involve dR from the very beginning, because on a
sphere R is constant. Because the final result must be a quadratic in de and d4, we have in the
final form defined the coefficients as gee and go4 etc.
In the next Section, we shall deal with this problem in general. But as a warm-up, we
consider another example as an exercise.
Problem 6
A spheroid is defined by
The spheroid is given by a = 1. Go through the above steps and find the distance expression
on the spheroid. In particular, set b = a and try to recover the result of Example D2. Also note
that before we set a = 1, the formula for the distance contains cross terms involving da dr.
Step 1
We start with Euclidean space in M dimensions, and denote the coordinates as (zl, . . ,zM).
The distance is
Step 3
In view of (10.32), we have the differentials
which defines the coefficients g,, . This still describes flat Euclidean space EM,but using the
new coordinates.
Step 5
To reduce to the surface of the manifold M, alll we need to do is to set the last M - N
coordinates t o constants.
xp = cp = constant
dxp = 0 , p= N+1,..-,M (10.36)
,,v=o
The only differences are that (a) the sums go only up to N rather than M, and (b) the values of
the last N - M coordinates are set to the given constant values in evaluating these expressions.
From now on, the summation convention will be used on the Greek indices, and the indices are
understood to go up to N, the dimensionality of the manifold M:
Step 6
The expression for the coefficients can now be written as follows.
By comparison with (10.38), we see that
Although most of the examples we deal with in this course have diagonal g,,, the general
expression (10.40) allows off-diagonal g,, as well. In fact, an example of off-diagonal g,, can
be seen from Example E2.
Problem 7
In order to understand the abstract notation, go back to the case of the sphere S2(a). Calculate
all the transformation coefficients Ai, as in (10.24), and substitute these results into (10.40) to
obtain the explicit expressions for g,,.
We should also mention one minor generalization: in some cases (including all cases
referring to spacetime), it is necessary to embed not in Euclidean space but in Minkowski space
- otherwise there is no way to have ds2 < 0. Then some signs have to be changed in (10.24)
and in a few subsequent places, but otherwise the results (including (10.38) without any sign
changes) remain valid.
In the rest of this Chapter, we give some examples of metrics that are important in
general relativity.
The last two conditions are very stringent, and there are only three possibilities, which we
describe below.
Closed manifold
If a Zdimensional manifold is (a) closed, (b) homogeneous, and (c) isotropic, then there is only
one possibility. We take 3-dimensional Euclidean space E3, and in it embed the surface of a
2-dimensional sphere S2(a). The only remaining degree of freedom is the radius a of the sphere.
This has been described in Examples C and D.
Now the generalization is obvious. We need a 3-dimensional manifold that is (a) closed,
(b) homogeneous, and (c) isotropic. Again there is only one possibility. We take Cdimensional
Euclidean space E4, and in it embed the surface of a 3-dimensional sphere S3(a). The only
remaining degree of freedom is the radius a of the sphere.
To describe points on S3(a), we can either use (a) three angles, or (b) one length r and
two angles 8,4. The latter is closer to our Uusualnthinking - because in the limit a + oo, it
clearly reduces to flat bdimensional space. By analogy with (10.16), it is easy to see that the
line element is
I 1
We note the following features.
0 The parameter r sets the scale for all distances in the tangential directions (i.e., associated
0 It is not immediately obvious that this manifold is homogeneous and isotropic, but this
property becones apparent when we realize that this is the surface of a sphere.
0 One can guess that the manifold is finite and closed, because (10.41) suggests that there
is a maximum value of r, i-e., r = a. This argument is not a proof; the singularity could
be a singularity of the coordinate system. But this property is easily proved by showing
that this is S3(a).
F l a t manifold
An even simpler possibility is flat 3-dimensional space, i.e., the "usual" E3 that we learn about
in high school Euclidean geometry. The line element in polar coordinates is
which is the same as (10.41) if we set a = oo. This is not surprising - the surface of a large
sphere is nearly flat. In fact, we can write both of these cases as
where k = a-2 > 0 for the closed manifold, and k = 0 for the flat manifold. This clearly shows
that the latter is a limiting case of the former.
This then suggests that the case k < 0 should also be possible, which gives the third
possibility.
Open manifold
If k = -Ik(, we can put -kr2 = +lklr2 + ( r / ~ ) Then
~ . the metric is
The embedding description of this manifold is a little complicated, because we need to start
with a Minkowski space. However, it is simpler to regard this as the "continuationn of the
previous two cases, so that it will "inheritn most of the relevant properties.
This manifold is again homogeneous and isotropic.
But (10.44) shows that there is no limit to the value of T ; the manifold is infinite and
open.
If a = oo,we again recover the flat manifold. The situation is illustrated in Figure 9.
flat
open closed
I
Figure 9
Uniform treatment
It is convenient to write the three cases in one unified way. We introduce the parameter
K = +I, 0, -1 to denote the three cases, and put the line element as
1 dsY =
1 - I<(r/a)2
+ r* (do- + sin' 0 dip')
Note that in the case of the flat manibld, we have introduced an arbitrary length parameter a,
which does not matter since it appears only in a term multiplied by I< = 0. The parameter a
in all cases will be called the scale parameter.
We shall later deal with an expanding universe, i.e., a increasing with time. The positions
of the galaxies will expands with it, i.e., r will increase together with a. For this reason, it
is often better to concentrate on the ratio r / a , which should be constant. To anticipate this
development, we introduce the new coordmate r" = ria. In terms of this, we have
so that the line element becomes
which has the advantage that the scale parameter a is factored out.
Robertson-Walker metric
Example F2
To go to the description of spacetime in cosmo!ogy, we have to add two ingredients.
There is an additional coordinate, namely "time" t . Because the space is homogeneous,
the time elapsed is the same in all places. (This is a heuristic statement, but essentially
correct .)
0 The scale parameter a depends on t, i.e., there could be expansion or contraction.
where the first term describes the time elapsed. If K = 0, this reduces back to the Minkowski
metric we studied in the earlier part of this course; the only effect now is that the spatial part
may be curved. All the next Chapter will be devoted to discussing how a ( t ) varies.
Here G is Newton's gravitational constant. Although we do not give the derivation here, we
shall at least describe some of the properties.
Far away ( r + oo), this approaches flat Minkowski space.
The last two terms involving d0 and d# measure tangential distances, and as usual, identify
r as the circumferential radius.
The radial separation is not dr. It has a similar structure as (10.12), but with a specific
form of f (r). Since f ( r ) # 1, space is curved.
The f i s t term is not simply dt2. The meaning is as follows. If-we increase t by 1 unit,
the proper time elapsed is not the same in all places. In other words, clocks move faster
or slower in different places - a consequence of gravitational redshift.
The departure from flat Minkowski space is given by the ratio E r GMIr, This is discussed
below.
Consider a test particle of mass rn at a distance r from a star of mass M, and use
Newtonian mechanics. We have
Potential energy = -GMm/r
Rest energy = mc2 = m
Hence we see
magnitude of PE
E =
rest -energy
When this ratio is very small, spacetime is almost %at.
Weak fields
Example H2
If the gravitational field is weak, we should be able to describe the situation entirely in terms
of the Newtonian potential @ (such that m@is the potential energy of a test particle of mass
m). It can be shown that in this case, the appropriate line element is
where the spatial part has been written in rectangular coordinates, which is often convenient.
This form is valid to first order in @, and is applicable in the solar system, or close to earth.
Problem 8
Consider the earth as a test particle in the field of the sun, and estimate the order of magnitude
of cf, (expressed without dimensions).
Problem 9
+
Show that the Schwarzschild metric reduces to (10.50) to first order if we identify x2 y2 $ z2 E
R2, and R = r - G M .
ChlO-2.tex; January 5, 1998
11 Poor Man's Cosmology
(Earlier versions of this Chapter dealt with the classical model of cosmology without a cosmologicd
constant or the idea of idation, because these concepts were too tentative. However, recent advances
have made these ideas much more reliable. Accordingly, this Chapter has been totally rewritten in
2001 to reflect the new understanding, in order to bring students taking even this introductory course
dose to the frontier of research.)
11.1 Introduction
The problem of cosmology
Cosmology studies the universe in th ge, i.e., eraged over large distances, and is especially
concerned with two questions.
The spatial structure of the universe is characterized by two parameters: (a) a discrete
parameter (K = 1,0, -1) indicating the topology (spatially closed, flat, open); (b) a
continuous parameter a indicating the "size of the universe".
0 The temporal development is described by the t dependence of a(t); this includes the
history (e.g., the origin and age of the universe) and the future.
Distance scales
We first give a rough idea of the distance scales. To each characteristic distance L we also
associate a typical time TL = Llc. All numbers given below are orders-of-magnitude only.
I I L i n m I Tr, - in s 1
at om lo-1o 3 x 10-l9
human being- 2 6 x lo-'
radius of earth 6 x106 2x10-'
distance to sun 1.5 x lo1' 5 x lo2
I
L
I I
0
is about loz6 m 1010 light years.
-
The corresponding characteristic time is 10'' years (10 Gyr) or 3 x 1017 s.
Thus the situation in Figure l a is allowed, but the situation in Figure l b is not allowed. Here
the dots schematically represent galaxies.
Figure 1
Principle of cosmology
Although we can observe the universe only from one point P, we believe in the principle of
cosmology:
.. ..
*- * -
. . .
I . ' ,
. . -- * - . . * . * -
*
.
*..
. . . . . . . ;.. . b ' . ' .
4 .
' . . = .......
' : . - I . .
Figure 2
Homogeneity
So isotropy together with the principle of cosmology implies that every point is equivalent:
Although H is called a constant, actually it may change with time; Figure 3 only shows
what we obtain now - it could have a different slope a billion years ago (see below). Thus
H = H(t) and T = T(t). We adopt the convention that now is t=O and all quantities at t=O
are denoted with a subscript 0: H(t=O) E Ho.
Figure 3
Problem 1
A certain galaxy A is known to be of the brightest type, and its absolute luminosity (energy
emitted per unit time) can be assumed to be L = 1.45 x W (a "standard candle"). Its
apparent luminosity (energy received on earth per unit time per unit area) is measured to be
1 = 1.00 x lo-'* W m-'. Find its distance s from us. [Hint: 1 = L/4xs2.]
Problem 2
(a) If a galaxy is receding from us at a velocity v << c, find an expression for the red-shift
parameter z in terms of v / c , where z is defined by
Here A, is the wavelength of an optical line when emitted and X is the red-shifted wavelength
that is received. [Hint: the angular frequency and the wave number form a $-vector.]
(b) For galaxy A, the red-shift is measured to be z = 0.1. Find v.
Problem 3
Using the above data, estimate Ho and To Hi1.
Problem 4
As a simple exercise in the conversion of units, evaluate To from Ho.
Astronomical measurements come with considerable mcertainties, but to keep this in-
troductory account simple, we shall use the above value without quoting ranges around it.
The parameter Hosets the scale for all quantities in cosmology. Thus, the characteristic
time scale is the Hubble time To = Hi1, and the characteristic length scale is the Hubble
distance Lo = cTo. Using these scales, all relevant physical quantities can be expressed in
dimensionless parameters, many of which we shall discuss below.
11.3 Kinematics
In this Section we discuss the kinematics, i.e., the description of how various quantities change
with time.
Changing H - Analogy
Although s = u T , the motion is not uniform, as illustrated by the following analogy.
Problem 5
A collection of beads i = 1:2, - .. all have the same mass m and are projected to travel hori-
zontally in the same viscous medium, with retarding force -kv, where v is the velocity and k
is the same for all the beads. However, the initial velocities of the beads are different.
(a) Show that the equation of motion is
(b) Hence show that the velocity and position of particle i at time t are given by (7= k/m)
(c) Compare diflerent particles at the same time - plot v; against s; at fixed time t. Show
that the plot is a straight line, i.e., H = v / s is independent of i, and express H in terms of 7
'W. L. Freedman et al., Astrophysical Journal, 533, 47 (2001)
2http://www.journals.uchicago.edu/ApJ/journal/issues/ApJ/v553nl/524l7/524l7.web.pdf
3http://xxx.lanl.gov/astro-phy/9801080
4W. L. Freedman, Scientific American,March (1998)
5http://www.sciam.com.specialissues/0398cosmos/O398freedman.html
and t. Does s cc v indicate uniform motion? Explain. Is T = H-' the same as the "age" of the
system?
(d) Compare the situation at di8erent times. To do so, introduce an arbitrary velocity scale V
(say the mean value of x).
Show that the distance and the velocity can be written as
is independent of i. Sketch a(t) versus t. Think of. a(t) as a dispIacement. Does this graph
indicate uniform motion?
(e) Show that H(t) = a(t)/a(t). On the sketch of a(t) versus t, draw the tangent at the time t
and relate T = H-l to the horizontal intercept. Show graphically that if there is deceleration
(acceleration), then T is more (less) than the age of the system.
With this analogy, we can now consider the problem of cosmological expansion, and
focus attention on a universal scale factor.
Spatial s t r u c t u r e
Because we believe the universe to be homogeneous, i.e., all points are equivalent, only three
types of spatial structure are allowed, namely, closed, flat or open, corresponding to K =
1,0, -1; see Chapter 10.
To make this Chapter self-contained, we give a qualitative discussion of the spatial
structure. For simplicity, we reduce to an analogy in one lower dimension - Zdimensional
space.
The flat case (IC = 0) is like a flat piece of paper. On such a flat piece of paper, if we
draw a circle of radius r l , the circumference is exactly 2 ~ 7 - l .
The closed case (IC = 1) is like the surface of a sphere. To generate a spherical surface,
we stand at one point on the piece of paper, and bend the paper down in all directions.
The circumference of a circle is less than 27rrL. Such a surface is characterized by a radius
of curvature a, which is just the radius of the sphere.
The open case ( K = -1) is like the surface of a saddle. To generate such a surface, we
stand at one point on the piece of paper, and bend the paper up along the x-axis, and
down along the y-axis. The circumference of a circIe is more than 2xrl. If we take a cross-
section along either axis, there is a radius of curvature a. (However this description is not
exactly right, since not all points of a saddle are equivalent - unless we embed the surface
not in 3-dimensional Euclidean space but in 3-dimensional space with a Minkowski-like
metric.)
Scale parameter
Consider a flat spatial structure as an example. Space (or a Zdimensional analog) is like a
sheet of rubber that is being stretched. All distances are magnified by the same ratio as time
increases. So draw a spatial coordinate grid that also expands with the universe (Figure 4).
Since all lengths expand together
Let one grid (or any fixed number of grids) be a; this changes with t,ime, so a = a(t). D' stances
r can be expressed as
T = fa(t)
and the statement that galaxies are %tuck" to the coordinate grid means that
The reduced coordinate is analogous to 5 in Problem 5, and to r" in Chapter 10. It remains to
consider the evolution of a single function a(t) - the scale factor of the universe.
Figure 4
However, notice that there is an arbitrary multipiicative constant in a(t). (a) In Problem
5, the velocity scale V is arbitrary. (b) In the present discussion, a can be one grid, or two grids,
etc. Thus we have the freedom a(t) t Ba(t), and physical quantities must be independent of
the arbitrary scale B. We shall fix the scale only when we come to (11.16).
General form
As the universe expands, the possible behavior of a(t) is shown schematically in Fi,me 5.
a If gravity is very strong, the expansion will have a maximum and the universe ultimately
contracts - eventually back to a point. (Curve 1)
In the marginal case, gravity slows the expansion to zero rate eventually. (Curve 2)
a If gravity is weak, the expansion goes on forever. (Curve 3)
a If for any reason there is net repulsion, then the expansion accelerates. (Curve 4)
Figure 5
Hubble constant
Figure 6
Because of (11.4), the distances si to galaxy i and the corresponding velocities vi are given by
(compare Problem 5)
u
0 The first of these is just the same as the observational evidence in (11.1) or Figure 3, and
allows H to be determined from data at one time.
The second relates H to the evolution of a(t). In particular, it allows H to be determined
from the time T = H-I illustrated in Figure 6,
Deceleration p a r a m e t e r
Until recently, it was believed that the expansion is decelerating (i.e., cases 1 to 3, but not 4))
because gravity is attractive. So conventionally we talk about the deceleration -a, or better
yet the dimensionless deceleration parameter q defined by
PI q = --
Note that q can change with time, and the present value is denoted as qo.
(11.8)
Problem 6
Let q = -aaman.
(a) By requiring q to be dimensionless, determine rn and n.
(E) Show that the freedom in scaling a(t) I-+ Ba(t) does not change q.
Problem 7
(a) In Figure 6, the tangent is drawn for the "present", t = 0. Draw another tangent at an
earlier time t = -At. In which case is T larger? In which case is H larger?
(b) To be more quantitative, prove that
Ho .
0 Light from distant galaxies was emitted a long time ago. Therefore the slope for the
distant part of the Hubble diagram gives H in the past.
For example, if H was larger in the past, then the earlier data would have a larger slope,
as shown in an exaggerated w2y in Figure 7.
0 So the deviation of the Hubble diagram upwards or downwards determines q.
0 Because there are other minor complications arising from the propagation of light in an
expanding universe, the Hcbble diagram is not strictly linear even if there is no decelera-
tion. Moreover, in p~acticeone does not plot s versus v but magnitude versus red-shift z
(or Inz). So the deviation does not refer to the deviation from a straight line, but from
the theoretical line corresponding to q = 0.
Figure 7
The modern methodology is more refined: Measuring the slope for every part of the
Hubble diagram in principle gives H for all times in the past. In practice, this is too difficult;
so we assume a form of a ( t ) (and hence of H ( t ) ) with a few parameters, and fit the Hubble
diagram to determine these parameters. But first we need a model of the dynamics.
11.4 Dynamics
Newtonian formalism
Consider a small part of space, within a distance r = r'a(t) of the origin, with i: << 1. Assume
all particles are L L s t ~ to n coordinate grid, so ? is independent of time. Such a small part
~ kthe
of a curved surface is nearly flat, and Newtonian concepts apply. A test particle m at the edge
of this part of space (Figure 8) satisfies
where M ( r ) is the total mass inside this small part, and p is the density. Here we have assumed
as usual that all the mass inside acts as if it is concentrated at the origin and all the mass
outside has no effect. But using r = r"a(t), where r" is independent of time, we find that ?
cancels:
Thus we get an equation for the scale parameter a(t) without reference to f . This should be
expected - r" is an arbitrary choice, and should not appear in the final result.
Figure 8
Vacuum repulsion
However, Einstein's equations allow aoother effect: Vacuum contains another density p1 which
repels, with a "gravitational" constant GI. Thus (11.1-0) becomes instead
Matter domination
As the universe expands, p decreases inversely as the volume:
In the earlier history of the universe (when radiation dominates), this would not be true;
instead p would then scale as la)^. For this reason, (11.12) is called the assumption of
matter domination (as opposed to radiation domination).
Problem 8
If the density p is due to radiation (i.e., photons), then
Cosmological constant
On the other hand, the density p' has always has the same value - vacuum does not get
"thinned" as the universe expands. Thus p' = constant and we d e h e the cosmological constant
A = 4.rrG1p' (11.14)
The individual factors G' and p' are introduced only for a heuristic discussion, and have no
meaning.
Equation of motion
Putting (11.12) and (11.l4) in (11.11) then gives
This can be thought of as the Newtonian equation of motion for a unit mass, subject to an
attractive inverse-square force oc -a-2 and a repulsive Hookian force oc +a; the latter is like a
spring with the opposite sign. The conservation of energy then leads to
The second and third terms on the left are the potential energy and 11"/2 is the total energy.
Problem 9
Show that integrating (11.15) gives (11.16). Or, equivalently and more simply, show that
differentiating (11.16) with respect to time gives (11.15).
Note that I< gives the spatial structure whereas I<' affects the temporal development; general
relativity relates the two. This used to be an important result; however, accoding to current
understanding, this is now no longer important, even though it is still correct (see below).
Choose the present Hubble time as the unit of time, i.e., define the dimensionless time
variable
The omitted proportionality constants in (11.20) are to be chosen so that these two equations
take the simple form as above, and in fact
Problem 10
Derive (11.21) , (11.22) and (11.23).
Problem 11
Show that
(a) dR(r=O)/dr = 1;
(b) (1/R)dlR/dr2 = -(H/Ho)2q in general and
(c) (1/R)d2 R(r=0)/dr2 = -go.
-
The simple form of (11.25) explains why we chose the constants in this particular way. Because
of (11.25), there are only two independent constants, and it is conventional to represent all
possibilities on the OM-flAplane. We need two pieces of input to determine their values.
The key equation (11.25) refers to the present. We can write a similar one for any other
time. To do so, go back to (11.22) and note that the LHS is (H/Ho)2. Moving this factor to
the other side, we get
We should imagine that these parameters were "set" as initial conditions at a very early
time r * by other processes, e.g., inflation (see below). Therefore eventually we should try to
explain the values flM(r*),flA(r*)and OK(r*)rather than the present ones. The latter have
simply evolved from the "originaln ones.
Interpretation of t h e constants
From now on we refer to RM, Ra, OK instead of Gpo,A, K. So it is necessary to have some
physical interpretation of these dimensionless constants.
OM gives the amount of "normal" gravitational attraction due to matter; it causes decel-
eration.
OA gives the amount of repulsion due to the cosmological constant; it causes acceleration.
flK relates to the spatial structure, as explained below.
First consider the sign of OK. Since OK cx -K, space is closed, flat or open according
to whether OK is negative, zero or positive. Next consider its magnitude (irrelevant in the case
K = 0). We have
The age of the universe is to - To,and the size of the universe is ao. Thus
age of universe
size of universe
But light from any galaxy farther than do would not reach us. Thus
If laK -
1 1, then most of the universe is visible. If InK
I << 1, then only a small part of the
universe is visible, and this small part will appear as nearly flat.
T h e big question
The big question in observational cosmology is: What are the values of these parameters? It
turns out that our belief has gone through three main stages, as explained separately in the
next three Sections.
T h e mass p a r a m e t e r
Previously, people tried to obtain iIM by "counting" the arnoxnt of matter. Vie can estimate
the mass of galaxies, and how far they are from us (using the Hubble relationship). Thus we
know the density of matter in the universe, po. Together with Ho, this would give O M . This
method gives approximately (with very large errors)
0 If we have another way of determining f l M , then this tells us the percentage of matter in
the universe that is dark.
A topic of current research interest is: what is the nature of the dark matter, and how
is it distributed in the universe?
Ratio N
po
- - kg rnW3
10-3 kg m-3
p,
_
Thus the cosmological constant cannot be detected in small systems.
Thus
Problem 12
Refer to (11.16) as the conservation of energy. The terms associated with OM and are
the potential energy V(a). If there is no acceleration, then a is at an equilibrium position:
VJ(a) = 0. Sketch V(a) and state whether this is a stable equilibrium.
Of course the cosmological model is only approximate. Any deviations from an unstable
equilibrium will grow rapidly; it is unlikely that the universe remains static in a position of
unstable equilibrium.
Big bang
If the universe is expanding, and we go back in time ("reverse the movie"), then the universe
would contract. The contraction would end up in a "point": t h e universe must have begun by
expanding from a tiny volume and a huge density - the big bang. (Although the name is now
established, it is misleading: there was never any "explosion".) The time from the big bang to
now is the age of the universe. I
The first relation (11.36) has the simple interpretation "acceleration = gravitational
force". The second equation tells us that the universe is spatially closed (aK< 0) if and only
if !IM > 1 (there is "sufficient" mass in the universe), or equivalently, qo > 112 (the universe is
observed to be decelerating at a sufficient rake). Therefore much effort went into the estimation
of qo. The best that could be said was
which was not very helpful in settling whether qo > 112. The estimate was poor because only
relatively nearby galaxies could be measured, so qo came from comparing the slopes in two
nearby portions of the Hubble diagram, and it was therefore difficult to see how much the curve
bends.
Incidentally, the critical value qo = 112 can be understood from a purely Newtonian
example (which of course does not contain the cosmological constant).
Problem 13
Throw a ball of mass m upwards from the earth. Denote the distance from the centre of the
earth as a . Define q in the same way as before. Show that the ball will escape if and only if
q < 112.
Problem 14
Show that StM can be written as po/p,, where p, is the critical mass density to make the universe
closed. Give an expression for p, in terms of G and Ho, and give the numerical value based on
Ho = 70 units. (However this result is valid only if CIA = 0; therefore the critical density is no
longer much referred to.)
11.7 The belief in the 21st century
Value of the parameters
Current belief can be summarized by
It is believed that Q K is very very small, and for all practical purposes can be taken to be
exactly zero. The uncertainties in flM and ahare at least 0.1. But i t seems definite that On is
non-zero and positive.
The qualitative conclusions are as follows.
0 The universe is nearly flat ( a K m 0).
9 The cosmological constant is more important than "normal" gravity.
We shall next explain the evidence for these beliefs, and also some of the consequences.
0 Most of this is due to our absolute motion in the universe: the motion blue-shifts (increases
T )the radiation in front and red-shifts (decreases T) the radiation from the back. We can
transform to a frame to remove this effect; incidentally this gives a good measurement of
our absolute velocity.
0 In the new frame (at rest with respect to the universe as a whole), the remaining inho-
mogeneity is A T I T < due to thermal fluctuations. But the mystery is: why are the
fluctuations so small, or in other words, why is the CMB so homogeneous?
The problem and its resolution can be explained heuristically by an analogy. Suppose
you are given a temperature chart (without scale factor) of a part of HK: it could be nearly all
of HK (say 10 km x 10 km); a district (say 1 km x 1 km); a street (say 100 m x 100 m); all
the way down to a part of a room (say 1 m x 1 m); or even a tiny part of a room (say 1 mm
x 1 mm). There must be a relation between ATIT and the scale - in particular, if the chart
shows ATIT < it must be the chart of a tiny portion of HK, probably as small as 1 mm
x 1 mm. The reason is that we expect the thermal processes that homogenize the temperature
to operate only on small scales. (To make this vgument more precise, we have to estimate the
distance that effects can propagate - the horizon - and the size of the universe at the time
when the CMB was emitted by matter.)
Since all of the visibile universe (i.e., all that we can see) has a highly uniform temper-
ature, we conclude that it must be a tiny portion of the whole universe. Referring to (11.29),
this gives
Size of the universe
This in turn means that the size of the universe is
Flatness problem
The universe is nearly flat in the sense that OK is small. More precisely, (11.25) can be
interpreted as three contributions to the KE (the LHS), of which the curvature term KtK is by
far the smallest. As a ratio,
-But we can make a similar analysis at an earlier time T, in fact the time T* when these
parameters were "set". Then the appropriate equation to look at is (11.26) and the relevant
ratio is
where R = R(r*) << 1. This ratio is many orders of magnitude smaller still, and we would have
to explain this very very small number E(T*).This is called the flatness problem.
Inflat ion
Both the homogeneity problem and the flatness problem have to do with: Why is the size of the
universe so large (compared to its age)? The currently accepted solution is that the universe
at an early stage went through a short period of very rapid (in fact exponential) expansion,
called inflation. Quantum field theory suggests that inflation probably occurred from s to
s after the beginning of the universe, and during this time the linear size of the universe
increased by at least a factor ~ 1 (making 0 ~ E ( T~) -lo-'' immediately after infiation). Thus
i d a t i o n supplies the initial condition that a starts off very very large and OK is very very small.
In fact, OK is so small that we shall take it to be exactly zero, and the size of the universe to
be effectively infinite.
The physics of inflation cannot be discussed beyond this superficial level without quazt-
turn field theory.
Corroboration
It may come as a surprise that the universe is (very nearly) flat. In fact, it may be a bit of a
disappointment - perhaps we do not need to learn so much Riemannian geometry if space is
flat! (But spacetime is not flat.) In any event, is there other evidence that OK w O?
Here we come back to CMB and look at the tiny temperature fluctuations ATIT, in
particular the angular distribution -roughly speaking telling us how tightly correlated in space
are the fluctuations.
How did the fluctuations come about? For this, we have to trace back to the time when
the universe was much smaller (R N and denser, so that matter and radiation were in
contact. The density fluctuations at that time can be calculated using our knowledge of the
properties of the matter at that time (a plasma); these lead to predicted fluctuations in CMB.
How these then propagate to us now can be traced by solving for the evolution of photons in
the universe with parameters (aM, fin); different parameter values lead to different patterns of
fluctuations seen today. Turning this around, the measured pattern of fluctuations can be used
to fit the parameter values. It gives OK x 0 with a high ~onfidence.~ 171819110
Notice that CMB has contributed in three stages to cosmology, in an increasingly so-
phisticated fashion.
The very existence of CMB suggests a big bang, i.e., a time in the past when the universe
was small and hot.
The extraordinary isotropy of CMB shows that the visible universe is a small part of the
whole universe, leading to the theory of idation.
The details of the tiny fluctuations of CMB provides information on the way the universe
is structured and the way it has expanded since the CMB was created.
Supernova observations
With OK = 0, we still need one more piece of data to fix the parameters - the deceleration
parameter q or something equivalent.
Recently, the red-shifts and distances of another group of "standard candles" - a class
6Physics Today, July 2000
7P.de Bernardis et al., Nature 404, 955 (2000)
8http://~.lanl.gov/astro-ph/0005004
ghttp://xxx.lanl.gov/astro-ph/0005123
10http://xxx.lanl.gov/astro-ph/OO05124
of supernovas (SNs) - have been carefully measured.'' 112*13SNS are extremely bright objects,
so even those at very large distances s can be observed (if one can catch them during their brief
existence - and this is one of the technical breakthroughs). Light from very distant objects
came from a long time ago, so these observations probe the ancient history of the universe. As
the authors of these papers say, the results "provide a record of changes in the expansion rate
over the past several billion years".
It is conventional to express distance and age by the red-shift z, because this quantity
is directly measured. Most of these SNs are in the range z = 0.3 - 0.7, with one case as large
-
z = 0.83. Recall that wavelengths scale as l+z, together with the size of the universe. So the
light observed originated when the universe had a size relative to the present of-l/(l+z) 0.5
for the case of the largest z. Assuming nearly uniform expansion (which is correct as an order-
of-magnitude estimate), it also follows that the age of the universe was roughly half of the
present value. Thus the paper is entitled "'Discovery of a Supernova Explosion at Half the Age
of the Universe and its Cosmological Implications".
With data extending much further on the Hubble diagram, this allows the curvature of
the diagram to be determined much more accurately. Remember that the curvature is related
to the deceleration: the more deceleration, the more the curve bends upwards.
The great surprise is that the SN data show that
- -
We can give a little bit more detail, in a heuristic way. The SN data are centered at
-
red-shifts z s ~ 0.5. Since we compare the slopes at z z s and
the acceleration at the midpoint z,
~ z = 0, in effect we determine
0.25. Recall that wavelengths scale by l+z, together with
the size of the universe. So the acceleration -q thus determined does not refer to the present,
but to a time when the size of the universe (relative to now) is R = R, (111.25). In other
words we determine the LHS of (11.24) at this value of R, yielding the combination
The coefficient of f l M turns out to be -1 by accident, on account of the value z, of the data.
The SN results give
whereas f l K = 0 gives
- -
and before the universe got to this size, there was deceleration.
In fact, this argument is misleading. We defined these parameters for convenience for now, i.e.,
to make (11.25) look simple. There is nothing special about now, so the proper question to ask
is the relative magnitude of the analogous terms in (11.26)) at the very early time T* when the
initial conditions were set, when R was very very small. So we have to understand
This is very very small and is very much of a mystery at the moment.
Problem 15
( a ) Take (11.21) with the "initial" conditions R(r=O) = 1, dR(r=O)/d~= i and integrate the
equation both forward and backward numerically. Plot R versus T .
(b) The size R becomes zero at some negative time -TO. Give an accurate value for TO. Give
the age of the universe to = H&-, in Gyr.
Do this problem for three sets of parameters (OM,CIA): (i) (0.0,1.0), (ii) (0.3,0.7) and (iii)
(1.0,O.O).
Problem 16
Another way of determining the age of the universe is to use (11.22). Show that it can be
written in the form
Cixry out the integral numerically and hence find TO as well as to ili Gyr. (The integral is not
singular at the lower limit, but a bit of care is needed for accurate answers.) Again do it for
the three sets of parameter values.
Summary
We summarize how our understanding of cosmology has evolved since Einstein formulated
the general theory of relativity. For simplicity we do not show uncertainties in the currently
accepted values.
1960s I Present
-.
I velocity
&I acceleration -q
10-20 Gyr
unimportant
14 Gyr
t"__size 10-20 Glyr
Quantum gravity e r a
At a time of s and before (see Problem 17 below), quantum gravity must have been
important.
What is quantum gravity? In classical mechanics, the position x has a definite value;
in quantum mechanics, we talk about the probability amplitude $(x) instead. So similarly,
in classical general relativity, the size of the universe a (more generally the metric g,,) has
a definite value; in quantum gravity, we talk about the probability amplitude +(a) (or more
generally $(g,,)) instead. No proper theory of quantum gravity exists at the moment - there
are too many degrees of freedom and summing over them (as you would have to do in quantum
perturbation theory) gives nonsense. Some attempts have been made t o get an approximation
by throwing away most of the degrees of freedom, and keeping only one or two (e.g., only a). The
most famous of these speculations is due to Hawking and others16 and made popular in his book
A Brief History of Time. However, this book is not recommended, as it gives a totally wrong
impression about how scientists actually approach these problems. Many alternate proposals
have been given, including one by Suen and Young.17 However, none of these are likely to be
correct, because there are too many unknowns before we get to such early times and small sizes.
Works such as these are just attempts to push our current theories as far as posssible, as a way
of exploration.
Problem 17
The typical length scale lp where gravity becomes essentially quantum is called the Planck
length, and the corresponding time is the Planck t i n e tp = epic. We expect lpto depend only
7 . expressions for lpand tp and evaluate their values in MKS
on G, fi and c: lp= ~ " f i ~ c Find
units.
Inflation e r a
At s to S, inflation occured. For this era, we need quantum field theory but not
quantum gravity, i.e., spacetime can be regarded as classical, but all other particle degrees of
freedom have to be treated like quantum fields. This is an exciting testing ground where high
energy physics and cosmology intersect, but much remains open.
Post-inflation e r a
After inflation, the development is basically known, even though details are still being fine-
tuned. This era began at very high temperatures (-lo1' K say) and densities. Everything was
in thermal equilibrium, for which all qauntities can be calculated in terms of the temperature.
As the universe expands and cools, protons, neutrons and eventually nuclei are formed. This
16See, e.g., J. B. Hartle and S. W.Hawking, Phys. Rev. D 28, 2960 (1983)
17W. M. Suen and K. Young, Phys. Rev. D 39, 2201 (1989)
takes places up to a time of 200 s. A very good account is given by Weinberg.18 This account
of the era after inflation remains basically correct even though it was first written in 1977.
Thereafter, the evolution is well described by our equations (11.24) and (11.25), except
that up to -10'' S, radiation dominated. At about -1015 S, galaxies formed. The present time
is about ,10l8 s or -10 Gyr. See for example Figure 28.1 in Misner, Thorne and Wheeler.lg
Further reading
For further reading, especially of recent developments, a good text is Bergstr6m and G ~ o b a r . ~ '
For historical interest in original works, there is a nice collection of reprints.21
Developments in the last few years are best found on the web, using a search engine to
look for key words such as "cosmological constant". The resources include the following types:
Electronic versions of original papers and preprints, for example at xxx.lnl.gov/astro-ph.
Most of these are probably too diEcult for students in this course.
0 Electronic versions of semi-popular journals such as Nature, Science, Scientific American,
Physics Today.
0 There are also web pages of astronomy and cosmoIogy courses in various universities
around the world.
Some useful sites are given below. Nature maintains a "Science Updatesnz2,and there is
a similar one at Physics Webz3 and at Lawrence Berkeley Lab2*. More general sources include
Physics Toda y25 and Science News26.
Survey and news can be found at27. There are pedagogical introductions to the cosmo-
logical constant28~29~30~31,the SN work3', and the flatness problem33.
Many of these are cross-linked as well. With knowledge developed in this Chapter,
students can begin to access much of this exciting new development,
"Usual" definition
On flat space, a displacement from A to B is represented by the straight line joining the two
points; it is a straight arrow - the most important property of a vector (Figure la). But on a
curved manifold, the displacement from A to B is not a straight arrow (Figure lb); it cannot
be called a vector.
Proper definition
4
However, each small part of M is nearly flat, and infinitesimal displacements dx are vectors.
They can be represented by straight arrows (Figure 2a).
Figure 2
Let the coordinates of the neighboring points be
Tangent plane
In Euclidean space, say E3, the points belong to E3, and we also think of the vectors as
belonging to E3. But for a curved manifold M, we should not think of vectors as belonging to
M. We should think of the displacement & as belonging to the tangent plane at x, denoted
as Tp(x) (Figure 2b).
At a given point x, T,(x)is a flat space, like Euclidean or Minkowski space, so all vector
operations-can be defined in the usual way. (We shall come to a few of these below.)
But in general T,(X)# T,(y).
So vectors in different tangent planes cannot be added or subtracted naively. This will
be the focw of the next Chapter.
Example of a vector
Consider M= S2(a) (see, e.g:, (10.20)), and
In words:
Start with the point (6,$).
Move an infinitesimal distance (dB, d4).
0 This displacement is the vector.
Thus the vector is specified not just by (do, dd), but also by (8,d).
In physics, we need not distinguish between infinitesimal displacements and sufficiently
small finite displacements. For example, we can think of the displacement from Hong Kong to
Macau as a vector on the surface of the globe.
12.2 Embedding in flat space
An expression such as (12.1) becomes easier to visualize if M is embedded in a higher-
dimensional flat space, say of M dimensions, M > N.
and reduce to the manifold M by setting the "extra" coordinates t o constants (e.g., see (10.26)
or (10.36))
xp = cp = constant
dxp = 0
x = (xl,. -.,zN)
Step 3: Vectors
In the larger space, x itself is a vector, and we can write it as 6:
The index p is understood to be summed. In general, the sum is over the dimension of the
larger space, i.e., p = 1, - ,M; however, if the displacement stays on the manifold M, &* =a
for p > N, so from now on, the sum over p is understood to be from 1 to N only.
As before, define (see (10.34))
so we have
d ! = dx* (C A',$)
2 = xP+yj
d; = dzi+dyj
= d(r cos )) :+
d(r sin )) j
= (drcos) - rsin4d)) i
+
+(dr sin 4 r cos 4 dd) j
= dr (cos ) :+
sin 4 j)
+d) (-r sin 4 +
r cos ) j)
+
The vector dx = (dr, d)) in 2-d polar coordinates means precisely this.
z' = X ~ + ~ ~ + Z ~ (12.8)
d; = dx 2+dy j + d r &
= d(a sin 0 cos )) i + d(a sin 8 sin )) j + d(a cos 8) k
= (acosBcosddB-asinBsin4d4) I
+
+(a cos 6 sin 4 dB a sin 0 cos # d # ) j
+(-a sin 9 dB) k
= d B ( a c o s B c o s # ~ + a c o s B s i n ~ ~ - a s i nk)
s
+d$ (-a sin 0 sin q5 :+
a sin 0 cos q5 j)
Example B4
We continue with Example B3. Define the basis vectors e', and Z+ respectively to be the
coefkients of d r and d$ in d5. Thus
Example C4
U'e continue with Example C3. Define the basis vectors & and Z+ respectively to be the
coefficients of dB and dq5 in d5. Thus
e, = - (12.14)
The point x is a vector in the embedding space, but is only a point P, not a vector, on the
manifold M . Thus, we also write
ep= - (1215)
Note that a11 these operations can be performed on M; so this is an intrinsic definition.
Embedding derivation
Intrinsic derivation
We can derive the same result even more directly by using the intrinsic point of view. First,
we note that the distance between neigbouring points are
ds2 = &.
= (dxPZZ)- (dz"e',)
= (,$ E',) dz"dxU
The absolute value sign is inserted to deal with time-like vectors. For example, in special
relativity, goo = 700 = -1, but we still say that the time-like basis vector has a length 1.
Problem 2
Continue with Problem 1 and check that the dot products agree with (12.1 7).
So far we have concentrated on the primary vector - the infinitesimal displacement dx. Ewe
consider a curved space or curved spatial coordinates (not spacetime), then time is an invariant;
thus the velocity and momentum
are also vectors. (If we are in a curved spacetime, then all dt should be replaced by d r ; however,
here we illustrate with more familiar examples in curved space only.)
Let us see what this means in terms of the example of polar coordinates on a plane.
Example B5
For polar coordinates on a plane
It is also interesting to calculate the components with lower indices. Because the metric is-
diagonal, the calculation is simple.
Problem 3
Show that pb is just the angular momentum J.
xfP = a ~ v x Y (12.25)
where are constants. On a general manifold M, there is no special set of cartesian coordi-
nates, so we must consider completely general coordinate transformations:
The infinitesimal displacements transform linearly, just like (l2.25), even though the
coordinate transformations need not be linear.
e On a given tangent plane T,(z), a", is a constant matrix.
On different tangent planes a", can be different.
This definition ensures that the contraction of an upper index with a lower index gives an
invariant. The proof is analogous to the case of the linear transformations in special relativity.
Higher rank tensors
Higher rank tensors are objects that transform in the following way, e.g.,
t w = uppaYutPa
,:t = tPubPpbuY (12.31)
Metric tensor
Lawering of indices
We define
and similarly for higher rank tensors. By the contraction theorem, A, so defined is a
tensor.
This is of the form (12.34), with A = 0 and B = r2. Can we write dx2 as the change in a
certain quantity F?
Raising of indices
Since raising is the opposite of lowering, it must be done by the inverse of g,,. Hence define
(12.35)
or more explicitly
Let @(x)be a scalar field, and consider its change. By Taylor's theorem
d@(x)= @,,(x)dx"
where
Figure 3
Problem 5
(a) On a flat 2-d plane, the basis vector & points east, and the basis vector & points north.
However, they are not unit vectors, but have lengths ll = 5, 12 = 3 respectively (Figure 3). A
certain vector u' is given by
+-.
ds2 = g l l ( ~ 1 )$2 g22(dz2)2 (12.40)
This means that the lengths of the basis vectors are given by I, = lGl = m.We therefore
define unit basis vectors d , by
Thus, for any vector 5,
We therefore identify the coefficient of the unit basis vector as the physical component of the
vector v':
PI- = (12.43)
Because these are physical components (e.g., the easterly and northerly components),
we have
Moreover, the physical components all carry proper units, e.g., for a velocity, the physical
components would have unit m s-l for every component.
Example B6
For polar coordinates in 2 d
Problem 6
For polar coordinates in 2 d, express v i in terms of r and w. Also express #, p i and in
terms of r and w. Which of these is conserved?
(a>
Figure 1
Example B7
Now go to polar coordinates:
where the basis vectors e', and Z4 introduced in Chapter 12 are not constant - see t-he two sets
in Figure lb. Therefore the derivative is
The second line is new; this is the main feature in this Chapter.
By continuing with this example, we shall see that this extra contribution is actually
familiar. In this example, the basis vectors e", and Z+ are given by
We shall pay attention to the radial component, i.e., the coefficient of e', .
This example shows that for curved coordinates (in flat space or in curved space), when
we differentiate a vector
we do not simply differentiate the components; but
0 there are additional terms due to the change of the basis vectors.
These additional terms account for the difference between the following operations, which can
be seen from (13.11).
First differentiate a vector and then take the /I component - like (dZ/dt)r on the LHS.
First take the /I component and then differentiate - like dvr / d t on the RHS.
The acceleration is given by the first of these two operations acting on the velocity. The extra
term in (13.11) is - r ( ~ d )=~ -rw2: it is just the centripetal acceleration.
Problem I
Take a point P at rectangular coordinates (1,O) and another point Q at (cos a,sin a), where
0 < a << 1. At these two points, construct unit vectors pointing in the x direction.
(a) If we first subtract these two vectors and then take the r component, what is the result?
@) If we first take the r component in each vector and then subtract, what is the result?
(Note: calculate to first order in a only.)
Problem 2
What can you say about the "extra" term in the coefficient of Z+ in (13.10)?
13.2 General embedding definition
We see that all the complication comes from the changes in the basis vectors.
Algebraic d e h i t i o n
Go back to (i3.7). It is clear that all the extra terms come from contributions such as
The coefficientsin (13.12) tell us how the basis vectors are changing. These coefficients are
associated with three directions. Take the second coefficient l / r in (13.12) a s an example.
0 Take the basis vector in the r direction (in general the p direction).
In general, the RHS will be a sum over p (the two terms in (13.12)).
Thus we can define these coefficients by the general formula
I I
I J
The coefficients,;?I are called Christogel symbols. A more modern mathematical term is the
connection 1-form; but we shall not use the latter terminology since we shall not go into the
theory of p-forms in this course. The Christoffel symbols tell you how the basis vectors are
changing; once they are known, you can differentiate vectors and tensors.
Warning
Although I?;, carries 1 upper index and 2 lower indices, it is not a () tensor. We shall not
discuss its transformation properties.
+
Figure 2a shows two basis vectors e', at differentradii r and r dr; both are unit vectors. From
this diagram, we see that de',/dr = 0. Figure 2b shows two basis vectors e', at slightly different*
+
angles q5 and q5 dq5. The difference is in the i?$ direction; thus ae',/dq5 cx +Z+
Problem 5 [Example Cq
We now try to do the same thing for polar coordinates in 3 d, with (xl,x2,x3) = (8,4, R).
(a)The displacement vector is
Remember ihat
-e The dummy index X is summed over.
0 Commas denote differentiation.
Example BlO
The metric is
NOWconsider
In the above, since go* is diagonal, the only allowed value of X is X = r; and g4,,+ = g,4,4 = 0,
9 4 4 , ~= 2'.
Problem 7
Continue with the above Example and calculate all the Christoffel symbols. Compare with the
results of Problem 3.
Problem 8 [Example Cq
On the surface of a sphere of radius a
Follow the above steps and calculate the Christoffel symbols intrinsically. Compare with the
results of Problem 5.
where A = A(r) = (1 - 2GM/r) and B = B(r) = A(r)-l. In the following, do not use the
explicit form of A and B, but just express the answers in terms of A, B, A', B' etc.
(a) Write down all the nonzero elements of g,,.
(b) Write down all the nonzero elements of g".
(c) Find all the nonzero elements of .,g
(d) Calculate the following elements of the Christoffel symbols.
13.5 Covariant differentiat ion
Differentiating a scalar
Differentiation just means comparing a function at two neighbouring points x and x + dx.
Consider first a scalar field a; the difference in its values between the two points is
+
@(x dx) - @(x)r d@(x)= @,Jx)dzp (13.18)
By considering the case where dxp has only one nonzero component, e.g., dx = (dxl, 0, . - ,0),
i t is easy to see that these coefficients are just partial derivatives
Differentiating a vector
+
.
Now ccnsider a vector field A(x), and compare its value at two neighbouring points. There are
two ways of doing this comparison.
Cornpisre components
We can first take the components (say the p component) at these two points, and subtract, i.e.,
consider the difference
where
AP(x -
Since the component A1 is just a function like @, we again have
Compare vectors
Another method is to compare the two vectors directly, i.e., take the difference vector:
(13.23)
In particular, we take its p component; this must again be linear in the displacement, so
4.) = Ape;
d l q x ) = dAPe; + A'de;
We use (13.21) for dA" and (13.14)for d G :
Transformation properties
We consider the definition
Now the LHS are the components of a vector, and dx" are also the components of a
~ ;+ rpu
c ~= cpvlU P C
~ PV + p' C,P
PC
Cpv+gG',, (13.32)
where @ denotes an exterior product. However we shall not go into this derivation.
Appendix A
In this Appendix, we derive the intrinsic definition of the Christoffel symbol from the embedding
definition. The cartesian coordinates of the embedding space are denoted by indices i, j, .,
while the coordinates of the manifold are denoted by Greek indices p , v, . . -. Thus, an intrinsic
expression is one in which the indices i ,j, . do not appear in the end.
We are here concerned with the changes in these basis vectors, so we consider
(13.39)
The last quantity is symmetric in the two lower indices because of the property of mixed partial
derivatives.
This allows (13.38) t o be expressed compactly as
This is a simple expression for the Christoffel symbol in terms of the two sets of coordinates,
without mentioning the basis vectors.
= c (;J (p)
(;) (;J (:) (;)
c (;J
=
= 6:
= (i")
In the above, we have used the fact that
The Christoffelsymbols can be taken outside the xi,and the remaining factors can be expressed
back in terms of g,, by (13.41):
Thus we get
1
rG~%l = 5(gp*,v f gb,p -g p v , ~ )
Multiplying by the inverse matrix gXu = guX
This is the relation that we seek; it expresses,:?I in terms of the metric and its derivatives, and
is therefore intrinsic.
To be explicit where a curved manifold comes in, let us consider the case of a sphere
embedded in 3 d. Here, the Greek indices (e.g., p ) would range over 1, 2, 3 or (6,4, R), while
the Latin indices (e.g., i) in the intermediate steps would range over 1, 2, 3 or (x, y, I). Thus
the Christoffel symbols obtained, e.g., (l3.51), refer to curvilinear coordinates in 3-d flat space.
However, by just setting R = a = const, and allowing the Greek indices to range over only 1, 2
or (6, d), the whole discussion is restricted to the surface of the sphere, and the same expression
now refers to a curved manifold.
Mathematically
Figure 1
Figure 1 illustrates two neighbouring points A, B on the path of a particle in spacetime,
with
- + - -
d p - pB -FA = 0 (14.3)
In other words we subtract the two vectors, then take the components, so
(ap
= d ( p p ) + ~ ~ p p " d=
x P0 (14.4)
Divide by the proper time d r for this interval
+ mrEP-- -
Force
An alternate expression is
m G . We have m r = r n automatically.
~
0 In electromagnetism, the part of the force that depends on the spatial components of the
1 r
The fact that mI = m ~or, equivalently that m cancels in (l4.9),is the principle of equivalence.
This property comes out naturally.
We now give a second approach, based on the principle of least action. First review section 8.3.
There, for flat Minkowski space, the action for each small segment of path is.
A S = -m(-AS- Ax)1/2 -
= -rn(-~x~Ax~)'~~
= -m(-qpVAxpAx V )112
dS = -m(-qpydxPdx v )112
In curvilinear coordinates (and hence also for curved space) the generalization of distance
is simply qpVH gpV. So (14.8) should be modified to
The correct path is the one that makes S stationary, i.e., 6S = 0 under a first-order change
d6xP dx"
-- dxP -
- d6xv
ds ds ' ds ds
have been combined, hence the factor of 2. In the first term in { }
dxp dxY
First term = 1[ ]-1/2 {-g,vlpxz
In the second term, 6xY = qv. Since the expression is under the integral sign, we can integrate
by parts.
In the last step we have changed dummy variables v + p. Thus the two terms together yield
.
where { }qP = (14.12) + (14.13). Since qP(s) is arbitrary, we put { } in (14.14) equal to
zero.
d r 2 = -gpydxPdxY
dxp dx"
-0
dr d r
(We shall come back to this in section 14.8.) Carrying out the differentiation
But
dgPp
-- dx"
dr - QPPPZ
Hence
Since (dxp/dr)(dxY/dr) is symmetric under p ct v, the first term inside the bracket can be
written as
Hence
Multiply by guP
Problem 1
(a) Estimate the numerical value of B and v2 for the motion of the earth around the sun.
(b) Using Newtonian mechanics, derive a relation between the time-averaged value of @ and of
v2, for a bound orbit. [Hint: consider d(p r)/dt. For bound orbits, p . r stays within bounded
limits, say f B. So for long times T, I (d(p . r)/dt) 1 5 2B/T -+ 0. Express d(p r)/dt in terms
of and v2. ]
The equation of motion is
(The 0 component is not needed.) In the above, ' = d / d ~ .The velocity is 0(e112);d / d t and
d / d r differ by O(7 - 1) = O(v2) = O(E). SOin the fist term, if we replace d / d r -t d l d t , the
error is only 0(212), which can be neglected. Also r = O(E),SO we only need to take Q(eo) in
x"xp,This means we only take v = p = 0 and
Hence
Figure 2
We now return to the Robertson-Walker metric. If IC = +I, space is like the surface of a
spherical balloon with radius a ( t ) . The spatial coordinate system ( F , 8 , (6) can be thought of as
being painted on this balloon, and expands with the balloon (Figure 2).
-4lthough the coordinate system expands with the balloon, the motion of particles may
be different. We claim that particles (i.e., galaxies) are "stuck to" the coordinate grid as the
latter expands. Mathematically we have to show
since d?/dt = dO/dt = d$/dt = 0 by assumption. Hence the solution is (up to an irrelevant
additive constant)
In the first two terms, because g,, is diagonal, X must be t , and gttjt = 0. In the last term,
because gtt = -1, gtt,* = 0. In fact this shows that we rely only on the following properties of
the metric:
fix = 0 for X # t.
gtt = constant.
Thus this calculation shows that galaxies can be "stuck to" the expanding coordinate system.
N d e that we say "can ben, and not LLmust be". The reason is that, depending on initial
conditions, there are also solutions that move with respect to the coordinate system. This is
clearly allowed physicdly - we can launch a satellite at high speed to travel from galaxy 1 at
(?I, 01, $1) to galaxy 2 at (72, 02,#2). In that case ?, 0,4 are not constants.
Figure 3
However, gravity distorts spacetime around M, and the ray will be deflected by an angle
a. Referring to Figure 4, we see that a is simply related to the asymptotic momentum-
Figure 4
In short, we have to calculate the change in pY. The calculation is made simple by two obser-
vations :
We calculate only to first order in G.
Because of the nearly straight path (Figure 3), rectangular coordinates would be conve-
nient.
Equation of motion
It is most convenient to write the equation of motion as
where p is the momentum of the photon. Consider /I = y, and note that r = O(G), so the
other factors can be calculated to O(Go),i.e., for the straight line path. Thus we get
where to be slightly more general we consider a particle moving at velocity PC. Thus from
(14.27),
Hence
hence
Evaluate a
a=--- GM , Q'Y = GMY
7
T
We have inserted a factor 1/c2to get the units right. This formula is valid whenever the particle
goes along a nearly straight trajectory at nearly constant speed.
Note that the order-of-magnitude of the effect is (say for a test mass m)
Deflection of light
For light, /3 = 1, and
This value is appropriate for R = radius of sun, i.e., for a ray passing close to the rim of the
sun. Of course this can only be observed during an eclipse, and was so confirmed in 1919. It
was one -ofthe triumphs of general relativity.
Non-relativistic limit
Suppose the particle is non-relativistic, IPI << 1. Then (14.33) becomes
Let us try to recover this result by Newtonian physics. In Newtonian physics, and referring to
Figure 3,
which is just the first term in (14.31), and will lead to (14.34).
However, if we
0 do the Newtonian calculation, and
naively put /3 = 1,
then the answer would be off by a factor of 2. The reason is that we would have missed the
I?,: contribution, i.e., the second term in (14.33). This is the 'Lmagnetic" contribution. So
the experimental observation verifies that there is a velocity-dependent ''magnetic" force which
corrects Newton's law of gravity. Of course, light is one of the hest ways to detect such a force,
which would be negligible for P << 1.
pz = pt , dz = dt
Choose p = t in (14.36).
Hence
E= mpt
= ( 1 + @)pt
Thus
I I
Problem 2
(a) Show that the equations of motion for a particle in a gravitational field are
(Hint: the second equation is the conservation of angular momentum, and the second term in
the first equation is the centrifugal force.)
(b) Use the angular momentum equation to show
and replace the independent variable t by #, using the above relationship between d l d t and
d/d#. As an example
Hence show that the equation for the orbit (i.e., a relationship between r and $, without t) is
(d) Hence show that (by a suitable choice of the coordinates), a solution is
where. la/ < 1. In terms of a, by how much would the perihelion (the point of smallest r)
advance in each revolution?
Metric
The metric is that due to the sun, regarded as a point particle.
Equation of motion
The equations are
d2xC" dxYdxp
0=-
dr2 + r&~z
There are 4 such equations, corresponding to different choices of p.
Evaluation of I'
The nonzero elements are
1 t B'
r =-
r
, r:@= cot 0 , rtr=-
2B
Problem 3
Derive the results shown above for I?:,. I?;,.
The 4 equations
Putting these into (14.45), we get the following 4 equations.
d28 2dOdr
0 = - + ---
dr2 r d r d r
-sinOcos9
(14.47)
0 =
0 =
Problem 4
Show the derivation of these 4 equations.
Constants of motion
The last two equations can be integrated once. From (14.51),
r2-d4 = J = constant
dr
Obviously J is the angular momentum/mass. From (14.52),
dt
B ( r ) - = I< = constant
d7
TOeliminate (dtldr)', we use the definition of d r , from (l4.43), but specialized to 8 = 1712.
Put this into the last term in (14.55).
Ratio - - (. )
J2
-
r2
2 d4
r 2 - 2
v = O(E)
Newtonian limit
Let us first ignore the last term in (14.63). Note that the u term in (14.63) has a coefficient
exactly 1. Let
Au($) oc cos 4
The sin 45 term can be ignored by adding a constant phase ji.e., redefine what we mean by
# = 0). Introducing a constant e (which will turn out to be the eccentricity),
GM
) -(1
~ ( 4=
J2
+ e cos #) r uo(1 + e cos 4 )
(14.64)
This means r is minimum whenever cos 4 = 1, i.e., # = 2nn The path is closed (Kepler's first
law), because r(6) = r($+27r).
Correction
Again we assume that
Planetary orbits in the solar system have small eccentricities, i.e., the changes in u are small
compared to the mean value of u: 1Aul << uo. Then
u2 FJ u; + 2u0Au(g5)
and (14.63) becmes
where
3(GM)2
a=
J2
Thus, following the same procedure as before,
I 1
$ FJ (l+a)2n
Thus the perihelion shifls forward in each revolution by
Apply to Mercury
Eliminate GM by
Hence
64 = 6n- (14.71)
This makes it very clear that it is a relativistic correction. Further, we express it in terms of
radius and period.
r = 5.8 x 10'' m
T = 88 days = 7.6 x lo6 s
64 = 4.8 x radians/revoIution
100 x 365
64 = (4.8 x lo-'' x 180
7r
x 36001') x
88
cent -'
N 41" cent-'
A more accurate calculation gives 43" cent-l. (In the above, we have assumed a nearly circular
orbit, i.e., only to lowest order in the eccentricity e.)
Problem 5
The result (14.72) is written in a form that depends on both the orbit radius r and the period
T. However, the two are related by Kepler's third law. Hence write the advance per revolution
64 as
Find the exponent n, and give a numerical value for the prefactor b, if r is expressed in AU.
Write a similar expression for the shift per century.
In the first term we have used the geodesic equation (14.5) for $, and in the second term we
have used the chain rule. Now multiply throughout by m, and note that gap contracts with gPr
in r to give 67,
The first, second and fourth terms cancel, because they multiply the symmetric pPpu. Hence
we are left with
Heuristic argument
A BH is a region of space where gravity is so strong that nothing can escape, not even light -
hence it is "black".
Consider a mass M at the origin. A test mass m at a radius r can escape only if
KE > IPEI, i.e.,
1
-mv 2 GMm
>-
2 r
But since v 5 c, a necessary condition is r > &, where
I I
where & is the critical radius. Anything closer than & cannot escape; a particle farther than
& could escape if its velocity is high enough.
The above is a mixture of Newtonian concepts (e.g., (1/2)mv2) and relativistic concepts
(e.g., v _< c), which is not really legitimate. It is however acceptable for an order-of-
magnitude estimate.
The form of the potential assumes a point mass M. If the mass M (say a star) has
a physical radius R larger than &, the BH condition r < & is never satisfied in the
exterior of the star. But if the physical radius is less than &, then the BH condition can
be satisfied in the exterior.
Thus we conclude: If a star of mass M collapses to a radius R < & = 2GM/c2, then it
becomes a BH.
Example
Consider a star with the mass of the Sun: M = M,, = 2 x lo3' kg. Then
Figure 1
The size of a star is determined by the balance between two forces (Figure 1): the force
of gravity pulling in, and the pressure force pushing out.
Normal star
In a normal star such as the Sun, the pressure is due to the hot gas:
NkT
p=- cx T
v
When the nuclear fuel is exhausted, T1 and Pl, which therefore cannot support the force of
gravity, and the star starts to collapse.
Degeneracy pressure
Figure 2
We can only put 2 particles (electrons, protons or neutrons) into each energy level (Pauli
exclusion principle). Thus, if there are many particles, they have to fit into highly excited
levels (large n) with large momenta p. Even at absolute zero, the particles cannot be at rest.
This motion gives rise to degeneracy pressure.
To derive the degeneracy pressure P, start with the formula in kinetic theory
which is valid even relativistically. Recall that the momentum delivered per collision with the
wall is 2p, and the freqnency of collisions is 1/(2L/v) cc v, accounting for the two factors.
To estimate the typical momentum p of the particles, we note that in ID, each state
occupies
where V is the volume in space and V, is the volume in momentum space. Thus N particles
would occupy V and V,,where
and the factor of 2 accounts for spin. Assume all states in momentum space up to pm, are
occupied, i-e., V, is a sphere of radius ,p, then
Nonrelativistic case
We use (15.5) and convert pv t+ p2/m:
Thus
since the number density goes as n oc V-l R-3 for a sphere of radius R.
Ultra-relativistic case
We again use (15.5) and convert pv I+ pc for the ultra-relativistic case.
Thus
iorce
P - J -
area
GM2/R2
N
4x R2
GM2R-4
In the nonrelativistic case, setting the degeneracy pressure CR-5 equal to (15.11) gives
However, if the star is very heavy, R becomes very small. The spatial volume V is small, so the
particles must occupy a large volume V, in momentum space; hence they must move at high
speeds. Eventually, we have to go over to the ultra-relativistic case. Now setting the degeneracy
pressure CR-5 (the value of C is different) equal to (15.11) gives the limit
Note that R disappears (both sides go as R-4), and we get a constraint, representing the
maximum M. In other words, the ultra-relativistic case (v H C) provides an upper limit, called
the Chandresekhar limit M,.
Thus to summarize
Even at absolute zero, there is a pressure that can support gravity.
But this does not work if i l l > M,. The star has to further collapse.
White dwarfs
So when a star exhausts its nuclear fuel, it becomes supported by the degeneracy pressure of
the electrons. This stage is called a white dwarf. If the mass is too large, the white dwarf is
unstable, and the star further collapses.
Neutron stars
As the star further collapses, the electrons get "squeezed7' back into the protons, forming
neutrons - in effect the inverse of P-decay:
and a neutrino escapes. One then has a collection of neutrons only. The star is then supported
by the degeneracy pressure of neutrons, and is very dense. This stage is called a neutron star.
If the mass is so large that it exceeds the Chandresekhar limit for neutron degeneracy
pressure, then nothing further can support the force of gravity, and the star collapses to a radius
less than &. Then a BH is formed. We shall see later that once the star collapses to less than
&, the collapse cannot stop, and it has to collapse all the way to a point.
In the rest of this Chapter, we shall deal with only the third question. We assume a
star has been formed that is smaller than &, and try to understand the spacetime and the
motion of particles near it. This question can be approached within the framework of a static
spacetime, i.e., we assume that the BH is formed and is no longer changing, and likewise the
spacetime around it is not changing.
where & = 2GM, c = 1, and the angular terms are omitted because we shall consider only
radial motion, so that do = dq!~= 0.
Clearly something funny happens at &, called the Schwarzschild radius.
The example earlier shows that the Schwarzschild radius is 3 km if M = Msu.A BH is
formed only if the physical radius is smaller than the Schwarzschiid radius; otherwise (15.14)
would not be d i d at &, which would be inside the star.
What is time?
Let us now examine the coordinates near the BH. It is useful to go back to the case of a flat
Minkowski space; ignore y and z.
We recognize t as the "time" not because it is called "t" - physics cannot depend on which
+
letter of the alphabet we use. If we write (15.15) as ds2 = -dt2 dq2, we would recognize [ as
the time. We can make several related remarks.
A change in t (dt # 0) leads to ds2 < 0; this characterizes time-like displacements.
0 Particie trajectories are time-like. This can be seen because
0 The time coordinate has only one sign, i.e., time "flows" only into the future and-not into
the past. Thus, the last equation can be expressed as
0 For r > &, B > 0, so we recognize t as the time coordinate; this is the "usual" situation.
0 For r < &, B < 0, so we have to call r the time coordinate, i.e., changes in r leads to
ds2 < 0.
Thus the roles of t and r are reversed for r < &. The coordinate r is time. It can
flow only one way, i.e., particle trajectories can only go along dr < 0. (We shall later give a
heuristic argument why it is not the reverse. See Figure 5 below and the arguments immediately
following it.) The situation is illustrated in the spacetime diagram in Figure 3.
Figure 3
Light cone
Next consider the light cone, defined by
Figure 4
Particle trajectory
Figure 5
Particle trajectories have to point in a time-like direction. Therefore the situation must
be qualitatively as shown in Figure 5.
0 Trajectories are nearly "vertical" (i.e., in the t direction) near &.
Thus, once inside r = I& = 2GM, r becomes the "time" coordinate and can only
decrease. Particle cannot come back out from r < - thus, it is a black hole.
The surface r = & which divides the "inside" from the "outside" is called the event
horizon.
pt = constant r -E
-
(15.19)
We pay attention to pt and the physical energy E:
The above equation for r -t co shows that k is just the energy at infinity: 2 = E(m).
Thus
since E = iiw. So
when r + Ro. In other words, when the photon reaches &, there is an infinite blueshift.
Reversing the argument, if a climbs out of the BH starting from Ra, there is an infinite
redshift.
We can also give a heuristic argument that leads to %earlyn the same result.
iiw(oo) = energy at co
= energy at r - work done
= iiw - m A @
fiw GM
= fiw---
c2 T
( )::
= iiw I - -
Coordinate singularity
The above result suggests that there is some sort of singularity at T = &. This is apparent
from the metric as well: gtt = 0 and g,, = oo at T = &. Is this a physical problem, or is it
just a problem with the coordinate system? The former is a problem with the geometry of the
surface (e.g., a kink, corner, edge); the latter would only be a problem with the grid we draw
on the surface.
As an example, consider the surface of a sphere of radius a. This is obviously a perfectly
smooth surface, with no singularities anywhere. Now in polar coordinates
so at the poles (0 = 0, x), we have g44 = 0. But there is nothing special about the poles. It is
only a coordinate problem - the polar coordinates become degenerate at the poles. It is easy
to find another set of coordinates which is regular there, e.g., a set of polar coordinates with
the poles defined in some other place.
As a second example, consider a plane:
so that
Although we have gxx = 0 and g y y = co at various places (e.g., the origin in the X-Y plane),
the surface is actually perfectly smooth. The only problem is with the coordinates.
In just the same way, the problem at r = & is only a singularity of the coordinate
system, not of spacetime itself. If this is the case, then one should be able to transform to
a new set of coordinates in which there is no singularity - called the Kruskal coordinates.
We shall not write out these coordinates. But we emphasize that the transformation between
regular coordinates (Kruskal) and singular coordinates (Schwarzschild) must itself be singular
- because the regular transform of regular coordinates must be regular.
There are two other ways of seeing whether there is a real problem at r = &.
Calculate the physical components of the curvature tensor. We shall find that it is zero
at r = &. (In fact, so long as it is finite, there would be no problem.)
Calculate the motion of a particle into a BH, and show that nothing singular happens at
T=&.
The former is a rather tedious calculation and we shall omit it. The latter will be done in the
next Section, just to prove this point.
15.6 Mstion into a black hole
For simplicity, consider only radial motion, so that d8 = dd = 0. Because the metric is
independent of t , we have
-
pt = constant G -E
Thus
This means that there is infinite time dilation as the particle approaches the event horizon.
Each unit of proper time dr becomes an infinite amount of time dt according to a distant
observer. (We know that t is the time measured by a distant observer, because as T -+ co,the
metric becomes the standard flat Minkowski space, where t is obviously the time.)
Now from the metric
-
- -
In other words, it takes only a finite proper time dr to fall through the event horizon.
From dr1d.r 1 and dtldr 1 / we ~ get
Thus E +0 + t 4 m, i.e., it takes an infinite amount of t to fall through the event horizon.
To summarize, the process of falling through the event horizon takes k i t e time according
to a comoving observer, but to an observer at infinity, it seems to take forever - the motion
is- ''slowed down". In fact, this is the same effect as the redshift - what takes one period of a
photon still appears as one period to the distant observer, but the period is infinitely redshifted.
inside the event horizon (Figure 6). Look at a tiny part of the star near the surface. It
is just like one single particle; and by the previous argument, must keep falling to r = 0.
This argument applies to evey part of the surface of the star.
In general, one can show rigorously that even if the star is not spherically symmetric, the
above result still holds, and the whole star collapses to r = 0.
Figure 6
So ultimately all matter falls to r = 0. Thus we have infinite density and a real singu-
larity. This is the BH singularity.
Naked singularity?
On the one hand, we know there must be singularities in general relativity (the spherically
symmetric BH is one example). On the other hand we do not like singularities in physics -
we believe all physical quantities should be finite. In this example there is an escape: the
singularity exists cannot be seen from the outside. We say that the singularity is "clothed" and
not "nakedn. There is a question whether general relativity allows naked singularities. The
question is not completely settled.
Blackholes have no hair
The star inside the event horizon may be very complicated. But we cannot see any of the
complications frorn the outside. How much can we tell from the outside? It has been shown that
we can tell only three properties - the mass M , the charge Q and the angular momentum J.
This is called the property that "black holes have no hair", i.e., no complications or "extraneous"
properties.
Figure 1
Metric?
Can we use the metric g,? Can we say that a space is flat if g,, = 6,,(if it is locally Euclidean)
or g,, = q,, (if it is locally Minkowski), or more generaly if g,, is constant? No! Consider
polar coordinates in two dimensions:
Figure 2
As an example, consider a triangle on the surface of the earth (Figure 2). Let
Exterior angles
Equivalently, consider the exterior angles. Refer to Figure 3. (All such diagrams project
a possibly curved surface onto a flat piece of paper, so do not rely on "normal" geometric
intuition.) In obvious not aton,
Figure 3
Ext LP = w - LP
Ext LM = w-LM
Ext LN = w - LN
C E ~ ~L = 3a-x1nt L (16.2)
Thus, the sum of interior angles is w if and only if the sum of exterior angles is 27r.
Generalize to polygon
We generalize this idea to a polygon of N sides (Figure 4): the space is flat if and only if the
sum of exterior angles a; is 2n.
Figure 4
This idea can be given mother interpretation. Imagine that we march along the polygon.
We turn by a; at each vertex, for a total of a;. Do we turn exactly 2n? In Figure 2, the
answer is no - we turn a total of 3n/2 in going round the triangle M N P .
We can further generalize. By letting N -t m, the path can become any closed curve.
This qualitative discussion leads us to the key idea of parallel transport of vectors.
Figure 5
Refer to Figure 5. Initially, at the point M, the rod i and the side & make an angle
L(R,G). At the point N, the side turns from & to &, through an angle al. Then the rod
makes an angle L(X,&) given by
Going around- the triangle, we have
To be more precise, let us denote the final direction as 2,so (16.4) should more properly be
written zs
Example
Return to the example in Figure 2. The situation is illustrated in Figure 6.
Step 1 Step 2
North t P
Step 3
west
North t
t
M I
Figure 6
0 A
Start at M with the vector pointing north.
March along the equator to N. The rod 2 remains always at 90 deg to the path. When
we get to N, the rod makes 90 deg with the path (the equator), so it is pointing north,
towards the pole.
0 Turn to walk along N P . The rod started pointing north (towards P) and stays pointing
north.
0 At P, turn to walk dong P M . The rod was along NP, which is now pointing west. As
we walk along, the rod stays pointing west. When we get back to M, the rod points west.
Thus we have (Figure 7)
2
-,
= north
A' = west
Start (north)
Figure 7
A space (manifold) is flat if and only if the parallel transport of every vector around
every closed loop gives hack the original vector 2,i.e., if and only if
Figure 8
and the question is whether this is zero. So, instead of following around a closed loop, we
follow its component AP and compute the total change.
Reduce t o small loops
A-s with the usual proof of Stokes' theorem, the above computation can be reduced to checking
small (i.e., infinitesimal) loops, as indicated schematically in Figure 9.
Figure 9
where f, fi are infinitesimal displacements, hence vectors (Figure 10). We now take a vector
around the loop: A j B -t C + D -, A, and consider AAp. This quantity must be
proportional to itself, and to the sides f and rj:
Figure 10
16.3 Riemann curvature tensor - calculation
Definition
We are led to the definition 04 the Riemann curvature tensor, which carries 4 indices:
Tensor character
We have already stated that AA is a vector (because it is the difference between two vectors at
the same point). On the right hand side of (l6.9), R is a vector, and so are the idnitesirnal dis-
#.-
placements C and q. Therefore by the contraction theorem, Rfi,,, is a tensor, more specifically
a (i) tensor.
Crucially, this implies that under a coordinate transformation, Rp,,, transforms linearly
- in particular, if it is zero in one coordinate system, it is zero in all coordinate systems. So
whether the curvature tensor is zero is an objective property independent of the coordinate
system; it is a property of the space itself. Contrast with the property g,, = const; this is not
an objective property of the space itself.
T h e indices
We note several properties about the indices.
The last two indices p and a refer to the displacements. Therefore they relate to the
underlying ma~ifoldand run over 0,1,2,3 in the case of spacetime.
0 The first two indices p and v refer to the vector being transported. In this case, the vector
Ais a tangent vector, similar to the displacements, so the indices again run over 0,1,2,3
in the case of spacetime. In other applications, we can consider the parallel transport
of other vectors, e.g., a complex number (a quant um-mechanical wavefunction), which
would have only two directions (1 = real, 2 = imaginary). In such cases, the range of
the p , v indices could be different.
a Because p and v are the same type of index, it is often convenient to put them both
"together" as subscripts:
Evaluate i n t e r m s of metric
Remember that we need to calculate the total change of Ap around a closed loop - we deal
with just one component and not the vector. The total change is the sum of the changes in
each small step:
But we know that under parallel transport,
where in C, we have suppressed the index p. Now, similar t o the usual derivation of Stokes'
theorem (see Appendix), the line integral around the loop in Figure 10 can be changed to a
surface integral, namely
But since
Cp = I'cPAv (16.15)
we find
This expression contains the derivative of A", which we eliminate via (16.12)' now written
as
0 = (dz)" = + I'~,A*)dz"
giving
A " , = -I';,A,
When this is put into (16.17), we find
where in the last step-we have interchanged the dummy indices A t, v in the second term.
Finally, comparing (16.19) with the definition of the curvature tensor in (16.9)' we find
-
r ag (16.21)
and the curvature tensor is obtained by another differentiation as well as a nonlinear term
R -- m+rr
sag f nonlinear
The important property - to be proved later - is that the space is flat if and only if
the curvature tensor is identically zero.
Tensor character
We have already stated that from (16.9), R",,, is a tensor. Thus, if it vnaishes in one coordinate
system, it vanishes in every coordinate system.
To prove this, consider the scalar S = g,,A"BU, constructed out of two vectors A and
. We parallel transport both vectors and hence S. Clearly, a scalar does not change under
parallel transport, so
There is no Ag,, because when we complete the loop, g,, returns to the original value. There
is also no AAPAB" t a m because we are entitled to consider a small loop, and hence calculate
only first-order changes.
Now use the curvature tensor to express the changes in the vectors upon parallel trans-
port:
where we have lowered the first index in the curvature tensor using the factor g,,. Finally,
change the dummy variable by X H p in the first term and X I+ v in the second term:
-.
Since this has to be zero for arbitrary A,B, ( and ij, the square bracket is zero, thus proving
(16.23).
Symmetry in pairs
We state without proof that the curvature tensor is symmetric under the interchange of the
first pair of indices with the second pair:
Cyclic property
We also stzte without proof the cyclic property: fix the first index and add the cyclic permu-
tations of the other three, the result is zero:
Bianchi identity
Finally we state without proof an identity concerning the derivative of the curvature tensor. If
we perform a (covariant) differentiation on the curvature tensor, we get 5 indices: RPvmp;,.Fix
the first 2 indices and add the cyclic permutation of the other 3; the result is zero:
1 dimension Ipcomponents I
Ricci tensor
Thus we define the Ricci tensor
Rvp = RpvClp
= gpuRpvop
Symmetry
It is readily shown that the Ricci tensor is symmetric (interchanging the two indices v and p
above is equivalent to interchanging the 1st and 3rd index, but these are identical and summed):
C~rvaturescalar
We can further construct a scalar by contracting the two indices in the Ricci tensor:
16.6 Examples
Polar coordinates in 2D
We take p-olar coordinates in flat 2D space, with (xl, x2) = (T, q5), and the metric
We now consider the curvature tensor R,,,,. The only nontrivial element is ( p v ) = (12)
and (pcr) = (12):
We should have known this beforehand, without having to do any calculations, for the
following reason.
This space is flat.
So there is a Cartesian coordinate system.
0 In that system, the curvature tensor is RC",,,= 0 because ag = 0.
Now change to polar coordinates. R',,, transforms linearly and stays zero in the new
coordinate system.
In other words, our explicit computation in (16.37) verifies that RpuPp
= 0 is an objective
property independent of coordinate system.
2 2 2
911 = a , 922 = a sin 0, g12 = 0 (16.39)
and the nonzero elements of the Christoffel symbol axe
This is not zero! This proves that this space is not flat - which is of course correct.
Although whether R,,,, = 0 is coordinate-independent, the actual value (if it is not
zero) is coordinate-dependent. Therefore we have to be careful and not say that the "amount
of curvature" goes as a2 (as might seem to be the case from (16.41)). Such a conclusion is
counter-intuitive and clearly wrong - if a -+ oo, the space becomes nearly flat rather than
more curved. We need a coordinate-independent measure to indicate the 'Lamountof curvature";
this will be shown below.
Next we calculate the Ricci tensor in this example.
Note that R is a constant - this is expected since this space is homogeneous, and every
point is equivalent. A related property is that R,, o: g,,. A heuristic way to understand this
is as follows. Try to separate the space itself from the coordinates, i.e., the grid that we choose
to draw on the space. For the space itself (without the grid), there are no preferred or special
directions; thus directions are distinguished only by the grid. Thus, the tensor indices of Rpu
must be carried by the indices related to the coordinate system, namely g,,.
Since R is independent of coordinates, it is a useful overall measure of the "amount of
curvature". We see thai in this example, it goes as a-2, which is sensible - as a -+ oo,the
space becomes nearly flat.
We shall apply the concept of curvature to physics in the next Chapter. But for now,
the important point is this: given the metric, we can calculate the curvature tensor.
( P ), - - 7 ZN ( P )
whose dot products are
Since Q is arbitrary, we have constructed a global orthogonal Cartesian system. If the space
has such a system, it must be flat.
16.8 Eins-teintensor
Introduction
As we have said before, the field equation will take the form
Conservation
To review the concept of conservation, recall charge conservation in flat space, in obvious
notation:
The analogous statement in curvilinear coordinates (or on curved space) is obtained by simply
changing the derivatives to covariant derivatives:
J P ; ,= 0
This explains that (16.48) is the analog of such a conservation law.
+ +
Rpvap;r RPvrcr;p Rpvpr;ry- 0
Set cu = p and sum, and recall the definition of the Ricci tensor:
We define the Einstein tensor to be -(1/2) times the quantity in the brackets:
1
GY, = Rv, - -RSYr
2
or iowering the p index
Appendix
Stokes' theorem should be familiar. In this Appendix, we prove it in a form that is valid in any
dimension. We consider the integral § CpdxP over the "rectangle" in Figure 11:
Figure 11
where we have omitted component indices, i.e., x stands for the point whose coordinates are xp.
Also, we have drawn this as a rectangle just to emphasize the similarity with the situation that
-,
<
you are familiar with, but there is actually no need for C and to be perpendicular; compare
Figure 10.
Then the integral is obtained by adding the 4 contributions. For each contribution, we
evaluate (schematically)
C.AX (16.58)
where C is the-value at the mid-point of the segment and Ax is the displacement. Thus
In the above, after a factor of [ q has been explicitly taken out, we can evaluate C anywhere in
the small rectangle, to the order of accuracy required. In the last line, we have i~terchanged
dummy indices p H CT.
Likewise the 2nd and 4th terms together give
Compare electromagnetism
Einstein's general theory of relativity can be viewed from two perspectives: as a description
of spacetime curvature, or as a theory of gravity. We start from the latter point of view, and
compare the theory of gravity with electromagnetism (em). Both em and gravity consist of two
parts.
Part 1
First, we describe haw the fields (assumed to be given) act on the particles. In em, this is given
by the Lorentz force law:
or in covariant notation \
where we have indicated schematically that the force goes as (charge) x (field) x (kvelocity).
The parallel statement for gravity is
The force goes as the second power of the 4-velocity, and the field I? carries one more index.
Otherwise, the force law is quite similar to em.
In both cases, if the fields are given, the motion of point particles is in principle deter-
mined. We have already discussed some examples in Chapters 14 and 15.
Part 2
Secondly, we have to say how the particles (more generally the sources) generate the fields.
Let us trace the development in the case of em. We start with Coulomb's law:
This can be written in terms of Poisson's equation for the electrostatic potential Qi:
where p is the charge density. Going from (17.4) to (17.5) involves no new physics.
However, b d h Coulomb's law (17.4) and Poisson's equation (17.5) are only valid for
statics. When charges move, the full desciption is given by Maxwell's equations (which we shall
not display). Going from Coulomb's law to Maxwell's equations does involve new physics, e.g.:
While Coulomb's law gives the electric effect of static charges, Maxwell's equations also
give the magnetic effect of moving charges.
Maxwell's equations also take care of time delay - to be further discussed below.
In much the same way, gravity starts with Newton's inverse-square law:
which can again. be written in terms of Poisson's equation for the gravitational potential @:
where p is now the mass density. Going from (17.6) to (17.7) also involves no new physics.
As with em, Newton's theory of gravity is only valid for static masses, and we need a
generalization (similar to Maxwell's equations) which has the following properties:
While Newton's law gives the '(electric" effect of static masses, the new theory also gives
the '(magnetic" effect of moving masses.
0 The new theory will also take care of time delay - to be further discussed below.
We use the terms "electric" and "magnetic" to denote forces that are velocity-independent and
velocity-dependent respectively. -
The new theory is given by Einstein's field equations -the analog of Maxwell's equations
for em. Both Maxwell's equations and Einstein's equati~nsare partial differnetial equations
(FDEs). However, there is one major difference: Maxwell's equations are linear PDEs, but
Einstein's equations are nonlinear PDEs. We shall explain the physical reason behind this
later. Thus, to summarize:
There is one historical difference. In t h e case of em, the magnetic part (e.g., Ampere's
law) was discovered experimentally, and the overall thecretical framework developed later. In
the case of gravity, the overall theoretical framework was developed first, and the 'magnetic"
part etc. were predicted from the theory. There is a reason for this, which we shall come to
later.
No nonlinear efect
This problem is peculiar to gravity and has no counterpart in em, and we give a heuristic
argument here. Let there be a mass M. It creates a gravitational field g o: GM. This
field carries a field energy U m (1/G)g2 m G M 2 . (Recall that in em, the field enegy is
Figure 1
Let us explain the delay effect in more detail, by considering the em case. Figure 1 shows a
test charge q at a distance from a source Q; the force felt by q is
where r is the instantaneous separation. Let us move Q a little bit. Then r changes immediately,
and F changes immediately, so q feels the change immediately. But we know that signals
propagates at most at the speed of light, so q should not feel any change until a time t = r / c
later. Newton's inverse-square law of gravity suffers from exactly the same problem.
This problem is solved by introducing the concept of fields. Instead of (17.9) stating
the force, we consider the electric field E. In pictorial terms, this means we draw the field lines
(Figure 2).
Figure 2
dE
- = spatial derivatives
at
dB
- = spatial derivatives
at
We shall not write out the RHS, but simply note that a spatial differentiation tells us about the
field "at the next point" in space. Since the effect propagates only "to the next point", it takes
a finite time to propagate a finite distance, and there is delay. This situation can be understood
pictorially by imaging the field lines to be like a string, and changes to be like "bumps" that
propagate along the strings with a finite speed (Figure 3).
Figure 3
In short, fields and PDEs automatically introduce delay, and ensure consistency with
causality.
Motivate form of equations
What should be the form of Einstein's equations? Start with the Newtonian theory, i.e., (l7.7),
which we know to be basically correct. It has the form
aaQ -
Now recall that for weak fields,
Note that the 1 in (17.12) does not contribute, which is physically correct - gravity relates to
the departure of the metric from the Minkowski metric.
The Christoffel symbol and the curvature tensor R are related by
So, up to nonlinear terms (which in any case we would not be able to guess from Newtonian
theory), we expect that the LHS of the Newtonian equation is replaces by
and the LHS of Einstein's equations involves the curvature tensor R (or something constructed
out of it, but without further differentiations).
On the RHS would be the source: the mass density p in Newtonian theory, and something
related to -energy-momentum in relativity. Thus, we expect
Fl
We next discuss the source, which will also tell us what exactly is on the LHS.
source N Pp (17.17)
Count components in em
IE em, the source is the charge Q (1 global conserved quantity). Locally, it gives rise to
The charge density p (Q per unit volume) - 1 local quantity.
The charge flux or current ..?j (flow of Q per unit area per unit time, in a certain direction
j ) - 3 local quantities.
The 4 local quantities form a 4-vector
Jp = (PI J) (17.18)
Then the field also has 4 components, namely the Cvector potential Ap, and Maxwell's
equations take the schematic form
The model
If the energy and moment= are due to a collection of particles, then
where
n = number density
Pp = 4-momentum of particles
E = PO = energy of the particles
all calculated at a certain point in spacetime, with (. a) denoting an average.
For example,
(since Po = E) which is clearly the correct expression for the energy density. The derivation
for the other components is given in the Appendix.
We note that T is symmetric:
which is in fact a general property. It says, e.g., that the energy flux is equal to the momentum
density.
Problem 1
Consider the following situation. A box of size L3 contains N particles of mass m, all moving
along the 1-axis with speed v. (a) What is the energy density? (b) What is the 1-component
of the momentum density? (c) How much energy crosses the 2-3 plane per unit time? Hence
calculate the 1-component of the momentum density. (d) Hence show that TO1 = TIo.
/
Because the number density n and the energy E are both the 0-components of vectors,
they transform in the same way, and the combination n / E is invariant (Appendix). Thus TpV
transforms like PpP", i.e., like a (i) tensor.
We now consider -three sub-models:
Dust, for v = 0
Perfect fluid
Radiation, for v = c
The dust model consists of particles that are not moving, or which have negligible velocities.
T h ~ vs = O and the spatial components are Pi= 0. On the other hand, Po= E = m (in units
where c = 1). There is no flow of energy since the particles are not moving. Thus
In other words, there is only energy density, no momentum density, no energy flux, no momen-
tum flux.
Perfect fluid
This model consists of particles moving at velocities v. For simplicity we assume that the fluid
as a whole is not in motion. (Otherwise we consider the following in the local rest frame and
perform a Lorentz transformation.)
The 00 component is simply the energy density, to be denoted as p (see (17.25)).
Since the fluid is at rest, there is no momentum and no energy flux: Ti' = TOi = 0. So
it remains to calculate T ' j . Recall that, Pi = rn7vi and E = my, so
First, this vanishes if i # j. For example (m7v1v2)= 0 because the system must be invariant
under the reversal of the 1-axis, v 1 H -vl. Secondly, we claim that all the diagonal entries are
just the pressure:
Figure 4
To prove this, consider the usual derivation of the pressure in kinetic theory (Figure
4). Each collision of a particle with a wall (in the 2-3 plane) delivers a momentum 2P1. The
particle has to travel a distance of 2L before the next collision with the same wall, so the time
between collisions is 2L/v1, and the frequency of collisions is v1/(2L). This must be multiplied
by the number of particles in the box, namely N = nL3, and the total force is delivered on a
wdl of area L2. Thus the pressure is
1
p = no. of particles x frequency of collisions x momentum delivered x -
area
Since the speeds of the particles are different, the above derviation should refer to the average.
Thus, we get exactly (17.28). In short,
P O 0 0
;:;)
T = ( ;o o o p
However, p and p are not independent; they are related by an equation of state: p = p(p),
or both in terms of the temperature 0: p = p(O), p = p(0).
In fact, the dust model can be regarded as a special case of the perfect fluid model, with
p = 0.
Problem 2
Consider a parcel of fluid moving with overall velocity V along the 1-direction. In its own rest
frame, the energy-momentum tensor is given by (17.32). By using a Lorentz transformation,
find the energy-momentum tensor in the lab frame.
Radiation
In this model, the particles are photons, or other massless particles (e.g., neutrinos). It also
applies to situations where -massiveparticles are moving very rapidly, so that P e E.
As before Too= E, and Ti$= 0-if i # j. By isotropy,
1 pip
= 3-nC(+
.
since xi P'P' = P2= E2 (the last step using the fact that the mass is zero or negligible).
So for the tensor as a whole
!)
0 0
T= (! 0 0
p3.
0 p/3
(17.34)
In fact, the radiation model is a special case of the perfect fluid model with p = p / 3 .
Curved spacetime
The above calculations were implicitly performed in flat spacetime, but the results apply to
curved spacetime as well. The chain of argument is straightforward. (a) Each little piece of
spacetime can be regarded as flat. We do the calculation in this small piece of flat spacetime,
using cartesian coordinates. (b) We transform to generalized coordinates locally. The equations
keep the same forms because both sides are tensors.
Embedding space
The above is also valid if the curved manifold is embedded in a larger flat space. Let the manifold
be defined by x p = 0 for p > N, i.e., these are the Uextra" dimensions. Then the particles
do not move outside the manifold, and we have PF = 0 for p > N. Accordingly, T p " = 0 for
p > iV or v > N. We shall use this property below for a simple proof of conservation laws.
where Q is the total charge in the volume, and we have used Guass' theorem to convert the
second term to a surface integral over the boundary of V. Thus, the rate of increase of Q is
equal to minus the rate of outflow.
Note that p is a free index while v is summed. To show that (17.39) leads to the conservation
of PC" globally, consider
where in the first term we note that TP0 is the density of PP and in the second term we note
that Tpi is the rate of flow of P@per unit area per unit time, in the i direction. In exactly the
saine way, this equation can be interpreted as: the rate of increase of P P is equal to minus the
rate of outflow.
Change to curved spacetime
It is easy to convert these statements to curved spacetime, namely
if p > N or v > N. The expression TPui,originally implies summing over v = 1,- - ,M ;but
in view of (17.43), we can sum over only Y = 1,- . ,N, i.e., only along the manifold. Thus
Tfi";,= 0 on the manifold.
V2@ p- (17.44)
and since changes in are related to changes in the metric tensor g, we expect
R-T (17.46)
Here, the RHS is the energy-momentum tensor, which has the following properties:
It is a rank 2 tensor, i.e., it carries two indices.
It is symmetric.
It is conserved: TpVi,= 0.
Hence the LHS must be something made from the Riemann curvature tensor, also with these
properties. From the last Chapter, we see that the only choice is the Einstein tensor G P " :
G p = R p v - +Rgpv
Thus, we have "derived" Einstein's equation
- 2V2@= Kp
But Newton's theory gives
v2@
= -47rGp
Comparing these two then gives K = -87rG, giving finally
Nature of the equations
We now examine some properties of this set of equations.
0 The equation (17.54) allows free choices of p and v, and so contains 4 x 4 equations.
But because both sides are symmetric in p t, v, there are in fact only 10 independent
equations.
0 Recall that R,, etc. involve derivatives of g,, etc., up to second order. Thus these are
second-order diflerentz'alequations for the 10 metric components g,,. Of course these are
PDEs.
0 The equations are coupled, i.e., gll, 922 etc. do not occur separately in different equations.
The equations are nonlinear. We can see this mathematically in at least two ways. First,
thecurvature tensor contains a term quadratic in the Christoffel symbol: R aI' I N + T'.
Second, even in calculating I?, we need the inverse matrix elements gll etc., and these
depend nodinearly on 911 etc.
In short, the Einstein equations are coupled, nonlinear PDEs.
Compare electromagnetism
Maxwell's equations take the form
These are 4 equations for the unknowns A". They are again PDEs, and they are again coupled.
(However, in a suitable gauge, they can be decoupled so that each Ap appears by itself without
the other component.) But crucially, Maxwell's equations are linear.
The spatial components of (17.55) describes magnetism: Ji gives rise to A' and hence
the magnetic field. In just the same way, the spatial components To'and T ' j give rise to the
"magnetic" effects of gravity, i.e., additional gravitational effects caused not by masses, but by
the motion of masses.
Nonlinearity
We wish to understand the physical origin of the nonlinearity in a heuristic way. In em, a charge
Q produces a field E oc Q. The field itself is not charged, so the story ends. Thus E oc Q and
the theory is linear.
In gravity, a mass M produces a field (or acceleration due to gravity) g cx M. But the
field itself carries energy U cx g2 and hence mass, AM oc U cx g2 oc M2,which produces a field
Ag oc AM cx M2. The story goes on, and the total field is not simply proportional to M.
The crucial difference is: the electric field (or the photon) has no charge, but the gravi-
tational field (or graviton) has mass or energy.
This leads to an important consequence. A linear theory obeys the principle of super-
position: the sum of two solutions is a solution. So all of em can be reduced to point charges
- if we know the effect of one point charge, then for any problem we simply have to add (or
integrate). Thus, there are no really difficult problems in em - we simply add things up by
Coulomb's law (electrostatics), the Biot-Savart law (magnetostatics) or the Lenard-Wiechert
potential (moving charges). In gravity, the situation is totally different. Even if we know the
solution for one point mass, we cannot simply add these up t o obtain the solution for two point
masses. Tf we know the solution for one point mass M , we cannot simply multiply by 2 to get
the solution for one point mass 2M. Therefore very few analytic solutions are known.
In some sease, however, this limitation is now becoming irrelevant, because equations
can always be solved numerically on the computer.
Nevertheless, in the next Section, we show two exact solutions that are well known,
and that have been used in previous Chapters - now we finally justify the assumptions made
previously.
Magnetic effect
In em, -the magnetic effect is small, but still easy to detect experimentally. To understand this,
first remember that the electric field is produced by charges, but the magnetic field is produced
by currents:
where v is the typical velocity of the charges. Now, acting on a test charge q , we have
Electric force
Magnetic force
N -
- - qE qQ
qvB q ~ v 2
Thus, magnetic forces are down by a factor of v Z = P2. (In the above, for simplicity we have
assumed that the speed of the source and the speed of the test charge are the same order; it is
easy to remove t.his restriction.)
For example, consider two beams 9f electrons, moving in parallel but some distance
apart, say at a speed of v = 300 m s-', i.e., P = There is a Coulomb force of repulsion,
and also a magnetic force of attraction between two parallel currents. However, the latter is
smaller by a factor P2 x 10-12. If we really do this experiment, we need at least 12-figure
accuracy to detect the magnetic force.
But you have done this experiment (in fact with much smaller values of v) even in high
school - but in a slightly different way. The beams are not in free space, but are electrons
moving in parallel wires. In the wires are positive ions of exactly equal density. Thus, the
net charge in each wire is zero, and the electric effect cancels exactly. This leaves the much
smaller magnetic force as the largest remaining contribution, and is therefore easily detected.
The cancellation of the electric field depends on the existence of opposite charges, and the fact
that the strong Coulomb attraction makes them combine until neutrality is achieved.
However, we do not have opposite signs of mass. So the "electric" effect is never cancelled
to reveal the "magnetic" effect. The latter appears only as a tiny correction to the Newtonian
force. There is one important exception: if the test particle is a photon, P = 1, and the
LLmagnetic"terms arre of the same order. We have actually seen this in the discussion of the
deflection of light, where naive adaptation of the Newtonian theory is wrong by a factor of 2.
This explains why the magnetic effect in em was first discovered experimentally, whereas
the corresponding effect in gravity was first postulated theoretically.
Comparison
We summarize the comparison between em and gravity.
em gravity
Eauations Maxwell Einstein
I Static avvrox 1
A
I I
A.
I Coulomb I Newton
Nature PDE PDE
Order 2 2
I Basic variables I A, I ~
- U Y I
I NO. of ea. II 4 I 10 I
I I
I
A
Linear? I Yes I NO
I Superposition?
- -
I
I Yes
I
I NO I
I TWO signs of source?
I
I Yes
I
I No I
The 8 and 4 terms must take this form because of spherical symmetry, and r is the circumferen-
tial radius. In the rest of the metric, the coefficients cannot depend on t because the situation
is assumed to be static, and cannot depend on 8 and c$ because of spherical symmetry. Thus,
there can be only two functions depending on r alone, denoted as A(r) and B ( r ) .
Our object is to use the Einstein equations to determine A and B. Remember: we
expect to get second-order differential equations for A and B.
First, we calculate the Christoffel symbols. Some of the non-zero ones turn out to be,
e-g.,
Problem 4
Calculate all the components of the Ricci tensor and verify the above results.
Now, away from the point mass, there is no energy and momentum, i.e. T,, = 0 and
Einstein's equation becomes R,, = 0. We consider
1
- -- (BA' + AB')
rA
showing that
The value 1 is obtained by evaluating at r 4 m, where space must be flat. The relation (17.65)
implies
We put all these back into the 88 equation to give
This gives
leading to
where C is a constant. So
To determinec, we consider r oo,where the field must be weak, and should therefore
agree with
Comparison then gives C = -2GM. Although the derivation made use of large r , once the
value of the constant is obtained, it is of course valid for all r. So finally we have
Although we referred to a point mass, it is easy to see that the derivation goes through
for teh space outside any static, spherical distrbution of mass. Thus it is valid outside a non-
rotating star. We have already used this formula many times, e.g. in the discussion of the
black-hole phenomenon, especially at the point r = 2GM. We have also used the results for
the deflection of light and the advance of perihelion in the gravitational field of a star.
Robsertson-Walker metric
In our discussion of cosmology, we had introduced the Robertson-Walker metric with the fol-
lowing line element
In the above, all r are really 5 , i.e., a dimensionless length measured in units of a(t).
The factor a(t) is the scale factor of the universe.
0 The structure of the spatial terms is dictated by the requirements of homogeneity and
isotropy.
The parameter (I is either +I, 0 or -1, indicating whether space is closed, flat or open.
8 In the second line above, we have introduced the notation jij to simplify the writing of
the expression in the square brackets. The reduced metric ij;j describes a unit 3D sphere
in 4D space (if Ii' = 1).
Now it remains to determine a(t), i.e., to see how the universe evolves.
As usual, we begin by calculating the Christoffel symbol. The results are
Problem 5
Calculate all the components of I'.
Next we calculate the Riemann curvature tensor and then the Ricci tensor, with the
results
where T = Tp,. This form has the advantage of moving some of the complications from the
LHS (involving unknowns) to the RHS (involving knowns).
To evaluate the RHS of (17.77), we assume a non-relativistic matter-dominated universe.
Thus
and consequently
3a
LHS = -
a
RHS = -4rGp
This is exactly the same as the Newtonian equation, showng that our previous derivation
by considering a small piece of the universe was indeed valid. We next integrate (17.82) once
in time, giving
where K1 is a constant of integration. The derivation from (17.82) to (17.83) follows the usual
calculation that proves conservation of energy from the equation of motion. In (17.83), the first
term is like th KE, the second term is like the PE and the term on the RHS is like the total
energy. The sign of K' indicates whether the variable a "escapes" t o infinity, i.e., whether the
universe expands forever. So far, we have no new information beyond what is already known.
Next consider the i j equation:
K j = -8aGSij
Evaluate the two sides:
which is exactly the same as (17.83) with K' (the constant of integration in time)replaced by
K (the constant describing the structure in space). Thus we have derived the important result
The physical meaning is: the universe is closed in space ( K = 1) if and only if the
universe is closed in time (I-? = 1). In short, we have proved the key assumptions used in
earlier discussion of "poor man's cosmology". This is one major achievement of Einstein's
equations, which opened the way to the modern study of cosmology.
Of course, there are still 3 possibilities, namely K = IC' being either $1, 0 or -1. The
3 cases can only be distinguished by observations.
Appendix
Invariant combination
We show that the ratio n / E is invariant. Instead of a general proof, we simply demonstrate it
for one simple example, which is more instructive. Consider N particles of mass m, at rest in
a box of size L3. Then n = N / L3 and E = rn, so n/ E = N/(mL3).
Now transform to a reference frame that is moving at velocity v , say along the 1-direction.
The cross-section area in the 2-3 plane is unchanged, but the length in the 1-direction is
cootracted by a factor 7, hence the volume is V' = L2L' = L3/7. Thus
n' = N / V 1= 7 n
At the same time, the energy of each particle now becomes
E'=ym=7E
so the ratio n'/E1 is the same as n / E .
and take the trace. Note that the trace of g,, is 4. (We have to first raise one index and then
sum; with one upper index and one lower index, g becomes the Kronecker delta, with +1 down
the diagonal.) Then we get