You are on page 1of 11

MNRAS 000, 1–11 (2017) Preprint 27 December 2017 Compiled using MNRAS LATEX style file v3.

Measuring the mass distribution in stellar systems

Scott Tremaine1?
1 Institute for Advanced Study, Einstein Drive, Princeton, NJ 08540, USA

27 December 2017
arXiv:1712.08794v1 [astro-ph.GA] 23 Dec 2017

ABSTRACT
One of the fundamental tasks of dynamical astronomy is to infer the distribution
of mass in a stellar system from a snapshot of the positions and velocities of its
stars. The usual approach to this task (e.g., Schwarzschild’s method) involves fitting
parametrized forms of the gravitational potential and the phase-space distribution to
the data. We review the practical and conceptual difficulties with this approach and
describe a novel statistical method for determining the mass distribution that does
not require determining the phase-space distribution of the stars. We show that this
new estimator out-performs other distribution-free estimators for the harmonic and
Kepler potentials.
Key words: stars: kinematics and dynamics – galaxies: kinematics and dynamics –
methods: statistical

1 INTRODUCTION Bovy et al. (2010) use this example to motivate a wide-


ranging discussion of the application of Bayesian methods
The determination of the mass distribution in stellar systems
to dynamical inference.
has led to some of the most important discoveries in astron-
omy. Examples include Kapteyn’s (1922) measurement of The idealized problem we examine here has only limited
the local density in the Galactic disc; Zwicky’s (1933) dis- relevance to real astronomical systems, for several reasons.
covery that the dark mass in the Coma cluster of galaxies In most systems only two of the three spatial coordinates are
is several hundred times larger than the mass in stars; mea- known because the distance along the line of sight cannot be
surements of the mass distribution in low-luminosity dwarf determined, and only one of the three velocity components is
galaxies which show them also to be dominated by dark known because the proper motions are undetectably small.
matter (e.g., Aaronson 1983; McConnachie 2012); and the In addition, the survey region is often smaller than the sys-
determination of the mass of the central black hole in our tem, so only a subset of the stars is measured. Finally, the
Galaxy from the kinematics of the nuclear star cluster (e.g., measurement uncertainties in the velocities are often sub-
Chakrabarty & Saha 2001; Feldmeier-Krause et al. 2017). stantial. We shall not discuss these important practical is-
The scope and power of such measurements will grow in sues, in order to focus on the simplest version of the task of
the near future, when the Gaia spacecraft provides accurate mass determination, which is already complicated enough.
positions and velocities for ∼ 109 stars in the Galaxy. The test particles sample a distribution function (here-
In this paper we examine an idealized formulation of after df) F(x, v ), defined such that F(x, v ) dx dv is the prob-
this task. A set of N test particles orbit in a gravitational po- ability that a given particle is found in the small phase-space
volume dx dv . Thus F(x, v ) dx dv = 1. Since the system is

tential Φ(x | µ), where µ = (µ1, µ2, . . .) is a list of parameters
that determine the analytic form of the potential. We call in a steady state, F(x, v ) can only depend on the integrals of
these mass parameters since the mass distribution follows di- motion in the potential Φ(x | µ) (Jeans’s theorem). Otherwise
rectly from the potential via Poisson’s equation ∇2 Φ = 4πGρ. F is arbitrary so long as it is non-negative. Thus, the obser-
The system is assumed to be in a steady state, that is, the vations are consistent with a given set of mass parameters
particles are distributed randomly in orbital phase. We know µ if and only if the inferred density of the test particles is
the positions and velocities of the particles at some instant, constant and non-negative on phase-space surfaces on which
{x n, v n },, n = 1, . . . , N. What can we infer about the mass the integrals of motion are fixed.
parameters µ? For simplicity we assume that motion in the potential
Perhaps the simplest astronomical example of this prob- Φ(x | µ) is integrable. Then if the phase space has D de-
lem is to estimate the force law in the solar system from a grees of freedom (2D dimensions) there exist D action-angle
snapshot of the positions and velocities of the eight planets. pairs (j , θ) that form canonical momenta and coordinates
for each particle. The relation between the Cartesian coor-
dinates (x, v ) and the actions and angles is given by the
? E-mail: tremaine@ias.edu functions j = J (x, v | µ) and θ = Θ(x, v | µ). Since the actions

© 2017 The Authors


2 S. Tremaine
are
∫ integrals of motion, we can write the df as F(j ) with (iv) It is straightforward to generalize the Bayesian
F(j ) dj dθ = (2π)D F(j ) dj = 1.

Schwarzschild’s method to account for observational errors
in the positions and velocities. Magorrian’s (2014) paradox
arises when the orbit library is much larger than the number
1.1 Schwarzschild’s method of particles. Then as the observational errors shrink to zero
the mass parameters become less and less accurate, and in
The usual approach to estimating the mass parameters µ the limit where there are no errors there are no posterior
starts by assuming that the df depends on a set of param- constraints on the mass parameters. The reason is that the
eters λ = (λ1, λ2, . . . , λ L ), and then infers both µ and λ from region of phase space corresponding to the orbit j l always
the data. Since we are not interested in the properties of the has either zero or one particle in it, whatever the mass pa-
df (for the purposes of this paper), {λl } are called nuisance rameters may be.
parameters. In practice the df is assumed to have the form
Schwarzschild’s method has successfully passed many tests
on mock data, so it remains unclear whether these concep-
L
Õ tual concerns affect the many important results that have
F(j | λ) = λl ψl (j ), (1) been derived from real data using this approach.
l=1

where {ψl (j )} is a set of basis functions in action space and


1.2 Distribution-free methods
{λl } is a set of weights subject to the constraint that F(j | λ)
cannot be negative. In the simplest and most widely used By ‘distribution-free’ we mean, loosely, that the estimate of
variant, Schwarzschild’s method, the basis functions are in- the mass parameters does not rely on assumptions about the
dividual orbits, df of the system1 . The simplest example of a distribution-
free method is the virial theorem: if there is a single mass
ψl (j ) = δ(j − j l ), (2) parameter µ and the potential is Φ(x | µ), then an estimator
where {j l } is an orbit ‘library’ that covers the action space as of the mass, µ̂, is given implicitly by
well as possible. Most users of Schwarzschild’s method then N
Õ N
Õ
maximize the likelihood of the data over the parameters µ vn2 = x · ∇Φ(x | µ̂)|x =x n . (4)
and λ. This approach raises a number of concerns: n=1 n=1

(i) The dimensionality of λ is generally far larger than the For example, in the harmonic potential Φ(x|ω) = 21 ω2 x 2 the
dimensionality of µ. Thus the data are being used mainly to virial-theorem estimator takes the form
determine nuisance parameters rather than the mass distri- Í 2 ! 1/2
v
bution. ω̂ = Í n n2 . (5)
(ii) The number of nuisance parameters describing the n xn
df, L, is likely to be comparable to, or even exceed, the The variance in ω̂ for N  1 is
number of particles N. In this case maximum likelihood can
ω2 h j 2 i
give inconsistent results; in other words, when N ∼ L the σ2 = , (6)
maximum-likelihood estimate µ̂ may not converge to the 2N h ji 2
correct value µ as N → ∞ (e.g., Neyman & Scott 1948; where h·i denotes an average over phase space and j is the
Lancaster 2000). action (eq. 33). For the Kepler potential Φ(x | µ) = −µ/r,
(iii) An alternative is to use Bayesian rather than ÍN 2
maximum-likelihood methods. The Bayesian approach is to vn
µ̂ = Í Nn=1 . (7)
marginalize over all of the nuisance parameters λ using some n=1 |x n |
−1
suitable priors. Thus the posterior probability of the mass
Unfortunately virial-theorem estimators have several
parameters µ is
undesirable properties (e.g., Bahcall & Tremaine 1981; Be-
∫ N
Ö loborodov & Levin 2004):
p(µ| {x n, v n }) = C Pr (µ) dλ Pr (λ) F[J (x n, v n | µ)| λ]. (i) The estimator is generally biased, that is, h µ̂iθ , µ
n=1 where h·iθ denotes an average over the angles corresponding
(3) to the true mass. The bias is typically ∼ µ/N.
(ii) The estimator cannot be applied to systems in which
Here Pr (µ) and Pr (λ) are the prior probabilities of the pa-
there is more than one mass parameter.
rameters, and C is a normalization constant defined such
(iii) The estimator is inefficient. For example, if the parti-
that dµ p(µ| {x n, v n }) = 1. Magorrian (2006) argues that in

cles in a harmonic potential have a wide range of actions, the
Schwarzschild’s method the prior Pr (λ) should be indepen-
sums in both the numerator and denominator of equation (5)
dent of the partition of phase space (the choice of the orbit
are usually dominated by the particles with the largest or
library) and this requires that the priors are infinitely divisi-
smallest action, depending on the df. A more formal argu-
ble. The uniform and Jeffreys priors, Pr (λ) ∝ const and 1/λ
ment is that the ratio h j 2 i/h ji 2 in the expression (6) for the
respectively, do not have this property, but Magorrian de-
scribes how to find priors that do. Unfortunately, even with
infinitely divisible priors the results depend on the choice of 1 This is a weaker definition than the one often used in statistics,
prior. Moreover this approach is far more computationally in which all statistical properties of a distribution-free estimator
expensive than maximum likelihood. are independent of the underlying distribution.

MNRAS 000, 1–11 (2017)


Masses of stellar systems 3
variance always exceeds unity by the Cauchy–Schwarz in- be only slightly less than D90 (N), so µmin and µmax we be
equality, and can be much larger than unity if a wide range very close, leading to unrealistically small error bars.
of actions is present. Thus the kinematic information in most
of the particles is diluted. In the following sections we describe a distribution-free mass
(iv) The estimator can be inconsistent, that is, it may estimator for an arbitrary integrable potential Φ(x | µ). We
not converge in probability to the correct value of µ as N → call this the ‘generating-function’ estimator since it is based
∞. In a harmonic potential, this problem arises when the on the infinitesimal generating function that relates action-
number density of particles satisfies dn ∝ j −b dj as j → ∞ angle variables at nearby values of the mass parameters.
with 1 < b ≤ 3. Then h j 2 i diverges and the sums in (5) are The generating-function estimator is derived using an it-
dominated by a few particles with the largest actions, no erative approach: we assume a trial value µ t for the mass
matter how many particles are present. A similar problem parameters, derive an estimator µ̂, then replace µ t by µ̂ and
occurs in the Kepler potential when the number density of iterate to convergence. In the potentials we have examined,
particles in semimajor axis satisfies dn ∝ a−c da as a → 0 our estimator is consistent, that is, for all dfs the estimator
with 0 ≤ c < 1. µ̂ converges in probability to the true mass parameters µ
as N → ∞. The estimator is also unbiased in the following
A second distribution-free method is orbital roulette (Be- sense: if the trial mass parameter is equal to the true value µ,
loborodov & Levin 2004), which constrains µ by requiring then h µ̂iθ is also equal to µ, for any sample size N. Here h·iθ
that the distribution of angles Θ(x n, v n |µ) must be consis- denotes an average over the angle variables corresponding
tent with a uniform distribution. However this requirement to the true mass parameter.
is a method for hypothesis testing rather than parameter The estimator is derived in §2 and its applications to the
estimation, that is, it does not specify the distribution of harmonic and Kepler potentials are described in §§3 and 4.
angles that would obtain if the mass parameter were dif-
ferent from the one assumed. Applying hypothesis testing
to a parameter-estimation problem such as this one can be
misleading or inefficient. For example: 2 A DISTRIBUTION-FREE ESTIMATOR
BASED ON GENERATING FUNCTIONS
(i) One version of roulette described by Beloborodov &
Levin (2004) is the ‘mean orbital phase’ estimator. For any For simplicity the derivation in this section is for systems
trial value of the mass parameter, µt , the corresponding an- with a single degree of freedom and a single mass parameter.
gle variable θ t (chosen so θ t = 0 at periapsis) is folded and More general derivations for several degrees of freedom and
rescaled to a new variable mass parameters are given in the Appendix.
We have a set of N particles with positions and veloci-
1 θt , 0 ≤ θ t ≤ π,

gt = (8) ties {xn, vn }, n = 1, . . . , N. We choose a trial mass parameter
π 2π − θ t , π < θ t < 2π.
µt that is (hopefully) close to the true but unknown mass
The mean phase is then computed as parameter µ, with ∆µ ≡ µ − µt . We compute the actions and
angles of the particles in the potential corresponding to the
N
1 Õ trial mass parameter, and call these { jt,n, θ t,n }. We now seek
gt ≡ gt,n . (9)
N a function P( jt , θ t , µt ) that yields an estimator µ̂ of the true
n=1
mass µ through the following formula:
If the trial mass µt equals the true mass µ, then h g t iθ =
1 N
2 , where h·iθ denotes the average over the angle variable µ̂ = µt +
1 Õ
P( jt,n, θ t,n, µt ). (10)
θ corresponding to the true mass. The estimated mass µ̂ N
n=1
is the mass at which g t = 21 and its standard deviation σ
is defined such that over the domain µ̂ ± σ the range of We require that the estimator is unbiased to O(∆µ), by which
g t is 12 ± (12N)−1/2 . Beloborodov & Levin (2004) show that we mean the following. Suppose that the N particles have
this estimator works well for the Kepler potential, but when a common action j and a uniform probability distribution
applied to the harmonic potential it fails dramatically, since in the angle θ, both variables being defined in the potential
the first of equations (36) below shows that h g t iθ = 12 for corresponding to the true mass parameter µ. Denoting the
all values of the trial frequency ωt . Thus the mean orbital average over this distribution of angles by h·iθ , we require
phase estimator is useless in this case. that
(ii) Other versions of roulette are based on statistics such
as the Kolmogorov–Smirnov or Anderson–Darling tests of hP( jt , θ t , µ)iθ = µ − µt + O(∆µ)2 . (11)
uniformity. The K-S statistic D = maxg |FN (g) − g| where The transformation between (x, v) and ( j, θ) or ( jt , θ t )
FN (g) is the cumulative distribution of the trial phases is canonical so the transformation from ( j, θ) to ( jt , θ t )
{gt,n }; and if D > D90 (N) (where for example D90 (100) = is canonical as well. Therefore it can be described by a
0.1208), then the uniform hypothesis can be rejected at the mixed-variable generating function S( j, θ t , µt , ∆µ) = jθ t +
90 per cent level. Beloborodov & Levin (2004) argue that s( j, θ t , µt , ∆µ) with s( j, θ t , µt , ∆µ) = O(∆µ) and
if D > D90 (N) for all trial masses less than µmin or greater
than µmax , then the true mass parameter lies in the interval jt = ∂2 S( j, θ t , µt , ∆µ) = j + ∂2 s( j, θ t , µt , ∆µ),
(µmin, µmax ) at the 90 per cent confidence level. However, in θ = ∂1 S( j, θ t , µt , ∆µ) = θ t + ∂1 s( j, θ t , µt , ∆µ). (12)
this approach (i) there will be no acceptable mass parame-
ter in a significant fraction of cases; (ii) in some cases the Here and throughout we use the notation ∂n f to denote the
minimum value of the KS statistic over all trial masses will partial derivative of f with respect to its nth argument. The

MNRAS 000, 1–11 (2017)


4 S. Tremaine
action-angle variables for mass parameter µt can then be The estimator is distribution-free in the sense that we
written in terms of those for mass parameter µ as have made no assumptions about the df F(j ) in deriving
it. We have no general proof that the estimator is always
jt = j + ∂2 s( j, θ, µt , ∆µ) + O(∆µ)2,
consistent, but we show below that the GF0 estimator is
θ t = θ − ∂1 s( j, θ, µt , ∆µ) + O(∆µ)2 . (13) consistent for the harmonic and Kepler potentials.
We are free to choose the constant j?. The simplest
We can now write
approach is to set j? = 02 , and an estimator based on this
P( jt , θ t , µt ) = P + (∂1 P)(∂2 s) − (∂2 P)(∂1 s) + O(∆µ)2 ; (14) choice will be called a GF0 estimator. A potentially more
powerful approach is to choose j? to minimize the variance
in this formula the arguments of all functions on the right σ 2 . From equations (20) and (23) this condition implies
are ( j, θ, µt ) or ( j, θ, µt , ∆µ). The condition (11) now reads
jn /h[∂2 s1 ( j, θ, µt )]2 iθ
Í
hPiθ + h(∂1 P)(∂2 s) − (∂2 P)(∂1 s)iθ = ∆µ + O(∆µ)2 ; (15) j? = jmin ≡ Ín . (24)
n 1/h[∂2 s1 ( j, θ, µt )] iθ
2

To make the dependence on ∆µ on the left side explicit, An estimator based on this choice for j? will be called a
we expand the generating function s( j, θ t , µt , ∆µ) as a power GF1 estimator. Of course, initially we do not know the true
series in ∆µ, actions and angles so we must use an iterative procedure to
determine jmin . An approach that works well in our experi-
s( j, θ t , µt , ∆µ) = ∆µ s1 ( j, θ t , µt ) + O(∆µ)2 . (16)
ments is to (i) determine an estimate µ̂ for the mass param-
Then equation (15) becomes eter using j? = 0; (ii) use this estimate to compute actions
and angles, and insert these in equation (24) to determine
hPiθ + ∆µh(∂1 P)(∂2 s1 ) − (∂2 P)(∂1 s1 )iθ = ∆µ + O(∆µ)2 ; (17) jmin ; (iii) use j? = jmin to derive an improved estimate µ̂.
here the arguments of all functions are ( j, θ, µt ). Since s1 is The formula (10) that determines the estimated mass µ̂
periodic in the angle variable – this follows from equation in terms of the trial mass µt can be iterated to convergence,
(30) below – we may integrate the last term on the left side which occurs when µ̂ = µt or
by parts. Then (17) can be rewritten as N
Õ
P( jµ̂,n, θ µ̂,n, µ̂) = 0. (25)
hPiθ = 0, ∂1 hP(∂2 s1 )iθ = 1 (18)
n=1
or This is a non-linear equation for µ̂ and it may have more than
one solution. We have no general procedure for choosing the
hPiθ = 0, hP(∂2 s1 )iθ = j − j? (19)
correct solution but show how to do so for the harmonic and
where j? is an integration constant and the arguments of P Kepler potentials in §§3 and 4 respectively. Note that:
and s1 are ( j, θ, µt ). These requirements can be satisfied if (i) The generating-function estimators are unbiased only
we choose for a fixed value of the trial mass, so some bias is introduced
( j − j?) ∂2 s1 ( j, θ, µt ) when the estimator (25) is used. In our experiments this bias
P( j, θ, µt ) = P?( j, θ, µt ) ≡ ; (20)
h[∂2 s1 ( j, θ, µt )]2 iθ is always small compared to the standard deviation of the
estimator (see Figures 1 and 3).
we call these generating-function estimators. Other func-
(ii) The formula (23) for the variance is valid for a fixed
tions can satisfy the constraints (19) but P? is optimal in a
value of the trial mass, and if we iterate to convergence using
sense that we now describe.
(25) the variance in µ̂ will generally be larger.
If the trial mass is fixed and equal to the true mass,
then the variance in the estimator µ̂ is
σ 2 ≡ ( µ̂ − h µ̂iθ )2 θ = h µ̂2 iθ − h µ̂iθ2

2.1 The generating function
N To implement this method we need the first-order generating
1 Õh 2 i
= hP ( jn, θ, µ)iθ − hP( jn, θ, µ)iθ2 . (21) function s1 ( j, θ t ) for the canonical transformation between
2
N n=1 ( j, θ) and ( jt , θ t ) (eq. 16). We assume that the Hamiltonian
has the usual form H(x, v| µ) = 21 v 2 + Φ(x| µ). Then in the trial
Now hPiθ = 0 by (18), so
action-angle variables we have
N
σ2 =
1 Õ 2
hP ( jn, θ, µ)iθ . (22) H( jt | µt ) = H( j | µ) + Φ(x| µt ) − Φ(x| µ). (26)
N 2 n=1 Using equations (13) and working to first order in ∆µ =
µ − µt ,
Now write P = P? + ∆P. Then θ = hP2 i + 2( j − hP?2 iθ
j?)h(∆P)(∂2 s1 )iθ /h(∂2 s1 )2 iθ + h∆P2 iθ . The second of equa- ∂H ∂H ∂s1 ∂Φ
− = . (27)
tions (19) requires h(∆P)(∂2 s1 )iθ = 0. Therefore hP2 iθ = ∂µ ∂ j ∂θ ∂µ
hP?2 iθ + h∆P2 iθ ; this means that P? has the smallest vari- Any function g( j, θ, µ) can be split into angle-averaged and
ance among all estimators with a given value of the integra- oscillating parts,
tion constant j?. For this reason we adopt the estimator P? ∫ 2π
1
in preference to any other estimators that satisfy equations hgiθ = dθ g( j, θ, µ), {g}θ ≡ g − hgiθ . (28)
(19). The variance is then 2π 0

N
1 Õ ( jn − j?)2 2
σ2 = . (23) In principle, this involves no loss of generality because there are
N 2 n=1 h[∂2 s1 ( jn, θ, µ)]2 iθ action-angle variables (j 0, θ 0 ) in which j 0 = j − j? and θ 0 = θ.

MNRAS 000, 1–11 (2017)


Masses of stellar systems 5
Equation (27) can be re-arranged as consistent with (38).
To derive the generating-function estimator of the fre-
∂H ∂Φ ∂Φ ∂H ∂s1
   
− = + . (29) quency we need the result
∂µ ∂µ θ ∂µ θ ∂ j ∂θ
j2
The terms on the left side are independent of angle, and the h[∂2 s1 ( j, θ, ω)]2 iθ = . (41)
2ω2
terms on the right average to zero. Therefore the left and
From equation (20) we then have
right sides must both be zero, and we have
j − j?
∂H

∂Φ
 
∂H −1 θ
 ∫ 
∂Φ
 P?( j, θ, ω) = −2ω cos 2θ. (42)
= , s1 ( j, θ, µ) = − dθ . (30) j
∂µ ∂µ θ ∂j ∂µ θ The estimator (10) is then
The second equation implies that s1 ( j, θ) is a periodic func-  N   
2 Õ j?
tion of θ, a result we used in deriving (20). ω̂ = ωt 1 − 1− cos 2θ t,n . (43)
N jt,n
n=1
We may choose either j? = 0 for simplicity, which yields the
3 THE HARMONIC POTENTIAL GF0 estimator for the harmonic potential, or (eq. 24)
Í −1
We now apply the generating-function estimator to particles n jt,n
orbiting in the one-dimensional harmonic potential, j? = jmin = Í −2 (44)
n jt,n
Φ(x| ω) = 21 ω2 x 2 . (31) which yields the minimum-variance or GF1 estimator.
Here the mass parameter is ω, the oscillator frequency. The The variance in ω̂ at a fixed value of the trial frequency
Hamiltonian is can be determined from equation (23): If the trial frequency
ωt is equal to the true frequency ω then
H(x, v| ω) = 21 v 2 + 12 ω2 x 2 . (32) N 
2ω2 Õ j? 2

The solution of the equations of motion is σ2 = 2 1− . (45)
N jn
r n=1
2j When j? = 0, σ 2 = 2ω2 /N. Since σ 2 is finite and declines
x= cos θ, v = 2 jω sin θ;
p
(33)
ω as N −1 the GF0 estimator is consistent, and by the central-
here ( j, θ) are the action-angle variables for frequency ω and limit theorem the distribution of estimates ω̂ is Gaussian for
in these variables the Hamiltonian is large N.
For the minimum-variance estimator GF1, j? is given
H( j |ω) = ω j. (34) by equation (44) and the variance is
Suppose that the true value of the frequency is ω and 2ω2

h j −1 i 2

2
ωt is a trial approximation to ω. We denote the action and σ = 1 − −2 . (46)
N hj i
angle corresponding to ωt by jt and θ t . The relation between
the action-angle pairs ( j, θ) and ( jt , θ t ) is found by solving We now iterate this procedure, replacing the trial fre-
quency ωt by ω̂, re-evaluating the angles θ n , and repeating
the process. We have converged when ω̂ = ωt , which requires
r s
2j 2 jt
x= cos θ = cos θ t , v = 2 jω sin θ = 2 jt ωt sin θ t ,
p p
ω ωt
N  
Õ j?
(35) 1− cos 2θ ω̂,n = 0 (47)
jω̂,n
n=1
which yields
or
ωt ω ωt
 
N
θ = tan−1 tan θ t , j = jt 2
cos θ t + 2
sin θ t . (36) Õ (ω̂2 xn2 + vn2 − 2ω̂ j?)(ω̂2 xn2 − vn2 )
ω ωt ω = 0. (48)
n=1 (ω̂2 xn2 + vn2 )2
The generating function (16) for this transformation is
When j? = 0 the estimator (48) simplifies to
ωt
s( j, θ t , ωt , ∆ω) = j tan−1 tan θ t − jθ t , (37) N
ωt + ∆ω Õ ω̂2 xn2 − vn2
= 0. (49)
n=1 ω̂ xn + vn
2 2 2
where ∆ω = ω − ωt . Expanding as a power series in ∆ω, we
have s( j, θ t , ωt , ∆ω) = ∆ωs1 ( j, θ t , ωt ) + O(∆ω)2 where The left side approaches −N as ω̂ → 0 and approaches +N
j as ω̂ → ∞ and is monotonic in ω̂. Therefore (49) has one
s1 ( j, θ, ω) = − sin θ cos θ. (38) and only one solution for ω̂. When j? , 0 the asymptotic
ω
behavior of the left side is the same; thus there is at least one
This result can also be derived from equation (30). From
solution in the range 0 < ω̂ < ∞ but there may be more than
equation (31) ∂Φ(x|ω)/∂ω = ωx 2 and from equation (33)
one solution. To determine which one to use in practice, we
∂Φ ∂Φ first find the unique solution ω̂0 with j? = 0 from (49), then
   
= h2 j cos2 θiθ = j, = j(2 cos2 θ − 1). (39)
∂ω θ ∂ω θ find all the solutions with the given value of j? from (48)
and choose the one that is closest to ω̂0 .
Equation (34) yields ∂H/∂ j = ω so
For the harmonic potential, but not in general, the vari-
∫ θ
j j ance in ω̂ at a fixed value of the trial frequency is the same as
s1 ( j, θ, ω) = − dθ (2 cos2 θ − 1) = − sin θ cos θ, (40) the variance in ω̂ as determined by iterating to convergence.
ω ω

MNRAS 000, 1–11 (2017)


6 S. Tremaine
3.1 A maximum-likelihood estimator
5
The harmonic potential is one of the few for which a calcula-
tion of the likelihood can be carried out analytically without virial theorem
assumptions about the df, and this result can be compared 4
with the GF estimators derived above.
Let∫F(x, v) = F( j) be∫ the df, defined in §1 and normal- max likelihood, GF0
ized so F( j)djdθ = 2π F( j)dj = 1. Then the probability 3
that a particle lies in the small phase-space volume dxdv is
orbital roulette

ω̂
F( j)dxdv = F( 21 v 2 /ω + 12 ωx 2 )dxdv. Now let z ≡ shov/x and 2
ask for the probability density of z:

GF1
p(z) = dxdvF( 12 v 2 /ω + 12 ωx 2 )δ(z − v/x) 1

= dx xF( 12 x 2 z 2 /ω + 12 ωx 2 )
0
101 102 103
ω N


= 2 F( j)dj = . (50)
z + ω2 π(z 2 + ω2 )
Given a data set {zn }, the log-likelihood is then (cf. Magor- Figure 1. Performance of four estimators for N particles orbit-
rian 2014) ing in a harmonic potential. From top to bottom, the estimators
are the virial theorem (eq. 5), the maximum-likelihood estimator
N
Õ (52), orbital roulette, and the minimum-variance GF1 estimator
log L = const + N log ω − log(zn 2 + ω2 ). : (51) (eqs. 44 and 48). The mean estimate for the frequency ω and
n=1 the standard deviation are determined for 5000 realizations and
The maximum-likelihood estimate of the frequency, ω̂, is shown by the filled circles and error bars. The true frequency is
given by the solution of ∂ log L/∂ω = 0, which is ω = 1, and the distribution of amplitudes is given by equation
(54) with γ = 0 and Amax /Amin = 3. To minimize confusion, the
N N
Õ ω̂2 − zn 2 Õ ω̂2 xn2 − vn2 results for the different estimators have been displaced by integer
=0 or = 0. (52) offsets on the vertical axis.
n=1
ω̂2 + zn 2 n=1 ω̂ xn + vn
2 2 2

The variance is given by and zero otherwise.


∂ log L We compare four estimators: (i) the virial theorem (eq.
 2 
1 N
= − = − 2; (53) 5); (ii) the maximum-likelihood estimator (52), which for
σ2 ∂ω2 z 2ω
the harmonic potential is equivalent to the GF0 estima-
here h·iz denotes the average over the probability density tor with j? = 0 (eq. 49); (iii) orbital roulette, using the
(50). Anderson–Darling test for uniformity of the orbital phases
Despite its simplicity, this method is of limited useful- (Beloborodov & Levin 2004)3 ; (iv) the minimum-variance
ness for two reasons. First, the derivation works only for GF1 estimator (eqs. 44 and 48).
a limited set of potentials of the form Φ(x) = c|x| α , and The mean and standard deviation of the estimators ω̂
cannot be generalized to arbitrary one-dimensional systems are shown in Figure 1 over the range N = 5–1000, for a dis-
or to three-dimensional systems such as the Kepler poten- tribution of amplitudes having γ = 0 and Amax /Amin = 3.
tial. Second, by fitting only the distribution of z = v/x the All four estimators exhibit modest bias for small N, but are
method ignores the distribution of actions, which also helps consistent in the sense that the bias and standard deviation
to constrain the frequency ω – for example, at the correct approach zero as the sample size N → ∞. At N = 1000, the
frequency there should be no correlation between the actions standard deviations in ω̂, ordered by increasing magnitude,
and angles (Beloborodov & Levin 2004). are: GF1, 0.023; then virial theorem, 0.026, then GF0, max-
The estimator (52) and its variance (53) are identical imum likelihood, and orbital roulette tied at 0.045 (for all
to those of the GF0 method, equations (49) and (45) with methods except roulette, these results can be derived ana-
j? = 0. Thus for the harmonic potential, the GF0 estimator lytically; see equations 6, 45, 46, and 53).
is equivalent to maximizing the likelihood of the data set Figure 2 shows the performance of the estimators for
{zn }. fixed sample size N = 100, γ = 0, and a range of values for
Amax /Amin . Rather than plotting ω̂ as in Figure 1, we plot

(ω̂−1) N, which is independent of sample size. The behavior
3.2 Numerical tests
of the maximum-likelihood and orbital-roulette estimators is
To explore the performance of various estimators for the independent of the distribution of amplitudes A since they
frequency ω of a harmonic potential, we generate 5000 real- depend on the data only through the ratio z = v/x (eq.
izations of the positions and velocities {xn, vn } of N particles 51) or θ (eq. 35) whose distributions are independent of the
orbiting in the potential (31) with frequency ω = 1. The par- amplitude (this independence holds only for the harmonic
ticles are distributed uniformly
p random in phase, and ran- potential, not for most others). In this figure, the standard
domly in the amplitude A ≡ 2 j/ω (eq. 33) with probability
density 3 As we observed after equation (9), the version of orbital roulette
−γ
dp ∝ A d log A, Amin < A < Amax (54) based on the mean phase fails for the harmonic potential.

MNRAS 000, 1–11 (2017)


Masses of stellar systems 7
conjugate angle θ 1 . For brevity we now drop the subscript
so j1 → j and θ 1 → θ = `.
14 virial theorem We have ∂Φ/∂ µ = −1/r, r = a(1 − e cos u), and d` =
12 (1 − e cos u)du so

max likelihood, GF0 ∂Φ


  ∫ 2π
10 =−
d` 1 1
=− , (57)
∂µ θ 0 2π a(1 − e cos u) a
8
(ω̂−1) N
p

and
6 orbital roulette 
∂Φ

1 1 e cos u
4 =− + =− . (58)
∂µ θ r a a(1 − e cos u)
2 GF1 Then equation (30) yields the generating function
0 ( j + L)3 u
∫ 
∂Φ
 r
a
s1 ( j, θ, µ) = du(1 − e cos u) = e sin u,
21 µ2 ∂µ θ µ
2 3 4 5
Amax/Amin (59)
and the results
r
Figure 2. Performance of the same four estimators as in Figure 1 a e cos u
for N = 100 particles orbiting in a harmonic potential with ω = 1. ∂2 s1 ( j, θ, µ) = ,
µ 1 − e cos u
The distribution of amplitudes is given by equation (54) with γ = 0 a h i
and varying Amax /Amin (horizontal axis). To minimize confusion, h[∂2 s1 ( j, θ, µ)]2 iθ = (1 − e2 )−1/2 − 1 . (60)
the results for the different estimators have been displaced by
µ

integer offsets on the vertical axis. We plot (ω̂ − 1) N , which is In these formulae u, e, and a should be regarded as functions
independent of N for N  1. of the action j = (µa)1/2 [1 − (1 − e2 )1/2 ], the angle θ = ` = u −
e sin u, and the angular momentum L = [µa(1 − e2 )]1/2 . From
equations (10) and (20), the generating-function estimator
deviation of the generating-function estimator GF1 is always
of the mass is given by
the smallest of the four estimators. It is remarkable that
the estimator GF1 performs so much better than the others N  2 1/2
µ̂ j? et,n (1 − et,n ) cos ut,n

1 Õ
when Amax /Amin ∼ 1. =1+ 1− , (61)
µt N jt,n 1 − et,n cos ut,n
n=1

The estimator can be re-written in terms of observable quan-


4 THE KEPLER POTENTIAL tities, the radius r, speed v, and radial and tangential veloc-
ities vr and v⊥ = (v 2 − vr2 )1/2 . In making this conversion two
The Hamiltonian for the Kepler potential is useful identities are
µ
H(x, v | µ) = 12 v 2 + Φ(r | µ) = 21 v 2 − r (55) e2 sin2 u
v 2 r = µ(1 + e cos u), vr2 r = µ . (62)
1 − e cos u
where r = |x |, µ = GM, G is the gravitational constant, and
we loosely refer to µ as the mass of the central body. We find
An orbit is described by its semimajor axis a, eccen- µ̂ 1 Õ
N  1/2
j? (vn2 rn − µt )v⊥,n rn

tricity e, inclination i relative to some x-y reference plane, =1+ 1− . (63)
µt N jt,n µt (2µt − vn rn )1/2
2
longitude of node Ω, argument of periapsis ω, and mean n=1
anomaly `. The specific angular momentum is L = |x × v | = where
[µa(1 − e2 )]1/2 and the z-component of the specific angular   1/2
momentum is Lz = L cos i. A suitable set of action-angle jt,n =
rn  1/2 
µt − (2µt − vn2 rn )1/2 v⊥,n rn . (64)
variables is 2µt − vn2 rn
√  p p p 
j = ( j1, j2, j3 ) = µa 1 − 1 − e2, 1 − e2, 1 − e2 cos i The quantities within the square roots are positive so long
√ h p i  as 2µt ≥ vn2 rn , which holds so long as the particle is in a
= µa 1 − 1 − e2 , L, Lz , bound orbit at the trial mass.
The variance of the estimator µ̂ can be determined from
θ = (θ 1, θ 2, θ 3 ) = (`, ` + ω, Ω); (56)
equation (21) or (61). If the trial mass equals the true mass,
j1 is called the radial action since it vanishes for circular the variance at a fixed value of the trial mass is
orbits. In these variables the Hamiltonian is H = − 12 µ2 /( j1 +
σ 2 = ( µ̂ − h µ̂iθ )2 θ


j2 )2 . We shall also use the eccentric anomaly u, given by
N 
Kepler’s equation ` = u − e sin u. µ2 Õ j? 2

= 2 (1 − e2n )1/2 1 − (1 − e2n )1/2 .
 
1− (65)
Since the orientation of the orbit does not help to con- N jn
n=1
strain the mass µ, we can ignore the angles θ 2 and θ 3 and
the action Lz . Moreover the angular momentum L = |x × v | When j? = 0, the variance is bounded above by µ2 /(4N),
is independent of the mass µ for fixed position and velocity which implies that the estimator is consistent, and by the
(x, v ). Therefore the mass-dependent dynamics has only one central-limit theorem the distribution of estimates µ̂ is Gaus-
degree of freedom and is described by the action j1 and its sian for large N.

MNRAS 000, 1–11 (2017)


8 S. Tremaine
The variance is minimized when we choose
Í 2 1/2 a−1/2 5
n (1 − en ) n
j? = jmin = µ1/2 Í (66)
2 )1/2 a−1 1 − (1 − e2 )1/2 −1 virial theorem
 
n (1 − e n n n
Í 2
4
n (2µ − vn rn )v⊥,n
= Í  −1 .
2
n (2µ − vn rn )
3/2 v
 1/2
⊥,n µrn − (2µ − vn rn )
2 1/2 v
⊥,n rn
roulette (mean phase)
3
The estimate has converged when µ̂ = µt , which occurs
roulette (A-D)

µ̂
when
2
N  1/2
j? (vn2 rn − µ̂)v⊥,n rn
Õ 
1− = 0. (67) GF0
jµ̂,n (2 µ̂ − vn2 rn )1/2
n=1 1
When j? = 0 the estimator (67) simplifies to
N 1/2
(vn2 rn − µ̂)v⊥,n rn 0
Õ
= 0,
101 102 103
n=1 (2 µ̂ − vn2 rn )1/2
(68) N
and we denote the estimator based on this formula as GF0.
Figure 3. Performance of estimators for N particles orbiting
We denote the minimum-variance estimator based on equa-
in a Kepler potential with mass µ = 1. The squared eccentric-
tions (67) and (66) as GF1. Note that the generating- ities are distributed uniformly random between 0 and 1. From
function estimators are unbiased, and their variance is given top to bottom, the estimators are the virial theorem (eq. 7), or-
by (65), only for a fixed value of the trial mass. When iter- bital roulette using the mean phase, orbital roulette using the
ated to convergence there will generally be a small non-zero Anderson–Darling test, and the simple generating-function esti-
bias, and the variance will be larger. mator GF0 (j? = 0; eq. 68). The mean estimate for the mass
The smallest allowed mass is µ̂min = 21 vm 2 r where m
m and the standard deviation in µ̂ are determined for 5000 realiza-
is the index corresponding to the smallest of {vn2 rn }. As tions and shown by the filled circles and error bars. The distribu-
tion of semimajor axes is given by equation (71) with γ = 0 and
µ̂ → µ̂min from above, the left side of (68) approaches
3/2 amax /amin = 3. To minimize confusion, the results for the different
⊥,m rm ( µ̂ − µ̂min )
2−3/2 vm
2v −1/2 → +∞. As µ̂ → ∞, the left
estimators have been displaced by integer offsets on the vertical
1/2
−1/2 µ̂1/2 Í
side approaches −2 n v⊥,n rn → −∞. Moreover the axis.
left side is a monotonically decreasing function of µ̂. There-
fore (68) has one and only one solution for µ̂.
and zero otherwise. We have conducted tests with a variety
When j? , 0 there may be more than one solution to
of eccentricity distributions. We compare several estimators:
(67). To determine which one to use, we first find the unique
solution µ̂0 with j? = 0, then find all the solutions with the (i) the virial theorem (eq. 7);
chosen value of j? and choose the one that is closest to µ̂0 . (ii) orbital roulette, using both the mean-phase test and
These results are derived from the action-angle variables the Anderson–Darling test as described by Beloborodov &
(56). Other sets of action-angle variables yield different es- Levin (2004);
timators. For example, if we choose Delaunay elements (iii) the simple generating-function estimator GF0 ( j? =
√  p p  √ 0, eq. 68) and the minimum-variance estimator GF1 ( j? =
j = µa 1, 1 − e2, 1 − e2 cos i = ( µa, L, Lz ),
jmin , eqs. 67 and 66);
θ = (`, ω, Ω), (69) (iv) if the particles have a known eccentricity e then two
unbiased estimators of the mass are
the estimator analogous to (68) is
N
1 Õ
N
Õ 1/2
(vn2 rn − µ̂)v⊥,n rn µ̂ = 1
vn2 rn, (72)
2
1/2  = 0. (70) N(1 − 2 e ) n=1
n=1 (2 µ̂ − vn rn )
1/2 µ̂ − v
⊥,n rn (2 µ̂ − vn rn )
2 2

1/2
N
2 Õ 2
In general this performs less well than (68). For example, µ̂ = 2
vr,n rn, (73)
Ne n=1
if the estimated mass is correct and the orbits are circular,
then µ̂ = µ and vn2 rn = v⊥,n
2 r = µ, so both the numerator
n These estimators are not useful in practice since the eccen-
and denominator in each term of (70) vanish. tricities are not known, but they provide a useful benchmark
for assessing the performance of other estimators (see, e.g.,
An & Evans 2011).
4.1 Numerical tests
The mean and standard deviation of the virial-theorem and
Our tests are similar to those in §3.2 for the harmonic po-
orbital-roulette estimators, and of the generating-function
tential. We generate 5000 realizations of the positions and
estimator GF0 are shown in Figure 3 over the range N = 5–
velocities {x n, v n } of N particles orbiting in a point-mass po-
1000, for a distribution of semimajor axes having γ = 0 and
tential with unit mass, µ = 1. The particles are distributed
amax /amin = 3. In these simulations the squared eccentric-
uniformly random in phase, and randomly in the semimajor
ity is distributed uniformly random between 0 and 1, corre-
axis a with probability density
sponding to a df that is constant on an energy surface in
dp ∝ a−γ d log a, amin < a < amax, (71) canonical phase space. At N = 1000 the standard deviations

MNRAS 000, 1–11 (2017)


Masses of stellar systems 9
rent semimajor axes and eccentricities of the planets, sam-
pling the positions and velocities at a random time and ap-
plying the estimator GF0 yields the correct solar mass with
1.0 a fractional standard deviation of 2.1 per cent. Using the
standard deviation × N

positions and velocities on 2009 April 1 (taken from Table


p

1 of Bovy et al. 2010) yields µ̂/(GM ) = 1.028. Thus, GF0

2)
0.8

e /2
­ ®
2 v2r r /e2

(1−
) provides mass estimates accurate to within about 3 per cent

v2 ®
)
ase lette (A-D

r/
h in this case, even though it is given only 8 data points and

­
0.6 n p r o u
a no prior information on the eccentricity or semimajor axis
me
e( em
distributions.
0.4 lett eor
GF0
rou rial
th
vi
0.2
GF1 5 DISCUSSION
0.00.0 0.2 0.4 0.6 0.8 1.0 These results offer encouraging evidence of the power of
eccentricity these estimators, but unanswered questions remain.
Can the generating-function estimators be derived from
a maximum-likelihood approach? A possible signpost is that
Figure 4. Performance of estimators µ̂ for 100 particles orbiting the GF0 estimator in a harmonic potential yields the maxi-
in a Kepler potential, all with the same eccentricity
√ (horizontal mum likelihood of the frequency ω for the data set {vn /xn },
axis). The standard deviation of µ̂, multiplied by 100 to provide the unique combination of positions and velocities that has
a quantity independent of the number of particles, is plotted on
the same dimension as ω (see discussion at the end of §3.1).
the vertical axis for each of the following: virial theorem (eq. 7,
cyan line); estimator based on the mean of v 2 r (eq. 72, black line);
However, no analogous result is known for most other po-
estimator based on the mean of vr2 r (eq. 73, magenta line); orbital tentials, including the Kepler potential.
roulette using the mean phase (blue line); orbital roulette using What is the relation of the generating-function ap-
the Anderson–Darling test (red line); the generating-function es- proach to Bayesian estimates of the posterior distribution
timator GF0 (j? = 0; eq. 68, green line); and the generating- of the mass parameters given the data (e.g., Bovy et al.
function estimator GF1 (j? = jmin , blue line). The standard devi- 2010)?
ation is measured from 5000 realizations. The true mass is µ = 1, We have shown that the generating-function estimators
and the distribution of semimajor axes is given by equation (71) perform better than other distribution-free estimators such
with γ = 0 and amax /amin = 3. The results plotted near e = 0 and
as the virial theorem or orbital roulette when applied to
e = 1 are for e = 0.0001 and e = 0.9999.
the harmonic or Kepler potentials. We have also shown that
they avoid the conceptual difficulties associated with meth-
ods that estimate the distribution function simultaneously
in µ̂, ordered by increasing magnitude, are: GF0, 0.013; or- with the mass parameters, such as Schwarzschild’s method.
bital roulette using the Anderson–Darling test, 0.018; orbital We have not, however, directly compared the efficiency of
roulette using the mean phase, 0.022; and virial theorem, the latter methods to the efficiency of generating-function
0.031. Thus the performance of GF0 is substantially bet- estimators when applied to the same data.
ter than the other estimators. Note that the GF0 estimator In practical problems of mass estimation in astro-
is independent of the distribution of semimajor axes in the physics, the data are often available only over a subset of
sample, and its superior performance holds for a wide range the volume occupied by the stellar system, either because of
of possible semimajor axis distributions. the limited field of view of the survey or because of crowd-
The minimum-variance estimator GF1 does not perform ing near the core or background contamination in the out-
significantly better than GF0 in these simulations. The rea- skirts. How can generating-function estimators be modified
son can be traced to equation (66) for jmin . In these sim- to account for both incomplete phase-space sampling and
ulations we have assumed that the probability distribution observational errors?
of the squared eccentricity is uniform, dp = de2 . Then for a A difficult problem arises when not all of the phase-
given semimajor axis, the mean of the denominator in the space coordinates are known, typically because the distance
∫1
expression for jmin is proportional to 0 de e(1− e2 )1/2 [1−(1− along the line of sight cannot be determined accurately, and
e2 )1/2 ]−1 , which diverges logarithmically at its lower limit. the proper motions are too small to be detectable. Can the
Thus in any large sample jmin will be small, so the estimator generating-function estimators be generalized to the case
GF1 will perform similarly to GF0. On the other hand, for a where only some of the six phase-space coordinates of the
sample of particles with fixed non-zero eccentricity GF1 per- particle are known?
forms remarkably well. This is illustrated in Figure 4, which
shows the normalized standard deviation (i.e., the standard

deviation times N) of several mass estimators as a function
6 SUMMARY
of the eccentricity. GF1 is much more accurate than any of
its competitors. We have described a novel method based on generating func-
A final test, motivated by Bovy et al. (2010), is to ask tions for estimating the mass parameters of a gravitational
how well the generating-function estimators would predict potential, given that we have positions and velocities for
the solar mass, given a snapshot of the positions and veloc- a set of test particles orbiting in the potential in a steady
ities of the eight planets in the solar system. Given the cur- state (i.e., with random phases). The method we describe

MNRAS 000, 1–11 (2017)


10 S. Tremaine
is (i) distribution-free, in the sense that it requires no prior then hPα iθ = 0 and equation (A3) simplifies to
assumptions about the distribution of the particles in action M
space; (ii) iterative, in the sense that it produces an esti-
Õ ∂
cαγ ( j, µ)Sγβ ( j, µ) = δαβ (A5)
mate µ̂ for the mass given a trial value µt , (iii) unbiased, ∂j
γ=1
in the sense of equation (11). In practice we iterate the es-
timate until µ̂ = µt (eq. 25). We have demonstrated that where
this estimator is more powerful than other distribution-free Sγβ ( j, µ) = h∂2 sγ ∂2 sβ iθ . (A6)
estimators in the harmonic and Kepler potentials.
These results point the way towards more efficient and Now let S−1 denote the inverse of the M × M matrix S with
reliable mass estimators that could have broad applications components Sαγ . If we set cαγ = ( j − j?) Sαγ
−1 then equation

in astrophysical dynamics. (A6) is satisfied, and a suitable estimator is given by (A1)


with
M
Õ
−1
REFERENCES Pα = ( j − j?) Sαγ ∂2 sγ ( j, θ, µ). (A7)
γ=1
Aaronson, M. 1983, ApJ, 266, L11
An, J. H., & Evans, N. W. 2011, MNRAS, 413, 1744
Bahcall, J. N., & Tremaine, S. 1981, ApJ, 244, 805 A2 Generalization to several degrees of freedom
Beloborodov, A. M., & Levin, Y. 2004, ApJ, 613, 224
Bovy, J., Murray, I., & Hogg, D. W. 2010, ApJ, 711, 1157 Now suppose that there is a single mass parameter µ and
Chakrabarty, D., & Saha, P. 2001, AJ, 122, 232 D > 1 degrees of freedom, so the actions and angles become
Feldmeier-Krause, A., Zhu, L., Neumayer, N., et al. 2017, MN- D-dimensional vectors (j , θ). The analog to equation (17) is
RAS, 466, 4040
D 
∂P ∂s1 ∂P ∂s1

Kapteyn, J. C. 1922, ApJ, 55, 302 Õ
hPiθ + ∆µ − = ∆µ + O(∆µ)2 ; (A8)
Lancaster, T. 2000, J. Econometrics, 95, 391 ∂ jk ∂θ k ∂θ k ∂ jk θ
Magorrian, J. 2006, MNRAS, 373, 425 k=1
Magorrian, J. 2014, MNRAS, 437, 2230 here the average h·iθ is over all of the D angles. Let us write
McConnachie, A. W. 2012, AJ, 144, 4
Neyman, J., & Scott, E. L. 1948, Econometrica 16, 1 Õ
Zwicky, F. 1933, Helvetica Physica Acta, 6, 110 P(j , θ, µ) = qi (j , θ, µ) ∂s1 (j , θ, µ)/∂θ i . (A9)
i
Then equation (A8) requires
APPENDIX A: ESTIMATORS FOR SYSTEMS
D
WITH MULTIPLE MASS PARAMETERS AND ∂ ∂s1 ∂s1
Õ  
Tki (j , µ)qi (j , µ) = 1, where Tki (j , µ) = .
MULTIPLE DEGREES OF FREEDOM ∂ jk ∂θ k ∂θ i θ
k,i=1
A1 Generalization to several mass parameters (A10)
The results of §2 can be generalized to systems with M > 1 This condition is satisfied if
mass parameters µ = (µ1, . . . , µ M ) and one degree of free- D D
dom. The true but unknown mass parameters are described ?
Õ Õ
−1
qi (j , µ) = Tim αm ( jm − jm ) where αm = 1. (A11)
by the vector µ, and we choose a trial vector µ t that is close m=1 m=1
to µ, with ∆µ ≡ µ − µ t . The estimator of µ is (cf. eq. 10)
N
where T−1 is the inverse of the matrix T and j ? } is a con-
1 Õ stant vector.
µ̂ = µ t + P( jt,n, θ t,n, µ t ) (A1)
N
n=1
and we require that the estimator is consistent and unbi- A3 Generalization to several mass parameters
ased up to errors of O(|∆µ| 2 ). The analog to the generating and degrees of freedom
function in equation (16) is
It is straightforward to generalize the preceding results to
M
Õ
2 systems with M > 1 mass parameters and D > 1 degrees of
s( j, θ t , µ t , ∆µ) = ∆µα sα ( j, θ t , µ t ) + O(∆µ) , (A2)
freedom. The analogs to equations (A1)–(A3) are
α=1
N
and the condition on a component of the vector P analogous 1 Õ
to (17) is µ̂ = µ t + P(j t,n, θt,n, µ t ); (A12)
N
n=1
M
Õ
hPα iθ + ∆µβ h(∂1 Pα )(∂2 sβ ) − (∂2 Pα )(∂1 sβ )iθ M
Õ
β=1 s(j , θt , µ t , ∆µ) = ∆µα sα (j , θt , µ t ) + O(∆µ)2, (A13)
2
= ∆µα + O(∆µ) ; (A3) α=1

in this equation the arguments of all functions are ( j, θ, µ t ). M D 


∂Pα ∂sβ ∂Pα ∂sβ
Õ Õ 
Let hPα iθ + ∆µβ −
M ∂ jk ∂θ k ∂θ k ∂ jk θ
Õ β=1 k=1
Pα ( j, θ, µ) = cαγ ( j, µ)∂2 sγ ( j, θ, µ); (A4)
= ∆µα + O(∆µ)2 . (A14)
γ=1

MNRAS 000, 1–11 (2017)


Masses of stellar systems 11
Here Pα is a component of an M-dimensional vector
P(j , θ, µ). We choose this to have the form
M Õ
D
Õ ∂sγ (j , θ, µ)
Pα (j , θ, µ) = Q αiγ (j , θ, µ) . (A15)
∂θ i
γ=1 i=1

The condition (A14) becomes


M Õ
D
Õ ∂
Q αiγ (j , θ, µ)Uiγkβ (j , θ, µ) = δαβ (A16)
∂ jk
γ=1 i,k=1

where
∂sγ ∂sβ
 
Uiγkβ ≡ . (A17)
∂θ i ∂θ k θ
Now introduce a multi-index i where each i corresponds
to a 2-tuple (i, γ). Similarly the indices k and m correspond to
the pairs (k, β) and (m, α) respectively. Then equation (A17)
can be rewritten
D
Õ ∂ Õ
Q αi (j , θ, µ)Uik (j , θ, µ) = δαβ . (A18)
∂ jk
k=1 i ≤(D, M)

If we set
D D
? −1
Õ Õ
Q αi (j , θ, µ) ≡ αm ( jm − jm )Umi (j , θ, µ) where αm = 1,
m=1 m=1
(A19)
then the left side of equation (A18) becomes
D D M
Õ ∂ ?
Õ
−1
αm ( jm − jm ) Umi ((j , θ, µ)Uik (j , θ, µ)
∂ jk
k,m=1 i ≤(D, M)
D
Õ ∂ ?
= αm ( jm − jm )δαβ δmk = δαβ (A20)
∂ jk
k,m=1

so condition (A18) is satisfied.

MNRAS 000, 1–11 (2017)

You might also like