You are on page 1of 5

Metrics and Statistics

Mauricio Esguerra Neira


May 2, 2015

1 Metrics
Let X be a set. A function d : X × X −→ R is said to be a metric for X provided that d
has the following properties:

• d(x, y) ≥ 0 for all x ∈ X and all y ∈ Y .

• d(x, y) = 0 iff x = y for all x ∈ X and all y ∈ Y .

• d(x, y) = d(y, x) for all x ∈ X and all y ∈ Y .

• d(x, y) ≤ d(x, z) + d(z, y) for all x ∈ X, all y ∈ Y , and all z ∈ Z (triangle inequality).

If d is a metric for X, then the pair (X, d) is called a metric space.[1]

1.1 Minkowski’s Metric

N
! k1
X
d(X, Y ) = |xi − yi |k (1)
i=1

k = 1 Manhattan distance.
k = 2 Euclidean distance.

1.2 Root Mean Square Deviation (RMSD) Metric


As implemented in Theseus software by Theobald and Wuttke.
Use of a maximum likelihood definition for the root mean square defition has been imple-
mented in the Theseus software of Theobald-Wuttke.

k=N (2)
s
k
RM SDM L = P−1 (3)

1
P
Where represents the gamma distributed atomic covariance matrix, and M L stands for
maximum likelihood.

Xiang-Yun Lu’s definition.

rP
d2
RM SDLS = (4)
N
Where d is the difference matrix, and LS stands for Least Square.

General definition for two vectors.

v
uN
uP
u |xi − yi |2
t i=1
RM SD(X, Y ) = (5)
N

Where N denotes the total number of atoms.

General Definition from Error.


RM SD = M SE (6)
M SE = E((θ̂ − θ)2 ) (7)
n
(x1,i − x2,i )2
P

E((θ̂ − θ)2 ) = i=1 (8)


n

1.3 Root Mean Square Fluctuation


v
u
u1 X T
RM SF(ν) = t (νt − ν̄)2 (9)
T t=1

Where T stands for the number of “frames”. That is, the square root of the sum of the
squared difference to the mean coordinate value. So, for, say, hundred frames, the sum of
the difference of each frame to the average in a residue or atom.

1.4 Maximum Distance Metric

d(X, Y ) = max|xi − yi | (10)

where the distance between vectors X and Y is the maximum difference between vector
variables.

2
2 Statistics
2.1 Rossetta Stone
It is often the case that we use more than one term to refer to the same concept in statistics,
therefore the following is a Rossetta stone of sorts to try and avoid confusion and make the
right parallel across disciplines.

deviation ∼ fluctuation
average squared deviation = variance ∼ mean square fluctuation

Notice that deviation and variance can be defined as metric functions.

- deviation
n
X
(xi − x̄) (11)
i=1

- variance = mean square deviation


n
2 1X
σ = (xi − x̄)2 (12)
n i=1

- standard deviation = root mean square deviation


v
u n
u1 X
σ=t (xi − x̄)2 (13)
n i=1

- covariance = correlation moment = mixed second moment


Covariance is a concept that is extrapolated to many uses, to mention a few, fluctuation
analysis, principal component analysis, normal mode analysis. It is important to mention
here that the largest eigenvalue of a covariance matrix is a topic of importance in mathematics
related to the Perron-Fröbenius theorem (see biblio/bejan2005.pdf).
N
1 X
Cov(X, Y ) = (xi − x̄)(yi − ȳ) (14)
N i=1

When you have one variable, or one dimension, you talk about variance since Cov(X, X) =
V arX, but when considering two variables you use covariance.

3
- correlation
n   
P xi −x̄ yi −ȳ
sx sy
i=1
r= (15)
n−1
Cov(X1 , X2 )
r(X1 , X2 ) = (16)
σ1 σ2

2.2 The Normal or Gaussian Distribution


In general a Gaussian function is considered to be of the form:
2
f (x) = e−x (17)
(x−b)2

f (x) = ae 2c2 (18)

When a = c√12π the integral of equation 18 is equal to 1 and in that case the gaussian function
is referred to specifically as the normal probability density function. c is then referred to as
the standard deviation σ and b as the µ distribution mean.

3 Physics
In physics one of the most usual and useful approaches to understand fluctuations is the one
given by the solution of the one-dimensional diffusion equation, a homogenous differential
equation.

∂f ∂ 2f
=D 2 (19)
∂t ∂x
1 exp(−x2 /4Dt)
f (x, t) = p √ (20)
(4πD) t

If the above solution in eq. 17 is interpreted as a propability function, then the mean square
distance traveled by a particle in time t is[3]:
Z∞
2
< x >= 2 f (x, t)x2 dx (21)
0

References
[1] Crump W. Baker, Introduction to Topology. Wm. C. Brown Publishers 1991.

[2] Ken A. Dill, Molecular Driving Forces: Statistical Thermodynamics in Chemistry & Bi-
ology Garland Science Pages 59,230-232 2002

[3] D. K. MacDonald, Spontaneous Fluctuations Reports on Progress in Physics, 12, 56 1949

4
[4] Andrei D. Polyanin and Alexei I. Chernoutsan A Concise Handbook of Mathematics,
Physics, and Engineering Sciences. CRC Press 2011

[5] Peter Schuffler, Analysing MD Trajectories Online website, 00, 00 2010

You might also like