Higher Order Derivatives of The Inverse Function

Higher order derivatives of the inverse function
Andrej Liptaj∗
May 7, 2007
Abstract
The general formula for higher order derivatives of the inverse func-
tion is presented. The formula is next used in couple of mathematical
applications: expansion of the inverse function into Taylor series, solving
equations, constructing random numbers with a given distribution from
random numbers with the uniform distribution and predicting the value
of the function from its properties in a single point in the region beyond
the radius of convergence of its Taylor series.
1 Introduction
Every reader probably knows the formula for the derivative of the inverse func-
tion
d 1
g(y)|y=f (x0 ) = d , g = f −1 ,
dy dx f (x)|x=x0
which usually makes part of the basic course of mathematical analysis. One
could ask about analogous formulas for higher order derivatives. Even though
these formulas are simple they are not available. In this article we are going
to write down the general formula for all higher order derivatives of the inverse
function and show a couple of related examples and applications.
2 Higher order derivatives of the inverse function

2.1 Notations
For the purpose of this article it is convenient to introduce new notations. This
is mostly motivated by the fact that higher order derivatives and higher powers
will occur often in the text and thus would make formulas dicult to read. Let
us consider a real function of a real variable f (x) on the interval I . Let us
suppose that f is bijective on I . Let x0 be an internal point of I and g(y) be
the inverse function of f , g(f (x1 )) = g(y1 ) = x1 for any x1 belonging to I . We
dene y0 = f (x0 ). In this article we will use the following notations:
∗ iliºská 1, Bratislava, Slovakia; E-mail: fyzialip@savba.sk
1
Figure 1: f (x) and g(y).
dn
fn = f (x)|x=x0
dxn
dn
gn = g(y)|y=y0
dy n
Described objects are depicted on the Figure 1.
2.2 Formulas
The main result of this paper is the formula for higher order derivatives of the
inverse function. It has a recursive limit form and stands
1
g1 = ,
f1
h ii
∆x − Σn−1 1 n 1
i=1 i! (gi) Σj=1 j! (f j)(∆x)
j
gn = lim n! £ n 1 ¤n . (1)
∆x→0 Σi=1 i! (f i)(∆x)i
This gives us for the rst ve derivatives
1
g1 = ,
f1
f2
g2 = − ,
(f 1)3
3(f 2)2 − (f 1)(f 3)

g3 = , (2)
(f 1)5
15(f 2)3 − 10(f 1)(f 2)(f 3) + (f 1)2 (f 4)

g4 = − ,
(f 1)7
2
g5 =
105(f 2)4 − 105(f 1)(f 2)2 (f 3) + 10(f 1)2 (f 3)2 + 15(f 1)2 (f 2)(f 4) − (f 1)3 (f 5)
= .
(f 1)9
2.3 Examples
For the sake of convenience let us adopt the notation gn = F [f 1, ..., f n]. This
notation means that one uses the formula 1 to calculate gn as a function of
f 1, ..., f n.
2.3.1 Example 1: f (x) = x2 , x0 = 2

Let now calculate g1, g2 and g3 basing us on the formula 1. We have
d
f1 = f (x)|x=2 = 2x|x=2 = 4,
dx
d2
f2 = f (x)|x=2 = 2|x=2 = 2,
dx2
d3
f3 =f (x)|x=2 = 0|x=2 = 0.
dx3
Using now the formulas 2 (which follow from 1) one obtains
1 1
g1 = F [f 1] = = ,
f1 4
f2 2 1
g2 = F [f 1, f 2] = − 3
=− 3 =− ,
(f 1) 4 32
3(f 2)2 − (f 1)(f 3) 3(2)2 − (4)(0) 12 3
g3 = F [f 1, f 2, f 3] = 5
= 5
= = .
(f 1) (4) 1024 256
√
Since we know the corresponding inverse function g(y) = f −1 = y we can
explicitely verify the obtained results. We calculate the derivatives in the point
y0 = f (x0 ) = f (2) = 4.
d 1 1 1
g1 = g(y)|y=4 = √ |y=4 = √ = = F [f 1],
dy 2 y 2 4 4
d2 1 1 1
g2 = 2
g(y)|y=4 = − p |y=4 = − √ = − = F [f 1, f 2],
dy 4 y 3 4 4 3 32
d3 3 3 3
g3 = g(y)|y=4 = p |y=4 = √ = = F [f 1, f 2, f 3].
dy 3 8 y 5 8 4 5 256
We conclude that we obtain the same results either by using the formula 1 or
by explicit derivatives of the inverse function. It may however happen that the
inverse function is not an elementary function. In that cases we can relay only
on the formula 1.
3
2.3.2 Example 2: f (x) = sin(x), x0 = 0
One has
d
f1 = f (x)|x=0 = cos(x)|x=0 = 1,
dx
d2
f2 = f (x)|x=0 = − sin(x)|x=0 = 0,
dx2
d3
f3 = f (x)|x=0 = − cos(x)|x=0 = −1.
dx3
Using the formulas for the derivatives of the inverse function one gets
1 1
g1 = F [f 1] = = = 1,
f1 1
f2 0
g2 = F [f 1, f 2] = − = − 3 = 0,
(f 1)3 1
3(f 2)2 − (f 1)(f 3) 3(0)2 − (1)(−1) 1
g3 = F [f 1, f 2, f 3] = 5
= 5
= = 1.
(f 1) (1) 1
Using the explicit form of the inverse function g(y) = arcsin(y) and calculating
the derivatives in y0 = sin(0) = 0, we obtain once more an agreement
d 1 1
g1 = g(y)|y=0 = p |y=0 = √ = 1 = F [f 1],
dy 1 − y2 1 − 02
d2 y 0
g2 = g(y)|y=0 = q |y=0 = q = 0 = F [f 1, f 2],
dy 2 3 3
(1 − y 2 ) (1 − 02 )
d3 1 3y 2
g3 = g(y)|y=0 = q +q |y=0 = 1 = F [f 1, f 2, f 3].
dy 3 3 5
(1 − y 2 ) (1 − y 2 )
2.3.3 Example 3: f (x) = ln(x), x0 = 1

The last presented example - the natural logarithm - will serve us later on in
the last section. First three derivatives are
d 1
f1 = f (x)|x=1 = |x=1 = 1,
dx x
d2 1
f2 = f (x)|x=1 = − 2 |x=1 = −1,
dx2 x
d3 2
f3 = f (x)|x=1 = 3 |x=1 = 2.
dx3 x
4
The formulas 2 yield
1 1
g1 = F [f 1] = = = 1,
f1 1
f2 (−1)
g2 = F [f 1, f 2] = − = − 3 = 1,
(f 1)3 1
3(f 2)2 − (f 1)(f 3) 3(−1)2 − (1)(2) 1
g3 = F [f 1, f 2, f 3] = 5
= = = 1.
(f 1) (1)5 1
One recognizes immediately the derivatives of the exponential function g(y) =
f −1 = ey in the point y0 = ln(1) = 0. Once more the results agree.
2.4 Few comments

2.4.1 Properties of the formulas
Calculating the explicit form of the formulas for the rst few higher order deriva-
tives of the inverse function (see 2) one can make following observations:
• The formula 1 has recursive and limit form. Observing the formulas 2
one would think about an expression that would not be recursive and that
would not be in the form of the limit. Although some terms are easy to
guess (for example the denominator for each gn is (f 1)2n−1 ), it does not
seem to be straightforward to nd such a formula.
• The formulas for gn = F [f 1, ..., f n] contain only basic arithmetical oper-

ation (addition, subtraction, multiplication and division). Thus one can
- for example - expect them (at least for rst few cases) to be rapidly
calculable on a computer.
• The formulas gn = F [f 1, ..., f n] being simple, they however grow rapidly.

The formula for g10 has already 30 additive terms. We will make comment
concerning this property in the section dedicated to solving equations.
2.4.2 Property of duality

The operation of inversion has the property of duality. By that we understand
that applying the inversion twice to a function leads to the initial function, the
inversion squared gives identity. This property propagates into the formula 1.
Let g be the inverse function of f and h inverse function of g . One actually has
f ≡ h, but let us for a while think of f and h separately. Than one expects
hn = F [g1, ..., gn] = F [F [f 1] , ..., F [f 1, ..., f n]] = f n.
We do not need to prove this equality since it is the consequence of f , g and h

being inverse (we however still need to prove the formula 1). In spite of that it
is interesting to cross-check it. Let focus on the rst three cases:
5
1 1
h1 = = 1 = f 1,
g1 f1
f2
g2 − (f 1)3 −(f 2)(f 1)3
h2 = − 3
=− 1 3 =− = f 2,
(g1) ( f1 ) (f 1)3
2
3(f 2) −(f 1)(f 3)
3(g2)2 − (g1)(g3) 3(− (ff1)
2 2 1
3 ) − ( f 1 )( (f 1)5 )
h3 = 5
= 1 = f 3.
(g1) ( f1 ) 5
This property was explicitely veried on the computer to be valid up to n =

8. Ignoring the knowledge that this property is valid for each n because it is
propagated from the property of the inversion operation, it does not seem to be
straightforward to prove it just on the algebraic basis.
3 Derivation of the higher-order derivatives for-

mula for the inverse function
In this section we derive the general formula 1. We use the same notations as
in the previous section and we start with the derivation of the formula for g1
and g2 . The general case gn is treated afterwards.
3.1 Formulas for g1 and g2

The case of g1 is a well-known formula for the derivative of the inverse function.
We however still derive it in order to show the general scheme which will then
apply for deriving the formulas for higher derivatives. Let us suppose that the
function f can be on some interval around x0 approximated by its Taylor series.
Taking into account only the linear term one has
f (x0 + ∆x) ≈ f (x0 ) + a∆x,
for ∆x small. Doing the same for g
g(y0 + ∆y) ≈ g(y0 ) + b∆y
and using the fact of f and g being inverse (considered true also for their linear
approximations when ∆x is small) one obtains
x0 + ∆x = g[f (x0 + ∆x)]

= g[f (x0 ) + a∆x]
= x0 + ba∆x,
where we used a∆x = ∆y . Comparing the starting and the ending expression
one has immediately b = a1 , which gives the relation between the coecients a
6
and b. The coecients a and b are also the rst derivatives of f and g at x0
and y0 respectively a = f 1 and b = g1. Thus one has
1
g1 = .
f1
The derivation of the formula for g2 is somewhat more complicated, it how-
ever follows the scheme from the previous case. Let make approximations of f
and g up to quadratic members
f (x0 + ∆x) ≈ f (x0 ) + a1 ∆x + a2 ∆x2 ,

g(y0 + ∆y) ≈ g(y0 ) + b1 ∆y + b2 ∆y 2 .
We already know the relation between the rst derivatives so one can write
1
g(y0 + ∆y) ≈ g(y0 ) + ∆y + b2 ∆y 2 .
a1
Now applying the inverse-function property, which in our approximation holds
for ∆x → 0 (and ∆y → 0) one has
x0 + ∆x = g[f (x0 + ∆x)]
= g[f (x0 ) + a1 ∆x + a2 ∆x2 ]
[∆y = a1 ∆x + a2 ∆x2 ]
1
= g[f (x0 )] + a1 (a1 ∆x + a2 ∆x2 ) + b2 (a1 ∆x + a2 ∆x2 )2
= x0 + ∆x + aa12 ∆x2 + b2 (a21 ∆x2 + a22 ∆x4 + 2a1 a2 ∆x3 ).
Comparing the rst and the last expression in the previous equations one arrives
to
a2 ∆x2
b2 = − .
a1 (a21 ∆x2 + a22 ∆x4 + 2a1 a2 ∆x3 )
This expression does depend on ∆x (because unlike in the case of the linear
approximation, the inverse of a quadratic function is not a quadratic function).
However as we stressed before, our calculations are valid only for ∆x → 0
where the power approximations for the functions hold. Taking in addition into
account the relations between polynom coecients and the derivatives b2 = g2 2!
, a2 = f2!2 and a1 = f 1 one ends up with
f2 2
h 2! ∆x i =−
f2
g2 = lim −2! .
∆x→0
(f 1) (f 1)2 ∆x2 + ( f2!2 )2 ∆x4 + 2(f 1) f2!2 ∆x3 (f 1)3
3.2 General case

Following the scheme from the previous section we can expand functions f and
g up to the n-th order
n
X
f (x0 + ∆x) ≈ f (x0 ) + a1 ∆x + a2 ∆x2 + . . . + an ∆xn = f (x0 ) + aj ∆xj ,
j=1
7
n
X
g(y0 + ∆y) ≈ g(y0 ) + b1 ∆y + b2 ∆y 2 + . . . + bn ∆y n = g(y0 ) + bi ∆y i .
i=1
Than using the inverse-function property in the point x0 + ∆x one has for small
∆x
x0 + ∆x = g[f (x0P+ ∆x)]
n
= g[f (x0 ) + j=1 aj ∆xj ]
Pn
[∆y = j=1 aj ∆xj ]
Pn Pn
= x0 + i=1 bi [ j=1 aj ∆xj ]i
Pn−1 Pn Pn
= x0 + i=1 bi [ j=1 aj ∆xj ]i + bn [ j=1 aj ∆xj ]n .
Comparing the starting and the ending expression one gets
Pn−1 Pn
∆x − i=1 bi [ j=1 aj ∆xj ]i
bn = Pn .
[ j=1 aj ∆xj ]n
Taking into account that our calculations are valid only for small ∆x by applying
the limit ∆x → 0 and using the relation between the polynom coecients and
its derivatives aj = fj!j , bi = gi
i! one arrives to the expression
Pn−1 h ii
gi Pn fj j
∆x − i=1 i! j=1 j! ∆x
gn = lim n! Pn
∆x→0 [ j=1 fj!j ∆xj ]n
which is identical to the formula 1. Although the presented derivation of the
formula 1 might need some ne tuning (careful treatment of approximations
and limits) to be regarded as a rigorous mathematical proof, it represents the
essence of such a proof.
4 Inverse function Taylor series, solving equa-

tions
4.1 Inverse function Taylor series
The knowledge of the derivatives of the inverse function enables us to expand
the inverse function into Taylor series. Knowing the initial function f , let in
addition suppose that we also know the value of its inverse function g at some
point y0 , g(y0 ) = x0 . Calculating the numbers g1, g2, g3 . . . from the formula 1
one can express the value of g at some dierent point y1
1
g(y1 ) = g(y0 ) + (g1)(y1 − y0 ) + (g2)(y1 − y0 )2 + . . . (3)
2!
X∞
1
= x0 + (gi)(y1 − y0 )i .
i=1
i!
This naturally holds only on condition that Taylor series for g converge to g at
y1 .
8
Figure 2: Solving equations
4.2 Solving equations

The knowledge of the inverse function (supposing it can be approximated by
its Taylor series) makes us immediately think of solving the real equations of
the real variable. Equation f (xs ) = 0 can be simply solved by applying the
inverse function g on the both sides of the equation xs = g[f (xs )] = g(0). The
situation is actually little bit more complicated because in the general case we
do not know the value of g at some point. Consider the situation depicted on
the Figure 2. We can express g(0) through g(y0 )
X∞
1
xs = g(0) = g(y0 ) + (gi)(−y0 )i .
i=1
i!
To know g(y0 ) one cannot choose y0 arbitrary (otherwise one would directly
choose y0 = 0). The only possibility is to choose arbitrary an x0 in some
interval around xs , where f is bijective. Then using y0 = f (x0 ) one rewrites the
previous equation in the form
X∞
1
xs = g(0) = x0 + (gi)[−f (x0 )]i , (4)
i=1
i!
or once more in the form

X∞
1
xs = g(0) = x0 + F [f 1, . . . , f i][−f (x0 )]i ,
i=1
i!
where we explicitely see, that all information needed to solve the equation
f (xs ) = 0 is contained in the function f itself and the function f is supposed to
be known. As already previously mentioned this conclusion relies on supposition
that the Taylor series for the inverse function g converge to g at y = 0.
9
4.3 Few comments
• Many transcendent equations do not have a solution which could be ex-
pressed as an elementary function of an entire number. Thus the presented
method provides us the most general solution of an equation in the form
of (Taylor) series - one cannot expect better. The presented method has
however two drawbacks: the arbitrary choice of x0 and the fact that the
Taylor series for the inverse function do not need to converge at y = 0.
• Although said that the choice of x0 is arbitrary, it is actually not com-

pletely true. One should try to nd such x0 which would be as close as
possible to the solution value xs . If x0 is close to xs then y0 is close to
0 and hence the series have good chances to converge and to converge
rapidly.
• One can use a recursive approach to solve the equations. Since the length
of the formulas for higher order derivatives of the inverse function grows
rapidly (see the comment in 2.4.1), one might prefer to use the Taylor
polynom for the inverse function with smaller number of terms recursively
than to use it with big number of terms once. If x0 is our starting point,
then one could for example use as an approximation of the inverse function
a Taylor polynom with 10 terms to calculate the approximate value of
the solution xs,approx . One could however also use a polynom with 5
terms and calculate the rst approximative value of the solution xs,approx1 .
Then, in the next iteration, one would choose xs,approx1 as the starting
point, recalculate all the derivatives of the initial and the inverse function,
construct the corresponding inverse function polynom and arrive to more
accurate approximation of the solution xs,approx2 . In this way one would
avoid the long formulas when calculating higher order inverse derivatives.
We will show this approach on the examples which follow.
4.4 Examples
In this section we will demonstrate the previous ideas on a couple of examples.
Only the rst example will be presented in all details and will show as well the
method as the results of the method. The second example will exactly follow
the method of the rst one and thus will focus only on results.
4.4.1 Example 1 - solving equation

Let solve the equation xx = 7. We need to transform it into the form f (x) =
xx − 7 = 0. As explained before, we should have some reasonable guess for the
solution. Let the guess be x0 = 2 [f (x0 ) = y0 = 22 − 7 = −3]. Having our x0 we
dn x
need to calculate the derivatives f n = dx n (x −7)|x=x0 =2 up to the desired order
- we will consider only rst 8 derivatives. Then using the formula 1 we calculate
g1 . . . g8 - the derivatives of the inverse function g(y) at the corresponding point
y0 = −3. The numerical results for f n and gn are summarized in the Table 1.
10
dn x dn y
fn = dxn (x − 7)|x=x0 =2 gn = dy n (y − 7)−1 |y=y0 =−3
f 1 ≈ 6.77258 g1 ≈ 0.14765
f 2 ≈ 13.46698 g2 ≈ −0.04335
f 3 ≈ 28.57418 g3 ≈ 0.02460
f 4 ≈ 64.50134 g4 ≈ −0.02070
f 5 ≈ 151.34073 g5 ≈ 0.02313
f 6 ≈ 370.18129 g6 ≈ −0.03227
f 7 ≈ 929.27934 g7 ≈ 0.05406
f 8 ≈ 2409.09527 g8 ≈ −0.10583
Table 1: Numerical values for f 1 . . . f 8 and g1 . . . g8

dn x dn y
fn = dxn (x − 7)|x=x1 =2.28879... gn = dy n (y − 7)−1 |y=y1 =−0.34729...
f 1 ≈ 12.16106 g1 ≈ 0.08222
f 2 ≈ 25.13703 g2 ≈ −0.01397
f 3 ≈ 55.30717 g3 ≈ 0.00459
f 4 ≈ 128.19501 g4 ≈ −0.00224
Table 2: Numerical values for f 1, . . . f 4 and g1, . . . , g4 at x1 = 2.28879 . . . and

y1 = −0.34729 . . ..
Now we just need to insert the calculated values into the formula 4
1 1
xsolution ≈ 2 + (0.14765)(3) + (−0.04335)(3)2 + . . . + (−0.10583)(3)8
2! 8!
xsolution ≈ 2.309117 . . . .
We cross-check the obtained value 2.309117 . . .2.309117... = 6.90620 . . .. We see
that our approach gives indeed a good approximation of the solution. For more
precise value one should either include higher orders, either have better initial
guess or use an iterative procedure, as we are going to demonstrate it now.
If we proceed in the same way like in the previous paragraph, but we include
only 4 terms in the Taylor polynom we obtain
1 1
x1 ≈ 2 + (0.14765)(3) + (−0.04335)(3)2 + . . . + (−0.02070)(3)4
2! 4!
x1 ≈ 2.288709 . . . .
We take the value x1 = 2.28879 . . . as out starting point and we recalcu-
late at this point the rst four derivatives f 1, . . . , f 4 of f . Subsequently we
calculate the rst four derivatives g1, . . . , g4 of g in the corresponding point
y1 = 2.288709 . . .2.288709... − 7 = −0.34729 . . .. The results are summarized in
the Table 2. Using the formula 4 we get
1
xsolution ≈ 2.288709 + (0.08222)(0.34729 . . .) + . . . + (−0.00224)(0.34729 . . .)4
4!
11
xsolution ≈ 2.31645 . . .
Calculating 2.31645 . . .2.31645... = 6.999999205 . . . we observe that the precision
becomes really good, better than in the previous calculation. From this one can
learn that one does not need to include high orders in order to arrive to a good
approximation of the solution. It seems to be more ecient to use a smaller
number of orders in the iterative way. Because of the theory it is however
interesting to have the general expression for all higher orders, although they
may not be used in practice.
4.4.2 Example 2 - calculating π and e

One can use the previous approach to calculate approximate values of important
mathematical constants like π or e. One just needs to formulate equations that
have these constants for solutions. ¡ ¢ √ ¡ ¢
For π one can think of solving cos x4 = 22 . So one denes f (x) = cos x4 −
√
2
2 and applies the method of the inverse function to solve f (x) = 0. If we choose
x0 = 3 and if we include into our calculations rst 8 derivatives then we obtain
as an approximation of π
π ≈ 3.14159265357636280318084094735 . . . ,
where 10 decimal places are correct. If we use the iterative way including 4
derivatives and iterating just twice (like in the example in 4.4.1) we get
π ≈ 3.14159265358979323846269213732 . . . ,
where 22 decimal digits are correct.

For e one thinks immediately of the equation ln(e) = 1, from which one
denes f (x) = ln(x) − 1 and solves f (x) = 0. Taking x0 = 2 and using the
inverse function method with 8 derivatives we get
e ≈ 2.71828182832191473183292924148 . . . ,
where 9 decimal places are correct. Using the iterative method with 4 derivatives
we obtain
e ≈ 2.71828182845904444095135772192 . . . ,
where 14 decimal digits are correct.
4.4.3 Example 3 - generating random numbers

Higher order derivatives of the inverse function and the expression of the inverse
function in the form of the Taylor series can serve not only for solving equations
but also for other purposes. We provide such an example in this section. Let
us formulate the problem: given a random number generator which provides
uniformly distributed random numbers from the interval h0, 1), we would like
to obtain numbers which are distributed respecting probability density func-
tion ρ(x). Such kind of problem can often appear in practice since computers
12
dn dn
fn = dxn f (x)|x=x0 =0 gn = dy n g(y)|y=y0 =0.5
f 0 = f (x0 = 0) = 0.5 g0 = g(y0 = 0.5) = 0
f 1 ≈ 0.398942 g1 ≈ 2.506628
f2 = 0 g2 = 0
f 3 ≈ −0.398942 g3 ≈ 15.749609
f4 = 0 g4 = 0
f 5 ≈ 1.196826 g5 ≈ 692.704024
f6 = 0 g6 = 0
f 7 ≈ −5.984134 g7 ≈ 78964.749174
f8 = 0 g8 = 0
f 9 ≈ 41.888939 g9 ≈ 17068346.560783
Table 3: Numerical values for f n and gn at x0 = 0 and y0 = 0.5 with f (x) =

Rx 2
√1
2π −∞
exp(− y2 ) dy and g = f −1 .
usually oer uniformly distributed numbers from the interval h0, 1) and it is
up to user/programmer to obtain from them the numbers that are distributed
respecting a given probability density function.
The solution of the problem is simple:
Rx
• construct the distribution function f (x) = −∞ ρ(y) dy .
• construct the function g inverse to f : g[f (x)] = f [g(x)] = x.
• if the numbers x1 , x2 , . . . , xn from the interval h0, 1) respect the uniform

distribution then the numbers y1 = g(x1 ), y2 = g(x2 ), . . . , yn = g(xn )
respect the distribution ρ(x).
The methods presented in the article allow us to apply this solution supposing
that we know the value of the distribution function f at least at one point x0
(this is often the case - for example for symmetric distributions f (x0 = 0) = 12 ).
We simply construct the Taylor series for g using the formula 3. The example
we are going to present will at the end turn out not to be so convincing, it will
however well demonstrate our ideas.
Let our aim be the construction of a random number generator respecting
2
the normal distribution ρ(x) = √12π exp(− x2 ). The corresponding distribution
Rx 2
function f (x) = √12π −∞ exp(− y2 ) dy is not an elementary function. We how-
ever know its value and the value of all its derivatives f n at the point x0 = 0.
Using the formula 1 we calculate the derivatives gn of the inverse function g at
the point y0 = f (x0 = 0) = 12 . Taking into account rst 9 derivatives we get
numerical results which are summarized in the Table 3. Straighforward use of
the formula 3 (we use x for the variable name rather then y ) yields
1
g(x) ≈ 0 + (2.506628)(x − 0.5) + 0 + (15.749609)(x − 0.5)3 + . . .
3!
13
Figure 3: The approximation of the function g , g = f −1 , f (x) =
Rx 2
√1
2π −∞
exp(− y2 ) dy .
1
... + (17068346.560783)(x − 0.5)9 .
9!
The graph of the obtained function is plotted on the Figure 3. One clearly
sees that our random number generator would not produce numbers smaller
than approximately −2 and bigger then approximately 2. Since it should in
principle produce numbers from −∞ to +∞, one cannot be really satised with
the result. The dierence between our generator and a standard one can be
well observed on the Figure 4, where the two distributions with a high statistics
from both generators are compared.
A simple way of healing our generator would be to include higher order
terms in the Taylor polynom approximating the function g . The calculation
of such terms however relies on the calculation of the formulas for gn and the
calculation of these formulas starts to be complicated (=time consuming) with
increasing n ( n > 10) even when computers are used. Nevertheless this could
be done once and then used for ever. One should also check whether the Taylor
series for g converge to g .
Although the presented example does not approximate the normal distribu-
tion random number generator with big accuracy, the ideas presented in this
paragraph provide a very general and powerful method of producing random
numbers with a desired distribution from uniformly distributed random num-
bers.
5 Approximating the function beyond the radius

of convergence of its Taylor series
As the last point we would like to mention an interesting property which can be
explored using techniques presented in the article. This techniques allow us - at
least for some functions - to predict in canonical way the value of the function
14
Figure 4: Plots produced with the high statistics of random numbers.
Dashed line - standard Gaussian; Solid line - our approximation of the Gaussian
f at some point x1 only from the value of f and its derivatives at the point x0 ,
even if x1 is beyond the radius of convergence of the Taylor series! One can very
well demonstrate it on the example from the section 2.3.3 - and we will do so.
Let f (x) = ln(x) and x0 = 1. There is a lot of knowledge about the function
ln(x) but let us in this section allow only the knowledge about its value and
the values of its derivatives at x0 = 1. Using only this knowledge let us try
to predict the value of f (x) at the point x1 = 3. Having the set of numbers
f 0, f 1, f 2, . . . one would naturally think of constructing Taylor series. This idea
would however fail. It is a well known fact that the radius of convergence of
the Taylor series for the function ln(x) at x0 = 1 is 1. The point x1 = 3
is simply too far from 1, the series at this point diverge. There is however
another possibility how to calculate f (3). Let rst calculate the derivatives of
the inverse function g = f −1 = exp(y) at y0 = f (x0 = 1) = 0. The result is
obvious: 1 = g0 = g1 = g2 = . . . (see 2.3.3). Then one just needs to expand
g = exp(y) into Taylor series. It is a well known fact that this series converge
for each y ∈ R! Thus we can draw the graph of the function g and it is just a
technicality to nd the value y1 such that g(y1 ) = x1 = 3. The number y1 is
the number we are looking for. Let us briey summarize in the general way:
• We have the knowledge of the numbers f 0, f 1, f 2, . . . - the value and the
derivatives of the function f at the point x0 .
• We calculate g1, g2, . . . - the derivatives of its inverse function g at the
point y0 = f (x0 ).
• We expand the function g into Taylor series. If it converges on an appro-
priate interval we can calculate the value of f at some point x1 - we may
use for that purpose some iterative procedure.
The iterative procedure in our example with f (x) = ln(x) and g(y) = exp(y)
could look like
15
1. y := 0, step := 1, precision := 0.001
2. if [ g(y) < 3 and step > 0 ] then (y := y + step)
3. if [ g(y) < 3 and step < 0 ] then (step := −0.5 ∗ step and y := y + step)
4. if [ g(y) > 3 and step < 0 ] then ( y := y + step)
5. if [ g(y) > 3 and step > 0 ] then (step := −0.5 ∗ step and y := y + step)
6. if [ |g(y) − 3| < precision ] return y , else goto 2.
Many problems in the theory of mathematics or natural sciences are solved

using power series. In some cases the series do not converge simply because the
parameter in which they are built is too big (for example the series in powers of
strong coupling constant αs in particle physics at low energy ). One may wonder
whether presented approach would be applicable to this kind of problems. It
is however for the moment not clear for which class of functions would such an
approach work.
6 Conclusion
In this article we presented the formula for calculation of higher order derivatives
of the inverse function. We also tried to demonstrate on examples that the
knowledge of the derivatives of the inverse function leads to the interesting
mathematical applications. There are still many questions opened. What is
the relation between the radius of convergence for the Taylor series of the initial
and the inverse function? Is it possible to generalize the presented ideas to more
complex mathematical objects like for example operators? Since many problems
in practice lead to the functional equations such a generalization could be of a
big interest.
16

Higher Order Derivatives of The Inverse Function

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Higher Order Derivatives of The Inverse Function

Uploaded by

Copyright:

Available Formats

Higher order derivatives of the inverse function

2 Higher order derivatives of the inverse function

3(f 2)2 − (f 1)(f 3)

15(f 2)3 − 10(f 1)(f 2)(f 3) + (f 1)2 (f 4)

2.3.1 Example 1: f (x) = x2 , x0 = 2

2.3.3 Example 3: f (x) = ln(x), x0 = 1

2.4 Few comments

• The formulas for gn = F [f 1, ..., f n] contain only basic arithmetical oper-

• The formulas gn = F [f 1, ..., f n] being simple, they however grow rapidly.

2.4.2 Property of duality

hn = F [g1, ..., gn] = F [F [f 1] , ..., F [f 1, ..., f n]] = f n.

We do not need to prove this equality since it is the consequence of f , g and h

This property was explicitely veried on the computer to be valid up to n =

3 Derivation of the higher-order derivatives for-

3.1 Formulas for g1 and g2

f (x0 + ∆x) ≈ f (x0 ) + a∆x,

for ∆x small. Doing the same for g

g(y0 + ∆y) ≈ g(y0 ) + b∆y

x0 + ∆x = g[f (x0 + ∆x)]

f (x0 + ∆x) ≈ f (x0 ) + a1 ∆x + a2 ∆x2 ,

3.2 General case

4 Inverse function Taylor series, solving equa-

4.2 Solving equations

or once more in the form

• Although said that the choice of x0 is arbitrary, it is actually not com-

4.4.1 Example 1 - solving equation

Table 1: Numerical values for f 1 . . . f 8 and g1 . . . g8

Table 2: Numerical values for f 1, . . . f 4 and g1, . . . , g4 at x1 = 2.28879 . . . and

4.4.2 Example 2 - calculating π and e

where 22 decimal digits are correct.

4.4.3 Example 3 - generating random numbers

Table 3: Numerical values for f n and gn at x0 = 0 and y0 = 0.5 with f (x) =

• if the numbers x1 , x2 , . . . , xn from the interval h0, 1) respect the uniform

5 Approximating the function beyond the radius

2. if [ g(y) < 3 and step > 0 ] then (y := y + step)

4. if [ g(y) > 3 and step < 0 ] then ( y := y + step)

6. if [ |g(y) − 3| < precision ] return y , else goto 2.

Many problems in the theory of mathematics or natural sciences are solved

You might also like

This property was explicitely veried on the computer to be valid up to n =