Professional Documents
Culture Documents
Andrej Liptaj∗
May 7, 2007
Abstract
The general formula for higher order derivatives of the inverse func-
tion is presented. The formula is next used in couple of mathematical
applications: expansion of the inverse function into Taylor series, solving
equations, constructing random numbers with a given distribution from
random numbers with the uniform distribution and predicting the value
of the function from its properties in a single point in the region beyond
the radius of convergence of its Taylor series.
1 Introduction
Every reader probably knows the formula for the derivative of the inverse func-
tion
d 1
g(y)|y=f (x0 ) = d , g = f −1 ,
dy dx f (x)|x=x0
which usually makes part of the basic course of mathematical analysis. One
could ask about analogous formulas for higher order derivatives. Even though
these formulas are simple they are not available. In this article we are going
to write down the general formula for all higher order derivatives of the inverse
function and show a couple of related examples and applications.
1
Figure 1: f (x) and g(y).
dn
fn = f (x)|x=x0
dxn
dn
gn = g(y)|y=y0
dy n
Described objects are depicted on the Figure 1.
2.2 Formulas
The main result of this paper is the formula for higher order derivatives of the
inverse function. It has a recursive limit form and stands
1
g1 = ,
f1
h ii
∆x − Σn−1 1 n 1
i=1 i! (gi) Σj=1 j! (f j)(∆x)
j
gn = lim n! £ n 1 ¤n . (1)
∆x→0 Σi=1 i! (f i)(∆x)i
This gives us for the rst ve derivatives
1
g1 = ,
f1
f2
g2 = − ,
(f 1)3
2
g5 =
105(f 2)4 − 105(f 1)(f 2)2 (f 3) + 10(f 1)2 (f 3)2 + 15(f 1)2 (f 2)(f 4) − (f 1)3 (f 5)
= .
(f 1)9
2.3 Examples
For the sake of convenience let us adopt the notation gn = F [f 1, ..., f n]. This
notation means that one uses the formula 1 to calculate gn as a function of
f 1, ..., f n.
d3 3 3 3
g3 = g(y)|y=4 = p |y=4 = √ = = F [f 1, f 2, f 3].
dy 3 8 y 5 8 4 5 256
We conclude that we obtain the same results either by using the formula 1 or
by explicit derivatives of the inverse function. It may however happen that the
inverse function is not an elementary function. In that cases we can relay only
on the formula 1.
3
2.3.2 Example 2: f (x) = sin(x), x0 = 0
One has
d
f1 = f (x)|x=0 = cos(x)|x=0 = 1,
dx
d2
f2 = f (x)|x=0 = − sin(x)|x=0 = 0,
dx2
d3
f3 = f (x)|x=0 = − cos(x)|x=0 = −1.
dx3
Using the formulas for the derivatives of the inverse function one gets
1 1
g1 = F [f 1] = = = 1,
f1 1
f2 0
g2 = F [f 1, f 2] = − = − 3 = 0,
(f 1)3 1
3(f 2)2 − (f 1)(f 3) 3(0)2 − (1)(−1) 1
g3 = F [f 1, f 2, f 3] = 5
= 5
= = 1.
(f 1) (1) 1
Using the explicit form of the inverse function g(y) = arcsin(y) and calculating
the derivatives in y0 = sin(0) = 0, we obtain once more an agreement
d 1 1
g1 = g(y)|y=0 = p |y=0 = √ = 1 = F [f 1],
dy 1 − y2 1 − 02
d2 y 0
g2 = g(y)|y=0 = q |y=0 = q = 0 = F [f 1, f 2],
dy 2 3 3
(1 − y 2 ) (1 − 02 )
d3 1 3y 2
g3 = g(y)|y=0 = q +q |y=0 = 1 = F [f 1, f 2, f 3].
dy 3 3 5
(1 − y 2 ) (1 − y 2 )
d2 1
f2 = f (x)|x=1 = − 2 |x=1 = −1,
dx2 x
d3 2
f3 = f (x)|x=1 = 3 |x=1 = 2.
dx3 x
4
The formulas 2 yield
1 1
g1 = F [f 1] = = = 1,
f1 1
f2 (−1)
g2 = F [f 1, f 2] = − = − 3 = 1,
(f 1)3 1
3(f 2)2 − (f 1)(f 3) 3(−1)2 − (1)(2) 1
g3 = F [f 1, f 2, f 3] = 5
= = = 1.
(f 1) (1)5 1
One recognizes immediately the derivatives of the exponential function g(y) =
f −1 = ey in the point y0 = ln(1) = 0. Once more the results agree.
• The formula 1 has recursive and limit form. Observing the formulas 2
one would think about an expression that would not be recursive and that
would not be in the form of the limit. Although some terms are easy to
guess (for example the denominator for each gn is (f 1)2n−1 ), it does not
seem to be straightforward to nd such a formula.
5
1 1
h1 = = 1 = f 1,
g1 f1
f2
g2 − (f 1)3 −(f 2)(f 1)3
h2 = − 3
=− 1 3 =− = f 2,
(g1) ( f1 ) (f 1)3
2
3(f 2) −(f 1)(f 3)
3(g2)2 − (g1)(g3) 3(− (ff1)
2 2 1
3 ) − ( f 1 )( (f 1)5 )
h3 = 5
= 1 = f 3.
(g1) ( f1 ) 5
and using the fact of f and g being inverse (considered true also for their linear
approximations when ∆x is small) one obtains
where we used a∆x = ∆y . Comparing the starting and the ending expression
one has immediately b = a1 , which gives the relation between the coecients a
6
and b. The coecients a and b are also the rst derivatives of f and g at x0
and y0 respectively a = f 1 and b = g1. Thus one has
1
g1 = .
f1
The derivation of the formula for g2 is somewhat more complicated, it how-
ever follows the scheme from the previous case. Let make approximations of f
and g up to quadratic members
Comparing the rst and the last expression in the previous equations one arrives
to
a2 ∆x2
b2 = − .
a1 (a21 ∆x2 + a22 ∆x4 + 2a1 a2 ∆x3 )
This expression does depend on ∆x (because unlike in the case of the linear
approximation, the inverse of a quadratic function is not a quadratic function).
However as we stressed before, our calculations are valid only for ∆x → 0
where the power approximations for the functions hold. Taking in addition into
account the relations between polynom coecients and the derivatives b2 = g2 2!
, a2 = f2!2 and a1 = f 1 one ends up with
f2 2
h 2! ∆x i =−
f2
g2 = lim −2! .
∆x→0
(f 1) (f 1)2 ∆x2 + ( f2!2 )2 ∆x4 + 2(f 1) f2!2 ∆x3 (f 1)3
7
n
X
g(y0 + ∆y) ≈ g(y0 ) + b1 ∆y + b2 ∆y 2 + . . . + bn ∆y n = g(y0 ) + bi ∆y i .
i=1
Than using the inverse-function property in the point x0 + ∆x one has for small
∆x
x0 + ∆x = g[f (x0P+ ∆x)]
n
= g[f (x0 ) + j=1 aj ∆xj ]
Pn
[∆y = j=1 aj ∆xj ]
Pn Pn
= x0 + i=1 bi [ j=1 aj ∆xj ]i
Pn−1 Pn Pn
= x0 + i=1 bi [ j=1 aj ∆xj ]i + bn [ j=1 aj ∆xj ]n .
Comparing the starting and the ending expression one gets
Pn−1 Pn
∆x − i=1 bi [ j=1 aj ∆xj ]i
bn = Pn .
[ j=1 aj ∆xj ]n
Taking into account that our calculations are valid only for small ∆x by applying
the limit ∆x → 0 and using the relation between the polynom coecients and
its derivatives aj = fj!j , bi = gi
i! one arrives to the expression
Pn−1 h ii
gi Pn fj j
∆x − i=1 i! j=1 j! ∆x
gn = lim n! Pn
∆x→0 [ j=1 fj!j ∆xj ]n
which is identical to the formula 1. Although the presented derivation of the
formula 1 might need some ne tuning (careful treatment of approximations
and limits) to be regarded as a rigorous mathematical proof, it represents the
essence of such a proof.
8
Figure 2: Solving equations
To know g(y0 ) one cannot choose y0 arbitrary (otherwise one would directly
choose y0 = 0). The only possibility is to choose arbitrary an x0 in some
interval around xs , where f is bijective. Then using y0 = f (x0 ) one rewrites the
previous equation in the form
X∞
1
xs = g(0) = x0 + (gi)[−f (x0 )]i , (4)
i=1
i!
where we explicitely see, that all information needed to solve the equation
f (xs ) = 0 is contained in the function f itself and the function f is supposed to
be known. As already previously mentioned this conclusion relies on supposition
that the Taylor series for the inverse function g converge to g at y = 0.
9
4.3 Few comments
• Many transcendent equations do not have a solution which could be ex-
pressed as an elementary function of an entire number. Thus the presented
method provides us the most general solution of an equation in the form
of (Taylor) series - one cannot expect better. The presented method has
however two drawbacks: the arbitrary choice of x0 and the fact that the
Taylor series for the inverse function do not need to converge at y = 0.
• One can use a recursive approach to solve the equations. Since the length
of the formulas for higher order derivatives of the inverse function grows
rapidly (see the comment in 2.4.1), one might prefer to use the Taylor
polynom for the inverse function with smaller number of terms recursively
than to use it with big number of terms once. If x0 is our starting point,
then one could for example use as an approximation of the inverse function
a Taylor polynom with 10 terms to calculate the approximate value of
the solution xs,approx . One could however also use a polynom with 5
terms and calculate the rst approximative value of the solution xs,approx1 .
Then, in the next iteration, one would choose xs,approx1 as the starting
point, recalculate all the derivatives of the initial and the inverse function,
construct the corresponding inverse function polynom and arrive to more
accurate approximation of the solution xs,approx2 . In this way one would
avoid the long formulas when calculating higher order inverse derivatives.
We will show this approach on the examples which follow.
4.4 Examples
In this section we will demonstrate the previous ideas on a couple of examples.
Only the rst example will be presented in all details and will show as well the
method as the results of the method. The second example will exactly follow
the method of the rst one and thus will focus only on results.
- we will consider only rst 8 derivatives. Then using the formula 1 we calculate
g1 . . . g8 - the derivatives of the inverse function g(y) at the corresponding point
y0 = −3. The numerical results for f n and gn are summarized in the Table 1.
10
dn x dn y
fn = dxn (x − 7)|x=x0 =2 gn = dy n (y − 7)−1 |y=y0 =−3
f 1 ≈ 6.77258 g1 ≈ 0.14765
f 2 ≈ 13.46698 g2 ≈ −0.04335
f 3 ≈ 28.57418 g3 ≈ 0.02460
f 4 ≈ 64.50134 g4 ≈ −0.02070
f 5 ≈ 151.34073 g5 ≈ 0.02313
f 6 ≈ 370.18129 g6 ≈ −0.03227
f 7 ≈ 929.27934 g7 ≈ 0.05406
f 8 ≈ 2409.09527 g8 ≈ −0.10583
Now we just need to insert the calculated values into the formula 4
1 1
xsolution ≈ 2 + (0.14765)(3) + (−0.04335)(3)2 + . . . + (−0.10583)(3)8
2! 8!
xsolution ≈ 2.309117 . . . .
We cross-check the obtained value 2.309117 . . .2.309117... = 6.90620 . . .. We see
that our approach gives indeed a good approximation of the solution. For more
precise value one should either include higher orders, either have better initial
guess or use an iterative procedure, as we are going to demonstrate it now.
If we proceed in the same way like in the previous paragraph, but we include
only 4 terms in the Taylor polynom we obtain
1 1
x1 ≈ 2 + (0.14765)(3) + (−0.04335)(3)2 + . . . + (−0.02070)(3)4
2! 4!
x1 ≈ 2.288709 . . . .
We take the value x1 = 2.28879 . . . as out starting point and we recalcu-
late at this point the rst four derivatives f 1, . . . , f 4 of f . Subsequently we
calculate the rst four derivatives g1, . . . , g4 of g in the corresponding point
y1 = 2.288709 . . .2.288709... − 7 = −0.34729 . . .. The results are summarized in
the Table 2. Using the formula 4 we get
1
xsolution ≈ 2.288709 + (0.08222)(0.34729 . . .) + . . . + (−0.00224)(0.34729 . . .)4
4!
11
xsolution ≈ 2.31645 . . .
Calculating 2.31645 . . .2.31645... = 6.999999205 . . . we observe that the precision
becomes really good, better than in the previous calculation. From this one can
learn that one does not need to include high orders in order to arrive to a good
approximation of the solution. It seems to be more ecient to use a smaller
number of orders in the iterative way. Because of the theory it is however
interesting to have the general expression for all higher orders, although they
may not be used in practice.
π ≈ 3.14159265357636280318084094735 . . . ,
where 10 decimal places are correct. If we use the iterative way including 4
derivatives and iterating just twice (like in the example in 4.4.1) we get
π ≈ 3.14159265358979323846269213732 . . . ,
e ≈ 2.71828182832191473183292924148 . . . ,
where 9 decimal places are correct. Using the iterative method with 4 derivatives
we obtain
e ≈ 2.71828182845904444095135772192 . . . ,
where 14 decimal digits are correct.
12
dn dn
fn = dxn f (x)|x=x0 =0 gn = dy n g(y)|y=y0 =0.5
f 0 = f (x0 = 0) = 0.5 g0 = g(y0 = 0.5) = 0
f 1 ≈ 0.398942 g1 ≈ 2.506628
f2 = 0 g2 = 0
f 3 ≈ −0.398942 g3 ≈ 15.749609
f4 = 0 g4 = 0
f 5 ≈ 1.196826 g5 ≈ 692.704024
f6 = 0 g6 = 0
f 7 ≈ −5.984134 g7 ≈ 78964.749174
f8 = 0 g8 = 0
f 9 ≈ 41.888939 g9 ≈ 17068346.560783
usually oer uniformly distributed numbers from the interval h0, 1) and it is
up to user/programmer to obtain from them the numbers that are distributed
respecting a given probability density function.
The solution of the problem is simple:
Rx
• construct the distribution function f (x) = −∞ ρ(y) dy .
• construct the function g inverse to f : g[f (x)] = f [g(x)] = x.
The methods presented in the article allow us to apply this solution supposing
that we know the value of the distribution function f at least at one point x0
(this is often the case - for example for symmetric distributions f (x0 = 0) = 12 ).
We simply construct the Taylor series for g using the formula 3. The example
we are going to present will at the end turn out not to be so convincing, it will
however well demonstrate our ideas.
Let our aim be the construction of a random number generator respecting
2
the normal distribution ρ(x) = √12π exp(− x2 ). The corresponding distribution
Rx 2
function f (x) = √12π −∞ exp(− y2 ) dy is not an elementary function. We how-
ever know its value and the value of all its derivatives f n at the point x0 = 0.
Using the formula 1 we calculate the derivatives gn of the inverse function g at
the point y0 = f (x0 = 0) = 12 . Taking into account rst 9 derivatives we get
numerical results which are summarized in the Table 3. Straighforward use of
the formula 3 (we use x for the variable name rather then y ) yields
1
g(x) ≈ 0 + (2.506628)(x − 0.5) + 0 + (15.749609)(x − 0.5)3 + . . .
3!
13
Figure 3: The approximation of the function g , g = f −1 , f (x) =
Rx 2
√1
2π −∞
exp(− y2 ) dy .
1
... + (17068346.560783)(x − 0.5)9 .
9!
The graph of the obtained function is plotted on the Figure 3. One clearly
sees that our random number generator would not produce numbers smaller
than approximately −2 and bigger then approximately 2. Since it should in
principle produce numbers from −∞ to +∞, one cannot be really satised with
the result. The dierence between our generator and a standard one can be
well observed on the Figure 4, where the two distributions with a high statistics
from both generators are compared.
A simple way of healing our generator would be to include higher order
terms in the Taylor polynom approximating the function g . The calculation
of such terms however relies on the calculation of the formulas for gn and the
calculation of these formulas starts to be complicated (=time consuming) with
increasing n ( n > 10) even when computers are used. Nevertheless this could
be done once and then used for ever. One should also check whether the Taylor
series for g converge to g .
Although the presented example does not approximate the normal distribu-
tion random number generator with big accuracy, the ideas presented in this
paragraph provide a very general and powerful method of producing random
numbers with a desired distribution from uniformly distributed random num-
bers.
14
Figure 4: Plots produced with the high statistics of random numbers.
Dashed line - standard Gaussian; Solid line - our approximation of the Gaussian
f at some point x1 only from the value of f and its derivatives at the point x0 ,
even if x1 is beyond the radius of convergence of the Taylor series! One can very
well demonstrate it on the example from the section 2.3.3 - and we will do so.
Let f (x) = ln(x) and x0 = 1. There is a lot of knowledge about the function
ln(x) but let us in this section allow only the knowledge about its value and
the values of its derivatives at x0 = 1. Using only this knowledge let us try
to predict the value of f (x) at the point x1 = 3. Having the set of numbers
f 0, f 1, f 2, . . . one would naturally think of constructing Taylor series. This idea
would however fail. It is a well known fact that the radius of convergence of
the Taylor series for the function ln(x) at x0 = 1 is 1. The point x1 = 3
is simply too far from 1, the series at this point diverge. There is however
another possibility how to calculate f (3). Let rst calculate the derivatives of
the inverse function g = f −1 = exp(y) at y0 = f (x0 = 1) = 0. The result is
obvious: 1 = g0 = g1 = g2 = . . . (see 2.3.3). Then one just needs to expand
g = exp(y) into Taylor series. It is a well known fact that this series converge
for each y ∈ R! Thus we can draw the graph of the function g and it is just a
technicality to nd the value y1 such that g(y1 ) = x1 = 3. The number y1 is
the number we are looking for. Let us briey summarize in the general way:
• We have the knowledge of the numbers f 0, f 1, f 2, . . . - the value and the
derivatives of the function f at the point x0 .
• We calculate g1, g2, . . . - the derivatives of its inverse function g at the
point y0 = f (x0 ).
• We expand the function g into Taylor series. If it converges on an appro-
priate interval we can calculate the value of f at some point x1 - we may
use for that purpose some iterative procedure.
The iterative procedure in our example with f (x) = ln(x) and g(y) = exp(y)
could look like
15
1. y := 0, step := 1, precision := 0.001
3. if [ g(y) < 3 and step < 0 ] then (step := −0.5 ∗ step and y := y + step)
5. if [ g(y) > 3 and step > 0 ] then (step := −0.5 ∗ step and y := y + step)
6 Conclusion
In this article we presented the formula for calculation of higher order derivatives
of the inverse function. We also tried to demonstrate on examples that the
knowledge of the derivatives of the inverse function leads to the interesting
mathematical applications. There are still many questions opened. What is
the relation between the radius of convergence for the Taylor series of the initial
and the inverse function? Is it possible to generalize the presented ideas to more
complex mathematical objects like for example operators? Since many problems
in practice lead to the functional equations such a generalization could be of a
big interest.
16