An overview of Roots

The Roots package contains simple routines for finding zeros of continuous scalar functions of a single real variable. A zero of $f$ is a value $c$ where $f(c) = 0$. The basic interface is through the function find_zero, which through multiple dispatch can handle many different cases.

The NonlinearSolve package provides an alternative.

In the following, we will use ForwardDiff to take derivatives.

julia> using Roots, ForwardDiff

Basic usage

Consider the polynomial function $f(x) = x^5 - x + 1/2$. As a polynomial, its roots, or zeros, could be identified with the roots function of the Polynomials package. However, even that function uses a numeric method to identify the values, as no solution with radicals is available. That is, even for polynomials, non-linear root finders are needed to solve $f(x)=0$. (Though polynomial root-finders can exploit certain properties not available for general non-linear functions.)

The Roots package provides a variety of algorithms for this task. In this quick overview, only the default ones are illustrated.

For the function $f(x) = x^5 - x + 1/2$ a simple plot over $[-2,2]$will show a zero somewhere between $-1.5$ and $-0.5$ and two zeros near $0.6$. ("Between", as the continuous function has different signs at $-1.5$ and $-0.5$.)

For the zero between two values at which the function changes sign, a bracketing method is useful, as bracketing methods are guaranteed to converge for continuous functions by the intermediate value theorem. A bracketing algorithm will be used when the initial data is passed as a tuple:

julia> using Roots

julia> f(x) =  x^5 - x + 1/2
f (generic function with 1 method)

julia> find_zero(f, (-3/2,  -1/2)) ≈ -1.0983313019186336
true

The default algorithm is guaranteed to have an answer nearly as accurate as is possible given the limitations of floating point computations.

For the zeros near a point, a non-bracketing method is often used, as generally the algorithms are more efficient and can be used in cases where a zero does not cross the $x$ axis. Passing just an initial guess will dispatch to such a method:

julia> find_zero(f,  0.6) ≈ 0.550606579334135
true

This finds the answer to the left of the starting point. To get the other nearby zero, a starting point closer to the answer can be used.

However, an initial graph might convince one that any of the up-to-$5$ real roots will occur between $-2$ and $2$. The find_zeros function uses heuristics and a few of the algorithms to identify all zeros between the specified range. Here the method successfully identifies all $3$:

julia> find_zeros(f, -2,  2)
3-element Vector{Float64}:
 -1.0983313019186334
  0.5506065793341349
  0.7690997031778959

This shows the two main entry points of Roots: find_zero to locate a zero between or near values using one of many methods and find_zeros to heuristically identify all zeros within some interval.

Bracketing methods

For a function $f$ (univariate, real-valued) a bracket is a pair $ a < b $ for which $f(a) \cdot f(b) < 0$. That is the function values have different signs at $a$ and $b$. If $f$ is a continuous function this ensures (Bolzano) there will be a zero in the interval $[a,b]$. If $f$ is not continuous, then there must be a point $c$ in $[a,b]$ where the function "jumps" over $0$.

Such values can be found, up to floating point round off. That is, given f(a) * f(b) < 0, a value c with a < c < b can be found where either f(c) == 0.0 or f(prevfloat(c)) * f(c) < 0 or f(c) * f(nextfloat(c)) < 0.

To illustrate, consider the function $f(x) = \cos(x) - x$. From trigonometry we can see readily that $[0,\pi/2]$ is a bracket.

The Roots package includes the bisection algorithm through find_zero. We use a structure for which extrema returns (a,b) with a < b, such as a vector or tuple, to specify the initial condition and Bisection() to specify the algorithm:

julia> f(x) = cos(x) - x;

julia> x = find_zero(f, (0, pi/2), Bisection())
0.7390851332151607

julia> x, f(x)
(0.7390851332151607, 0.0)

For this function we see that f(x) is 0.0.

Functions may be parameterized. The following is a similar function as above, still having $(0, \pi/2)$ as a bracket when $p>0$. By passing in values of p to find_zero, different, related problems may be solved.

julia> g(x, p=1) = cos(x) - x/p;

julia> x0, M = (0, pi/2), Bisection()
((0, 1.5707963267948966), Bisection())

julia> find_zero(g, x0, M) # as before, solve cos(x) - x = 0 using default p=1
0.7390851332151607

julia> find_zero(g, x0, M; p=2) # solves cos(x) - x/2 = 0
1.0298665293222589

julia> find_zero(g, x0, M, 2) # positional argument; useful with broadcasting
1.0298665293222589

Next consider $f(x) = \sin(x)$. A known zero is $\pi$. Trigonometry tells us that $[\pi/2, 3\pi/2]$ will be a bracket. The calling pattern for find_zero is find_zero(f, x0, M; kwargs...), where kwargs can specify details about parameters for the problem or tolerances for the solver. In this call Bisection() is not specified, as it will be the default (as the initial value is not specified as a number is over Float64 values:

julia> f(x) = sin(x);

julia> x = find_zero(f, (pi/2, 3pi/2))
3.141592653589793

julia> x, f(x)
(3.141592653589793, 1.2246467991473532e-16)

This value of x does not exactly produce a zero, however, it is as close as can be:

julia> f(prevfloat(x)) * f(x) < 0.0 || f(x) * f(nextfloat(x)) < 0.0
true

That is, at x the function is changing sign.

From a mathematical perspective, a zero is guaranteed for a continuous function. However, the computer algorithm doesn't assume continuity, it just looks for changes of sign. As such, the algorithm will identify discontinuities, not just zeros. For example:

julia> find_zero(x -> 1/x, (-1, 1))
0.0

The endpoints and function values can even be infinite for the default Bisection algorithm over the standard floating point types:

julia> find_zero(x -> Inf*sign(x), (-Inf, Inf))  # Float64 only
0.0

The basic algorithm used for bracketing when the values are simple floating point values is a modification of the bisection method, where the midpoint is taken over the bit representation of a and b.

For big float values, bisection is the default (with non-zero tolerances), but its use is definitely not suggested. Simple bisection over BigFloat values can take many more iterations. For the problem of finding a zero of sin in the interval (big(3), big(4)), the default bisection takes $252$ iterations, whereas the A42 method takes $4$.

The algorithms of Alefeld, Potra, and Shi and the well known algorithm of Brent, also start with a bracketing algorithm. For many problems these will take far fewer steps than the bisection algorithm to reach convergence. These may be called directly. For example,

julia> find_zero(sin, (3,4), A42())
3.141592653589793

By default, bisection will converge to machine tolerance. This may provide more accuracy than desired. A tolerance may be specified to terminate early, thereby utilizing fewer resources. For example, the following $4$ steps to reach accuracy to $1/16$ (without specifying xatol it uses $53$ steps):

julia> rt = find_zero(sin, (3.0, 4.0), xatol=1/16)
3.125

julia> rt - pi
-0.016592653589793116

Non-bracketing methods

Bracketing methods have guaranteed convergence, but in general may require many more function calls than are otherwise needed to produce an answer and not all zeros of a function may be bracketed. If a good initial guess is known, then the find_zero function provides an interface to some different iterative algorithms that are more efficient. Unlike bracketing methods, these algorithms may not converge to the desired root if the initial guess is not well chosen.

The default algorithm is modeled after an algorithm used for HP-34 calculators. This algorithm is designed to be more forgiving of the quality of the initial guess at the cost of possibly performing more steps than other algorithms, as if the algorithm encounters a bracket, a bracketing method will be used (an efficient one, though).

For example, the answer to our initial problem is visibly seen from a graph to be near 1. Given this, the zero is found through:

julia> f(x) = cos(x) - x;

julia> x = find_zero(f , 1)
0.7390851332151607

julia> x, f(x)
(0.7390851332151607, 0.0)

For the polynomial $f(x) = x^3 - 2x - 5$, an initial guess of $2$ seems reasonable:

julia> f(x) = x^3 - 2x - 5;

julia> x = find_zero(f, 2)
2.0945514815423265

julia> f(x), sign(f(prevfloat(x)) * f(x)), sign(f(x) * f(nextfloat(x)))
(-8.881784197001252e-16, 1.0, -1.0)

For even more precision, BigFloat numbers can be used

julia> x = find_zero(sin, big(3))
3.141592653589793238462643383279502884197169399375105820974944592307816406286198

julia> x, sin(x), x - pi
(3.141592653589793238462643383279502884197169399375105820974944592307816406286198, 1.096917440979352076742130626395698021050758236508687951179005716992142688513354e-77, 0.0)

Higher-order methods

The default call to find_zero uses a first order method and then possibly bracketing, which potentially involves more function calls than necessary. There may be times where a more efficient algorithm is sought. For such, a higher-order method might be better suited. There are algorithms Order1 (secant method), Order2 (Steffensen), Order5, Order8, and Order16. The order $1$ or $2$ methods are generally quite efficient in terms of steps needed over floating point values. The even-higher-order ones are potentially useful when more precision is used. These algorithms are accessed by specifying the method after the initial starting point:

julia> f(x) = 2x - exp(-x);

julia> x = find_zero(f, 1, Order1())
0.3517337112491958

julia> x, f(x)
(0.3517337112491958, -1.1102230246251565e-16)

Similarly,

julia> f(x) = (x + 3) * (x - 1)^2;

julia> x = find_zero(f, -2, Order2())
-3.0

julia> x, f(x)
(-3.0, 0.0)
julia> x = find_zero(f, 2, Order8())
1.0000000131073141

julia> x, f(x)
(1.0000000131073141, 6.87206736323862e-16)

Starting at $2$ the algorithm converges towards $1$, showing that zeros need not be simple zeros to be found. A simple zero, $c,$ has $f(x) = (x-c) \cdot g(x)$ where $g(c) \neq 0$. Generally speaking, non-simple zeros are expected to take many more function calls, as the methods are no longer super-linear. This is the case here, where Order2 uses $51$ function calls, Order8 uses $42$, and Order0 takes $80$. The Roots.Order2B method is useful when a multiplicity is expected; on this problem it takes $17$ function calls.

To investigate an algorithm and its convergence, the argument verbose=true may be specified. A Roots.Tracks object can be used to store the intermediate values.

For some functions, adjusting the default tolerances may be necessary to achieve convergence. The tolerances include atol and rtol, which are used to check if $f(x_n) \approx 0$; xatol and xrtol, to check if $x_n \approx x_{n-1}$; and maxiters to limit the number of iterations in the algorithm.

Classical methods

The package provides some classical methods for root finding, such as Roots.Newton, Roots.Halley, and Roots.Schroder. (Currently these are not exported, so must be prefixed with the package name to be used.) We can see how each works on a problem studied by Newton. Newton's method uses the function and its derivative:

julia> f(x) = x^3 - 2x - 5;

julia> fp(x) = 3x^2 - 2;

julia> x = Roots.find_zero((f, fp), 2, Roots.Newton())
2.0945514815423265

julia> x, f(x)
(2.0945514815423265, -8.881784197001252e-16)

The functions are specified using a tuple, or through a function returning (f(x), f(x)/f'(x)). The latter is convenient when f' is easily computed when f is, but otherwise may be expensive to compute.

Halley's method has cubic convergence, as compared to Newton's quadratic convergence. It uses the second derivative as well:

julia> fpp(x) = 6x;

julia> x = Roots.find_zero((f, fp, fpp), 2, Roots.Halley())
2.0945514815423265

julia> x, f(x), sign(f(prevfloat(x)) * f(nextfloat(x)))
(2.0945514815423265, -8.881784197001252e-16, -1.0)

(Halley's method takes 3 steps, Newton's 4, but Newton's uses 5 function calls to Halley's 10.)

For many functions, their derivatives can be computed automatically. The ForwardDiff package provides a means. Here we define an operator D to compute a derivative:

julia> function D(f, n::Int=1)
           n <= 0 && return f
           n == 1 && return x -> ForwardDiff.derivative(f,float(x))
           D(D(f,1),n-1)
       end
D (generic function with 2 methods)

julia> dfᵏs(f,k) = ntuple(i->D(f,i-1), Val(k+1)) # (f, f′, f′′, …)
dfᵏs (generic function with 1 method)
julia> find_zero((f,D(f)), 2, Roots.Newton())
2.0945514815423265

Or, for Halley's method:

julia> find_zero((f, D(f), D(f,2)), 2, Roots.Halley())
2.0945514815423265

The family of solvers implemented in Roots.LithBoonkkampIJzerman(S,D) where S is the number of prior points used to generate the next, and D is the number of derivatives used, has both the secant method (S=2, D=0) and Newton's method (S=1, D=1) as members, but also provides others. By adding more memory or adding more derivatives the convergence rate increases, at the expense of more complicated expressions or more function calls per step.

julia> find_zero(dfᵏs(f, 0), 2, Roots.LithBoonkkampIJzerman(3,0)) # like secant
2.0945514815423265

julia> find_zero(dfᵏs(f, 1), 2, Roots.LithBoonkkampIJzerman(2,1)) # like Newton
2.0945514815423265

julia> find_zero(dfᵏs(f, 2), 2, Roots.LithBoonkkampIJzerman(2,2)) # like Halley
2.0945514815423265

The problem-algorithm-solve interface

The problem-algorithm-solve interface is a pattern popularized in Julia by the DifferentialEquations.jl suite of packages. The pattern consists of setting up a problem then solving the problem by specifying an algorithm. This is very similar to what is specified in the find_zero(f, x0, M) interface where f and x0 specify the problem, M the algorithm, and find_zero calls the solver.

To break this up into steps, Roots has the type ZeroProblem and methods for init, solve, and solve! from the CommonSolve.jl package.

Consider solving $\sin(x) = 0$ using the Secant method starting with the interval $[3,4]$.

julia> f(x) = sin(x);

julia> x0 = (3, 4)
(3, 4)

julia> M = Secant()
Secant()

julia> Z = ZeroProblem(f, x0)
ZeroProblem{typeof(f), Tuple{Int64, Int64}}(f, (3, 4))

julia> solve(Z, M)
3.141592653589793

Changing the method is easy:

julia> solve(Z, Order2())
3.1415926535897944

The solve interface works with parameterized functions, as well:

julia> g(x, p=1) = cos(x) - x/p;

julia> Z = ZeroProblem(g, (0.0, pi/2))
ZeroProblem{typeof(g), Tuple{Float64, Float64}}(g, (0.0, 1.5707963267948966))

julia> solve(Z, Secant(), 2) # uses p=2 as a positional argument
1.0298665293222589

julia> solve(Z, Bisection(); p=3, xatol=1/16) # p=3; uses keywords for position and tolerances
1.1959535058744393

Positional arguments are useful for broadcasting over several parameter values.

Examples

Intersections

A discourse post involved finding the roots of $\tan(x) = x /(B(\Lambda x^2 - 1)$. As the right hand side decays, we can see that for each positive arm of the periodic tangent function, there will be one intersection point in $(k\pi, (k+1/2)\pi)$ for each $k=0,1,\dots$. The standard way to find when $f(x) = g(x)$ with this package is to define an auxiliary function $h(x) = f(x) - g(x)$, as below:

julia> k, B, Λ = 3, 1, 1;

julia> f(x) = tan(x); g(x) = x/(B*(Λ*x^2 - 1));

julia> h(x) = f(x) - g(x)
h (generic function with 1 method)

julia> x = find_zero(h, (k*pi, (k + 1/2)*pi)); x, h(x)
(9.530477156207574, 8.326672684688674e-16)

As of version 1.9 of Julia, an extension is provided so that when SymPy is loaded, an equation can be used to specify the left and right hand sides, as with:

using SymPy
@syms x
find_zero(tan(x) ~ x/(B*(Λ*x^2 - 1)), (k*pi, (k + 1/2)*pi))

Inverse functions

The find_zero function can be used to identify inverse functions. Suppose $f$ is a monotonic function on $[a,b]$. Then an inverse function solves $y = f(x)$ for $x$ given a $y$. This will do that task and return values in a function form:

function inverse_function(f, a, b, args...; kwargs...)
    fa, fb = f(a), f(b)
    m, M = fa < fb ? (fa, fb) : (fb, fa)
    y -> begin
        @assert m ≤ y ≤ M
        find_zero(x ->f(x) - y, (a,b), args...; kwargs...)
    end
end
inverse_function (generic function with 1 method)

The fact that $f$ is monotonic, ensures that a bracket of $[a,b]$ can be used supposing $y$ is between $f(a)$ and $f(b)$, so no guess is needed.

Here we numerically find the inverse function of $f(x) = x - \sin(x)$:

using Plots, Roots;
f(x) = x - sin(x)
a, b = 0, 5pi
plot(inverse_function(f, a, b), f(a), f(b))

Finding critical points

The D function, defined above, makes it straightforward to find critical points (typically where the derivative is $0$ but also where it is undefined). For example, the critical point of the function $f(x) = 1/x^2 + x^3, x > 0$ near $1.0$ is where the derivative is $0$ and can be found through:

julia> f(x) = 1/x^2 + x^3;

julia> find_zero(D(f), 1)
0.9221079114817278

For more complicated expressions, D may need some technical adjustments to be employed. In this example, we have a function that models the flight of an arrow on a windy day:

julia> function flight(x, theta)
         k = 1/2
         a = 200*cosd(theta)
         b = 32/k
         tand(theta)*x + (b/a)*x - b*log(a/(a-x))
       end
flight (generic function with 1 method)

The total distance flown is when flight(x) == 0.0 for some x > 0: This can be solved for different theta with find_zero. In the following, we note that log(a/(a-x)) will have an asymptote at a, so we start our search at a-5:

julia> function howfar(theta)
         a = 200*cosd(theta)
         find_zero(x -> flight(x, theta), a-5)  # starting point has type determined by `theta`.
        end
howfar (generic function with 1 method)

To visualize the trajectory if shot at $45$ degrees, we would have:

using Plots;
flight(x, theta) = (k = 1/2; a = 200*cosd(theta); b = 32/k; tand(theta)*x + (b/a)*x - b*log(a/(a-x)))
howfar(theta) = (a = 200*cosd(theta); find_zero(x -> flight(x, theta), a-5))
howfarp(t) = ForwardDiff.derivative(howfar,t)

theta = 45
tstar = find_zero(howfarp, 45)

plot(x -> flight(x,  theta), 0, howfar(theta))

To maximize the range we solve for the lone critical point of howfar within reasonable starting points.

As of version v"1.9" of Julia, the automatic differentiation provided by ForwardDiff will bypass working through a call to find_zero. Prior to this version, automatic differentiation will work if the initial point has the proper type (depending on an expression of theta in this case). As we use 200*cosd(theta)-5 for a starting point, this is satisfied.

julia> (tstar = find_zero(D(howfar), 45)) ≈ 26.2623089
true

This graph would show the differences in the trajectories:

plot(x -> flight(x, 45), 0, howfar(45))
plot!(x -> flight(x, tstar), 0, howfar(tstar))

Sensitivity

In the last example, the question of how the distance varies with the angle is clearly important.

In general, for functions with parameters, $f(x,p)$, derivatives with respect to the $p$ variable(s) may be of interest.

A first attempt, as shown above, may be to try and auto-differentiate the output of find_zero. For example:

f(x, p) = x^2 - p # p a scalar
p = 2
2
F(p) = find_zero(f, one(p), Order1(), p)
ForwardDiff.derivative(F, p)
Dual{ForwardDiff.Tag{typeof(Main.F), Int64}}(0.3535533905932738,-0.0)

Prior to version v"1.9" of Julia, there were issues with this approach, though in this case it finds the correct answer, as will be seen: a) it is not as performant as what we will discuss next, b) the subtle use of one(p) for the starting point is needed to ensure the type for the $x$ values is correct, and c) not all algorithms will work, in particular Bisection is not amenable to this approach.

F(p) = find_zero(f, (zero(p), one(p)), Roots.Bisection(), p)
ForwardDiff.derivative(F, 1/2)
Dual{ForwardDiff.Tag{typeof(Main.F), Float64}}(0.7071067811865475,-0.0)

This will be 0.0 if the differentiation is propagated through the algorithm. With v"1.9" of Julia or later, the derivative is calculated correctly through the method described below.

Using the implicit function theorem and following these notes, this paper on the adjoint method, or the methods more generally applied in the ImplicitDifferentiation package we can auto-differentiate without pushing that machinery through find_zero.

The solution, $x^*(p)$, provided by find_zero depends on the parameter(s), $p$. Notationally,

\[f(x^*(p), p) = 0\]

The implicit function theorem has conditions guaranteeing the existence and differentiability of $x^*(p)$. Assuming these hold, taking the gradient (derivative) in $p$ of both sides, gives by the chain rule:

\[\frac{\partial}{\partial_x}f(x^*(p),p) \frac{\partial}{\partial_p}x^*(p) + \frac{\partial}{\partial_p}f(x^*(p),p) I = 0.\]

Since the partial in $x$ is a scalar quantity, we can divide to solve:

\[\frac{\partial}{\partial_p}x^*(p) = -\frac{ \frac{\partial}{\partial_p}f(x^*(p),p) }{ \frac{\partial}{\partial_x}f(x^*(p),p) }\]

For example, using ForwardDiff, we have:

xᵅ = find_zero(f, 1, Order1(), p)

fₓ = ForwardDiff.derivative(x -> f(x, p), xᵅ)
fₚ = ForwardDiff.derivative(p -> f(xᵅ, p), p)

- fₚ / fₓ
0.3535533905932738

This problem can be solved analytically, of course, to see $x^\alpha(p) = \sqrt{p}$, so we can easily compare:

ForwardDiff.derivative(sqrt, 2)
0.35355339059327373

The use with a vector of parameters is similar, only derivative is replaced by gradient for the p variable:

f(x, p) = x^2 - p[1]*x + p[2]
p = [3.0, 1.0]
x₀ = 1.0

xᵅ = find_zero(f, x₀, Order1(), p)

fₓ = ForwardDiff.derivative(x -> f(x, p), xᵅ)
fₚ = ForwardDiff.gradient(p -> f(xᵅ, p), p)

- fₚ / fₓ
2-element Vector{Float64}:
 -0.1708203932499369
  0.4472135954999579

The package provides a package extension to use ForwardDiff directly to find derivatives or gradients, as above, with version v"1.9" or later of Julia, and a ChainRulesCore.rrule and ChainRulesCore.frule implementation that should allow automatic differentiation packages relying on ChainRulesCore (e.g., Zygote) to differentiate in p using the same approach. (Thanks to @devmotion for much help here.)

Potential issues

The higher-order methods are basically various derivative-free versions of Newton's method (which has update step $x - f(x)/f'(x)$). For example, Steffensen's method (Order2()) essentially replaces $f'(x)$ with $(f(x + f(x)) - f(x))/f(x)$. This is a forward-difference approximation to the derivative with "$h$" being $f(x)$, which presumably is close to $0$ already. The methods with higher order combine this with different secant line approaches that minimize the number of function calls. These higher-order methods can be susceptible to some of the usual issues found with Newton's method: poor initial guess, small first derivative, or large second derivative near the zero.

When the first derivative is near $0$, the value of the next step can be quite different, as the next step generally tracks the intersection point of the tangent line. We see that starting at a $\pi/2$ causes this search to be problematic:

julia> try  find_zero(sin, pi/2, Order1()) catch err  "Convergence failed" end
"Convergence failed"

(Whereas, starting at pi/2 + 0.3–where the slope of the tangent is sufficiently close to point towards $\pi$–will find convergence at $\pi$.)

For a classic example where a large second derivative is the issue, we have $f(x) = x^{1/3}$:

julia> f(x) = cbrt(x);

julia> x = try  find_zero(f, 1, Order2())  catch err  "Convergence failed" end	# all of 2, 5, 8, and 16 fail or diverge towards infinity
"Convergence failed"

However, the default finds the root here, as a bracket is identified:

julia> x = find_zero(f, 1)
0.0

julia> x,  f(x)
(0.0, 0.0)

Finally, for many functions, all of these methods need a good initial guess. For example, the polynomial function $f(x) = x^5 - x - 1$ has its one real zero near $1.16$.

If we start far from the zero, convergence may happen, but it isn't guaranteed:

julia> f(x) = x^5 - x - 1;

julia> x0 = 0.1
0.1

julia> try find_zero(f, x0)  catch err  "Convergence failed" end
"Convergence failed"

A graph shows the issue. Running the following shows $15$ steps of Newton's method, the other algorithms being somewhat similar:

xs = [0.1] # x0
n = 15
for i in 1:(n-1) push!(xs, xs[end] - f(xs[end])/D(f)(xs[end])) end
ys = [zeros(Float64,n)';map(f, xs)'][1:2n]
xs = xs[repeat(collect(1:n), inner=[2], outer=[1])]
plot(f, -1.25, 1.5, linewidth=3, legend=false)
plot!(zero, -1.25, 1.5, linewidth=3)
plot!(xs, ys)

Graphically only a few of the steps are discernible, as the function's relative maximum causes a trap for this algorithm. Starting to the right of the relative minimum–nearer the zero–would avoid this trap. The default method employs a trick to bounce out of such traps, though it doesn't always work.

Tolerances

Mathematically solving for a zero of a nonlinear function may be impossible, so numeric methods are utilized. However, using floating point numbers to approximate the real numbers leads to some nuances.

For example, consider the polynomial $f(x) = (3x-1)^5$ with one zero at $1/3$ and its equivalent expression $f1(x) = -1 + 15\cdot x - 90\cdot x^2 + 270\cdot x^3 - 405\cdot x^4 + 243\cdot x^5$. Mathematically these are the same, however not so when evaluated in floating point. Here we look at the 21 floating point numbers near $1/3$:

julia> f(x) = (3x-1)^5;

julia> f1(x) =  -1 + 15*x - 90*x^2 + 270*x^3 - 405*x^4 + 243*x^5;

julia> above = accumulate((x,y) -> nextfloat(x), 1:10, init=1/3);

julia> below = accumulate((x,y) -> prevfloat(x), 1:10, init=1/3);

julia> ns = sort([below...,1/3, above...]); # floating point numbers around 1/3

julia> maximum(abs.(f.(ns) - f1.(ns))) < 1e-14
true

The exponents are:

julia> zs .|> abs .|> log10 .|> x -> floor(Int, x)
21-element Vector{Int64}:
 -16
 -16
 -16
 -15
 -15
 -16
 -76
 -77
   ⋮
 -15
 -16
 -75

We see the function values are close for each point, as the maximum difference is like $10^{-15}$. This is roughly as expected, where even one addition may introduce a relative error as big as $2\cdot 10^{-16}$ and here there are several such.

Note

(These values are subject to the vagaries of floating point evaluation, so may differ depending on the underlying computer architecture.)

Generally this variation is not even a thought, as the differences are generally negligible, but when we want to identify if a value is zero, these small differences might matter. Here we look at the signs of the function values for a run of the above:

julia> fs = sign.(f.(ns));

julia> f1s = sign.(f1.(ns));

julia> [ns.-1/3 fs f1s]
21×3 Matrix{Float64}:
 -5.55112e-16  -1.0  -1.0
 -4.996e-16    -1.0  -1.0
 -4.44089e-16  -1.0   1.0
 -3.88578e-16  -1.0   1.0
 -3.33067e-16  -1.0   1.0
 -2.77556e-16  -1.0   1.0
 -2.22045e-16  -1.0  -1.0
 -1.66533e-16  -1.0  -1.0
 -1.11022e-16  -1.0   1.0
 -5.55112e-17  -1.0   1.0
  0.0           0.0  -1.0
  5.55112e-17   0.0   1.0
  1.11022e-16   1.0   1.0
  1.66533e-16   1.0  -1.0
  2.22045e-16   1.0  -1.0
  2.77556e-16   1.0  -1.0
  3.33067e-16   1.0   1.0
  3.88578e-16   1.0   1.0
  4.44089e-16   1.0   1.0
  4.996e-16     1.0  -1.0
  5.55112e-16   1.0   0.0

Parsing this shows a few surprises. First, there are two zeros of f(x) identified–not just one as expected mathematically–the floating point value of 1/3 and the next largest floating point number.

julia> findall(iszero, fs)
2-element Vector{Int64}:
 11
 12

For f1(x) there is only one zero, but it isn't the floating point value for 1/3 but rather $10$ floating point numbers away.

julia> findall(iszero, f1s)
1-element Vector{Int64}:
 21

Further, there are several sign changes of the function values for f1s:

julia> findall(!iszero, diff(sign.(f1s)))
9-element Vector{Int64}:
  2
  6
  8
 10
 11
 13
 16
 19
 20

There is no guarantee that a zero will be present, but for a mathematical function that changes sign, there will be at least one sign change.

With this in mind, an exact zero of f would be either where iszero(f(x)) is true or where the function has a sign change (either f(x)*f(prevfloat(x))<0 or f(x)*f(nextfloat(x)) < 0).

As mentioned, the default Bisection() method of find_zero identifies such zeros for f provided an initial bracketing interval is specified when Float64 numbers are used. However, if a mathematical function does not cross the $x$ axis at a zero, then there is no guarantee the floating point values will satisfy either of these conditions.


Now consider the function f(x) = exp(x)-x^4. The valuex=8.613169456441398 is a zero in this sense, as there is a change of sign:

julia> f(x) = exp(x) - x^4;

julia> F(x) = sign(f(x));

julia> x=8.613169456441398
8.613169456441398

julia> F(prevfloat(x)), F(x), F(nextfloat(x))
(-1.0, -1.0, 1.0)

However, the value of f(x) is not as small as one might initially expect for a zero:

julia> f(x), abs(f(x)/eps(x))
(-2.7284841053187847e-12, 1536.0)

The value x is an approximation to the actual mathematical zero, call it $x$. There is a difference between $f(x)$ (the mathematical answer) and f(x) (the floating point answer). Roughly speaking we expect f(x) to be about $f(x) + f'(x)\cdot \delta$, where $\delta$ is the difference between x and $x$. This will be on the scale of abs(x) * eps(), so all told we expect an answer to be in the range of $0$ plus or minus this value:

julia> fp(x) = exp(x) - 4x^3; # the derivative

julia> fp(x) * abs(x) * eps()
5.637565490466956e-12

which is about what we see.


Bisection can be a slower method than others. For Float64 values, Bisection() takes no more than 64 steps, but other methods may be able to converge to a zero in 4-5 steps (assuming good starting values are specified).

When fewer function calls are desirable, then checking for an approximate zero may be preferred over assessing if a sign change occurs, as generally that will take two additional function calls per step. Besides, a sign change isn't guaranteed for all zeros. An approximate zero would be one where $f(x) \approx 0$.

By the above, we see that we must consider an appropriate tolerance. The first example shows differences in floating point evaluations from the mathematical ones might introduce errors on the scale of eps regardless of the size of x. As seen in the second example, the difference between the floating point approximation to the zero and the zero introduces a potential error proportional to the size of x. So a tolerance might consider both types of errors. An absolute tolerance is used as well as a relative tolerance, so a check might look like:

abs(f(x)) < max(atol, abs(x) * rtol)

This is different from Julia's isapprox(f(x), 0.0), as that would use abs(f(x)) as the multiplier, which renders a relative tolerance useless for this question.

One issue with relative tolerances is that for functions with sublinear growth, extremely large values will be considered zeros. Returning to an earlier example, with Thukral8 we have a misidentified zero:

julia> find_zero(cbrt, 1, Roots.Thukral8())
1.725042287244107e23

The algorithm rapidly marches off towards infinity so the relative tolerance $\approx |x| \cdot \epsilon$ used to check if $f(x) \approx 0$ is large compared to the far-from zero $f(x)$.

Either the users must be educated about this possibility, or the relative tolerance should be set to $0$. In that case, the absolute tolerance must be relatively generous. A conservative choice of absolute tolerance might be sqrt(eps()), or about 1e-8, essentially the one made in SciPy.

Though this tolerance won't be able to work for really large values:

julia> find_zero(x -> sqrt(eps()) - eps(x), (0,Inf))
9.981132799999999e7

This is not the choice made in Roots. The fact that bisection can produce zeros as exact as possible, and the fact that the error in function evaluation, $f'(x)|x|\epsilon$, is not typically on the scale of 1e-8, leads to a desire for more precision, if available.

In Roots, the faster algorithms use a check on both the size of f(xn) and the size of the difference between the last two xn values. The check on f(xn) is done with a tight tolerance, as is the check on $x_n \approx x_{n-1}$. If the function values get close to zero, an approximate zero is declared. Further, if the $x$ values get close to each other and the function value is close to zero with a relaxed tolerance, then an approximate zero is declared. In practice this seems to work reasonably well. The relaxed tolerance uses the cube root of the absolute and relative tolerances.

Searching for all zeros in an interval

The methods described above are used to identify one of possibly several zeros. The find_zeros function searches the interval $(a,b)$ for all zeros of a function $f$. It is straightforward to use:

julia> f(x) = exp(x) - x^4;

julia> a, b = -10, 10
(-10, 10)

julia> zs = find_zeros(f, a, b)
3-element Vector{Float64}:
 -0.8155534188089606
  1.4296118247255556
  8.613169456441398

The search interval, $(a,b)$, is specified either through two arguments or through a single argument using a structure, such as a tuple or vector, where extrema returns two distinct values in increasing order. It is assumed that neither endpoint is a zero.


The algorithm used to search for all zeros in an interval is confounded by a few things:

  • too many zeros in the interval $(a,b)$
  • nearby zeros ("nearby" depends on the size of $(a,b)$, should this be very wide)

The algorithm is adaptive, so that it can succeed when there are many zeros, but it may be necessary to increase no_pts from the default of 12, at the cost of possibly taking longer for the search.

Here the algorithm identifies all the zeros, despite there being several:

julia> f(x) = cos(x)^2 + cos(x^2); a,b = 0, 10;

julia> rts = find_zeros(f, a, b);

julia> length(rts)
32

For nearby zeros, the algorithm does pretty well, though it isn't perfect.

Here we see for $f(x) = \sin(1/x)$–with infinitely many zeros around $0$–it finds many:

julia> f(x) = iszero(x) ? NaN : sin(1/x);  # avoid sin(Inf) error

julia> rts = find_zeros(f, -1, 1);

julia> length(rts) # 88 zeros identified
88

The function, $f(x) = (x-0.5)^3 \cdot (x-0.499)^3$, looks too much like $g(x) = x^6$ to find_zeros for success, as the two zeros are very nearby:

julia> f(x) =  (x-0.5)^3 * (x-0.499)^3;

julia> find_zeros(f, 0, 1)
1-element Vector{Float64}:
 0.5

The issue here isn't just that the algorithm can't identify zeros within $0.001$ of each other, but that the high power makes many nearby values approximately zero.

The algorithm will have success when the powers are smaller

julia> f(x) =  (x-0.5)^2 * (x-0.499)^2;

julia> find_zeros(f, 0, 1)
2-element Vector{Float64}:
 0.49899999999999994
 0.5

It can have success for closer pairs of zeros:

julia> f(x) = (x-0.5) * (x - 0.49999);

julia> find_zeros(f, 0, 1)
2-element Vector{Float64}:
 0.49999
 0.5

Combinations of large (even) multiplicity zeros or very nearby zeros, can lead to misidentification.

IntervalRootFinding

The IntervalRootFinding package rigorously identifies isolating intervals for the zeros of a function. This example, from that package's README, is used to illustrate the differences:

julia> using IntervalArithmetic, IntervalRootFinding, Roots

julia> f(x) = sin(x) - 0.1*x^2 + 1
f (generic function with 1 method)

julia> rts = roots(f, -10..10)
4-element Vector{Root{Interval{Float64}}}:
 Root([3.14959, 3.1496], :unique)
 Root([-4.42654, -4.42653], :unique)
 Root([-3.10682, -3.10681], :unique)
 Root([-1.08205, -1.08204], :unique)

julia> find_zeros(f, -10, 10)
4-element Vector{Float64}:
 -4.426534982071949
 -3.1068165552293254
 -1.0820421327607177
  3.1495967624505226

Using that in this case, the intervals are bracketing intervals for f, we can find the zeros from the roots output with the following:

julia> [find_zero(f, (interval(u).lo, interval(u).hi)) for u ∈ rts if u.status == :unique]
4-element Vector{Float64}:
  3.1495967624505226
 -4.426534982071949
 -3.1068165552293254
 -1.082042132760718
`IntervalRootFinding` extension

As of version 1.9 of Julia an extension is provided so that when the IntervalRootFinding package is loaded, the find_zeros function will call IntervalRootFinding.roots to find the isolating brackets and find_zero to find the roots, when possible, if the interval is specified as an Interval object, as created by -1..1, say.

Adding a solver

To add a solver the minimum needed is a type to declare the solver and an update_state method. In this example, we also define a state object, as the algorithm, as employed, uses more values stored than the default.

The Wikipedia page for Brent's method suggest a modern improvement, Chandrapatla's method, described here. That description is mostly followed below and in the package implementation Roots.Chandrapatla.

To implement Chandrapatla's algorithm we first define a type to indicate the method and a state object which records the values $x_n$, $x_{n-1}$, and $x_{n-2}$, needed for the inverse quadratic step.

julia> struct Chandrapatla <: Roots.AbstractBracketingMethod end

julia> struct ChandrapatlaState{T,S} <: Roots.AbstractUnivariateZeroState{T,S}
    xn1::T
    xn0::T
    c::T
    fxn1::S
    fxn0::S
    fc::S
end

An init_state method can be used by some methods to add more detail to the basic state object. Here it starts the old value, c, off as a as a means to ensure an initial bisection step.

julia> function init_state(::Chandrapatla, F, x₀, x₁, fx₀, fx₁)
    a, b, fa, fb = x₁, x₀, fx₁, fx₀
    c, fc = a, fa
    ChandrapatlaState(b, a, c, fb, fa, fc)
end

The main algorithm is implemented in the update_state method. The @reset macro from Accessors.jl is used to modify a state object, which otherwise is immutable.

julia> import Roots.Accessors: @reset;

julia> function Roots.update_state(::Chandrapatla, F, o, options, l=NullTracks())

    b, a, c = o.xn1, o.xn0, o.c
    fb, fa, fc = o.fxn1, o.fxn0, o.fc

    # encoding: a = xₙ, b = xₙ₋₁, c = xₙ₋₂
    ξ = (a - b) / (c - b)
    ϕ = (fa - fb) / (fc - fb)
    ϕ² = ϕ^2
    Δ = (ϕ² < ξ) && (1 - 2ϕ + ϕ² < 1 - ξ) # Chandrapatla's inequality to determine next step

    xₜ = Δ ? Roots.inverse_quadratic_step(a, b, c, fa, fb, fc) : a + (b-a)/2

    fₜ = F(xₜ)
    incfn(l)

    if sign(fₜ) * sign(fa) < 0
        a, b, c = xₜ, a, b
        fa, fb, fc = fₜ, fa, fb
    else
        a, c = xₜ, a
        fa, fc = fₜ, fa
    end

    @reset o.xn0 = a
    @reset o.xn1 = b
    @reset o.c = c
    @reset o.fxn0 = fa
    @reset o.fxn1 = fb
    @reset o.fc = fc

    return (o, false)
end

This algorithm chooses between an inverse quadratic step or a bisection step depending on the relationship between the computed ξ and Φ. The tolerances are from the default for AbstractBracketingMethod.

To see that the algorithm works, we have:

julia> find_zero(sin, (3,4), Chandrapatla())
3.1415926535897927

julia> find_zero(x -> exp(x) - x^4, (8,9), Chandrapatla())
8.613169456441398

julia> find_zero(x -> x^5 - x - 1, (1,2), Chandrapatla())
1.1673039782614185