The significant digits in a number are the first nonzero digit and all
succeeding digits. Thus 1.7320 has five significant digits. 0.0491 has
only three significant digits.
It is not transparent to me why this definition is sensible .
We can naively define x^ agrees to x upto p significant digits
if x^ and x round to the same number upto p significant digits.
This definition is seriously problematic. Consider the numbers:
x=0.9949, x1=1.0, x2=0.99, x3=0.9950
y=0.9951, y1=1.0, y2=1.0, y3=0.9950
Here, y has correct one and three significant digits relative to x,
but incorrect 2 significant digits, since the truncation at x2 and y2do not agree even to the first significant digit.
§ Correct Significant digits --- the correct definition
We say that x^ agress to x upto p significant digits if ∣x−x^∣is less than half a unit in the pth significant digit of x.
Accuracy: absolute or relative error of a quantity.
Precision: accuracy with which basic arithmetic +, -, *, / are performed. for floating point, measured by unit round-off (we have not met this yet).
Accuracy is not limited by precision : By using fixed precision arithmetic,
we can emulate arbitrary precision arithmetic. The problem is that often
this emulation is too expensive to be useful.
Let y=f(x), where f:R→R. Let us compute y^ as an approximation to
y, in an arithmetic of precision u. How do we measure the quality of y^?
In many cases, we maybe happy with an y^ such that the relative error between y and y^ is equal to u: we did the best we can with the precision that was given. This is the forward error .
An alternative question we can ask is, for what δx do we have that y^=f(x+δx). That is, how far away from the input do we need to stray, to get a matching output? There maybe many such δx, so we ask for min∣δx∣. We can divide this error by x as a normalization factor. This is the backward error .
There are two reasons we prefer backward error.
It converts error analysis into "data analysis". The data itself tends to be uncertain. If the error given by the backward analysis is smaller than the data uncertainty, then we can write off our error as being too small. Since for all we know, we have 'fixed' the uncertain data with our small error.
It reduces the question of error analysis into perturbation theory, which is very well understood for a large class of problems.
A method for computing y=f(x) is called backward stable if it produces a y^ with small backward error. That is, we need a
small δx such that y^=f(x+δx).
We assume that addition and subtraction are backward stable, where uis the number of significant digits to which our arithmetic operations
can be performed:
x±y=x(1+Δ)±y(1+Δ)∀∣Δ∣≤u
Another type of error we can consider is that of the form:
y^+δy=f(x+Δx)
That is, for a small perturbation in the output (δy), we can get a
backward error of δx. This is called as mixed forward backward error .
We can say that an algorithm with mixed-forward-backward-error is stable iff:
y^+δy=f(x+Δx)∣Δy∣/∣y^∣<ϵ∣Δx∣/∣x^∣<ηϵ,η are small
This definition of stability is useful when rounding errors are the dominant
form of errors.
Relationship between forward and backward error is govered by conditioning :
the sensitivity of solutions to perturbations of data. Let us have an approximate
solution y^=f(x+δx). Then:
The quantity c(x) measures the scaling factor to go from the relative
change in output to the relative change in input. Note that this is a property
of the function f, not any particular algorithm.
If f(x)=logx, then c(x)=∣(x(logx)′)/logx∣=∣1/logx∣. This
quantity is very large for x≃1. So, a small change in x can
produce a drastic change in logx around 1.
Note the the absolute change is quite small: log(x+δx)≃logx+δx/x. However, relative to logx, this change of δx/x is quite large.
If a method produces answers with forward errors of similar magnitude to those
produced by a backward stable method, then it is called forward stable.
Backward stability implies forward stability, but not vice-versa (TODO: why?)
This is clearly wrong , because we know that (1−cosx)/x2)≤1/2.
The reason for this terrible result is that:
we know cosx to high accuracy, since x was some fixed quantity.
1−cosx converted the error in cosx into its value .
1−cosx has only one significant figure.
This makes it practically useless for anything else we are interested in doing.
In general:
x≡1+ϵerror of order ϵy≡1−x=ϵvalue of order ϵ
That is, subtracting values close to each other (in this case, 1 and x)
converts error order of magnitude into value order of magnitude .
Alternatively, it brings earlier errors into promience as values.
#include<cmath>#include<stdio.h>intmain(){double x =1000;for(int i =0; i <60;++i){
x =sqrt(x);}for(int i =0; i <60;++i){
x = x*x;}printf("x: %10.20f\n", x);}
This produces the output:
./sqrt-pow-1-12
...
x: 1.00000000000000000000
That is, even though the function is an identity function, the answer collapses
to 1. What is happening?
doublef(double x){return x ==0?1:(pow(M_E, x)-1)/ x;}
This can suffer from catastrophic cancellation in the numerator. When
x is close to 0, ex is close to 1, and ex−1 will magnify the
error in ex.
doublef(double x){constdouble y =pow(M_E, x);return y ==1?1:(y -1)/log(y);}
This algorithm seems crazy, but there's insight in it. We can show that
the errors cancel! The idea is that neither (y−1) nor logy are
particularly good, the errors accumulated in them almost completely
cancel out, leaving out a good value:
§ IEEE floating point fun: +0 and -0 for complex analysis
Rather than think of +0 and -0 as distinct numerical values, think of their sign bit as an auxiliary variable that conveys one bit of information (or misinformation) about any numerical variable that takes on 0 as its value.
We have two types of zeroes, +0 and -0 in IEEE-754. These are used in some
cases. The most famous is that 1/+0=+∞, while 1/−0=−∞. Here,
we proceed to discuss some complex-analytic considerations.
Therefore. implementers of compilers and run-time libraries bear a heavy burden of attention to detail if applications programmers are to realize the full benefit of the IEEE style of complex arithmetic. That benefit deserves Some discussion here if only to reassure implementers that their assiduity will be appreciated.
−1+0i=+0+i−1−0i=+0−i
These will ensure that z∗=(z)∗:
copysign(1,+0)=+1copysign(1,−0)=−1
These will ensure that copysign(x,1/x)=x when x=±∞.
An example is provided where the two limits:
The principal branch of a complex function is a way to select one branch
of a complex-function, which tends to be multi-valued. A classical example
is the argument function, where arg(reiθ=θ.
However, this is ambiguous: we can map θ↦θ+2πand still have the same complex number. So, we need to fix some standard.
We usually pick the "branch" where 0≤θ<2π.
In general, we need to carefully handle what happens to the function at
the discontinuity.
What deserves to be undermined is blind faith in the power of Algebra. We should not believe that the equivalence class of expressions that all describe the same complex analytic function can be recognized by algebraic means alone, not even if relatively uncomplicated expressions are the only ones considered.