Floating-point arithmetic – Part 2

Floating-point addition is not associative

When dealing with real numbers such as \(1\), \(-1/2\), \(\sqrt{2}\), \(1.414\), \(\pi\), etc., there are many properties that we take for granted. For instance, if \(x\), \(y\) and \(z\) represent any combination of the aforementioned numbers, then it is well known that

\(\begin{alignat}{1}
x + y & = y + x \label{eq:comm} \\
x + (y + z) & = (x + y) + z \label{eq:assoc}
\end{alignat}\)

The above expressions state that addition is commutative and associative on real numbers.  These properties are so natural that most people would consider it highly unusual for addition not to behave in this manner. Unfortunately, floating-point addition is not associative as the following example will show.

A simplified model example

We will use a simplified system based on decimal numbers to illustrate what happens. Consider real numbers of the following form:
\[
x = (-1)^s10^E\sum_{i=0}^{3}d_i2^{-i} \equiv (-1)^s10^E(d_0\cdot d_1d_2d_3)
\]
where

\(\begin{align*}
\label{eq:sEd}
s & = 0 \mbox{ or } 1 \\
E & = j, \mbox{  } -9 \leq j \leq 10 \\
d_i & = 0, 1, .\ldots, 9 ,\mbox{  } i = 0, \ldots, 3
\end{align*}\)

with \(d_0\) satisfying the normalization condition \(d_0 > 0\). Next, define three numbers \(x\), \(y\) and \(z\) that can be represented exactly in our
chosen system:
\[
\begin{split}
x & = 111.1 = 10^2(1.111) \\
y & = -100.0 = -10^2(1.000) \\
z & = 1.110 = 10^0(1.110) \\
\end{split}
\]
In exact arithmetic the result is
\[
(x + y) + z = x + (y + z) = 12.21
\]
Denote the floating-point addition operator by \(\oplus\). Floating-point addition of two floating-point numbers \(x\) and \(y\) is only defined if the exponents \(E\) in the floating-point representation of \(x\) and \(y\) are identical. If not, the smaller floating-point number must be denormalized to adjust the exponent. For floating-point numbers with identical exponents the fractions are added using normal integer arithmetic. The resulting fraction may require normalization. Since the exponent of \(x\) and \(y\) equals two they can be added directly:
\[
\begin{split}
x \oplus y & = 10^2 \times 1.111 + (-10^2 \times 1.000) \\
& = 10^2 \times (1.111 + (-1.000)) \\
& = 10^2(0.111) \\
& = 10^1(1.110) \\
\end{split}
\]
where the last step is normalization. Thus,
\[
\label{eq:nonassoc1}
\begin{split}
(x \oplus y) \oplus z & = 10^1(1.110) \oplus 10^0(1.110) \\
& = 10^1(1.110) \oplus 10^1(0.111) \\
& = 10^1(1.221) \\
\end{split}
\]
where the second equality follows from (temporary) denormalization. In this case floating-point addition is exact, since normalization/denormalization does not cause any loss of accuracy.

Next, start by computing
\[
y \oplus z = -10^2(1.000) \oplus 10^0(1.110) = -10^2(1.000) \oplus 10^1(0.111)
\]
where the second step is denormalization. But in order to be able to add the fractions together, one more denormalization step is required. This time precision will be lost since the least significant digit will be truncated. Thus,
\[
y \oplus z = -10^2(1.000) \oplus 10^2(0.011) = -10^2(0.989) = -10^1(9.890)
\]
Note that the least significant zero digit after normalization has no numerical significance. Finally,
\[
\begin{split}
x \oplus (y \oplus z) & = 10^2(1.111) \oplus (-10^1(9.890)) \\
& = 10^2(1.111) \oplus (-10^2(0.989)) \\
& = 10^2(0.122) \\
& = 10^1(1.220) \\
\end{split}
\]
One clearly sees that
\[
(x \oplus y) \oplus z \neq x \oplus (y \oplus z)
\]
i.e., floating-point addition is non-associative. This implies that an expression like
\[
x \oplus y \oplus z
\]
is meaningless since it is ambiguous.

The example presented above is simplified. In reality floating-point addition is more involved, but the conclusion still holds true. For more details, see the article by D. Goldberg.  Floating-point addition is commutative, however. The lack of associativity has practical consequences, which will be discussed in the third part of this series.

Leave a Reply

Your email address will not be published. Required fields are marked *