This post is about the do_sin(double x, double dx) function from s_sin.c. The source code is here.

The purpose of do_sin(x, dx) is to compute an approximation to \(\sin(x + dx)\) (addition in the sense of real numbers) even when the floating point addition x + dx is not exact (e.g. because dx is very small relative to x). The source doesn’t specify how much smaller dx should be, but with one exception, whenever do_sin(x, dx) is called x and dx have come from a call to reduce_sincos, so as it explains in my post on that function we can assume dx will be at most \(2^{-27}\) times x in magnitude.

x less than 0.126 in magnitude

do_sin does a check to see if \(\vert x\vert <0.126\) using fabs, and if so, invokes the TAYLOR_SIN macro with parameters x*x, x, dx. It invokes two other macros which are defined on lines 49-65 and which I’ve copied below.

#define POLYNOMIAL2(xx) ((((s5 * (xx) + s4) * (xx) + s3) * (xx) + s2) * (xx))

#define POLYNOMIAL(xx) (POLYNOMIAL2 (xx) + s1)

/* The computed polynomial is a variation of the Taylor series expansion for
   sin(x):

   x - x^3/3! + x^5/5! - x^7/7! + x^9/9! - dx*x^2/2 + dx

   The constants s1, s2, s3, etc. are pre-computed values of 1/3!, 1/5! and so
   on.  The result is returned to LHS.  */
#define TAYLOR_SIN(xx, x, dx)
({
  double t = ((POLYNOMIAL (xx)  * (x) - 0.5 * (dx))  * (xx) + (dx));
  double res = (x) + t;
  res;
})

The constants used, defined in usncs.h, are

double s1 = -0x1.5555555555555p-3;   /* -0.16666666666666666     */
double s2 =  0x1.1111111110ECEp-7;   /*  0.0083333333333323288   */
double s3 = -0x1.A01A019DB08B8p-13;  /* -0.00019841269834414642  */
double s4 =  0x1.71DE27B9A7ED9p-19;  /*  2.755729806860771e-06   */
double s5 = -0x1.ADDFFC2FCDF59p-26;  /* -2.5022014848318398e-08  */

and these numbers are close to, but not the nearest floats to, the coefficients -1/3!, 1/5!, -1/7!, 1/9!, and -1/11! in the Taylor series for sin. (For example, the nearest double to 1/5! is 0x1.1111111111111p-7 and this isn’t equal to s2.)

Unpacking the definitions, what gets evaluated is

\[\texttt{s5} x^{11} + \texttt{s4} x^9 + \texttt{s3} x^7 + \texttt{s2} x^5 + \texttt{s1} x^3 + x - 0.5 \texttt{dx} x^2 + \texttt{dx}\]

that is, a polynomial of degree 11 whose coefficients are close to those of the Taylor series for \(\sin(x)\), plus \(dx(1-x^2/2)\), which is dx times the degree 2 truncation of the Taylor series for \(\cos(x)\). The comment in the code about

x - x^3/3! + x^5/5! - x^7/7! + x^9/9! - dx*x^2/2 + dx

is misleading: the approximation used for sin(x) is degree 11, not 9, and as above, the constants s2 to s5 are not just double approximations to the true Taylor coefficients. (In older versions of s_sin.c the coefficient of dx was also wrong, but the maintainers fixed it a few years ago.)

The reason for these coefficients s1, …, s5 to differ from the true Taylor coefficients is that in general, the “best” polynomial approximation to a function, even a nice function like sin on an interval symmetric around 0, is not just a truncation of its Taylor series (although often the coefficients are close to those from the Taylor series). Since as double users we care about significant figures, these coefficients will have been chosen to minimise relative error on an interval.

There’s no discussion in the file of how small dx should be for this function to produce acceptable results, but some basic estimates show that \(\vert dx \vert \leqslant 2^{-40} \vert x \vert\) is good enough. Truncating Taylor expansions for sin and cos and using the Lagrange remainder, we get

\[\sin(x+dx) = \sin(x) - dx(1-x^2/2 + \cos(\zeta)x^4/4!) - \sin(\xi)(dx)^2/2!\]

at most for some \(\zeta, \xi\). The error in TAYLOR_SIN is therefore the sum of the error in approximating $\sin(x)$ by the degree 11 polynomial above, the error in neglecting \(dx\cos(\zeta)x^4/4!\), and the error in neglecting \(\sin(\xi)(dx)^2/\). If \(\vert dx \vert \leqslant 2^{-k} \vert x\vert\) then \(\vert \cos(\zeta) x^4 dx / 4!\vert \leqslant 2^{-k-16.5}\vert x \vert\) (using \(\vert x \vert \leqslant 0.126\)) and \(\vert \sin(\xi)(dx)^2/2! \leqslant 2^{-2k-3.9}\vert x \vert\) (using the same approximation). For x of this magnitude an ULP for x is about the same as an ULP for \(\sin(x+dx)\), so with \(k=40\) the overall error will be dominated by the contribution from the polynomial approximation to \(\sin(x)\) which testing in Python suggests is not much more than half an ULP. On the other hand, the same test code shows \(\vert dx \vert \leqslant 2^{-27}\vert x \vert\) is nowhere near small enough, but \(2^{-40}\) is OK (in the sense of having errors little more than half and ULP). Since the actual implementation shows this level of accuracy, I think the conclusion is that my estimate of the magnitude of dx returned by reduce_sincos was way too pessimistic.

This range of x-values does provide examples of doubles for which the glibc sin function is not correctly rounded. For example, testing in Python against the crlibm library, which claims to be correctly rounded, and with mpmath’s arbitrary precision sin function, shows that glibc sin of -0x1.e6fbcae266c20p-4 has an error of 0.506 ULPs so is off by one in the final place.

x between 0.126 and 0.855469 in magnitude

This starts around line 131. The code is rather annoying to read, so I have rewritten it below with some new variables introduced for clarity and with simplifications made by assuming x is positive.

mynumber u;
u.x = big + x;
double y = x - (u.x - big);

big is a static const double equal to 0x1.8000000000000p45. An ULP for big is \(2^{-7}\), while \(\vert x\vert < 1\). Therefore the last 7 bits of u.x will be the result of rounding x to 7 bits, that is, to the nearest multiple of 1/128. y is now the difference between x and the nearest multiple of 1/128. Let’s say this multiple is m/128 where m is a whole number. We have \(x+dx = y + m/128 + dx\).

yy = y * y
s = y + (dx + y * yy * (sn3 + yy * sn5))
c = y * dx + yy * (cs2 + yy * (cs4 + yy * cs6))

The constants here are given in the following odd format.

static const double
sn3 = -1.66666666666664880952546298448555E-01,
sn5 =  8.33333214285722277379541354343671E-03,
cs2 =  4.99999999999999999999950396842453E-01,
cs4 = -4.16666666666664434524222570944589E-02,
cs6 =  1.38888874007937613028114285595617E-03;

There are 32 decimal places, about half of which don’t make any difference to the double value (cs2 is just 0.5, for example). Presumably they came from some higher precision optimisation calculation. The result is that c is an approximation to \(1-\cos(y+dx)\) and s is an approximation to \(\sin(y + dx)\). The code now uses a macro SINCOS_TABLE_LOOKUP which retrieves double-double approximations to sin and cos of m/128 from a lookup table __sincostab.

int k = u.i[LOW_HALF] << 2; // u.i[LOW_HALF] multiplied by 4
// Interpreted as an integer, u.i[LOW_HALF] is the numerator of the
// closest fraction m/128 to x.
// k is the index of the  "row" of the table sincostab corresponding to y
sn = __sincostab.d[k];
ssn = __sincostab.d[k + 1];
cs = __sincostab.d[k + 2];
ccs = __sincostab.d[k + 3];

sn and ssn are a double-double approximation to sin(m/128) with ssn being the smaller part. cs and ccs are a double-double approximation of cos(m/128) with ccs being the smaller part.

cor = (ssn + s * ccs - sn * c) + cs * s;
return sn + cor;

The trig identity being used here is the addition formula for sin: \(\sin(x+dx) = \sin(m/128 + (y+dx)) = \sin(m/128)\cos(y+dx) + \cos(m/128)\sin(y+dx)\). If we introduce error terms \(\mathtt{s} = \sin(y+dx) + \epsilon_s\), \(1-\mathtt{c} = \cos(y+dx)+\epsilon_c\), \(\mathtt{sn}+\mathtt{ssn} = \sin(m/128)+\epsilon_{ssn}\), \(\mathtt{cs}+\mathtt{ccs} = \cos(m/128) + \epsilon_{ccs}\) then we get

\[\sin(x+dx) = (\mathtt{sn} + \mathtt{ssn} -\epsilon_{ssn})(1-\mathtt{c} - \epsilon_c) + (\mathtt{cs} + \mathtt{ccs} - \epsilon_{ccs})(\mathtt{s}-\epsilon_s)\]

so the neglected terms in the return value sn + cor are \(-\mathtt{ssn} \cdot \mathtt{c} - \epsilon_c (\mathtt{sn} + \mathtt{ssn}) - \epsilon_{ccs} \mathtt{s}- \epsilon_{ssn} - \epsilon_{ssn} \mathtt{c} + \epsilon_{ssn} \epsilon_c - \epsilon_s \mathtt{cs} - \epsilon_s \mathtt{ccs} + \epsilon_{ccs} \epsilon_s\).

We can bound the sizes of the various terms appearing here as follows:

expression lower bound upper bound
x 0.126 0.855469
y -1/256 1/256
dx \(-2^{40}x\) \(2^{-40}x\)
sin(x+dx) 0.125 0.76
m 16 110
sn + ssn 0.124 0.76
sn 0.124 0.76
ssn \(-2^{-54}\) \(2^{-54}\)
cs 0.658 0.993
ccs \(-2^{-54}\) \(2^{-54}\)
c 0 \(2^{-17}\)
s 0 \(2^{-8}\)
\(\epsilon_c\) 0 \(2^{-66}\), est.
\(\epsilon_s\) 0 \(2^{-57}\), est.
\(\epsilon_{ssn}\) 0 \(2^{-100}\)
\(\epsilon_{ccs}\) 0 \(2^{-100}\)

(the bounds for \(\epsilon_c\) and \(\epsilon_c\) are estimates derived by computing a large number randomly chosen examples). The term with the smallest bound that is kept is s * ccs which is at most \(2^{-62}\) (but can make a difference to the return value, in the last place). On the other hand the neglected term \(-\epsilon_s\cdot \mathtt{cs}\) has a larger upper bound than that for s * ccs. Given that cs is never smaller than 0.658, it’s surprising to me that this works out OK.