I've written quite a lot of code over the years. Most of the time I just bash out code. Obviously I always do my best to make my code clear and understandable (after all, I might be the next person to read it) but for the most part the code is functional rather than beautiful.

Sometimes however I find myself obsessing over small fragments of code trying to perfect them to meet the, sometimes arbitrary, criteria that makes life interesting. These functions are those that I crafted rather than those that I merely wrote. Sadly, I suspect few people will appreciate these functions as much as I do. Nevertheless, for those that are interested I have shown below a selection of my favorites complete with explanatory notes.

## Integer Division-less Quadratic Cosine Approximation

This function is nearly useless but I still think its cute! Like most sinusoid approximations, it is based on the idea that any (continuous) function can be modeled using a polynomial function of infinite complexity. This implies that such a function can be approximated by a polynomial of finite complexity. In this case we are using a quadratic function to approximate the first 90 degrees of the cosine function; the sine function and all other angles can be derived from this.

To make things slightly more tricky and meet an approximation of beauty I also decided not to allow division by anything other than a power of two (providing the input is within 0 to 360 degrees) nor the use of a lookup table. As an aside and assuming floating point computation is too slow this sort of function is, in fact, best implemented using table lookup and then using simple calculations (linear or quadratic interpolation) to derive values not present in the table.

With the above firmly set in my mind I then spent an hour or two in front of GNU Octave finding and testing the best quadratic function I could and tuning it to only use power of two division. The resulting function has an error margin or approximately 3% but has been tuned to that 0 and 90 degrees are straight lines. Given these results the function is, as I said in the introduction, nearly useless. That said the function does appear in TZones where it is used to draw the clock hands. Since the clock hands are only twenty five pixels long the error margin is, in this specific instance, acceptable.

/* return an approximation of 32767 * cos(x) where x is measured in degrees */
short IntCosine(short x)
{

long sign;

/* put x into the interval 0 <= x <= 90 */
x = (x > 0 ? x : 0 - x);
x = (x < 360 ? x : x % 360);
x = (x < 180 ? x : 360 - x);
x = (x < 90 ? (sign = 1, x) : (sign = -1, 180 - x));

return (short) (sign * (32767l - ((16570l * x * x) >> 12)));
}

short IntSine(short x)
{
return IntCosine(x - 90);
}

As you will, by this point, have noted I made one totally catastrophic mistake with this function (even ignoring the error margin). The input range (once filtered) ranges only between 0 and 90. This means that when used as, for example, a 16-bit PCM tone generator the rounding of the phase adds a significant quantity of additional error. At certain frequencies this can be heard aurally as a high pitched whine over the generated tone.

## RGB565 Flicker Filter

This function is a highly optimized FIR flicker filter that uses a few simple tricks to allow the ALU to perform calculations in parallel. Flicker filters are common when driving an interlaced display, there are basically a simply vertical blur that significantly reduces the flicker effects cause by interlacing. Basically a single pixel horizontal line is only refreshed 25 times per second (on a PAL TV) and at this refresh rate is very clearly visible. The flicker filter used here solves that by performing an FIR blur with coefficients of 0.25, 0.5 and 0.25 (e.g. this_pixel' = 0.25*pixel_above + 0.5*this_pixel + 0.25*pixel_below).

Addition has has the pleasing property that, providing carry overflow can be avoided a 32-bit add can actually perform 2 16-bit adds, or 4 8-bit adds, or 6 5-bit adds or many other combinations. This can be observed even with decimal numbers. 10,030 + 25,020 = 35,050 appears to be 5-digit addition however it could be viewed as two 2-digit additions with a gap to prevent overflow of the least significant digits from contaminating the most significant digits.

By working on 2 pixels at the same time we have a 32-bit value containing 6 values (R, G, B, R, G, B) and if we mask out every other value we ensure that carries within the add do not affect other values.

static uint32_t rgbrgb16_ff(uint32_t rgbrgb, uint32_t above, uint32_t below)
{

const uint32_t R = 0xf800;
const uint32_t G = 0x07e0;
const uint32_t B = 0x001f;

const uint32_t _G_R_B = (G << 16) | R | B;
const uint32_t R_B_G_ = (R << 16) | (B << 16) | G;

uint32_t _g_r_b, r_b_g_;

_g_r_b = (rgbrgb & _G_R_B) << 1;
_g_r_b += (above & _G_R_B);
_g_r_b += (below & _G_R_B);
_g_r_b >>= 2;

r_b_g_ = (rgbrgb & R_B_G_) >> 1;
r_b_g_ += (above & R_B_G_) >> 2;
r_b_g_ += (below & R_B_G_) >> 2;

return (_g_r_b & _G_R_B) | (r_b_g_ & R_B_G_);
}

The function is takes as arguments two pixels together with the two pixels above and the two pixels below the pixel to be calculated. It is another functions responsibility to located and provide the values (and to synthesise black pixels at the edge of the screen).