Finding Binary Gaps In C#

Variations of the binary gap problem are a classic at coding challenge sites nowadays.

This easy-to-solve, not-so-obvious-to-master problem can highlight how a programmer is aware of binary manipulation and how it affects performance.

This article describes two solutions for a variation of this problem and how their performance stacks against each other.

The Problem

Consider some integer, for example 14512.

Its binary representation is 11100010110000, with the most significant bit first.

A binary gap is a sequence of consecutive bits of the same value (ones or zeros) that are surrounded by bits of opposite values. For example, a zero binary gap is a sequence of unset bits (zeroes) that are surrounded by set bits (ones).

The number 14512, or 11100010110000 in binary, has two zero binary gaps, one of length 3 and one of length 1.

11100010110000
   ^^^ ^

The right-most sequence of 4 unset bits does not count, as it is not surrounded by set bits on both sides.

A maximum binary gap length is thus the length of the longest binary gap within the number’s binary representation.

This concept may look simple at first, but variations of it become interesting in the domain of compression algorithms that look for long chunks of similar data.

It is also a fine example of how not all O(N)-ish algorithms are born the same.

The precise way one writes code to achieve something can have a dramatic impact on how fast that code achieves that end.

Let’s see an example of that by attempting to find the length of the longest binary gap in any given number.

A Simple Approach

One simple approach can be to convert the integer into its binary representation as a string, and then iterate the characters one-by-one.

Here is a C# algorithm that performs just that.

public static int BinaryGapMaxLengthByIterating(int value, BinaryGapBit bit = BinaryGapBit.Unset)
{
    // trackers
    int max_gap = 0;
    int gap = 0;
    char mask = bit == BinaryGapBit.Unset ? '1' : '0';

    // convert the value to a binary string
    string binary = Convert.ToString(value, 2);

    // iterate until the first set bit
    int i = 0;
    while (i < binary.Length && binary[i] != mask)
    {
        ++i;
    }

    // we are either at the first set bit or have run out of bits
    for (; i < binary.Length; ++i)
    {
        // check if the current bit is set
        if (binary[i] == mask)
        {
            // check if the current gap is greater than the max candidate
            if (gap > max_gap)
            {
                // promote the current gap count to max candidate
                max_gap = gap;
            }

            // close the running gap count
            gap = 0;
        }
        else
        {
            // if zero then increase the running gap count
            ++gap;
        }
    }

    // at this point we have the greatest gap length or zero
    return max_gap;
}

public enum BinaryGapBit
{
    /// <summary>
    /// Look for gaps where the bits are unset.
    /// </summary>
    Unset = 0,

    /// <summary>
    /// Look for gaps where the bits are set.
    /// </summary>
    Set = 1
}

The algorithm above allows finding gaps of either zeroes or ones via the BinaryGapBit parameter, without affecting the iteration itself.

This is how it works for a zero-based gap:

Convert the integer number to its binary representation as a string.
Iterate through characters until the first '1'.
For each character in the string…
- Is it '1'? If so…
  - If the current gap length is longer than the maximum gap length, then promote it to the maximum gap length.
  - Reset the current gap length.
- Is it '0'? If so, increment the current gap length.
Return the maximum gap length.

It is a simple to understand algorithm, with apparent time-complexity O(N), if you consider an arbitrary number of bits to cover.

However, that apparent O(N) is hit by a curve ball from Convert.ToString(value, 2);, of which you can see the code here.

Looking at that source code, it becomes difficult to discern the actual time-complexity of this solution without empirical testing over a larger dataset than a single integer.

This is in addition to the string allocation by string binary = ... on the algorithm itself.

If one is writing performance-sensitive code, this can be one allocation too many.

An Efficient Approach

One efficient approach comes from realizing that one does not need to convert anything to binary at all.

Integer numbers are already binary values to start with, and many languages allow for some form of direct binary manipulation.

The algorithm below exploits this by creating a binary mask to check wheather a given bit is set or not, and then shifting it to simulate an iteration through all the bits.

public static int BinaryGapMaxLengthByShifting(this int value, BinaryGapBit bit = BinaryGapBit.Unset)
{
    // trackers
    int max_gap = 0;
    int gap = 0;
    int mask = 1;

    // if searching for gaps of ones just flip the bits in the search space
    if (bit == BinaryGapBit.Set)
    {
        value = ~value;
    }

    // shift until the first set bit
    while ((mask & value) == 0 && mask != 0)
    {
        mask <<= 1;
    }

    // shift one more time to skip it
    // this avoid a duplicate comparison on the next step
    mask <<= 1;

    // we are either at the first set bit or have run out of bits
    for (; mask != 0; mask <<= 1)
    {
        // check if the current bit is set
        if ((mask & value) == mask)
        {
            // check if the current gap is greater than the max candidate
            if (gap > max_gap)
            {
                // promote the current gap count to max candidate
                max_gap = gap;
            }

            // close the running gap count
            gap = 0;
        }
        else
        {
            // if zero then increase the running gap count
            ++gap;
        }
    }

    // at this point we have the greatest gap length or zero
    return max_gap;
}

While the algorithm above is a bit harder to read that the previous, its behaviour still follows the same train of thought.

Step 1 - Find First Set Bit

Create a mask of value 1 - this translates to binary 00000000000001 (plus other 18 other zeroes to left for a 32-bit integer).
While the mask is not zero…
- Apply a binary & (AND) operation between the mask and the value to see if the bit at the mask position is set.
- If yes, exit the loop.
- If not, shift the mask to the left and continue the loop.

The first iteration does not find a set bit in the last position because the & operation between both returns zero.

Value: 11100010110000
 Mask: 00000000000001
 --------------------
  AND: 00000000000000
                    ^

We therefore shift the mask one bit to the left with mask <<= 1 and try again…

Value: 11100010110000
 Mask: 00000000000010
 --------------------
  AND: 00000000000000
                   ^

Still no luck, so we shift and test a few more times. This time, the & operation returns a value that is equal to mask itself. This is our tell that we have found the first set bit.

Value: 11100010110000
 Mask: 00000000010000
 --------------------
  AND: 00000000010000
                ^--

We thus stop fast-scanning for a starting point and start counting gaps. We also shift the mask one bit further as there is no need to test the current bit again.

Value: 11100010110000
 Mask: 00000000100000
               ^

Step 2 - Count Gaps

This is also similiar to the simple approach but now we use binary shifting.

While the mask is not zero…
Test the mask against the value with the & (AND) operator.
- If the bit is set…
  - If the current gap length is longer than the maximum gap length, promote it to the maximum gap length.
  - Reset the current gap length.
- If the bit is not set, increment the current gap length.
Return the maximum gap length.

This is what it looks like…

The first test finds a set bit at the mask position. We therefore close the current gap count, which is still zero at this point.

Value: 11100010110000
 Mask: 00000000100000
 --------------------
  AND: 00000000100000
               ^

The next test returns zero. This means we can starting counting the current gap length.

Value: 11100010110000
 Mask: 00000001000000
 --------------------
  AND: 00000000000000
              ^

The next test returns the mask value. This means we found a set bit, so we take the current gap count of one, and promote it the maximum.

Value: 11100010110000
 Mask: 00000010000000
 --------------------
  AND: 00000010000000
             ^

We rinse and repeat a few more times, which will put the maximum gap length at 3 - the length of the gap between the fourth and sixth bit in the representation below.

Value: 11100010110000
 Mask: 10000000000000
 --------------------
  AND: 10000000000000
       ^-----

As we shift the mask to the left one last time, its set bit disappears, and the mask value itself becomes zero. This is our tell to stop the algortihm and return the current maximum length of 3.

Value: 11100010110000
 Mask: 00000000000000

Performance

What look like very similar algorithms can show dramatic performance differences.

This is a benchmark of the sample bit-shifting approach against the string-iteration approach as a baseline:

Method	Mean	Error	StdDev	Ratio
BinaryGapMaxLengthByIterating	658.9 ns	1.1043 ns	1.0330 ns	1.00
BinaryGapMaxLengthByShifting	144.2 ns	0.8616 ns	0.8060 ns	0.22

You can run the benchmark on your own machine from the Quicker repository.

The bit-shifting algorithm, by saving on the allocation, runs in less than a quarter of the time, or over four times faster… even for a single integer.

Takeaway

It pays off to obsess about performance, even in simple cases like this.

Through the compounding effect, a better algorithm can make a significant difference to our systems, our users and our business, if we just keep looking for things to improve.

And that is what problems such as this attempt to highlight.

On Distributed Computing