Bits, Bytes, and Binaries in LFE
Or: How I Learned to Stop Worrying and Love the Bit Stream
Introduction
In most programming languages, manipulating binary data at the bit level is rather like performing surgery with a pickaxe. It's technically possible, and if you squint hard enough while wielding your bit shifts and masking operations, you might even get something resembling the desired result. But it won't be pretty, and there's a non-zero chance you'll need therapy afterward.
LFE, inheriting Erlang's particularly sophisticated relationship with binary data, takes a different approach. Instead of making you grovel through arcane incantations of bitwise operations (though those are available if you're feeling nostalgic), it provides what can only be described as pattern matching on steroids. Or perhaps pattern matching that's been to finishing school and learned proper table manners. The result is that tasks which would normally require careful bit-shuffling and a comprehensive insurance policy can be accomplished with code that looks suspiciously like the specification of the data format you're working with.
This is not an accident. It's the kind of elegant design that makes you wonder what other programming languages have been doing with their time.
Why Binaries Matter
Let us consider, for a moment, the humble byte. Eight bits, representing values from 0 to 255 (or -128 to 127 if you're feeling signed about it). Unremarkable in isolation, perhaps, but stack enough of them together and you have:
- Network protocol packets pretending to be well-behaved structured data
- Image files storing millions of pixels that somehow form pictures of cats
- Audio streams carrying the complete works of humanity's musical genius (and also whatever's popular on the charts)
- Video codecs performing computational miracles that would have made our ancestors believe in magic
- Encrypted data that looks like someone spilled alphabet soup into a blender
In the world of telecommunications, distributed systems, and general data wrangling—which is to say, the world Erlang and LFE were designed for—the ability to efficiently pack, unpack, and pattern-match binary data isn't merely convenient. It's essential. It's the difference between "this will work" and "this will work while handling millions of messages per second without breaking a sweat."
The LFE Advantage
What makes LFE's approach to binaries particularly noteworthy is that it treats binary data as a first-class citizen with its own syntax and pattern-matching capabilities. You don't convert things to binaries as an afterthought; you work with them directly, naturally, and—dare we say it—elegantly.
Consider this: in many languages, if you wanted to parse an MPEG audio frame header (which consists of 11 bits of sync word, 2 bits of version ID, 2 bits of layer description, and so on), you'd be looking at code that resembles a particularly obtuse mathematical proof. In LFE, you write something that looks like:
(binary (11 (sync bits)) (2 (version bits)) (2 (layer bits)) ...)
And that's it. You've just pattern-matched against the bit-level structure of the data. The code reads like the specification. This is the kind of thing that makes programmers weep with joy, or at least nod approvingly while adjusting their glasses.
What You'll Learn
In the chapters that follow, we'll explore:
- The fundamentals: What binaries are, how they're represented in memory, and why the BEAM VM is particularly good at handling them
- Basic construction and manipulation: Creating binaries with the
#B(...)syntax and using built-in functions to dissect them - The bit syntax: The full glory of size specifications, type qualifiers, and pattern matching at arbitrary bit boundaries
- Pattern matching: Because if you can pattern-match lists and tuples, why not pattern-match the individual bits in a TCP packet?
- Binary comprehensions: List comprehensions for binaries, because sometimes you need to transform a stream of bytes and regular loops are just so... imperative
- Real-world applications: From parsing network protocols to implementing codec algorithms, because theory is nice but practice pays the bills
By the end of this exploration, you'll be able to look at a binary data format specification and translate it almost directly into working LFE code. You'll understand why binary pattern matching is one of the secret weapons that makes Erlang and LFE particularly well-suited for building telecommunications systems, distributed databases, and anything else that involves pushing large quantities of structured binary data through pipes at high velocity.
More importantly, you'll have in your toolkit an approach to binary data manipulation that doesn't feel like medieval torture. And in the grand scheme of programming, that's worth quite a lot.
A Note on Bitstrings
Before we dive in, a clarification: when the number of bits in your data is evenly divisible by 8, we call it a binary. When it's not—when you're dealing with, say, 13 bits or 47 bits—we call it a bitstring. Most operations work identically on both, but we'll use the more specific term when the distinction matters. Think of it as the difference between a byte-aligned data structure and one that wandered off the alignment grid to explore more interesting territory.
The convention is simple: binaries are the well-behaved, byte-aligned data structures your file I/O operations expect. Bitstrings are their more adventurous cousins that hang out at odd bit boundaries, usually because some protocol designer thought it would be clever to pack flags into 3 bits instead of wasting a whole byte.
Both are equally valid. Both are equally supported. And both, as we shall see, are equally elegant to work with in LFE.
Right, then. Shall we begin?