Add WAVE example file.

marhop · Sep 3, 2018 · 0ab8634 · 0ab8634
1 parent 0280e72
commit 0ab8634
Show file tree

Hide file tree

Showing 2 changed files with 204 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -44,7 +44,8 @@ programming](https://en.wikipedia.org/wiki/Literate_programming).
     with the `.nobin` class are ignored as well. This can be used to add code
     other than binary, or to "comment out" a whole code block.
 
-[Here is a complete example][example] describing a simple Bitmap image file.
+[Here is a complete example][example] describing a simple Bitmap image file; and
+[here is another one][example-wav] describing a WAVE audio file.
 
 # Usage
 
@@ -68,6 +69,7 @@ Windows, well, I don't know but surely somewhere sensible.
 [Pandoc]: https://pandoc.org
 [lb]: https://github.com/marhop/literate-binary
 [example]: examples/minimal.bmp.md
+[example-wav]: examples/minimal.wav.md
 [code blocks]: https://pandoc.org/MANUAL.html#verbatim-code-blocks
 [fenced code blocks]: https://pandoc.org/MANUAL.html#fenced-code-blocks
 [Stack]: https://docs.haskellstack.org/
diff --git a/examples/minimal.wav.md b/examples/minimal.wav.md
@@ -0,0 +1,201 @@
+# WAVE file example
+
+A WAVE file consists of a series of sections called chunks. The general
+structure of a chunk, the concept of building arbitrary files from chunks and
+the WAVE format itself as a specific instance of this concept are defined in the
+Resource Interchange File Format, or RIFF, specification. [Here is a PDF
+copy][RIFF]; the WAVE format description starts on page 56.
+
+Three chunks are necessary to build a WAVE file; the RIFF chunk, the format
+chunk and the data chunk. All chunks share the same basic structure:
+
+  1. The *chunk ID*, a four bytes ASCII string.
+  2. The *chunk size*, a four bytes number identifying the size of the chunk
+     data.
+  3. The *chunk data*, whose content is specific to the chunk type.
+
+Numeric fields are in little endian byte order.
+
+[RIFF]: http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/Docs/riffmci.pdf
+
+## RIFF chunk
+
+A chunk of this type is common to all RIFF based formats. Its chunk ID is
+"RIFF", and its chunk data consists of an ASCII string denoting the RIFF format
+instance ("WAVE" in this case), followed by the remaining chunks described
+below.
+
+    52494646 # chunk ID "RIFF"
+    24f40000 # chunk size
+    57415645 # chunk data "WAVE"
+
+Note that since the chunk data comprises all the rest of the file the chunk size
+equals file size minus 8.
+
+## Format chunk
+
+The format chunk is specific to WAVE files. Its chunk ID is "fmt " (note the
+trailing whitespace).
+
+    666d7420 # chunk ID "fmt "
+    10000000 # chunk size
+
+The chunk data is composed of several fields with a size of two or four bytes
+each that describe the format of the payload in the data chunk below.
+
+ 1. The WAVE format category that determines how the audio data in the data
+    chunk is encoded. Pulse code modulation (PCM) has format category 1.
+
+        0100
+
+ 2. The number of channels in the audio data. Two for stereo sound.
+
+        0200
+
+ 3. The sampling rate. 8000 (0x1f40) samples per second.
+
+        401f0000
+
+    Other common sampling rates include 44100 (CD quality), 48000, and 192000
+    samples per second.
+
+ 4. The average number of bytes per second. The RIFF specification defines this
+    as channels × bits per second × bytes per sample, but obviously it should be
+    samples per second (i.e., sampling rate) instead of bits per second. See
+    field 6 below for the number of bytes per sample.
+
+    2 × 8000 × 1 = 16000 = 0x3e80
+
+        803e0000
+
+ 5. The block alignment, that is, how many bytes have to be read for playback of
+    each sample: number of channels × bytes per sample.
+
+    2 × 1 = 2
+
+        0200
+
+ 6. The number of bits per sample, also known as bit depth. One byte for each
+    sample of audio data.
+
+        0800
+
+    Other bit depths like 16 bits (CD quality) are possible, but there is an
+    inconsistency in the RIFF specification: If one to eight bits are used this
+    value is interpreted as an unsigned integer, but with more than eight bits
+    it is interpreted as a signed integer ...
+
+The two most important pieces of information for interpretation of the data
+chunk are fields 3 and 6, sampling rate and bit depth. These are the central
+parameters of pulse code modulation.
+
+## Data chunk
+
+This chunk contains the actual audio data which is represented using pulse code
+modulation.
+
+Sound, from a physics perspective, is a vibration that can be visualized by a
+graph called waveform. (Josh Comeau has written a *really* great [tutorial about
+this][waveforms].) Consequently, digital representation of sound is a matter of
+encoding a waveform in binary. With [pulse code modulation][PCM], this requires
+two steps called [sampling] and [quantization]. First, the waveform is sampled
+at regular, uniform intervals, resulting in a list of values. The length of this
+list depends on the sampling rate; a sampling rate of 8000 samples per second
+obviously yields 8000 values per second of sound per channel. Second, the
+samples are mapped (or rounded, if you will) to the available range of numbers.
+This range is determined by the bit depth; with a bit depth of 8 bits per
+sample, the samples can be mapped to 2⁸ = 256 different numbers.
+
+The chunk data of a PCM WAVE data chunk consists of a simple series of quantized
+waveform samples. Samples for multiple channels are interleaved, so two lists of
+samples (c11, c12, c13, ...) and (c21, c22, c23, ...) representing two audio
+channels become (c11, c21, c12, c22, c13, c23, ...) in the chunk data. In this
+example, both stereo channels play the same sound so each pair of samples
+consists of the same two bytes. (Yes, that's like mono with doubled memory
+usage. It's just an example!)
+
+    64617461 # chunk ID "data"
+    00fa0000 # chunk size
+
+    # 80 stereo samples repeated 100 times = 8000 samples = 1 second at 100 Hz
+    (7f7f 8989 9393 9d9d a6a6 b0b0 b9b9 c1c1 caca d1d1 d9d9 e0e0 e6e6 ebeb f0f0
+    f4f4 f8f8 fafa fcfc fefe fefe fefe fcfc fafa f8f8 f4f4 f0f0 ebeb e6e6 e0e0
+    d9d9 d1d1 caca c1c1 b9b9 b0b0 a6a6 9d9d 9393 8989 7f7f 7575 6b6b 6161 5858
+    4e4e 4545 3d3d 3434 2d2d 2525 1e1e 1818 1313 0e0e 0a0a 0606 0404 0202 0000
+    0000 0000 0202 0404 0606 0a0a 0e0e 1313 1818 1e1e 2525 2d2d 3434 3d3d 4545
+    4e4e 5858 6161 6b6b 7575){100}
+
+    # 40 stereo samples repeated 200 times = 8000 samples = 1 second at 200 Hz
+    (7f7f 9393 a6a6 b9b9 caca d9d9 e6e6 f0f0 f8f8 fcfc fefe fcfc f8f8 f0f0 e6e6
+    d9d9 caca b9b9 a6a6 9393 7f7f 6b6b 5858 4545 3434 2525 1818 0e0e 0606 0202
+    0000 0202 0606 0e0e 1818 2525 3434 4545 5858 6b6b){200}
+
+    # 20 stereo samples repeated 400 times = 8000 samples = 1 second at 400 Hz
+    (7f7f a6a6 caca e6e6 f8f8 fefe f8f8 e6e6 caca a6a6 7f7f 5858 3434 1818 0606
+    0000 0606 1818 3434 5858){400}
+
+    # 10 stereo samples repeated 800 times = 8000 samples = 1 second at 800 Hz
+    (7f7f caca f8f8 f8f8 caca 7f7f 3434 0606 0606 3434){800}
+
+Of course, real-world waveform samples are not generated manually but by systems
+called [ADC]. But for the sake of example, this file contains four genuine
+handmade beeps. The remaining paragraphs explain how the samples in the
+preceding code block were obtained.
+
+In one of its simplest forms a waveform visualizing sound takes the shape of the
+sine function which is periodic, meaning it repeats its values in regular
+intervals and thus can be repeated over and over to produce a constant tone. The
+pitch of this tone is determined by the number of cycles of the sine function in
+a given unit of time; faster repetitions produce a higher pitch. The number of
+repetitions per second, the frequency, is measured in Hertz (Hz), so an 800 Hz
+tone has a higher pitch than a 200 Hz tone. (Do not confuse this frequency with
+the sampling rate which is also given in Hertz sometimes!)
+
+Consider the tone that corresponds to a 200 Hz sine waveform. The waveform has
+200 cycles per second, so using a sampling rate of 8000 samples per second one
+second of this tone would be represented by 200 repetitions of a pattern
+consisting of 8000/200 = 40 samples. (Contrast this with a 400 Hz sine waveform
+that needs 400 repetitions of 8000/400 = 20 samples.) Since the sine function is
+periodic with period 2π the 40 samples representing one cylce are sin(0),
+sin((1/40) × 2π), sin((2/40) × 2π), ..., sin((39/40) × 2π). Quantizing these
+values to a bit depth of 8 bits yields the bytes in the code block above.
+
+Here are some values for the 200 Hz (40 samples) example, where sample(x) =
+sin((x/40) × 2π) and quantized(y) = 127 + round(127 × y):
+
+x  | sample              | quantized | hex
+---|---------------------|-----------|----
+0  | 0.0                 | 127       | 7f
+1  | 0.15643446504023087 | 147       | 93
+2  | 0.3090169943749474  | 166       | a6
+   | ...                 | ...       | ...
+39 | -0.1564344650402311 | 107       | 6b
+
+And finally, here is the Haskell program that was used to generate the hex
+patterns in the code block above (something equivalent could be done in any
+other programming language, of course):
+
+~~~ {.nobin}
+import Text.Printf
+
+main = putStr $ patterns [80, 40, 20, 10]
+
+patterns = unlines . map (unwords . map fmt . samples)
+
+-- | Create x samples of one cycle of the sine function, quantized to integer
+-- values in the 8 bit range (0..256).
+samples x = map (quant . sample . (/ x)) [0 .. (x - 1)]
+  where
+    sample x = sin $ x * 2 * pi
+    quant x = 127 + round (127 * x)
+
+-- | Format number as hex string. Twice, because redundant stereo.
+fmt :: Integer -> String
+fmt x = printf "%02x%02x" x x
+~~~
+
+[waveforms]: https://pudding.cool/2018/02/waveforms/
+[PCM]: https://en.wikipedia.org/wiki/Pulse-code_modulation
+[sampling]: https://en.wikipedia.org/wiki/Sampling_(signal_processing)
+[quantization]: https://en.wikipedia.org/wiki/Quantization_(signal_processing)
+[ADC]: https://en.wikipedia.org/wiki/Analog-to-digital_converter