Coding for Magnetic Storage Mediums

Under a hard disk drive’s hermetically sealed housing that protects the platters, heads and electronics are some pretty sophisticated components that we’ve all been so accustomed to relying upon.

Under a hard disk drive’s hermetically sealed housing that protects the platters, heads and electronics are some pretty sophisticated components that we’ve all been so accustomed to relying upon. Beyond the constantly moving actuator arms and the high-speed 15,000 RPM spinning drive surfaces lays a set of interfaces that must miraculously perform without interruption for thousands of hours on end.

The electro-magneto mechanics of a disk storage system continually resolve minutely fine bits of information that reside on a magnetic coating deposited onto a revolving platter. This thin coating is virtually organized as a concentric set of circles called data tracks spreading from the near center of the platter to almost the outer edge. Each set of these data tracks is further divided linearly into data sectors that in turn store a set of fixed-sized data blocks.


Floating above the surface of the spinning platter, at the end of a stepping actuator arm, is the read/write head assembly, generally comprised of a circuit and an element, typically an inductor that polarizes the magnetic coating by generating negative and positive fields. Tiny impulses of current that indirectly represent the binary ones and zeros of the data stream are sent from the storage mediums’ cache memory to tiny coils that generate these fields.

The data sectors are constructed according to a set of defined mathematical constructs created by the algorithms of the disk drives’ core components. The data sectors are further grouped into data blocks that match operating system software parameters.

For the data read activity, the read/write head’s functions are reversed. The read element, typically a magneto-resistive element, senses the magnetic field stored in each data sector and “reads” those variations in the magnetic fields, recovering the impulse changes to a short term cache where it is transformed into data strings that ideally matching the same data that was originally stored to those sectors.

Data on a disk drive is represented only by the changing of a magnetic field, from positive (+) to negative (-) polarization and vice versa. A change in polarization is then converted into a change in a pulse’s direction, i.e., from a high to a low state or from a low to high state. The disk drive data itself has no understanding what that pulse shape really looks like, only that it either changed state or didn’t change state.


As data is moved from a cache to the drive itself, it is coded to prevent errors that are typically found in digital transmission systems. In order to mitigate mathematically repetitive sequences that can cause problems when stored onto a magnetic medium, the data streams received from the cache memory are generally coded using Run Length Limited (RLL) methodologies. RLL coding eliminates data sequences that can cause problems with the timing systems and aids in the recovery of data in magnetic storage systems.

As in any digital sequence, the orderly placement of the data is controlled by a timing sequence that arranges square wave like transitions, i.e., low to high and high to low changes, into a series of formats that can be recovered by either an external timing clock or through an asynchronous signal generated by the data string counts themselves.

Familiar forms of line coding used in Run Length Encoding (RLE) for communications signals are return-to-zero (RZ), non-return-to-zero (NRZ) and non-return-to-zero-inverted (NRZI). These coding sequence methodologies are employed in common broadcast system digital signal transports, such as SMPTE 259 (SDI), SMPTE 310 (a synchronous serial interface) and DVB-ASI (the asynchronous serial interface) as well as many other digital data transmission systems.

An example of line code with a self clocking signal, whereby the signal returns to zero (the falling side of the digital pulse stream) between each pulse, is RZ coding. The sequence takes place even if a consecutive set of zeros or ones occur in the signal stream. A self clocking signal therefore does not require an external reference clock for decoding purposes—however it requires twice as much bandwidth for the signal to survive the encode/decode process—and it will contain a DC component, which can induce baseline wander during long sets of continuous ones and zeros. The finer details of each of these methods will be reserved for a future installment.


To combat these issues, RLL coding places constraints on the number of consecutive ones and/or zeros that can occur in the data stream. The code rate is the parameter used to measure the efficiency of the RLL code. This coding sets a prescribed counter measure that says for every m-bits (or bytes) of data, an n-bit (or byte) encoded word is written to the string and added onto the storage media. This principal effectively eliminates any undesirable bit patterns in the data. (See Fig. 1.)

(click thumbnail)Fig. 1: Run length encodingIn RLL, the run-length describes the number of like bits for which a signal doesn’t change. A run-length of four for bit-1 represents a sequence of “1111.” The run-lengths in the string “001110111000001” are of length 2, 3, 1, 3, 5 and 1. RLL sequences are characterized by two parameters, (d) and (k), which stipulate the minimum and maximum run-length that is allowed to occur in the sequence. RLL’s are as a rule specified as (d,k) RLL; for example (1,3) MFM RLL; where MFM means modified frequency modulation.

Still, nothing is for free, and while RLL coding mitigates one problem with the sequence of data bits issue, it reduces storage capacity by increasing the channel bit density and also reduces the signal-to-noise ratio. These two impact the reliability of the data over the long run and must be balanced accordingly.


Recall that disk drives store information by changing the polarization of the magnetic particles on the drive’s surface. This makes for a somewhat different situation from the more familiar digital bit stream examples described in serial digital transport streams (e.g., SDI). Disk drive data sets differ because of the continuously changing fields themselves.

On a disk drive, changes in polarization are considered to be a “1-bit.” If there is no change for a time, the data is considered to be a “0-bit.” The actual signal on the disk might be “+----++---++++++” but this is usually read as “0100010100100000”—thus RLL coding for a disk is really about the upper and lower limits on how many zeros there must be between consecutive ones.

In addition to much deeper structures of how RLL works, there are other coding structures, such as group code recording and modified frequency modulation. Other subjects include error correcting codes and line code, self-synchronizing code and bit synchronization that are employed in both disk drive systems and in digitally encoded transmission systems.

Leave it to say that each manufacturer and software writer may use some, all or even their own proprietary means for their products—creating an even more mysterious set of questions for those inquiring minds.