Ken Shirriff's blog

In this article, I reverse-engineer the priority encoder in the ARM1 processor. By examining the chip layout provided by the Visual ARM1 project, I have determined how this circuit works and created a schematic.

The ARM1 chip is the ancestor of the extremely popular ARM processors used in most smart phones. The ARM1 is a good choice for reverse engineering since it was designed in 1985 and its simple RISC silicon circuits are easier to understand than modern processors. This article jumps into the chip details; if you want an overview of the ARM1 internals, start with my first article on reverse engineering the ARM1.

The priority encoder takes a 16-bit binary field, finds the bits that are set and outputs the 4-bit binary positions of these bits in sequence. For example, if the input field is 1000000000001011, successive outputs will be 0, 1, 3, and 15. (Bits are scanned starting with bit 0, the rightmost bit.) The priority encoder gets its name because it selects bits by priority (rightmost first) and encodes the result into binary.

The diagram below shows the layout of the priority encoder on the chip. It is implemented as 16 bit slices, one for each bit, arranged left to right ("backwards"). Slice 2 is highlighted in red; slices 5 through 13 have been cut out to make the image fit. The 16 input bits arrive through the data bus on the bottom and each bit enters a slice through one of the bit input lines (green). If the bit is currently the highest priority, the output encoder at the top of the slice generates the 4-bit binary value on the output bus. The pullups pull the output bus lines to the high state. Finally, the drivers amplify the output signals and send them to other parts of the chip.[1]

The priority encoder circuit in the ARM1 consists of 16 slices, one for each bit. One slice is highlighted in red. Slices 5 to 13 are omitted.

The priority encoder is a key part of the ARM processor's block data transfer instructions, which efficiently copy data between on-chip registers and memory storage.[2] These instructions can transfer any subset of ARM's 16 registers in a single instruction. The desired registers are specified by setting the corresponding bits in a 16-bit field in the instruction. The role of the priority encoder is to scan this field and determine which register to transfer during each step of the operation.

Implementation of the priority encoder

The schematic below shows one of the 16 slices in the priority encoder. The input bit from the bus, bus_bit enters at the bottom. The green bit select block determines if the bit is currently the high-priority bit. If so, bit_selected becomes 1. The output encoder (blue) puts the binary value associated with the selected bit onto the bus. Finally, the bit used latch (red) marks the bit as used, blocking it and allowing the next bit in sequence to be active in the next cycle. The two-phase clock signals Φ1 and Φ2 cause the priority encoder to move from bit to bit.

Schematic of the priority encoder in the ARM1 processor, showing one slice.

The bit selection logic (green) is fairly straightforward. The input clear_to_left is 1 if all the bits to the left are clear. If all the bits to the left are clear and the current bit is set, then this bit is selected by the priority encoder. This also blocks clear_to_left from being passed to the next slice. Otherwise, clear_to_left is passed along. Thus, as it passes through the circuit clear_to_left will be 1 until a bit is encountered, and then 0 from that point. If the final clear_to_left output is 1, then all bits are clear and encoding is done. The logic for clear_to_right is similar, allowing the highest-priority bit to be selected from the right instead. Normally the initial clear_to_left input is 0, and the initial clear_to_right bit is 1, enabling the left scan and disabling the right scan.

The bit used latch (red) keeps track of which bits have already been output. It is what allows the priority encoder to move from bit to bit each clock cycle. The two transmission gates (indicated with the four-triangle symbol) are clocked alternately so the bit_selected signal will move through the circuit after two half-clocks. Two NAND gates are connected as an SR latch to store this signal. Once a slice has selected a bit, the latch remembers that the bit has been used and blocks bus_bit from flowing into the bit select circuit. This allows the next bit in sequence to be selected. The bit used circuit also has a clear signal that resets the latch for a new instruction.

The bus pullup circuit (purple) and the output encoder (blue) work together to output the binary value corresponding to the selected bit. They use dynamic logic rather than standard gates to reduce the circuit size. This logic depends on the the clock and the capacitance of the output bus lines to generate the right values. In phase 2 of the clock, the bus pullup transistors pull the output bus lines high. Then, in phase 1, the output encoder in the active slice pulls the appropriate lines low so the bus will have the correct value. The schematic above shows the encoder for slice 6: the transistors attached to lines 8 and 1 pull them low, leaving 4 and 2 high; the resulting binary 0110 is 6. One set of pullup transistors supports the whole priority encoder, while each slice has its own output encoder transistors.

The output bus lines pass through drivers to boost the current; the signal on the output bus is relatively weak since it is generated by dynamic logic. The output flows to the register select circuit to select the appropriate register for the data transfer. See Dave Mugridge's article on ARM1 register selection for details on how registers are selected.

Discussion

The block data move transfer instructions in the ARM1 require two special functional units: the priority encoder and the bit counter (which I reverse-engineered earlier). These two circuits are highlighted in red in the ARM1 die photo below. Supporting block data transfers added significant complexity to the chip (about 3% by area), but the chip designers felt the performance gain from block transfers was worth it.

The ARM1 processor chip with major functional groups labeled. The bit counter and priority encoder used for the LDM/STM instructions are highlighted in red. These take up about 3% of the chip's area.ARM1 die photo courtesy of Computer History Museum.

One interesting thing about the priority encoder's design is alternating slices have inverted logic: NAND gates become NOR gates and vice versa. The reason is to avoid inverters between stages. You'll note on the schematic that the clear_to_right and clear_to_left outputs are inverted. The obvious design would add inverters to fix the polarity. However, this would add an extra gate delay in each stage, which is significant when the signal has to ripple through 16 stages. By "flipping" alternate stages, this delay is avoided. The trick of alternating stages to avoid inverters is used in other chips. For example, the 8085's incrementer and the 6502's ALU.

One surprise with the ARM1 priority encoder is it supports both low-to-high priority and high-to-low priority, but high-to-low priority is disabled and not used. That is, the rightmost clear_to_right is wired to 1, so the rightmost bit circuitry will never be active. The explanation for this unused circuitry is interesting.

When using the block data operations to push and pull registers on the stack, you'd expect to push R0, R1, R2, etc and then pop in the reverse order R2, R1, R0.[3] To handle this, the priority encoder needs to provide the registers in either order, and the address incrementer needs to increment or decrement addresses depending on whether you're pushing or popping, and the chip includes this circuitry. However, there's a flaw that wasn't discovered until midway through the design of the ARM1. Register 15 (the program counter) must always be updated last, or else you can't recover from a fault during the instruction because you've lost the address.[4]

The solution used in the ARM1 is to always read or write registers starting with the lowest register and the lowest address. In other words, to pop R2, R1, R0, the ARM1 jumps into the middle of the stack and pops R0, R1, R2 in the reverse order. It sounds crazy but it works. (The bit counter determines how many words to shift the starting position.) The consequence of this redesign was that the circuitry to decrement addresses and priority encode in reverse order is never used. This circuity was removed from the ARM2.

Conclusion

The priority encoder is a large functional unit in the ARM1 chip, used for the block data transfer instructions. By looking at one of the 16 slices in the encoder, the circuit can be reverse-engineered and understood. While largely built from standard logic gates, the circuit also uses transmission gates and dynamic logic for efficiency. One surprise is the priority encoder contains unused logic allowing it to work in either direction. This wasted circuitry is left over from a design change during the development of the ARM1.

Now that you've seen the internals of the priority encoder, you can use the Visual ARM1 simulator to see the circuit in action.[5]

Notes and references

[1] The drivers also invert and buffer clock signals that are used by the priority encoder.

[2] ARM's block data transfer instructions are called STM (Store Multiple) and LDM (Load Multiple), storing and loading multiple registers with one instruction. These instructions can be used for copying data or for stack push/pop, saving registers in a subroutine call or interrupt handler. Note that these instructions are not implemented in microcode, but in hardware that steps through the registers and memory. These instruction are explained in detail on the ARMwiki.

[3] The block data transfer instructions work for general register copying, not just pushing and popping to a stack. It's simpler to explain the instructions in terms of a stack, though.

[4] If an instruction encounters a memory fault (e.g. a virtual memory page is missing), you want to take an interrupt, fix the problem (e.g. load in the page), and then restart the instruction. However, if you update registers high-to-low, R15 (the program counter) will be updated first. If a fault happens during the instruction, the address of the instruction (R15) is lost, and restarting the instruction is a problem.

One solution would be to push registers high-to-low and pop low-to-high so R15 is always updated last. Apparently the ARM designers wanted the low register at the low address, even if the stack grows upwards, so popping R15 least wouldn't work. Another alternative is to have a "shadow" program counter that can restore the program counter during a fault. The ARM1 designers considered this alternative too complex. For details, see page 248 of "VLSI RISC Architecture and Organization", by Stephen Furber, one of the ARM1's designers.

[5] Thanks to the Visual 6502 team for providing the simulator and ARM1 chip layout data. If you're interested in ARM1 internals, also see Dave Mugridge's series of posts.

By carefully examining the layout of the ARM1 processor, it can be reverse engineered. This article describes the interesting circuit used for conditional instructions: this circuit is marked in red on the die photo below. Unlike most processors, the ARM executes every instruction conditionally. Each instruction specifies a condition and is only executed if the condition is satisfied. For every instruction, the condition circuit reads the condition from the instruction register (blue), evaluates the condition flags (purple), and informs the control logic (yellow) if the instruction should be executed or skipped.

The ARM1 processor chip showing the condition evaluation circuit (red) and the main components it interacts with. Original photo courtesy of Computer History Museum.

Why care about the ARM1 chip? It is the highly-influential ancestor of the extremely popular ARM processor. The ARM1 processor got off to a slow start in 1985 but now ARM processors are now sold by the tens of billions; your smart phone probably runs on ARM. This article is part of my series on reverse engineering the ARM1; start with my first article for an overview of the chip.

What are conditional instructions?

A key part of any computer is the ability of a program to change what it is doing based on various conditions. Most computers provide conditional branch instructions, which cause execution to jump to a different part of the program based on various condition flags. For example, consider the code if (x == 0) { do_something }. Compiled to assembly code, this first tests the value of variable x and sets the Zero flag if x is 0. Next, a conditional branch instruction jump over the do_something code if the Zero flag is not set.

The ARM processor takes conditionals much further than other processors: every instruction becomes a conditional instruction. Every instruction includes one of 16 conditions and the instruction is only executed if the condition is true; otherwise the instruction is skipped. (This is also known as predication.) The motivation is to avoid inefficient jumping around in the code.

The ARM manual excerpt below shows how four bits in each 32-bit instruction specify one of 16 conditions. Most of the conditions are straightforward, checking if values are equal, negative, higher, and so forth. Most instructions will use the "always" condition, which simply means the instruction always executes. The opposite "never" condition is not highly useful - an instruction with that condition never executes - but it can be used for a NOP, patching code, or adjusting timing of an instruction sequence.

Every instruction in the ARM processor has one of 16 conditions specified. The instruction is executed only if the condition is satisfied.

Studying the different conditions reveals much of how the condition circuit works. It is based on four condition flags. The zero (Z) flag is set if a value is zero. The negative (N) flag is set if a value is negative. The carry (C) flag is set if there is a carry or borrow from addition or subtraction. The overflow (V) flag is set if there is an overflow during signed arithmetic (details).

The top three bits of the instruction select one of eight conditions, as highlighted in yellow. The fourth bit selects the condition or its opposite (blue). If the fourth bit is 0, the condition must be true; if the fourth bit is 1, the condition must be false.

Implementation of the circuit

The implementation of the conditional logic circuit matches the above description. First, the eight conditions are generated from the four flags. One of the conditions is selected based on the three instruction bits. If the fourth instruction bit is set, the condition is flipped. The result is 1 if the condition is satisfied, and 0 if the condition is not satisfied. One unexpected part of the circuit is that an undefined instruction or and interrupt causes the condition to be cleared, preventing execution of the instruction. The resulting condition signal output is connected to a control part of the chip, where it causes the instruction to be executed or not, as desired.

The condition code evaluation circuit from the ARM1 processor.

The diagram above shows the condition code circuit of the chip as it appears in the simulator; this is a zoomed-in version of the red rectangle indicated on the die earlier. The chip consists of multiple layers, indicated by different colors. Transistors appear as red or blue regions. NMOS transistors are red; they turn on with a 1 input and can pull their output low. PMOS transistors (blue) are complementary; they turn on with a 0 input and can pull their output high. Physically above the transistors is the polysilicon wiring layer (green). When polysilicon crosses a transistor it forms the gate (yellow) that controls the transistor. Finally, two layers of metal wiring (gray) are above the polysilicon.

The circuit is arranged in columns. The first column of transistors forms the logic gates to generate the conditions from the flag values. The next column is the multiplexer, a circuit that takes the eight input conditions and selects one. The rightmost column contains 8 NAND gates that decode the three instruction bits into 8 control lines. Each line is fed into the multiplexer to select the corresponding condition. At the right is the wiring for the 3 instruction bits and their complements. A few miscellaneous gates are at the bottom of the multiplexer and decoder columns. These include inverters to complement the instruction bits.

The condition generation gates

The diagram below zooms in on the left third of the circuit above. This part of the circuit uses standard CMOS logic gates to computes the conditions from the flags. Each gate is built from NMOS (red) and PMOS (blue) transistors in a horizontal strip. Comparing the text description of conditions from the manual with the logic shows how the conditions are generated. For instance, the HI (unsigned higher) condition requires flags "C set and Z clear". The top three gates generate this condition. The GE (greater than or equal) condition is more complex, requiring flags "N set and V set, or N clear and V clear". The next two gates compute this value. (Due to the way CMOS gates are constructed, an OR-NAND gate is constructed as a single gate.) Likewise, the other conditions are generated. The AL (always) condition is simply a 1, and doesn't require any circuitry. The conditions are fed into the multiplexer, which will be discussed below.

The output coming back from the multiplexer is the selected condition, labeled "cond" below. The NAND and OR-NAND gates flip the condition if instruction register bit 28 (ireg28) is set. This implements the eight opposite conditions. The result is labeled "ok", indicating the overall condition is satisfied. The final three gates block instruction execution for an interrupt or undefined instruction.

Gates in the ARM1 processor generate the various conditionals from the flag values.

One thing I'd like to emphasize about the ARM1 is that its layout is very orderly and non-optimized. While it may appear chaotic, the gates are arranged by combining relatively fixed blocks ("standard cells") and wiring them together. Each gate forms a strip and the gates are stacked together in columns. The polysilicon and metal layers connect the gates as necessary.

The layout of the ARM1 chip is a consequence of the VLSI Technology chip design software used to create it. The resulting layout is simple, but doesn't use space very efficiently. Since the ARM1 uses very few transistors for its time, the designers weren't worried about optimizing the layout. In contrast, earlier chips such as the Z-80 were hand-drawn, with each transistor and wire carefully shaped to use the minimum space possible. The diagram below shows a small part of the Z-80 processor layout, showing the extremely irregular but dense arrangement of the chip. The transistors are not arranged in rows as in the ARM1 above, but fit together to use all the available space.

A detail of the Z-80 processor layout, showing the complex hand-drawn layout. Each transistor and wire is carefully shaped to minimize the chip's size.

The multiplexer and decoders

Selecting the desired condition out of the eight possibilities is the job of a circuit called the multiplexer. The multiplexer takes 8 inputs (the conditions) and 8 control signals (based on the instruction) and selects the desired condition. To the right of the multiplexer, 8 NAND gates generate the 8 control signals by decoding the three instruction bits. Each gate simply looks at three bit values and outputs a 0 if the bits select that condition. For instance, if the first two bits are 0 and the third is 1, the gate for condition 1 outputs a 0, selecting that condition in the multiplexer. The animation below shows the circuit as the instruction bits cycle through the eight conditions. You can see the activated condition moving downwards through the circuit.

Animation of the multiplexer in the ARM1 condition code evaluation circuit.

While a multiplexer can be built from standard logic gates, the ARM1 multiplexer is built from a different type of circuitry called transmission gates (which the ARM1 also uses in its bit counter). A multiplexer built from transmission gates is more compact and faster than one built from standard logic (NAND gates). One feature of CMOS is that by combining an NMOS transistor and a PMOS transistor in parallel, a transmission gate switch can be built. Feeding 1 into the NMOS gate and 0 into the PMOS gate turns on both transistors and they pass their input through. With the opposite gate values, both transistors turn off and the switch opens. The multiplexer is built from 8 of these CMOS switches. Each condition input feeds into one switch, and the switch outputs are connected together. One switch is turned on at a time, selecting the corresponding input as the output value.

The diagram below shows the schematic of the multiplexer as well as its physical layout on the chip. Only the first three segments of the eight are shown; the remainder are similar. Each input is connected to two transistors forming a CMOS switch. Because the NMOS and PMOS gates require opposite signals, the multiplexer has an inverter for each control signal. Each inverter also consists of two transistors, but wired differently from the switch.

Schematic and diagram of the multiplexer inside the ARM1 processor's condition code evaluation circuit.

Working together the decode circuit, inverters, and CMOS switches form the multiplexer that selects the desired condition from the eight choices. The logic described earlier allows this condition to be flipped, for a total of 16 possible conditions.

Conclusion

One unusual feature of the ARM instruction set is that every instruction has a condition associated with it and is only executed if the condition is true. The ARM1 chip is simple enough that the condition circuitry on the chip can be examined and understood at the transistor and gate level. Now that you've seen the internals of the condition logic, you can use the Visual ARM1 simulator to see the circuit in action. While the ARM1 may seem like a historical artifact of the 1980s, ARM processors power most smartphones, so there's probably a similar circuit controlling your phone right now.

Thanks to the Visual 6502 team for providing the simulator and ARM1 chip layout data. If you're interested in ARM1 internals, see my full set of ARM posts and Dave Mugridge's series of posts.

This article reverse-engineers the flag circuits in the ARM1 processor, explaining in detail how the flags are generate, controlled, and used. Condition flags are a key part of most computers, since they allow the computer to change what it does based on various conditions. The flags keep track of conditions such as a value being negative or zero or an overflow happening. Processors may also have status flags to control modes such as running in user mode versus protected (kernel) execution. The ARM1 processor stores these flags in a special register called the Processor Status Register (PSR).[1]

The ARM1 chip is interesting to examine not only because it is simple enough to understand but also because it was the first ARM processor. There are now tens of billions of ARM processors in use, probably powering your smartphone right now. This article is part of my series on reverse-engineering the ARM1. Processor flags seem like they should be trivial, but there's a lot more involved than you might expect. You might want to start with my first article for an overview of the chip.

The die photo below shows the ARM1 chip. This article concentrates on the flag logic, highlighted in red. As you can see, flags take up a significant part of the chip. The flags interact with many other parts of the chip: the trap control logic handles interrupts and exceptions; the register control logic handles access to the chip's registers including the program counter (PC); when the Arithmetic-Logic Unit (ALU) performs computations it stores status in the flags; the Barrel Shifter shifts or rotates values, sending shifted bits to the flags; and the Instruction Register holds instructions as they are read from memory and feeds them to the decode logic to be interpreted. In the upper left, the M0 and M1 pins indicate the mode bits stored in the flags. The article will describe how all these components interact with the flags.

The flag circuitry (red) in the ARM1 processor interacts with many other components of the chip. Original photo courtesy of Computer History Museum.

Some ARM1 background

This section summarizes a few features of the ARM1 processor that are important for understanding the flags. The ARM1 is a 32-bit processor with 16 32-bit registers called R0 through R15 (and some extra registers that will be described later). The processor has a 26-bit address space.

One unusual feature of the ARM1 processor is it combines the flag bits in the processor status register (PSR) and the program counter (PC) into a single register, R15, the PC/PSR. Because of the 26-bit address space, the top 6 bits of the 32-bit PC register are unused. In addition, instructions are always aligned on a 32-bit boundary, so the bottom two PC bits are always 0. These eight unused PC bits were instead used for flags, as shown in the diagram below.[2]

The Processor Status Register in the ARM1 processor is combined with the program counter. From page 2-26 of the ARM databook.

Four condition flags hold the status of arithmetic operations or comparisons. The negative (N) flag indicates a negative result. The zero (Z) flag indicates a zero result. The carry (C) flag indicates a carry from an unsigned value that doesn't fit in 32 bits. The overflow (V) flag indicates an overflow from a signed value that doesn't fit in 32 bits. The next two bits are used to enable or disable interrupts: the I flag controls regular interrupts, while the F flag controls the chip's special fast interrupts. The bottom two bits (M1 and M0) control the processor's execution mode: user, supervisor (kernel), interrupt handler, or fast interrupt handler. These modes will be discussed in more detail later.

Two instruction classes that are important to flags are the data processing instructions and the block data transfer instructions. Since the ARM has a simple, orthogonal instruction set, these operations can operate on the R15 with the flags as easily as any of the other registers.

The data processing instructions are the arithmetic-logic instructions. There are 16 types of data processing operations, such as addition, subtraction, Boolean operations such as AND, and comparison. Unlike most processors, the ARM makes updates of the condition flags optional. The instruction includes a bit called the "S" bit. If the S bit is set, the instruction updates the condition flags; otherwise the flags remain unchanged. The data processing instructions can also act on R15 directly, causing the flags to be read or modified.

The ARM also provides block data transfer instructions: LDM (load multiple) and STM (store multiple). These instructions load a selected set of registers from memory or store them to memory, for example popping registers from the stack or pushing them to the stack. These instructions can also use R15, accessing or modifying the flags.

Floorplan of the ARM1 chip, from ARM Evaluation System manual. (Bus labels are corrected from original.)

While the program counter (PC) and flags are architecturally part of the same register R15, they are physically separated on the chip, as you can see from the die photo and the floorplan diagram above. The flags are labeled PSR, above the ALU, while the PC is on the left of the register file. Interestingly, the original sketch for the ARM1 (below) show the PSR flags right next to the PC. While the final chip architecture largely matched the sketch, some components moved. In particular, several functional units were moved to the top of the chip, above the instruction bus (orange).

Original sketch of the ARM1 chip layout. Note the Processor Status Register (PSR) is on the left; the final chip put it above the ALU. Photo courtesy of Ed Spittles.

The flag circuitry

The diagram below shows the flag circuit of the chip as it appears in the simulator; this is a zoomed-in version of the red rectangle indicated on the die earlier.

The chip consists of multiple layers, indicated by different colors below. Transistors appear as red or blue regions. NMOS transistors are red; they turn on with a 1 input and can pull their output low. PMOS transistors (blue) are complementary; they turn on with a 0 input and can pull their output high. Physically above the transistors is the polysilicon wiring layer (green). When polysilicon crosses a transistor it forms the gate (yellow) that controls the transistor. Finally, two layers of metal wiring (gray) are above the polysilicon.

The flag circuit in the ARM1 processor. The eight flags are at the bottom, with control circuitry above.

The flag circuit above has been partitioned into several components. At the bottom are the circuits to store the eight flags. In the upper left, the flag control circuitry generates signals that control flag use and updates. The mode control circuit in the upper right generates the signals to update the mode bits M0 and M1. Finally, the register control circuit uses the mode bits to select a register bank. At the bottom is the wiring that connects the B bus, ALU bus, and flag inputs to the flag circuits.

The remainder of this article will start by discussing a single flag, the N flag at the bottom. Next it will describe the condition flags (V, C, Z and N) in more detail, along with how the flag control circuit (schematic) creates the control signals. This will be followed by an explanation of the mode flags (M0, M1) and the interrupt flags (F, I) and their control signals. The article ends with a discussion of the register bank select circuit.

The circuit to store a flag

This section discusses how the negative (N) flag works. The other flags operate similarly, but with some differences, and will be discussed in later section. The schematic below shows the circuit for the negative flag; this flag is at the bottom of the chip layout above. If you're expecting flags to be stored in a flip flop or regular latch, this circuit may seem unusual. Flags are stored in a dynamic two-phase flip-flop, which uses stray capacitance to store the value. The basic idea is the value goes around in a loop, amplified by the four inverters, and controlled by the clock. The trapezoids in the schematic are pass-transistor multiplexers[3] Each multiplexer has two inputs and two control lines; if a control line is active, the corresponding input is connected to the output.

Circuit for one flag (N) in the ARM1. The flag is stored in a two-phase dynamic latch. Two multiplexers (trapezoids) select values to store in the flag.

The storage loop consists of two parts, alternately connected by the clock. During the first clock phase, Φ1, the multiplexer on the left is inactivated by its inputs and generates no output. It holds its previous output due to stray capacitance at the point marked "hold during Φ1". The signal goes around the loop, through the Φ1 transistor on the right, and up to the input of the multiplexer. When the clock switches to Φ2, the multiplexer becomes active again, and the transistor on the right switches off. Now, the signal to the left of the transistor is held by the capacitance and flows around the loop until it reaches the transistor and is blocked. Thus, during each clock phase, half the loop is stable and half the loop can be updated. Alternatively, you can consider each half a simple latch and the two parts form a master-slave latch.

The main use of the condition flags is for conditional instructions — executing an instruction if the condition is satisfied. The flag out wire in the diagram goes to the conditional instruction logic which controls execution by checking the flag values to determine if the condition is satisfied (details),

The typical way the condition flags are updated is after performing a data processing operation, e.g. ADD. If the result is negative, the N flag is set; otherwise, the N flag is cleared. The multiplexer on the right allows the new flag value from the ALU to be selected instead of the recirculating value. This happens if the aluflag control signal is activated.

The second way to update the condition flags is to write to them directly, for instance to restore the flag values after handling an interrupt. The flags can be written from the ALU data bus (which is different from the flag value from the ALU described earlier). The multiplexer on the left selects this value instead of the recirculating value if the writeflags signal is active.

The condition flags can be read directly, for instance to save the flag values while handling an interrupt. The transistors on the left allow the flags to be written to the B bus when the psr_oen (PSR output enable) control signal is activated.

The diagram below zooms in on the chip layout of the N flag, which can be compared with the schematic. The wire that recirculates the flag from the right to the left is indicated. You can see the transistors that form the inverters and multiplexers. Details on how the red NMOS transistors and blue PMOS transistors work together to form inverters are here.

The circuitry for one flag (N/negative) in the ARM1 processor.

The conditions flags in detail

The flags all roughly follow the circuit described above, but there are differences since the flags have different behaviors. The schematic below shows the circuits for the four condition flags: V, C, Z and N. This section describes these flags in detail, along with how the control signals are generated. By comparing the chip logic with the documentation, we can see how the described behavior is implemented in the logic.

Generating the flags

Each flag is generated in a different way. The N (negative) flag is very simple. A signed number is negative if the top bit is set, so the N flag is simply loaded from the top bit of the ALU bus.

The Z (zero) flag is generated by the ALU. The ALU in effect does a NOR of all 32 output bits; if all bits are zero, the Z flag is 1. For efficiency, the ALU uses a chain of alternating NAND and NOR gates, but the effect is the same.

Generating the C (carry) flag is quite complicated. For arithmetic operations, the carry flag is the carry out from bit 31 of the ALU: this is the carry for addition and not-borrow for subtraction. The ARM1 supports a variety of shift operations, which affect the carry in different ways, so logic gates select different bits from the shifter depending on the instruction. It may be the bit shifted out on the left, the bit shifted out on the right, the carry flag, the left bit or the right bit.

The V (overflow) flag indicates overflow of a signed value. If two signed values are added or subtracted, the result may not fit in 32 bits, and this is indicated by setting the overflow flag. An overflow occurs if the carry out from bit 30 being different from the carry out from bit 31 and is computed by XOR of these two bits. I discuss signed overflow in detail here.

Schematic of the condition flags in the ARM1 processor: OVerflow, Carry, Zero, and Negative.

Updating the condition flags with results of an operation

One feature that distinguishes the ARM processor from most other processors is that condition flag updates are optional. If an arithmetic operation has the S bit (bit 20) set, the flags are updated, otherwise they are not. By looking at how the aluflag control signal is generated, we can see how this functionality is implemented.

The ARM manual explains how flags are updated by a data processing instruction (ADD, etc.)

If the aluflag control signal[4] is high, the multiplexer on the right will select the flag value generated by the ALU, rather than the recirculated value. The aluflag control signal is activated if pla1_aluproc from the instruction decoder is set (details) and if the S bit (bit 20) is set in the instruction register. The pla1_aluproc line is set when the ALU is doing a data processing operation, but not when the ALU is, for example, computing an address offset. This is why the condition flags are updated only for relevant operations. If an abort of the instruction occurs, aluflag is blocked, preventing the flags from being modified.

Arithmetic versus logic operations

The following text from the ARM databook explains the behavior of the condition flags during a data processing (ALU) operation. The part of interest is that the carry (C) and overflow (V) flags are treated differently for logical operations versus arithmetic operations.

The ARM manual explains how arithmetic and logic operations update the flags differently.

The schematic shows the circuits that explain this behavior. The control line pla1_aluarith is generated by the instruction decode logic (details); it is high if the ALU operation is an arithmetic operation (e.g. ADD), and low for a logic operation (e.g. AND). This control line selects the different C and V inputs for arithmetic or logical operations. For the C flag, this control line selects between the ALU's carry out and the shifter's carry out. (The shifter has a lot of logic because the carry out depends on the type and direction of shifting.) For the V flag, this control line selects between the ALU's overflow signal and the old V flag — this is why logic operations don't update the V flag.

Writing the flags directly

As described earlier, the flags and the Program Counter share register R15, so storing a value in R15 can update the flags. This is implemented through the multiplexer on the left. If control signal writeflags is activated, the multiplexer on the left will select the value from the ALU bus, rather than the recirculated value, updating the flags with the new value. Otherwise, nowriteflags is activated, selecting the recirculated value and leaving the flag unchanged. (Note that both writeflags and nowriteflags are inactive during clock phase Φ1, effectively disconnecting the multiplexer output.)

The generation of writeflags is relatively complicated. First, if pla_psrw this indicates a block copy instruction (LDM/STM) is writing to the PSR; if instruction register bit 22 (S) is set the flags will be updated. Second, aluflag (described above) indicates an ALU data processing operation should update the flags. In either of these cases, as long as abort is clear, and wpc (write PC) is set, then the nowriteflags1 signal is active. This signal is combined with the clock Φ2 to generate the writeflags and opposite nowriteflags signals sent to the multiplexer. This implements the logic described on page 2-34 for data processing instructions:

The ARM manual explains how flags are updated by the LDM block transfer instruction.

Reading the flags

Looking at the block diagram of the ARM1 process explains some of the behavior when reading the flags. A data processing instruction specifies three registers: the operation is performed on the first two registers and the result stored in the third. The first register (Rn) is read over the A bus. The second register (Rm) is read over the B bus and goes through the barrel shifter. The ALU generates the result of the operation, which is stored to a third register (Rd) via the ALU bus.

Block diagram of the ARM1 processor showing the flags. The flags are read via the B bus and written via the ALU bus. The flags also receive values directly from the ALU and shifter.

The block diagram above shows how the flags are connected to the chip's buses. The flags are separate from the register file; they are written via the ALU bus and read via the B bus. Thus, the flag value in R15 can only be accessed as the second register (Rm) via the B bus, and not as the first register (Rn) via the A bus. This explains the behavior described in the manual:

Depending on how it is accessed, register R15 in the ARM1 may or may not provide the flag values. From the ARM databook, page 2-35.

The process to write data to the B bus may seem backwards. The B bus is complemented, so a 1 on the bus indicates a 0 value. In more detail, the B bus is pulled high in clock phase Φ2 by transistors on the right of the register file (details). In clock phase Φ1, anyone writing to the bus sends a 1 by pulling the corresponding bus line low.[5] From the schematic, you can see that the control signal psr_oen (PSR output enable) controls putting the (complemented) flag values on the B bus. If psr_oen is active (only in phase Φ1) and the flag value is 1, the output transistors will pull the bus to 0.

The psr_oen signal is enabled to read the flags in two cases. The first happens when flags are being saved to R14 for a trap. The pla2_psren (PSR enable) signal controls this; it comes from instruction decoding at the start of a software interrupt (SWI), coprocessor instruction (i.e undefined instruction), or interrupt. The second case is when the R15 is being read via the B bus. This is indicated when pla2_ben (B Enable) and bpc (B bus PC) are active. The pla2_ben signal (PSR enable) comes from instruction decoding and is enabled at some point during most instructions. The register file generates the bpc signal when the B bus accesses the PC.

The mode and interrupt flags

This section discusses the M0 and M1 (processor mode) flags and the I and F (interrupt) flags. The behavior of these flags is different in several ways from the condition code flags, and their circuitry is significantly different.

The four modes of the ARM1 are:

M1	M0	Mode
0	0	User
0	1	Fast Interrupt (FIRQ)
1	0	Interrupt (IRQ)
1	1	Supervisor (SVC)

When an exception trap occurs, the trap logic directs the flag circuitry to switch the mode. An interrupt switches to Interrupt mode, a fast interrupt switches to Fast Interrupt mode, and any other exception (reset, undefined instruction, memory abort, etc) switches to Supervisor mode. The trap logic indicates the new mode through the signals psrbank1 and psrbank0:

Exception	psrbank1	psrbank0
Fast Interrupt	0	1
Interrupt	1	0
Reset	1	1
Other	0	0

Note that the psrbank values don't exactly match the M0/M1 values. The psrbank values pass through a few gates in the mode control logic to generate newM1 and newM0 which are stored into the flags.

As the schematic shows, control signal oldstatus causes the flags to keep their old value, while newstatus loads the new value when a fault occurs. The newstatus signal is generated from instruction decode signal pla2_banken, which is activated during a SWI (software interrupt) instruction, coprocessor instruction (causing an undefined instruction fault), or an interrupt. It is blocked by the abort signal. Otherwise oldstatus is activated. Both signals can only be active during clock phase Φ1.

Schematic of the status flags in the ARM1 processor: Mode 0 and 1, Interrupt, and Fast interrupt.

The other multiplexer signals are psr_t0, which loads the flags from the ALU bus, and psr_t1, which uses the value from the previous multiplexer. Both signals can be active only during clock phase Φ2, so the two multiplexers alternate. The psr_t0 signal is the same as writeflags used by the condition flags, except it is blocked if the mode flags indicate user mode. This is how the ARM1 prevents the mode and status flags from being updated in User mode (which is necessary for security). The psr_t1 signal is the opposite of psr_t0 (not exactly inverted since both are low during Φ1).

Moving on to the interrupt flags, any fault causes the I flag to be set (preventing an interrupt while the fault is being handled). This is accomplished by the 1 input to the I register multiplexer. The F flag is set (blocking fast interrupts) on reset and when a fast interrupt occurs. The schematic shows that F will be set if psrbank0 is high, and keeps its old value otherwise (via the OR gate). Since psrbank0 is high for fast interrupts and reset, the desired behavior is obtained.

One interesting thing about the M0 and M1 flags is they are connected directly to the M0 and M1 output pin driver circuits, shown below. This circuit supports tri-state output (electrically disconnecting the output so the signal can be controlled externally) as well as input, even though neither of these features is used for the M0 and M1 pins. The reason is the same pin driver circuit is reused for all the ARM1 output pins regardless of whether or not they need these features. This is another example of how the ARM1 was designed for simplicity, rather than optimizing the design. Note that large transistors to provide the output current to the pin.

Driver for the M0 mode output pin. Much of the circuit is unused, since the same circuit is used for most I/O pins.

Register control

One feature of the ARM1 processor is has multiple register banks, controlled by the mode flags. While there are 16 logical registers (R0 through R15), there are 25 physical registers. Each of the four modes has its own R13 and R14. The fast interrupt mode also has its own R10, R11 and R12.[6] These register banks improve performance by allowing interrupt handlers to use registers without needing to save the user registers.

The flag circuitry generates the signals that select the register bank. These signals go to the registers control circuitry next to the registers, where they are used to select particular registers details). The bank select signals are
bs0: general (non-fast-interrupt) registers.
bs1: fast interrupt registers.
bs2: regular interrupt registers.
bs3: supervisor registers.
bs4: user registers.

These (low-active) signals are generated from the M0 and M1 flags, which specify the mode. Registers R10-R12 use bs0 and bs1 to select the appropriate bank for fast interrupts or otherwise. Registers R13 and R14 use bs1, bs2, bs3 and bs4 to select between the four register banks.

One complication is for LDM/STM instruction, the S flag causes the user register bank to be used instead of the expected register bank. (This is a feature so interrupt handlers can access user registers if desired.) This happens if the pla2_psrw line is high, indicating a LDM/STM instruction; instruction register bit 22 is high (the S bit for LDM/STM); and pla2_nben is low, indicating bus B enabled. The pla2_psrw and pla2_nben signals are generated by the instruction decode circuits (details).

Conclusion

I expected to write a brief article on the ARM1 flags, but the topic turned out to be more complex than I expected. This article got a bit out of hand, so congratulations if you made it to the end! The flags are not the simple 8-bit register I expected, but are stored in dynamic latches with many control lines and inputs. With careful examination, it is possible to explain how the features and special cases described in the manual are implemented in the circuits. Studying the flags also explains the function of several of the control signals generated by the instruction decoder.

Now that you've seen the internals of the flag logic, you can use the Visual ARM1 simulator to see the circuit in action. Thanks to the Visual 6502 team for providing the simulator and ARM1 chip layout data. For more articles on ARM1 internals, see my full set of ARM posts and Dave Mugridge's series of posts.

For my latest articles, follow me on Twitter here.

Notes and references

[1] Flags do not need to be bits in a register. The IBM 1401 and Intel 8008, for instance, do not have status flags as part of a register. Flags in these computers were not assigned bit positions but exist more abstractly. The Z-80 on the other hand, stores flags both in discrete latches and in a flag register, copying the flags between the two. The MIPS architecture doesn't have condition flags at all, but does both the test and the branch in the conditional branch instructions.

[2] Was combining the flags and program counter into a single register in the ARM1 a clever idea or just bizarre? On the positive side, this allowed the flags and PC to be saved or restored in a single transfer, rather than two operations. It also allowed flags to be accessed without special flag instructions. On the negative side, restricting the address space to 26 bits was bad in the long term. This decision also prevented adding more flags in the future. Combining the flags and PC in register R15 also required special-case handling for R15 for many instructions.

The ARM architecture moved away from the combined PC/flags with the ARMv3 architecture. The flags were moved to separate registers: CPSR (Current processor status register) and SPSR (Saved Processor Status Register), allowing 32-bit addressing as well as additional flags and modes. New instructions (MSR, MRS) were added to access the CPSR and SPSR. (One ARMv3 processor of note is the ARM610, used in the Apple Newton.) Details on the historical and modern ARM status registers are here.

(The ARM numbering scheme is rather confusing. Architecture version numbers (e.g. ARMv3) don't match up with the CPU numbers (e.g. ARM6). More information on the ARM family numbering is here.)

[3] I discussed how the multiplexers in the ARM1 work earlier. In brief, each input has an NMOS and PMOS transistor working together as a switch, allowing the input to be connected to the output. The schematics show a single control line for each input; the implementation has two lines since the PMOS control signal must be inverted.

[4] Each signal in the simulator has a reference number that can be used to cross-reference the signals in other articles. Here are the key control signals used in the flags circuitry and their reference numbers:

abort	1591, 1655
aluflag	2021
bpc	8076
bs0	8077
bs1	8078
bs2	8079
bs3	8080
bs4	8081
instruction reg 22	8141
instruction reg 20	8139
newM0	2273
newM1	2272
newstatus	2244
nowriteflags	1654
nowriteflags1	1657
oldstatus	2177
pla_psrw	8273
pla1_aluarith	8059
pla1_aluproc	8064
pla2_banken	8075
pla2_ben	8275
pla2_nben	8186
pla2_psren	8272
pla2_psrw	8273
psr_oen	8281
psr_t0	8282
psr_t1	8283
psrbank0	8270
psrbank1	8271
wpc	8358
writeflags	1640

[5] You might wonder why the bus works in this way. This clocked dynamic logic is simpler than using logic gates to control the signal on the bus; only two transistors are needed to write a bit to the bus and they can be attached to the bus at any location. But why complement the bus? The reason is that it's easier with CMOS to pull a line low than to pull a line high. An NMOS transistor can provide more current than a similar PMOS transistor. And the reason for that is electrons (which carry the charge in NMOS) move faster than holes (which carry the charge in PMOS). Ultimately, the B bus is complemented due to semiconductor physics. (The Z-80 is another chip that has as complemented data bus.)

[6] Later versions of the ARM architecture introduced additional modes and more duplicated banks. Details are at ARMwiki.

When a computer executes a machine language instruction, it breaks down the instruction into smaller steps that are performed in sequence. For instance, a load instruction might first compute a memory address, then fetch a value from that address, and then store that value in a register. This article describes how the ARM1 processor implements instruction sequencing, performing the right steps in order. I also look briefly at the 6502 and Z-80 microprocessors and the different sequencing techniques they use.

The die photo below shows the ARM1 processor chip, with the relevant functional blocks highlighted. This article focuses on the instruction sequence controller which moves step-by-step through an instruction in sequence. The instruction decode section specifies the steps that need to be performed for each operation and communicates with the sequence controller. The priority encoder tells the sequence controller when to stop block transfer instructions.

The ARM1 processor, showing the instruction sequence controller and other parts of the chip that interact with the sequence controller.

You might wonder what relevance a processor from 1985 has today. The ARM1 processor is the ancestor to the immensely popular ARM processor architecture that is used in smartphones and many other systems. Billions of ARM processors have been sold and you probably have one in your pocket now executing the same instructions I discuss in this article. I've written multiple articles about reverse engineering different components of the ARM1; start with my first article for an overview of the chip.

ARM1 instructions and their sequencing

Instructions on the ARM1 range from simple instructions that take one cycle to more complex multi-cycle instructions.[1] Some instructions, such as adding the values in two registers, don't require sequencing because they complete in a single clock cycle. The ARM1 instruction to load a register from memory (LDR) is more complex, consisting of three steps and requiring three clock cycles.[2] In the first step, the memory address is computed. In the second step, the data word is fetched from memory. At the same time, the address register is updated. In the final step, the data is stored into a register. The instruction sequence controller is responsible for stepping through these three steps.

The most complex instructions on the ARM1 are the block data transfer instructions, which read or write multiple registers to memory. A 16-bit bitmask in the instruction specifies the registers to transfer. The number of steps used by the instruction is variable because the read/write step is repeated up to 16 times to copy the specified registers, To support this, the sequence controller implements conditional loops. The ARM1 contains a priority encoder circuit that scans through the register selection bits in order and signals the sequence controller when the transfers are done.

The sequence controller

The instruction sequence controller on the ARM1 is responsible for sequencing the steps of an instruction by providing a cycle number (0 to 3). It also must restart at the end of each instruction. It must repeat cycles as necessary for block transfers.[3]

To move between steps, the sequence controller has four sequence operations that it can perform each clock:

END: the instruction ends and an new instruction starts next step.
NEXT: the sequence controller moves to the next cycle number.
IF23: this conditional provides branching and looping for block data transfer —if not done, it stays on cycle 2; otherwise it goes to cycle 3.
IF1E: similar to IF23, if not done, it stays on cycle 1; otherwise it goes to the end cycle (0).

How does the sequence controller know which operation to perform? This information, along with the other control signals, is provided by the instruction decoder. The instruction decoder can be thought of as holding 42 microinstructions, each 36 bits wide.[4] The instruction decoder provides the appropriate microinstruction for each instruction type and cycle number. These microinstruction bits generate control signals for the chip.[5] Two bits in the microinstruction (seqs1 and seqs0) provide the operation to the sequence controller, indicating how to compute the next cycle number. Normally this will be NEXT, until the last microinstruction which specifies END. Thus, the instruction decoder and the sequence controller work together: the sequence controller's cycle number tells the instruction decoder which microinstruction to use, and the instruction decoder tells the sequence controller how to compute the next cycle number.

The sequence controller circuit

Schematic of the instruction sequencing circuit from the ARM1 processor.

The schematic above shows the circuitry for the instruction sequence controller. The overall idea is the instruction decoder indicates how to compute the next cycle through signals seqs1 and seqs0. The sequence controller produces outputs seq1 and seq0, which tell the instruction decoder the next cycle number. The next cycle values are selected by two multiplexers, which pick one of the four input values based on the control inputs, as shown in the following table. The loop values depend on the pencz signal from the priority encoder, which indicates no more registers to process.

Input	Seq1	Seq0
00 (END)	0	0
01 (NEXT)	seq1' xor seq0'	not seq0'
10 (IF23)	1	pencz
11 (IF1E)	0	not pencz

It's straightforward to verify that this logic implements the sequencing described earlier:

Input 00 (END) results in output cycle 0.
Input 01 (NEXT) increments the old cycle (seq1', seq0') by 1.
Input 10 (IF23) will output cycle 2 until pencz is triggered, and then output cycle 3.
Input 11 (IF1E) will output cycle 1 until pencz is triggered and then output cycle 0, the END cycle.

The schematic shows that the sequence controller circuit provides two other outputs. In cycle 0, the circuit outputs the newinst signal, indicating to the rest of the chip that a new instruction is starting. The abortinst signal indicates that the instruction should not be executed because its condition was not satisfied. It is based on the skip input, which comes from the conditional instruction circuit, and is set if the instruction should be skipped.[6] If the instruction should be skipped, abortinst is asserted in cycle 0, forcing the next cycle to be 0 and terminating the instruction after a single cycle. The abortinst signal is also used elsewhere to prevent the skipped instruction from having any effect. Thus, a skipped instruction is effectively a one-cycle NOP instruction.

The implementation of this circuit uses a two-phase clock and dynamic latches to move from step to step. The multi-triangle symbol in the schematic is a transmission gate, used frequently in the ARM1 to build dynamic latches. A transmission gate can be thought of as a switch that closes during the specified clock phase. When the switch opens, the charge stored on the capacitance of the wire holds the previous value, forming a dynamic latch. The clock itself is two phase: First the Φ1 signal is high and the Φ2 is low, and then they alternate. One transmission gate is open during Φ1 and the other is open during Φ2. You can think of it like people moving through double doors: when the first door is open, they can move through it, but must wait for the second door to open. In this manner, the signal progresses through the circuit under the control of the clock and the cycle count updates once per complete clock cycle.

Comparison with the 6502's control logic

This section briefly looks at the 6502 chip, which uses different techniques for sequencing instructions. The 6502 controls each instruction by stepping sequentially through a time step each clock cycle: T0 through T6. Some instructions are quick, ending after two cycles, while others can take all 7 cycles. Instead of a binary counter, the 6502 keeps track of the current T cycle with a shift register with a single active bit that indicates the current cycle (a ring counter). That is, a separate control line is activated during each T cycle, which makes the rest of the control logic easier to implement. When the last T cycle for a particular instruction is reached, the control logic generates a signal (inexplicably called METAL) to reset the shift register to T0 for the next instruction.[7]

Interesting 6502 fact: if you execute an illegal instruction known as KIL (kill), the T reset signal doesn't get generated and the timing bit falls off the end of the shift register. The 6502 ends up in no T state at all and stops generating control signals. This locks up the chip until a hardware reset is triggered.

Layout of the 6502 processor. Die photo courtesy of Visual 6502.

The die photo above shows the layout of the 6502 processor. Note that the control logic (Decode PLA and Random control logic[8] ) takes up about half the chip. At the top, the PLA (Programmable Logic Array) implements the first step of decoding. Below that, the gate logic generates the control signals, using the PLA outputs. The datapath in the bottom part of the chip contains the registers, ALU (arithmetic logic unit), and buses. It performs the operations as instructed by the control signals.

The PLA is a structured collection of NOR gates, which is visible in the die photo as a regular grid. It takes as inputs the instruction and the timing state, and outputs 130 different control signals, which indicate a combination of a timing state and instruction class, such as "T1.DEX" or "T4.X,IND". The PLA outputs are combined and processed by many gates to generate the final control signals, for instance S/SB (connecting the S and SB buses) or SUM (instructing the ALU to compute a sum).

To compare the ARM1 and 6502, they both use sequential timing states to control the instructions but the implementations are different. The 6502 uses a shift register to sequentially move through states. The more complex sequence controller in the ARM1 provides looping on a state. The 6502 has more states (7 vs 4), but looping in the ARM1 allows longer instructions. The ARM1's sequence controller is highly structured, with sequencer "commands" generated by the PLA; an END command reset the sequence controller to end each instruction. The 6502 uses a combination of a PLA and hard-wired logic to control the sequence; the METAL signal resets the shift register to end each instruction.

Comparison with the Z-80's control logic

The Z-80 uses a much more complicated system for instruction sequencing. An instruction is made up of multiple M (memory) cycles, one for each memory access during the instruction. Each M cycle consists of multiple T (time) states. For example an instruction could take 3 T states for the first M cycle and 4 for the second, going through the states: M1T1, M1T2, M1T3, M2T1, M2T2, M2T3, M2T4. More complex instructions can have 6 machine cycles and 23 T states.

Layout of the Z-80 processor. Data courtesy of Visual 6502.

The diagram above shows the layout of the Z-80. The control logic consists of the circuitry to generate the state timing signals, the PLA that decodes instructions,[9] and the random logic to generate control signals below. The chip's datapath (registers and ALU) are at the bottom of the chip. (You may be surprised that the Z-80 has a 4-bit ALU.)

The Z-80's control logic is implemented using a shift register ring counter for the M cycles and a second shift register ring counter for the T states. At the end of an M cycle, the M cycle counter advances to the next cycle and the T state counter resets. Like the 6502, the Z-80 uses a PLA and random logic for instruction decoding, but the details are different. The Z-80 has an AND/OR PLA that generates outputs for different instruction classes, from a single instruction like "LD SP, HL" to larger classes such as a conditional jump or a load. In comparison, the 6502's PLA has a single NOR plane that combines both instruction decoding and timing.

The Z-80 uses complex gates to combine the instruction signals with the timing signals to generate the control signal. A typical gate is structured as: "generate a signal to do something for instruction X in M1T1 or instruction Y in M2T3 or instruction Z in M2T6". The chip layout for these signals has an interesting structure, shown below: the signals T1 to T5 and M1 to M5 run horizontally in the metal layer (faint gray), while the instruction signals (A, B, C) run vertically in polysilicon wires (green). Transistors (yellow) are formed where polysilicon wires cross the silicon (red). This creates a complex, multi-input AND/NOR gate that generates a control signal for the right combination of M, T, and instruction signals. Due to the structure of MOS circuits, this complex gate is constructed as a single gate.

One gate from the Z-80 to generate a control signal at the right time by combining M cycle and T state signals. Neighboring gates have similar structures to generate other control signals.

The AND/NOR gate above computes not ((A and M4 and T3) or (B and M3 and T5) or (C and M1 and T2)). It has three vertical red paths from ground to the output (one uses a hard-to-see horizontal metal connection); since any of these paths can form a connection this creates a three-input NOR gate. Each path has three yellow transistors; all three transistors must be active to complete the path, so this forms a three-input AND gate.

In this gate, A, B, and C are instruction decode signals. Signal A, for instance, is triggered by an indexed load instruction. The output of this gate controls writes to the registers. Thus, an indexed load instruction will trigger the control signal at time M4 T3, causing a register write. To summarize, the Z-80 uses gates such as these to generate control signals when instructions are at a specific point in the M and T cycle.

The Z-80's instruction sequencing is much more complex than the ARM1 and 6502. The Z-80 sequences instructions through M cycles, each of which is composed of multiple T states. Like the 6502, the Z-80 uses a combination of a PLA and random logic to generate the control signals. The 6502 combines instruction decoding and timing signals in the PLA, while the Z-80 uses the PLA only for instruction decoding. The Z-80 uses complex multi-input gates to generate control signals by combining the decoded instructions with timing signals. Like the ARM1, the Z-80 can loop over states to support block data operations.

Conclusion

The ARM1, Z-80 and 6502 use very different techniques for sequencing instructions. The ARM1 can use a simple, highly structured sequence controller because of its simple RISC instruction set. The 6502 and Z-80 are implemented with a PLA in combination with hard-wired "random" logic. You can see these chips in action with the Visual 6502 team's simulators: Visual ARM1 simulator and Visual 6502 simulator.

For more articles on ARM1 internals, see my full set of ARM posts and Dave Mugridge's series of posts. This article builds on Dave's article on Instruction decoding and sequencing. Thanks to the Visual 6502 team for providing the die photos and chip layout used in this analysis.

Follow me on Twitter here for updates on my latest articles.

Notes and references

[1] The ARM1 is a reduced instruction set computer (RISC) with relatively simple instructions. The typical RISC chip performs at most one memory access per instruction, making instruction sequencing straightforward. However, the ARM1 processor has instructions that are more complex than typical for a RISC processor, such as block data transfer instructions that can access 16 memory words. Some people suggest the ARM is not really RISC, but the R in ARM does stand for RISC.

[2] The LDR (Load Register) instruction described is is similar to the C statement Rd = *Rn++;, but it can do more. I've simplified the explanation of the LDR instruction since it provides a variety of addressing mechanisms. Full details are here.

[3] The ARM1 uses state looping for the block data transfer instructions. The ARM2 also uses the same loop functionality for multiplication and coprocessor operations. (Multiplication uses Booth's multiplication algorithm, invented in 1950. The multiplier does shifts and adds until all the bits are handled.) The book VLSI Risc Architecture and Organization by ARM architect Furber briefly discusses the sequence controller on page 303.

[4] See the article Inside the ARMv1 —instruction decoding and sequencing for discussion of how instruction decoding works in the ARM1. The instruction decoder is implemented with a PLA (Programmable Logic Array). It may be controversial to call its rows microinstructions, but I think that's the best way to understand its operation. Unlike the PLA in the 6502 or Z-80, the ARM1's instruction decode PLA operates more like a ROM, with exactly one row active at a time, and it steps through these rows sequentially. These rows can be considered microinstructions that generate the control signals. I wouldn't call the ARM1 more than partially microcoded because the majority of the chip's control logic is hardwired.

[5] For an example of microinstructions, consider the load register instruction described earlier that takes three cycles. It has three microinstructions. The control signals specified by the first microinstruction tell the ALU to add the base register and offset, put this on the address bus, and perform a memory read. The second microinstruction tells the ALU to compute the new base register value and write it to the register. The third microinstruction stores the fetched value in the destination register and terminates the instruction. Thus, each cycle of an instruction has a microinstruction specifying what to do during that cycle.

[6] An instruction is skipped if the condition is false, the instruction is not an undefined instruction, and a fault is not in progress. As a consequence, an undefined instruction will cause an exception even if its condition is false and it wouldn't be executed.

[7] The first two timing states in the 6502 (T0 and T1) are more complex than a shift register in order to handle some special cases and to optimize two-cycle instructions.) For more information on 6502 instruction sequencing, see 6502 State Machine and How MOS 6502 Illegal Opcodes really work. The contents of the 6502 PLA are described here.

[8] "Random logic" describes unstructured logic that appears random; it isn't actually random, of course.

[9] The regular grid structure of the AND plane of the Z-80's decode PLA's is visible in the layout diagram. The structure of the OR plane is less visible, since the PLA has been optimized so multiple terms can fit in one row. For more than you ever wanted to know about PLA optimization in early microprocessors, see The Architecture of Microprocessors by F. Anceau, 1986. This book is a wealth of information on microprocessors of the early 1980s, but is dense and somewhat academic.

If you've played around with electronic circuits, you probably know[1] the 555 timer integrated circuit, said to be the world's best-selling integrated circuit with billions sold. Designed by analog IC wizard Hans Camenzind[2] in 1970, the 555 has been called one of the greatest chips of all time with whole books devoted to 555 timer circuits.

Given the popularity of the 555 timer, I thought it would be interesting to find out what's inside the 555 timer and how it works. While the 555 timer is usually sold as a black plastic IC, it is also available in a metal can, which can be cut open with a hacksaw[3] revealing the tiny die inside.

Inside the 555 timer. The tiny die in the package is connected to the 8 pins by wires.

A brief explanation of the 555 timer

The 555 timer has hundreds of applications, operating as anything from a timer or latch to a voltage-controlled oscillator or modulator. The diagram below illustrates how the 555 timer operates as a simple oscillator. Inside the 555 chip, three resistors form a divider generating references voltages of 1/3 and 2/3 of the supply voltage. The external capacitor will charge and discharge between these limits, producing an oscillation. In more detail, the capacitor will slowly charge (A) through the external resistors until its voltage hits the 2/3 reference. At that point (B), the upper (threshold) comparator switches the flip flop off and the output off. This turns on the discharge transistor, slowly discharging the capacitor (C). When the voltage on the capacitor hits the 1/3 reference (D), the lower (trigger) comparator turns on, setting the flip flop and the output, and the cycle repeats. The values of the resistors and capacitor control the timing, from microseconds to hours.[4]

Diagram showing how the 555 timer can operate as an oscillator.

To summarize, the key components of the 555 timer are the comparators to detect the upper and lower voltage limits, the three-resistor divider to set these limits, and the flip flop to keep track of whether the circuit is charging or discharging. The 555 timer has two other pins (reset and control voltage) that I haven't covered above; they can be used for more complex circuits.

The structure of the integrated circuit

The photo below shows the silicon die of the 555 through a microscope. On top of the silicon, a thin layer of metal connects different parts of the chip. This metal is clearly visible in the photo as yellowish-white traces and regions. Under the metal, a thin, glassy silicon dioxide layer provides insulation between the metal and the silicon, except where contact holes in the silicon dioxide allow the metal to connect to the silicon. At the edge of the chip, thin wires connect the metal pads to the chip's external pins.

Die photo of the 555 timer.

The different types of silicon on the chip are harder to see. Regions of the chip are treated (doped) with impurities to change the electrical properties of the silicon. N-type silicon has an excess of electrons (negative), while P-type silicon lacks electrons (positive). In the photo, these regions show up as a slightly different color surrounded by a thin black border. These regions are the building blocks of the chip, forming transistors and resistors.

NPN transistors inside the IC

Transistors are the key components in a chip. The 555 timer uses NPN and PNP bipolar transistors. If you've studied electronics, you've probably seen a diagram of an NPN transistor like the one below, showing the collector (C), base (B), and emitter (E) of the transistor, The transistor is illustrated as a sandwich of P silicon in between two symmetric layers of N silicon; the N-P-N layers make an NPN transistor. It turns out that transistors on a chip look nothing like this, and the base often isn't even in the middle!

Schematic symbol for an NPN transistor, along with an oversimplified diagram of its internal structure.

The photo below shows one of the transistors in the 555 as it appears on the chip. The slightly different tints in the silicon indicate regions that has been doped to form N and P regions. The whitish-yellow areas are the metal layer of the chip on top of the silicon - these form the wires connecting to the collector, emitter, and base. You can spot an emitter on the chip by its "bullseye" structure, while the base rectangle surrounds the emitter.

An NPN transistor in the 555 timer chip. The collector (C), emitter (E) and base (B) are labeled, along with N and P doped silicon.

Underneath the photo is a cross-section drawing illustrating how the transistor is constructed. There's a lot more than just the N-P-N sandwich you see in books, but if you look carefully at the vertical cross section below the 'E', you can find the N-P-N that forms the transistor. The emitter (E) wire is connected to N+ silicon. Below that is a P layer connected to the base contact (B). And below that is an N+ layer connected (indirectly) to the collector (C).[5] The transistor is surrounded by a P+ ring that isolates it from neighboring components.

PNP transistors inside the IC

You might expect PNP transistors to be similar to NPN transistors, just swapping the roles of N and P silicon. But for a variety of reasons, PNP transistors have an entirely different construction. They consist of a small circular emitter (P), surrounded by a ring shaped base (N), which is surrounded by the collector (P). This forms a P-N-P sandwich horizontally (laterally), unlike the vertical structure of the NPN transistors.

The diagram below shows one of the PNP transistors in the 555, along with a cross-section showing the silicon structure. Note that although the metal contact for the base is on the edge of the transistor, it is electrically connected through the N and N+ regions to its active ring in between the collector and emitter. A metal line is routed between the collector and base, but is not part of the transistor.

A PNP transistor in the 555 timer chip. Connections for the collector (C), emitter (E) and base (B) are labeled, along with N and P doped silicon. The base forms a ring around the emitter, and the collector forms a ring around the base.

A large, high-current NPN output transistor in the 555 timer chip. The collector (C), base (B) and emitter (E) are labeled.

How resistors are implemented in silicon

Resistors are a key component of analog chips. Unfortunately, resistors in ICs are large and inaccurate; the resistances can vary by 50% from chip to chip. Thus, analog ICs are designed so only the ratio of resistors matters, not the absolute values, since the ratios remain nearly constant.

A resistor inside the 555 timer. The resistor is a strip of P silicon between two metal contacts.

The photo above shows a 1K&ohm; resistor in the 555, formed from a strip of P silicon (visible as an outline). Note that the resistor connects two metal wires and another metal wire crosses it. The resistor below is an L-shaped 100K&ohm; pinch resistor. A layer of N silicon on top of the pinch resistor makes the conductive region much thinner (i.e. pinches it), forming a much higher but less accurate resistance.

A pinch resistor inside the 555 timer. The resistor is a strip of P silicon between two metal contacts. An N layer on top pinches the resistor and increases the resistance.

IC component: The current mirror

There are some subcircuits that are very common in analog ICs, but may seem mysterious at first. The current mirror is one of these. If you've looked at analog IC block diagrams, you may have seen the symbols below, indicating a current source, and wondered what a current source is and why you'd use one. The idea is you start with one known current and then you can "clone" multiple copies of the current with a simple transistor circuit, the current mirror.

Schematic symbols for a current source.

The following circuit shows how a current mirror is implemented with two identical transistors.[6] A reference current passes through the transistor on the left. (In this case, the current is set by the resistor.) Since both transistors have the same emitter voltage and base voltage, they source the same current, so the current on the right matches the reference current on the left.

Current mirror circuit. The current on the right copies the current on the left.

A common use of a current mirror is to replace resistors. As explained earlier, resistors inside ICs are both inconveniently large and inaccurate. It saves space to use a current mirror instead of a resistor whenever possible. Also, the currents produced by a current mirror are nearly identical, unlike the currents produced by two resistors.

Three transistors form a current mirror in the 555 timer chip. They all share the same base and two transistors share emitters.

The three transistors above form a current mirror with two outputs. Note the three transistors share the base connection, tied to the collector on the right, and the emitters on the right are tied together. The transistor on the left is a Widlar current source, a modified mirror that produces a smaller current. On the schematic, the two transistors on the right are drawn as a single two-collector transistor, Q19.

IC component: The differential pair

The second important circuit to understand is the differential pair, the most common two-transistor subcircuit used in analog ICs.[7] You may have wondered how a comparator compares two voltages, or an op amp subtracts two voltages. This is the job of the differential pair.

Schematic of a simple differential pair circuit. The current sink sends a fixed current I through the differential pair. If the two inputs are equal, the current is split equally between the two branches. Otherwise, the branch with the higher input voltage gets most of the current.

The schematic above shows a simple differential pair. The current sink at the bottom provides a fixed current I, which is split between the two input transistors. If the input voltages are equal, the current will be split equally into the two branches (I1 and I2). If one of the input voltages is a bit higher than the other, the corresponding transistor will conduct more current, so one branch gets more current and the other branch gets less. A small input difference is enough to direct most of the current into the "winning" branch, flipping the comparator on or off.

In the 555, the threshold comparator uses NPN transistors, while the trigger comparator uses PNP transistors. This allows the threshold comparator to work near the supply voltage and the trigger comparator to work near ground. The 555's comparators also use two transistors on each input (Darlington pair) to buffer the inputs.

The 555 schematic interactive explorer

The 555 die photo and schematic[8] below are interactive. Click on a component in the die or schematic, and a brief explanation of the component will be displayed. (For a thorough discussion of how the 555 timer works, see 555 Principles of Operation.)

For a quick overview, the large output transistors and discharge transistor are the most obvious features on the die. The threshold comparator consists of Q1 through Q8. The trigger comparator consists of Q10 through Q13, along with current mirror Q9. Q16 and Q17 form the flip flop. The three 5K&ohm; resistors forming the voltage divider are in the middle of the chip.[9] Urban legend says that the 555 is named after these three 5K resistors, but according to its designer 555 is just an arbitrary number in the 500 chip series

Click the die or schematic for details...

How I photographed the 555 die

Integrated circuit usually come in a black epoxy package which require inconveniently dangerous concentrated acid to open. Instead, I bought a 555 in a metal can (below). To examine the die, I used a metallurgical microscope. Unlike a standard microscope, the metallurgical microscope shines light down through the lens allowing it to work with opaque objects (such as chips). I stitched the photos together with Hugin (details).

The 555 timer in an eight-pin metal can package. (Banana for scale)

The failed improved 555

Given the popularity of the 555, it's surprising that it has several rookie design flaws; unbalanced comparators, large operating currents, an asymmetric output waveform, and temperature sensitivity.[10]

In 1997, Camenzind redesigned the 555 to create a much better chip that could run at much lower voltages. The improved chip was sold by Zetex as the ZSCT1555, but unfortunately was a flop. The continuing success of the original 555 and the failure of the improved successor can be viewed as an example of the worse is better principle.

Conclusion

I hope you've found this look inside the 555 timer chip interesting. Next time you're building a 555 project, you'll know exactly what's inside the chip. If you enjoyed this article, I've also reverse-engineered the 741 op amp and 7805 voltage regulator. Thanks to Eric Schlaepfer[11] for helpful comments.

Follow me on Twitter and you won't miss an article!

Notes and references

[1] The 555 timer is iconic enough to appear on mugs, bags, caps and t-shirts.

The 555 timer is popular enough to appear on t-shirts. Courtesy of EEVblog.

[2] The book Designing Analog Chips written by the 555's inventor Hans Camenzind is really interesting, and I recommend it if you want to know how analog chips work. Chapter 11 has an extensive discussion of the 555's history and operation. Page 11-3 claims the 555 has been the best-selling IC every year, although I don't know if that is still true. The free PDF is here or get the book.

[3] You can cut an IC can open with a plain hacksaw, but a jeweler's saw gives a much cleaner cut. I got a jeweler's saw on eBay for $14, and used the #2 blade. Make sure you cut near the top of the IC so you don't hit the die as I did.

[4] The brilliant part of the 555 timer is that the oscillation frequency depends only on the external resistors and capacitor and is insensitive to the supply voltage. If the supply voltage drops, the 1/3 and 2/3 references drop too, so you might expect the oscillations to be faster. But the lower voltage charges the capacitor more slowly, canceling this out and keeping the frequency constant.

This voltage insensitivity is so tricky that the chip's designer didn't figure it out until near the end of the 555's design, but it made a big difference. The original design was more complex and required nine pins, which is a terrible size for an IC since there are no packages between 8 and 14 pins. The final, simpler 555 design worked with 8 pins, making the chip's packaging much cheaper. (See page 11-3 of Designing Analog Chips for the full story.)

[5] You might have wondered why there is a distinction between the collector and emitter of a transistor, when the typical diagram of a transistor is symmetrical. As you can see from the die photo, the collector and emitter are very different in a real transistor. In addition to the very large size difference, the silicon doping is different. The result is a transistor will have poor gain if the collector and emitter are swapped.

[6] For more information about current mirrors, check wikipedia, any analog IC book, or chapter 3 of Designing Analog Chips.

[7] Differential pairs are also called long-tailed pairs. According to Analysis and Design of Analog Integrated Circuits the differential pair is "perhaps the most widely used two-transistor subcircuits in monolithic analog circuits." (p214) For more information about differential pairs, see wikipedia, any analog IC book, or chapter 4 of Designing Analog Chips.

[8] The 555 schematic used in this article is from the Philips datasheet.

[9] Note that the three resistors for the voltage divider are parallel and next to each other. This helps ensure they have the same resistance even if there are electrical variations across the silicon.

[10] I'm not criticizing the 555; Hans Camenzind points out the design flaws and attributes them to "the early period of IC design (and the inexperience of a rookie designer)"; see Designing Analog Chips, page 11-4. The design of a 555 replacement is discussed in detail in "Redesigning the old 555", IEEE Spectrum, September 1997. That article makes it clear how much much faster IC design is now than in 1970. It took months to create the layout of the 555 chip by hand and manually verify it for correctness. The new chip took two days to layout and 20 minutes to verify.

[11] Evil Mad Scientist sells a very cool discrete 555 timer kit, duplicating the 555 circuit on a larger scale with individual transistors and resistors — it actually works as a 555 replacement. Their 555 footstool is also worth a look.

Large-size 555 timer created by Evil Mad Scientist Lab.

This article looks at how the ARM1 processor executes instructions. Unexpectedly, the ARM1 uses microcode, executing multiple microinstructions for each instruction. This microcode is stored in the instruction decode PLA, shown below. RISC processors generally don't use microcode, so I was surprised to find microcode at the heart of the ARM1. Unlike most microcoded processors, the microcode in the ARM1 is only a small part of the control circuitry.

Die photo of the ARM1 processor. Courtesy of Computer History Museum.

I should warn the reader in advance that this article is more terse than my usual articles and intended for the small group of people interested in very low-level details of the ARM1. For the average reader I'd recommend my article Reverse engineering the ARM1 instead.

The microinstructions

Each instruction in the ARM1 is broken down into 1 to 4 microinstructions. These microinstructions are stored in the instruction decode PLA (which acts as a ROM).[1] The ARM1's microcode is stored as 42 rows of 36-bit microinstructions. The 42 rows are split into 18 classes of instructions, each consisting of 1 to 4 microinstructions. (The microcode sequencer supports looping, allowing it to handle the bulk data transfer instructions LDM and STM which can take up to 17 cycles.)

To explain the microinstruction format, I'll use the LDR instruction as an example. The LDR (Load Register) instruction accesses the memory address stored in a base register Rn plus a constant offset from the instruction and stores the result into a destination register Rd, also updating the base register. (This is similar to the C code: Rd = *Rn++;)[2] The ARM1 takes three cycles (i.e. three microinstructions) to perform this LDR operation. In the first cycle, the ALU adds the offset to the register to compute the address. The second cycle is used to fetch the word from memory. In the third cycle, the data is transferred to the destination register.

The diagram below shows the bit pattern for the LDR instruction. The PLA uses the highlighted bits (4, 20, 24-27) to determine the instruction class; the lighter bits are irrelevant for selecting the LDR instruction and are ignored. The cond bits specify a condition; if the condition is false, the instruction is skipped. The P, U, B, and W bits control different options for the LDR instruction. The Rn and Rd fields specify the base address register and the destination register. Finally, the 12-bit Offset field specifies the offset added to the base address.

Structure of the LDR (Load Register) instruction. Highlighted bits are used for instruction decoding; dark bits indicate LDR. Rn is the base register and Rd is the destination register.

Of the 32 instruction bits, only the 6 highlighted bits are used to select the microinstruction. As a result, microinstructions correspond to classes of instructions and the control outputs from the PLA are somewhat generic, e.g. "store to a register" rather than "store to register R12". Hardwired control logic looks at other bits in the instruction to pick a specific register, to pick a specific ALU operation, or to tweak exactly what the instruction does. For example, for LDR the microcode ignores the P, U, B and W bits and the hardwired control logic uses them. For registers, the microinstruction indicates which instruction bits specify the register and the hardwired register control logic uses those bits to select the register.

Contents of the microcode PLA

The raw data from the PLA for the LDR immediate instruction is given below, showing the 36 output bits forming a microinstruction for each cycle of the instruction.

Cycle number	PLA output
0	001010101001000000100001100010100001
1	101011010001000000001000111010100100
2	010101101001000001010010110010010000

Since the raw PLA output is fairly meaningless, I have broken it down into fields and done a small amount of decoding. The image below shows the decoded contents of the instruction decode PLA; click for full-size. Each row corresponds to one clock cycle in an instruction and each column is one of the 22 fields generated by the 36 bits of the PLA. The PLA handles 18 different instruction groups, indicated on the left.

Contents of the ARM1 microcode PLA (thumbnail).

The rows Initialization and Interrupt are not instructions per se, but triggered by other PLA inputs. The Initialization micro-instruction is an idle step used when the pipeline does not have a valid instruction (at startup or after R15 modification). It is triggered if the iregval signal (8156) from the Pipeline State circuit is 0. The Interrupt microinstructions handle an interrupt or fault and are triggered by the intseq signal (8118) from the Trap Control circuit. The Reserved rows correspond to undocumented instructions, probably load and store with register-specified shift. The first Reserved row is unique in that the microcode sequence forks; this is cycle number 0 for both of the next Reserved blocks. It is unclear why these instructions were implemented but not documented.

Example microinstructions

The diagram below illustrates the three microinstructions that make up the load register immediate (LDR) instruction, with explanations on some of the important fields. The first microinstruction computes the address: the indicated fields instruct the ALU to add or subtract the 12-bit offset value from the instruction, and put the value on the address bus. The ALU control logic uses the U (up/down) and P (pre/posts) bits in the instruction to determine if the offset should be added or subtracted or ignored. This illustrates that the microinstruction only partially defines the instruction; the hardcoded control logic also makes decisions based on the instruction. The microinstruction also specifies that the sequencer should move to the next microinstruction.

The instruction decode PLA contents for the LDR (Load Register) immediate instruction. Each row corresponds to a clock cycles and shows the activity during one cycle. Each column indicates a control signal.

The next microinstruction instructs the ALU to update the offset register. As before, the ALU control logic determines if the update requires an add or subtract. The register control logic determines if the register should be updated. The microinstruction also indicates that the fetched data should be read in.

The final microinstruction stores the fetched result in a register. It specifies Rd as the destination register and indicates a register write. The microinstruction tells the sequencer this is the end of the instruction.

Fields in the microinstruction

This section describes the fields that make up the microinstruction. I am still working out all the details, so this is not 100% accurate. Refer to the floorplan diagram below to see the components involved.

Floorplan of the ARM1 chip, from ARM Evaluation System manual. (Bus labels are corrected from original.)

seqs: sequencer control

This field specifies the cycle number for the next microinstruction. It is used by the Sequence Controller. It has the following values:

Field	Label	Meaning
0	END	End of the instruction
1	NEXT	Move to next cycle in sequence
2	IF23	If not pencz, next cycle is 2; if pencz, next cycle is 3.
3	IF1E	If not pencz, next cycle is 1; if pencz, ends the instruction.

The pencz signal from the priority encoder indicates all registers have been processed for a LDM/STM instruction.

For more information, see Reverse engineering ARM1 instruction sequencing.

Signal numbers: 8310, 8309. I've put this field first to make control flow clearer, but it is physically after rws in the PLA.

dinin: data in to B bus

This field indicates the value on the data pins should be read in to the B bus. It is used by the data bus controls.

Signal number: 8111

sctls: shifter controls

This field specifies the shifter action at a high level. The Shift Decode block uses this field in combination with other instruction bits and values to determine the specific shift direction and amount.

Field	Shifter action
0	Rs
1	DP instruction
2	ASL 2*instruction
3	byte to word
4	no shift
5	ASL 2 bits
6	nop (unused)
7	nop

For more details, see Decoding barrel-shifter commands.

Signal numbers: 8288, 8287, 8286. Note that bits 2 and 1 are reversed coming out of the PLA.

aluac: ALU latch A bus

This signal latches the A bus value as an ALU input. The ALU control logic generates latch controls 2370, 2371 from this signal. For more details, see The ALU control logic.

Signal number: 8058

aluctls: ALU mode controls

This field selects the ALU mode. The ALU decoder uses this field to generate the ALU control signals.

Field	Operation	Instructions
0	add/rsb for base register update / address	LDM/STM/Data processing
1	add for branch/fault destination	B/SWI
2	add/sub/nop for address computation	LDR/STR
3	mov for register update, nop for abort	LDM/LDR
4	add/rsb/mov for address computation	LDM/STM
5	add/sub for base register update	LDR/STR
6	rsb for link address update	BL / SWI
7	op specified by instruction	Data processing

For more details, see The ALU control logic.

Signal numbers: 8062, 8061, 8060

aluenb: ALU latch B bus

This signal latches the B bus value as an ALU input. The ALU control logic generates latch controls 7485, 7486 from this signal. For more details, see The ALU control logic.

Signal number: 8063

banken: update PSR mode

This signal causes the M0, M1, F and I flags in the PSR to be updated from the psrbank signals from the trap control circuit. This happens during fault handling. This signal is used by the flag circuitry. For more details, see The ARM1 processor's flags.

Signal number: 8075

psrw: PSR write

This signal indicates that the PSR is potentially being written by a LDM/STM block copy instruction. It controls writing the ALU bus to the flags, after some more logic. It also allows LDM/STM to access the user-mode registers via the S bit. This signal is used by the flag circuitry. For more details, see The ARM1 processor's flags.

Signal number: 8273

nben: data to B bus

This signal indicates that the register file should write to the B bus when nben is 0. This signal is used by the register control logic and the flag logic. For more details, see The ARM1 processor's flags and Inside the ARMv1 Register Bank.

Signal number: 8186; the signal is negative-active.

psren: PSR to B bus

When active, this signal enables writing the PSR to the B bus to save it during a trap. This signal is used by the flag logic. For more details, see The ARM1 processor's flags.

Signal number: 8272

abctls: register controls for A and B bus

This field controls which registers are read onto the A and B bus. This signal is used by the register control logic.

Field	A register selector	B register selector
0	Instruction bits 16-19 (Rn)	Instruction bits 0-3 (Rm)
1	Instruction bits 8-11 (Rs)	Instruction bits 12-15 (Rd)
2	R15	Instruction bits 16-19 (Rn)
3	R15	From priority encoder
4	Instruction bits 16-19 (Rn)	R14

For more details, see Inside the ARMv1 Register Bank — register selection.

Signal numbers: 8042, 8041, 8040

wctls: register write controls

This field selects which register gets written to, from the ALU bus. This signal is used by the register control logic.

Field	Register selector
0	Instruction bits 16-19 (Rn)
1	Instruction bits 12-15 (Rd)
2	From priority encoder
3	R14 (link)

For more details, see Inside the ARMv1 Register Bank — register selection.

Signal numbers: 8356, 8355

opc: OPC opcode fetch signal

This signal goes to the OPC pin and indicates a new instruction is being fetched. It is also used by the pipeline state circuitry.

Signal number: 8630

pipebl: pipeline control

This signal is used by the pipeline state circuitry. It apparently indicates the end of the instruction, except for STM. It is high throughout branches and faults, perhaps to clear the pipeline.

Signal number: 8261

skpwen: register write enable controls

This field controls whether a write to the register file happens or not. It is used by the Instruction Skip circuitry which can block the write if the instruction is aborted. The following table is a rough draft.

Field	Write condition
0	None
1	Not dataabort
2	Writeback
3	Instruction bit 24 (link)
4	Writeback / P bit
5	alureg
6	skpawen0

Signal numbers: 8324, 8323, 8322

skpw15: register 15 write controls

This signal controls writes to the R15 (PC). It is used by the Instruction Skip circuitry, perhaps to clear the pipeline when R15 is updated.

Signal number: 8321

skparegs: address bus controls

This field controls what is written to the address bus. It is used by the Instruction Skip circuitry to generate the address bus controls. The following table is a rough draft.

Field	Address source
0	Trap address
1	ALU bus
2	incrementer (normal) or ALU bus (for R15 write)
3	unincremented PC (normal) or ALU bus (for R15 write)
4	ALU bus or PC or incrementer, depending on R15 write and priority encoder
5	ALU bus or PC or incrementer, depending on R15 write and priority encoder
6	incrementer
7	unincremented PC (normal) or ALU bus (for R15 write)

For more details, see Inside the ARMv1 — the Read Bus B, ALU Output Bus, and Address Bus.

Signal numbers: 8320, 8319, 8318

undef: undefined instruction

This signal is generated for an undefined instruction (specifically a coprocessor instruction). It is used by the Trap Control circuitry to generate a fault.

Signal number: 8348

rws: read or write select

This signal controls the RW output; it is 1 for a read and 0 for a write. The Trap Control circuitry gates this (apparently to block writes on an address exception) and the signal then drives the RW pin.

Signal number: 8284

pencen: priority encoder A bus control

This field controls writing of the bit counter output (times 4) to the A bus. It can also set the two low bits, either for the constant 3, or to add 3 to the bit counter output. The constant 3 is used (with borrow) to subtract 4 from R14 during a branch with link, see page 233 of VLSI RISC Architecture and Organization. The modified bit counter output is used to compute the LDM/STM start address.

Field	Bit counter action on A bus
0	None
1	Low bits set (3)
2	Bit count
3	Bit count, low bits set

Signal numbers: 8202, 8201

bws: enable byte/word select

This signal indicates that byte/word should be selected by instruction bit 22, for LDR/STR. This signal is used by the Data Control (field extraction) circuitry.

For more details, see Inside the ARMv1 Read Bus.

Signal number: 8082

dctls: data bus field extraction controls

This field controls which bits of the data bus or instruction are passed to the B bus. This field is used by the Data Control (field extraction) circuitry.

Field	Selected data bus field
0	Select a byte or word depending on bw
1	24 bits (branch offset)
2	12 bits (LDR/STR offset)
3	byte (immediate instr)

For field 0, the byte is specified by controls 8195 and 8194.

For more details, see Inside the ARMv1 Read Bus or pages 296 and 301 of VLSI Risc Architecture and Organization.

Signal numbers: 8105, 8104

Microcode in RISC?

Everyone "knows" that RISC processors don't use microcode.[3] So does the ARM1 have "real microcode"?

One of the ARM1 architects explains microcode: "A microcode address is formed from some or all of the contents of the instruction register, together with some state values which are internal to the micro-control unit. This address is decoded to drive a unique row of a matrix, the columns of which are the control signals for the datapath."[4] This description is a perfect fit for how the ARM1's control works, so it seems reasonable to consider the ARM1 to have microcode.

I think it's easiest to understand the ARM1's control logic by viewing it as microcode. However, there are couple reasons to consider it not "real microcode". One reason is that the ARM1 microcode is only a small part of the chip's control, as you can see in the die photo and floorplan earlier. The control signals are heavily modified by the instruction skip component and conditionals are handled by the conditional unit. This goes beyond vertical microcode, where logic expands the microcode's control signals; in the ARM1, this other circuitry can entirely override the control signals. In addition, the ARM1 uses separate circuitry (the priority encoder) to control the block data transfer instructions; the microcode just sits in a loop. (The ARM2 is similar with multiplication — a separate circuit controls multiplication.)

The ARM1's microcode is an order of magnitude smaller than other microcoded processors. The ARM1's microcode has a 42×36 microcode, for 1512 bits in total. The 8086 used a 504×21 microcode (over 10,000 bits) while the 68000 has a 544×17 microcode and 366×68 nanocode (over 34,000 bits).

Probably the biggest objection to calling the ARM1 microcoded is that the designers of the ARM chip didn't consider it that way.[4] Furber mentions that some commercial RISC processors use microcode, but doesn't apply that term to the ARM1. He describes ARM1's instruction decode as two-level structure. In the first level, the instruction decoder PLA differentiates instructions into classes with similar characteristics. The secondary decoding uses the information from the first level along with hardware to cope with all the possible operations. The first level is described as providing "broad hints" about which functions to choose, and the second level fills in the details with bits from the instruction.

Conclusion

So is the ARM1 microcoded or not? The instruction decoder is clearly made up of microinstructions executed sequentially or with branching. It makes sense to look at this as microcode. But on the other hand, the microcode is fairly simple and forms a small part of the total control circuitry. A large amount of hardcoded logic interprets the microinstruction outputs to generate the control signals. My conclusion is the ARM1 should be called "partially microcoded" or maybe "hybrid microcode / hardwired control".

This article owes a lot to Dave Mugridge's analysis of the ARM1, especially Inside the ARMv1 — instruction decoding and sequencing. Thanks to the Visual 6502 team for the ARM1 simulator and data used in my analysis.

Notes and references

[1] While a typical PLA acts as structured logic gates generating signals (as in the Z-80 or 6502), the ARM1's PLA is different. Exactly one row is active at a time, so the PLA functions more like a ROM. There's a discussion of ROMs as PLAs in section 7.3.2.2 of The Architecture of Microprocessors.

[2] My explanation of the LDR instruction is simplified, since the instruction provides a variety of addressing mechanisms. It also provides byte access as well as 32-bit word access. Full details are here.

[3] IBM's ROMP microprocessor is generally considered RISC, but uses a 256×34 control ROM. Likewise, the Intel i960 is usually considered RISC but uses microcode.

[4] ARM1 designer Furber's book VLSI RISC Architecture and Organization discusses the ARM1 and other RISC chips. Section 1.3.1 has an extensive discussion of microcode. He describes how the ARM1's block move and ARM2's multiplication operations are under the control of a separate hardware unit inside the chip, unlike how a microcoded implementation would operate. Section 4.7 describes the ARM1's control logic.

What's inside a counterfeit Macbook charger? After my Macbook charger teardown, a reader sent me a charger he suspected was counterfeit. From the outside, this charger is almost a perfect match for an Apple charger, but disassembling the charger shows that it is very different on the inside. It has a much simpler design that lacks quality features of the genuine charger, and has major safety defects.

Inside a counterfeit MagSafe 45W charger.

The counterfeit Apple chargers I've seen in the past have usually had external flaws that give them away, but this charger could have fooled me. The exterior text on this charger was correct, no "Designed by Abble" or "Designed by California". It had a metal ground pin, which fakes often exclude. It had the embossed Apple logo on the case. The charger isn't suspiciously lightweight. Since I've written about these errors in fake chargers before, I half wonder if the builders learned from my previous articles. One minor flaw is the serial number sticker (to the right of the ground pin) was a bit crooked and not stuck on well.

This counterfeit MagSafe 45W charger has the same 'Designed by Apple in California' text as the genuine charger. Unlike many fakes, it has a metal ground pin (although it isn't connected internally). To the right of the ground pin, the serial number label is a bit crooked, which is a hint that something isn't right.

The photo below shows the safety certifications that the charger claims to have. Again, it looks genuine, with no typos or ugly fonts.

The counterfeit power supply has all the same safety indications as a real power supply.

One flaw that made the original purchaser suspicious was the quality of the case didn't seem up to Apple standards. It didn't feel quite like his old charger when tapped, and the joints appear slightly asymmetrical, as you can see in the picture below.

The seams in a counterfeit Magsafe power supply are a bit asymmetrical.

A problem showed up when I plugged in the charger and measured the output at the Magsafe connector. I measured 14.75 volts output and got a spark when I shorted the pins. Since the charger is rated at 14.85 volts, this may seem normal, but the behavior of a real charger is different. A Magsafe charger initially produces a low-current output of 3 to 6 volts, so shorting the output should not produce a spark. Only when a microcontroller inside the charger detects that the charger is connected to a laptop does the charger switch to the full output power. (Details are in my Magsafe connector teardown article.) This is a safety feature of the real charger that reduces the risk from a short circuit across the pins. The counterfeit charger, on the other hand, omits the microcontroller circuit and simply outputs the full voltage at all times. This raises the risk of burning out your laptop if you plug the connector in crooked or metallic debris sticks to the magnet.

Inside the charger

Cracking the charger open with a chisel reveals the internal circuitry. A real Apple charger is packed full of complex circuitry, while this charger had a fairly low density board that implemented a simple flyback switching power supply.

A view of the counterfeit MagSafe charger with the case and heat sink removed.

The circuit is a fairly standard flyback power supply. To understand how it works, look at the diagram below, going counterclockwise from the AC input on the right. After going through a fuse, the power is converted to DC by a bridge rectifier. The large filter capacitor smooths out the DC. Next, the switching transistor chops the DC into pulses, which are fed into the flyback transformer. The transformer's low-voltage output is converted back to DC by the output diode. The output filter capacitors smooth the DC output.

The counterfeit Magsafe power supply uses a standard flyback switching power supply circuit. AC enters at the right and is converted to DC. The switching transistor sends pulses into the flyback transformer (center), which produces the low voltage output (left).

A TL431A voltage reference generates a feedback signal from the output, which is fed to the control IC through the optoisolator. While this circuit may seem complex, it's pretty standard for a simple charger. A genuine Macbook charger on the other hand has a much more complex circuit, as I describe in my teardown.

The charger is controlled by a tiny 6-pin IC on the underside of the board. It switches the MOSFET on and off at the proper rate (about 60 kilohertz) to generate the desired output voltage. The control IC is labeled "63G01 415", but I couldn't find any chip that matches that description. (Update: a clever reader identified the chip as the OB2263.)

Closeup of the tiny control IC inside a counterfeit MagSafe 45W power supply.

What's wrong with this charger

The most important feature of a charger is the isolation between the potentially-dangerous AC input and the low-voltage output. High voltage and low voltage should be separated by a safety gap of at least 4mm (to simplify the UL's creepage and clearance rules). On the circuit board below, the high voltage input section is at the bottom and the low voltage output section is at the top. On the right half of the board, the two sections are separated by a large gap, which is good. On the left, there should be a gap (bridged by the optoisolator). Unfortunately, traces and components pass through this area making the gap dangerously small, under 1 mm. Any moisture or loose solder could bridge this gap sending high voltage to the output.

The counterfeit MagSafe charger has a dangerously small distance between the low voltage side (top) and the high voltage side (bottom). This is why you shouldn't buy counterfeit chargers.

I'm puzzled as to why counterfeit chargers never manage to have sufficient clearance distances. They use simple, low-complexity circuits so the circuit board layout should be straightforward. Except in the smallest cube phone chargers, they aren't fighting for every millimeter of space. It shouldn't take much additional effort to make the boards safer.

The second safety flaw is the heat sink that provides cooling for the input-side MOSFET and the output-side diode. The heat sink is basically a giant conductor between the two sides of the circuit, with only small gaps separating it from active parts of the circuit.

As well as having large creepage and clearance distances between high and low voltages, genuine chargers also make extensive use of insulating tape for separation. The counterfeit charger lacks extra insulation, except heat-shrink tubing around the fuse and fusible resistor. I didn't disassemble the transformer, but I expect it also lacks the necessary insulation.

The counterfeit charger has a metal ground pin (unlike other fakes I've seen that have a plastic pin). However, the pin is just for appearance and is not connected to anything.

The photo below compares the underside of the counterfeit 45W charger (left) with a genuine Apple 60W charger (right). As you can see, the counterfeit has a simple circuit board with just a few parts, while the genuine charger is crammed full of parts. The two boards are in totally different worlds of design complexity. The additional parts provide better power quality and improved safety in the real charger; this is part of the reason genuine chargers are significantly more expensive.

Comparison of a counterfeit MagSafe 45W charger (left) and a genuine 60W charger (right). The genuine charger is crammed full of components, while the counterfeit just has a few components.

Quality of the power

I measured the output power from the counterfeit charger with an oscilloscope, while drawing 15 watts. As you can see below, the output power is not smooth, but has pairs of large spikes when the switching transistor turns on and off. The charger operates at a frequency of about 60 kilohertz. More filtering inside the charger reduces these voltage spikes, but would cost more.

The switching power supply operates at about 60 kilohertz, producing large voltage spikes in the output. You can see a spike when the transistor switches on, followed by another spike when it switches off.

The oscilloscope trace below zooms in on one of the spikes. You can see that the spike measures 2.7 volts peak-to-peak, which is a lot of noise to be feeding into your laptop.

The output of the counterfeit charger has large 2.7V noise spikes when a transistor switches internally.

Conclusion

This counterfeit Magsafe charger is convincing from the outside, with more attention to detail than most. Until I opened it up, I wasn't completely sure that it was counterfeit. But on the inside, the difference between the counterfeit and real chargers is clear. The counterfeit has a much simpler circuit that provides poorer-quality power. It also ignores safety requirements with less than a millimeter separating you and your computer from a dangerous shock. While counterfeit chargers are much cheaper, they are also dangerous to you and your computer. Thanks to Richard S. for providing the charger.

I've written a bunch of articles before about chargers, so if this article seems familiar, you're probably thinking of an earlier article, such as: Magsafe charger teardown, iPhone charger teardown or iPad charger teardown.

You can follow me on Twitter and find out about my new articles.

Notes

For those who care about the component details, the MOSFET is a 600V, 7.5A transistor from Fairchild (FQPF8N60C datasheet). The optoisolator is a Kento JC817 (datasheet). The output diode is a NAMC MBRF10100CT 10A 100V Schottky barrier rectifier. I was unable to identify the control IC, which is marked with "63GO1 415". The Y capacitor (blue) is JNC JN472M 250V 4.7nF capacitor.

This article explains how the LMC555 timer chip works, from the tiny transistors and resistors on the silicon chip, to the functional units such as comparators and current mirrors that make it work. The popular 555 timer integrated circuit is said to be the world's best-selling integrated circuit with billions sold since it was designed in 1970 by analog IC wizard Hans Camenzind[1]. The LMC555 is a low-power CMOS version of the 555; instead of the bipolar transistors in the classic 555 (which I described earlier), the CMOS chip is built from low-power MOS transistors. The LMC555 chip can be understood by carefully examining the die photo.

The structure of the integrated circuit

The photo below shows the silicon die of the LMC555 as seen through a microscope, with the main function blocks labeled (photo from Zeptobars). The die is very small, just over 1mm square. The large black circles are connections between the chip and its external pins. A thin layer of metal connects different parts of the chip. This metal is clearly visible in the photo as white lines and regions. The different types of silicon on the chip appear as different colors. Regions of the chip are treated (doped) with impurities to change the electrical properties of the silicon. N-type silicon has an excess of electrons (making it Negative), while P-type silicon lacks electrons (making it Positive). On top of the silicon, polysilicon wiring shows up as other colors. The silicon regions and polysilicon are the building blocks of the chip, forming transistors and resistors, which are connected by the metal layer.

Functional blocks in the LMC555 chip.

A brief explanation of the 555 timer

The 555 chip is extremely versatile with hundreds of applications from a timer or latch to a voltage-controlled oscillator or modulator. To explain the chip, I will use one of the simplest circuits, an oscillator that cycles on and off at a fixed frequency.

The diagram below illustrates the internal operation of the 555 timer used as an oscillator. An external capacitor is repeatedly charged and discharged to produce the oscillation. Inside the 555 chip, three resistors form a divider generating reference voltages of 1/3 and 2/3 of the supply voltage. The external capacitor will charge and discharge between these limits, producing an oscillation, as shown on the left. In more detail, the capacitor will slowly charge (A) through the external resistors until its voltage hits the 2/3 reference. At that point (B), the threshold (upper) comparator switches the flip flop off turning the output off. This turns on the discharge transistor, slowly discharging the capacitor (C) through the resistor. When the voltage on the capacitor hits the 1/3 reference (D), the trigger (lower) comparator turns on, setting the flip flop and the output on, and the cycle repeats. The values of the resistors and capacitor control the timing, from microseconds to hours.

Diagram showing how the 555 timer can operate as an oscillator.

To summarize, the key components inside the 555 timer are the comparators to detect the upper and lower voltage limits, the three-resistor divider to set these limits, the flip flop to keep track of whether the circuit is charging or discharging, and the discharge transistor. The 555 timer has two other pins (reset and control voltage) that I haven't covered above; they are used in more complex circuits.

Transistors inside the IC

Like most integrated circuits, the CMOS 555 timer chip is built from two types of transistors, PMOS and NMOS. In contrast, the classic 555 timer uses the older technology of bipolar transistors (NPN and PNP). CMOS is popular because it uses much less power than bipolar. CMOS transistors be packed into a chip very densely without overheating, which is why CMOS has ruled the microprocessor market since the 1980s. Although the 555 doesn't require many transistors, low power consumption is still an advantage.

The diagram below shows an NMOS transistor in the chip, with a cross section below. Since the transistor is built from overlapping layers, the die photo is a bit tricky to understand, but the cross section should help clarify it. The different colors in the silicon indicate regions that has been doped to form N and P regions. The green rectangle is polysilicon, a layer above the silicon. The whitish rectangle is the metal layer on top. The vias are connections between the layers.

The structure of an NMOS transistor in the LMC5555 CMOS timer chip.

A MOS transistor can be thought of as a switch that connects or disconnects the source and drain, based on the voltage on the gate. The transistor consists of two rectangular strips of silicon that has been doped negative (N), embedded in the underlying P silicon. The gate consists of a layer of conductive polysilicon above and between the drain and source. The gate is separated from the underlying silicon by a very thin layer of insulating oxide. If voltage is applied to the gate, it produces an electric field that changes the properties of the silicon below the gate, allowing current to flow.[2] The photo also shows the metal connection to the source, along with the "vias" that connect the silicon layer to the metal layer through the insulating oxide.[3]

The second type of transistor is PMOS, shown below. PMOS transistors are opposite to NMOS in many ways; they are called complementary, which is the C in CMOS. PMOS uses a source and drain of P-doped silicon embedded in N-doped silicon. The transistor is turned on by a low voltage on the gate (opposite to NMOS), causing current to flow from the source to drain. The metal connections to the source, gate, and drain are visible below, with circular vias to the underlying layers. (Note that the diagram on the right is not a cross section, but a simplified "overhead" view.) In the die photo, NMOS transistors are blue with a green gate, while PMOS transistors are pink with orange gates. These colors are created by interference due to the thickness of the layers, and saturation is enhanced in the photo.

Die photo of a PMOS transistor in the LMC555 timer. A simplified diagram of the transistor is on the right.

The output transistors in the 555 are much larger than the other transistors and have a different structure in order to produce the high-current output. The photo below shows one of the output transistors. Note the zig-zag structure of the gate, between the source (outside) and drain (center). Also note that the metal layer for the drain is narrow on the right and widens as it exits the transistor in order to handle the increasing current.[4]

A large NMOS output transistor in the LMC555 CMOS timer chip.

A variety of symbols are used to represent MOS transistors in schematics; the diagram below shows some of them. In this article, I use the highlighted symbols.

Various symbols used for MOS transistors. Based on Wikipedia.

How resistors are implemented in silicon

Resistors are a key component of analog circuits. Unfortunately, resistors in ICs are large and inaccurate; the resistances can vary by 50% from chip to chip. Thus, analog ICs are designed so only the ratio of resistors matters, not the absolute values, since the ratios remain nearly constant even if the values vary depending on manufacturing conditions.

These resistors form the voltage divider in the CMOS 555 timer.

The photo above shows the resistors that form the voltage divider in the chip. There are six 50k&ohm; resistors, connected in series to form three 100k&ohm; resistors. The resistors are the pale vertical rectangles. At the end of each resistor, a via and P+ silicon well (pink square) connects the resistor to the metal layer, which wires them together. The resistors themselves are probably P-doped silicon.

To reduce current, the CMOS chip uses 100k&ohm; resistors, much larger than the 5k&ohm; resistors in the bipolar 555 timer. Urban legend says that the 555 is named after these three 5K resistors, but according to its designer 555 is just an arbitrary number in the 500 chip series

IC component: The current mirror

Schematic symbols for a current source.

The idea of the current mirror is you start with one known current and then you can "clone" multiple copies of the current with a simple transistor circuit, the current mirror. A common use of a current mirror is to replace resistors. As explained earlier, resistors inside ICs are both inconveniently large and inaccurate. It saves space to use a current mirror instead of a resistor whenever possible. Also, the currents produced by a current mirror are nearly identical, unlike the currents produced by two resistors.

The circuit below shows how a current mirror is implemented with three identical transistors.[5] A reference current passes through the transistor on the right. (In this case, the current is set by the resistor.) Since all the transistors have the same emitter voltage and base voltage, they source the same current, so the currents on the left match the reference current on the right. For more flexibility, you can modify the relative sizes of the transistors in the current mirror and make the copied current larger or smaller than the reference current.[4] The CMOS 555 chip uses a variety of transistor sizes to control the currents in the circuit.

A current mirror formed from PMOS transistors. The left two currents mirror the current on the right, which is controlled by the resistor.

The diagram below shows one of the current mirrors in the LMC555 chip, formed from two transistors. Each transistor is actually two transistors in parallel, which is a common trick in the chip, so there are physically two pairs of transistors. It's a bit tricky to see the transistors because the metal layer partially covers them, but hopefully the description will make sense. Starting at the top, the first transistor is formed from the wide rectangles for source, gate 1, and drain 1. Note the vias connecting the metal layer to the source. The next transistor shares drain 1, with the second gate 1 and source below. Since these two transistors share the drain, and the sources and gates are wired the same, the two transistors effectively form one larger transistor. Likewise, there are two transistors below in parallel: source, gate 2, drain 2, and then drain2, gate2, source.

Two pairs of PMOS transistors in the LMC555 chip form a current mirror.

The schematic on the right shows how the transistors are wired together as a current mirror. If you look at the photo carefully, you can see that a single polysilicon strip snakes back and forth to form all the gates, so the gates are connected together. On the right, the upper metal strip connects drain 1 and the gates to the rest of the circuit. The lower metal strip is connected to drain 2.

IC component: The differential pair

The second important circuit to understand is the differential pair, the most common two-transistor subcircuit used in analog ICs.[6] You may have wondered how a comparator compares two voltages, or an op amp subtracts two voltages. This is the job of the differential pair.

The schematic above shows a simple differential pair. The current source at the top provides a fixed current I, which is split between the two input transistors. If the input voltages are equal, the current will be split equally into the two branches (I1 and I2). If one of the input voltages is a bit higher than the other, the corresponding transistor will conduct more current, so one branch gets more current and the other branch gets less. A small input difference is enough to direct most of the current into the "winning" branch, flipping the comparator on or off. Rather than resistors, the chip uses a current mirror on the two branches. This acts as an active load and increases the amplification.

Inverters and the flip flop

Although the 555 is an analog circuit, it contains a digital flip flop to remember its state. The flip flop is built out of inverters, simple logic circuits that turn a 1 into a 0 and vice versa. The 555 uses standard CMOS inverters, as shown below.

Structure of a CMOS inverter: a PMOS transistor at top and a NMOS transistor at bottom.

The inverter is built from two transistors. If the input is 0 (i.e. low), the PMOS transistor on top turns on, connecting the positive supply to the output, producing a 1. If the input is 1 (high), the NMOS transistor on the bottom turns on, connecting ground to the output, producing a 0. The magical part of CMOS is that the circuit uses almost no power. Current doesn't flow through the gate (because of the insulating oxide layer), so the only power usage is a tiny pulse when the output changes state, to charge or discharge the wire's capacitance.[7]

The diagram below shows the circuit for the flip flop. Two inverters are connected in a loop to form a latch. If the top inverter outputs 1, the bottom outputs 0, forming a stable cycle. If the top inverter outputs 0, the bottom outputs 1, again forming a stable cycle.

Circuit diagram of the flip flop in the LMC555 CMOS timer chip.

To change the value stored in the flip flop, the new value is simply forced into the latch, overriding the existing value with brute force. To make this work, the bottom inverter is "weak", using low-current transistors. This allows the set or reset inputs to overpower the weak inverter and the latch will immediately flip into the proper state The R (reset) and S (set) inputs come from the comparators and pull the latch input high or low through the transistors. Reset comes from the input pin and pulls the latch input high through a diode; the Reset inverter's output current is controlled by a current mirror. Reset will pull S low, blocking the action of a contradictory S input.

The 555 schematic interactive explorer

The 555 die photo and schematic below are interactive. Click on a component in the die or schematic, and a brief explanation of the component will be displayed. (For a thorough discussion of how the 555 timer works, see 555 Principles of Operation.)

For a quick overview, the large output transistors and discharge transistor are distinguishable by their zig-zag gate pattern. The current mirror transistors are generally large. The threshold comparator consists of Q1 through Q5. The trigger comparator consists of Q13 through Q18. Q19 through Q29 form the flip flop circuit. The voltage divider resistors are in the upper center of the chip.[8]

Click the die or schematic for details...

I created the above schematic by reverse-engineering the chip, so I don't guarantee full correctness. A PDF of my schematic is here and a differently-formatted version is here. The schematic of a different CMOS 555 is here, and it's interesting to compare the differences. While the comparators are the same, the current mirrors are built differently, and the flip flop circuit is very different.

CMOS 555 compared with traditional bipolar 555

The regular 555 timer was designed in 1970, while a CMOS version (the ICM7555) wasn't released until 1978. The LMC555 described in this article came out around 1988, while the die itself has a date of 1996.

The image below compares the classic 555 timer (left) with the CMOS LMC555 (right), both to the same scale. While the bipolar chip is constructed from silicon connected by a metal layer, the CMOS chip has an additional interconnect layer of polysilicon, which makes the chip more complex to understand visually. The CMOS chip is smaller. In addition, the CMOS chip has a lot of wasted space in the bottom and upper right, so it could have been made even smaller. The CMOS transistors are much more complex than the bipolar transistors. Except for the output transistors, the bipolar transistors are all simple individual units. Most of the CMOS transistors in comparison are built from two or more transistors in parallel. The classic 555 uses many more resistors than the CMOS 555; 16 versus 4.

Die photos of the 555 timer (left) and CMOS 555 timer (right), to the same scale.

You can see from the photo that the features are smaller in the CMOS chip. The smallest lines in the regular 555 are 10-15µm, while the CMOS chip has 6µm features. Advanced chips in 1996 used the 350nm process (about 17 times smaller), so the LMC555 was nowhere near the cutting edge of CMOS technology.

Comparing these chips illustrates the power consumption benefits of CMOS. The standard 555 timer typically uses 3 mA of current, while this CMOS version only uses 100µA (and other versions use below 5µA). An input to the 555 can draw .5µA, while an input to the CMOS version uses an incredibly small 10pA, more than four orders of magnitude smaller. The smaller input "leakage" currents permit much longer delays with the CMOS chips.[9]

Conclusion

At first, a chip die photo seems too complex to understand. But a careful look at the die of the LMC555 CMOS timer chip reveals the components that make up the circuit. One can pick out the PMOS and NMOS transistors, see how they are combined into circuits, and understand how the chip operates. Because the CMOS chip has a layer of polysilicon that isn't present in the classic bipolar 555 chip, it takes more effort to understand the CMOS chip. But fundamentally, both chips use similar analog functional blocks: the current mirror and the differential pair.

If you've found this look at the CMOS version of the 555 chip interesting, you should also look at my teardown of the classic 555 chip. Thanks to Zeptobars for the die photo of the CMOS chip.

Get announcements of my new articles by following @kenshirriff on Twitter.

Notes and references

[1] The book Designing Analog Chips written by the 555's inventor Hans Camenzind is really interesting, and I recommend it if you want to know how analog chips work. Chapter 11 has an extensive discussion of the 555's history and operation. Page 11-3 claims the 555 has been the best-selling IC every year, although I don't know if that is still true — microcontrollers have replaced timers in many circuits. The free PDF is here or get the book.

[2] The structure of a MOSFET transistor explains several things about it. The transistor is called a "field-effect transistor" (FET) because it is controlled by the electric field on the gate. Because the gate is separated by an insulating oxide layer, there is essentially no current flow through the gate. This is why CMOS circuits have such low power consumption. The thin oxide layer, however, can easily be damaged or destroyed by static electricity, which is why MOS integrated circuits are sensitive to static electricity.

[3] For simplicity, the cross-section diagram doesn't show the highly-doped P region (pink) that provides a connection to the underlying P body silicon, keeping it at the right voltage. (A via between the metal layer and pink silicon region is visible at the top of the diagram.) MOS transistors typically connect the source and body silicon together; the source and drain are otherwise structurally the same. I should also mention that the cross-section is simplified; in a real chip, the layers are more irregular.

MOS transistors originally used metal for the gate so they were named MOS after the three layers: Metal, Oxide, and Semiconductor (silicon). Although polysilicon gates replaced metal gates since the 1970s, the name remains MOS even though POS would be more accurate. Federico Faggin (a developer of the 4004 and Z-80 processors) explains how silicon gate technology revolutionized chips here.

[4] The structure of the transistor controls how much current flows through it. In particular, the current is proportional to the ratio of the gate's width and length (W/L). It's straightforward to see that doubling the width of the gate is similar to putting two transistors side-by-side in parallel, allowing twice the current. Doubling the length of the gate (so the current needs to travel twice as far through the gate) cuts the current in half due to physics reasons.

Two NMOS transistors in the LMC555 chip's flip flop. The left transistor is typical. The right transistor is a weak transistor with current flowing top to bottom.

In the CMOS 555 chip, transistors have a wide variety of W/L ratios, especially to control the currents in different branches of the current mirrors. Some of the weak transistors are hard to spot, such as the above weak transistor from the flip flop. The transistor on the left has a W/L ratio of about 7. The transistor on the right looks almost identical but careful examination shows it is actually rotated 90 degrees with the source and drain arranged vertically rather than horizontally. The W/L ratio of the transistor on the right is only about 0.17, making the transistor about 40 times weaker than the one one the left. In other words, the transistor on the left has a wide, short gate while the transistor on the right has a narrow, long gate.

[5] For more information about current mirrors, check wikipedia, any analog IC book, or chapter 3 of Designing Analog Chips.

[6] Differential pairs are also called long-tailed pairs. According to Analysis and Design of Analog Integrated Circuits the differential pair is "perhaps the most widely used two-transistor subcircuits in monolithic analog circuits." (p214) For more information about differential pairs, see wikipedia, any analog IC book, or chapter 4 of Designing Analog Chips.

[7] Because CMOS only uses power when circuits change state, power consumption is roughly proportional to frequency. This is the main limitation for CPU clock frequency: the chip will overheat if it is clocked too fast.

[8] Note that the three resistors for the voltage divider are parallel and next to each other. This helps ensure they have the same resistance even if there are electrical variations across the silicon.

[9] If you want a 555 timer that provides a long delay up to days, the CSS555 is an unusual option. This chip is pin-compatible with the 555, but internally it includes a programmable counter that can divide the output up to 1 million. The chip contains a one-byte EEPROM to hold the configuration and is programmed serially via the trigger and reset pins. Once programmed, it acts just like a regular 555, except with a very long delay.

Punched card sorters were a key part of data processing from 1890 until the 1970s, used for accounting, inventory, payroll and many other tasks. This article looks inside sorters, showing the fascinating electromechanical and vacuum tube circuits used for data processing in the pre-computer era and beyond.

Herman Hollerith invented punch-card data processing for the 1890 US census.[1] Businesses soon took advantage of punched cards for data processing, using what was called unit record equipment. Each punched card held one data record, consisting of multiple data fields. A card sorter sorted the cards into the desired order. Then a machine called a tabulator read the cards, added up desired fields and printed a report.

For example, a company could have one card for each invoice it needs to pay, as shown below, with fields for the vendor number, date, amount to pay, and so forth. The card sorter ordered the cards by vendor number. Then the tabulator generated a report by reading each card and printing a line for each card. Mechanical counters in the tabulator summed up the amounts, computing the total amount payable. Many other business tasks such as payroll, inventory and billing used punched cards in a similar manner.

Example of a punched card holding a 'unit record', and a report generated from these cards. From Functional Wiring Principles.

The surprising thing about unit record equipment is that it originally was entirely electro-mechanical, not even using vacuum tubes. This equipment was built from components such as wire brushes to read the holes in punched cards, electro-mechanical relays to control the circuits, and mechanical wheels to add values. Even though these systems were technologically primitive, they revolutionized business data processing and paved the way for electronic business computers such as the IBM 1401.

How a sorter works

A card sorter takes punched cards and sorts them into order based on a field, for example employee number, date, or department. One application is putting records in the desired order when printing out a report.[2] Another application is grouping record by a field, for instance to generate a report of sales by department: the cards are first sorted based on the department field, and then a tabulator sums up the sales field, printing the subtotal for each department.

To sort punched cards, they are loaded into the card hopper and fed through the sorter. Cards are read and directed into one of the 13 card pockets: 0 through 9, two "zone" pockets, and a Reject pocket. This is very different from a typical sort algorithm — cards aren't compared with each other — so you may wonder how this machine sorts its input.

IBM Type 80 Card Sorter.

Card sorting uses a clever technique called radix sort. The sorter operates on one digit of the field at a time, so to sort on a 3-digit field, cards are run through the sorter three times. First, the sorter deposits the cards into ten bins (0-9) based on the lowest digit of the field. The operator gathers up the cards from the bins in order (0 bin first and 9 bin last) and they are sorted again on the second-lowest digit, again getting stacked in bins 0-9. The important thing is that the cards in each bin will still be ordered from the first pass: bin 0 will have cards ending in 00 first, and cards ending in 09 last. The operator gathers up the cards in order again, yielding a stack that is now sorted according to the last two digits. The cards are run through the sorter a third time, this time sorting on the third-lowest digit. After the last run through the sorter, the cards are in order, sorted on the entire field.

The radix sort process is fast and simple. You may be familiar with comparison-based sorting algorithms like quicksort that compare and shuffle entries, taking O(n log n) time. Radix sort can be implemented with a simple electric mechanism (along with an operator busily moving stacks of cards around), and takes linear time.[3] Although the sorter's hopper can hold 3600 cards, it can sort as many cards as desired, as long as the operator keeps loading and unloading them.

The sorting mechanism

You might expect a sorter to haves multiple sensors to read the holes from a card and 10 flippers to direct the card into the right bin. But the actual implementation of the early sorters is amazingly simple and clever, using a single sensor and a single electromagnet.

An IBM punched card, showing the encoding of digits and letters.

The photo above shows the layout of a standard IBM punched card, which stores 80 characters in 80 columns. The characters are printed along the top of the card and the corresponding holes are punched below. For a digit, each column has a single punch in row 0 through 9 to indicate the digit in that column. (I'll explain the two additional "zone" rows for alphabetic characters later.)

The diagram below shows how the card sorter works. Cards are fed through the sorter "sideways" starting with the bottom edge (called the "9-edge" because the bottom row is row 9). A small wire brush (red) detects the presence or absence of a hole; the brush will contact the rows in order from 9 to 0. An intact card blocks the wire brush from contacting the metal roller. But if there is a hole in the card, the brush makes contact with the roller through the hole, completing an electrical circuit.

Card sorting mechanism in the IBM Type 80 and Type 82 card sorter.

A stack of metal guides (called chute blades) are used to direct the card into the appropriate bin. As a card is fed through the sorter mechanism, it slides under the chute blades as shown in the top illustration. If the brush (red) makes contact through a hole, it trips an electromagnet (purple) that pulls down a metal armature plate (green), allowing the ends of the chute blades to drop down. This causes the card to go above the chute blade rather than underneath it. The key is the chute blades have the same spacing as the rows on the card so the hole is detected just before the card reaches the corresponding blade. (If no hole is detected, the card passes under all the chute blades and into the Reject bin.)

For example, in the diagram above the card has slid under chute blades 9 through 5. The brush makes contact through hole 4, energizing the electromagnet and causing the blades to drop just before the card reaches blade 4. Thus, the card is directed into chute 4.

The chute blades can be seen in the photo below; they are the metal strips running down the center of the sorter between the feed rollers. Each chute blade ends at the appropriate pocket, causing the card to drop into the right location.

IBM Type 82 Card Sorter. The feed rollers under the glass top send cards through the sorter. The pockets at bottom collect the cards. This is a German model, thus the 'Sorteirmaschine' label.

Alphabetic sorting

Numeric values have one hole in a column and are straightforward to sort, but how about alphabetic characters? In addition to the ten numeric rows 0-9, punched cards also have two additional "zone" rows (11 and 12). The diagram below shows the encoding; a letter combines a digit punch (1-9) with a zone punch (a hole in 0, 11 or 12). Confusingly, row 0 is used both as a zone and a digit.

The IBM punched card code, from IBM 82, 83, and 84 Sorters Reference Manual.

With this encoding, a sorter can perform an alphabetical sort in two passes. The first pass sorts on the numeric rows, putting cards into bins 1 through 9. These bins are gathered up in order and the cards are sorted a second time. For the second sort, the zone rows (0, 11 and 12) are read and the digit rows are ignored. The result is A through I sorted in bin 12, J through R in bin 11, and S through Z in bin 0. For multiple-character fields, the process is repeated for each column.

Control switches on the sorter select a numeric or zone sort. The photos below show these controls on the Type 80 (top) and 83 (bottom) sorters. The Type 80 sorter has a round commutator with tabs that are moved in or out to select which rows to use; the red tab selects a zone sort. The Type 83 sorter has pushbuttons to select rows, as well as a switch to select different types of sorting (Numeric, Zone, or Alpha).

Sorter controls on the Type 80 (top) and Type 83 (bottom) sorters.

A brief history of IBM's horizontal sorters

Type 80 sorter

In 1925, IBM introduced its first horizontal card sorter, the Type 80.[4] This sorter became very popular with 10,200 units in use by 1943. IBM continued to support this card sorter until 1980, a remarkable lifespan of 55 years.

IBM Type 80 punched card sorter.

The Type 80 sorter performed useful data processing with electromechanical technology without the benefits of transistors or even vacuum tubes. The Type 80 sorter used a relay to latch the electromagnet on for the duration of the card; this is the extent of its "intelligence".[5]

Even though it was electrically simple, the sorter was a piece of precision machinery. It sorted 450 cards per minute, so the chute blades must pop down and up more than 7 times per second. Any timing error could result in a mis-sorted card or could cause the blade to nick the edge of the card.

Type 82 sorter

IBM's next sorter model was the Type 82, able to sort 650 cards per minute, and renting for 55 dollars per month. At the faster speed, an electromechanical relay wasn't fast enough to control the magnet, so vacuum tubes were used.

IBM Type 82 punched card sorter.

Type 83 sorter

The next sorter model, the Type 83, was introduced in 1955. It could sort 1000 cards per minute and rented for 110 dollars per month. This sorter used a much more advanced technique for processing cards: instead of selecting the card chute at the instant a hole was detected, the 83 sorter read all the holes in the column before selecting a card chute. This allowed the Type 83 sorter to perform tasks that were impossible with the previous sorters, such as rejecting erroneous cards that had multiple holes in one column.

IBM Type 83 card sorter.

Type 84 sorter

IBM's most advanced sorter was the Type 84, introduced in 1959 and produced until 1978. This sorter replaced the wire brush with a photoelectric sensor and used solid state technology. A vacuum feed grabbed cards more effectively. With these improvements, it could process 2000 cards per minute, over 30 cards per second flying through the sorter.

IBM Type 84 card sorter. Photo courtesy of Computer History Museum.

Sorters and IBM's industrial design

As you may have noticed from the photos above, IBM's industrial design changed drastically from the early sorters.[6] The Type 80 sorter is an example of IBM's early hardware, built of cast iron in a "Queen Anne" style with curved cabriole legs. The mechanisms and motor of the Type 80 sorter are visible. By the time of the Type 82 sorter, IBM was using industrial design firms and had an "understated Art Deco aesthetic". Note the curved, sleek enclosure of the Type 82 sorter, and its shiny horizontal metal trim. The Type 83 and Type 84 sorters are more boxy, without the decorative trim, moving closer to the dramatic modernist style of IBM's computers of the 1960s.

The technology inside the sorter

This section looks inside the Type 83 sorter and describes how it was implemented using tube and relay technology. Unlike earlier sorters, the Type 83 sorter read the entire column before selecting the bin for the card. This permitted more complex processing, such as detecting erroneous cards with multiple punches. The sorter used 12 vacuum tubes to store the holes in the column as they were read. Electromechanical relays implemented the decision logic to select the bin, and then solenoids activated the chute blade for that bin.

Removing the panel from the end of the sorter shows most of the mechanism (below). At the top is the feed hopper where cards are fed into the sorter. On the right, a pulley connects the feed mechanism to the motor. Mechanical cams (behind clear plastic) are also driven by the motor. Below the power switch and fuses, the 12 vacuum tubes are barely visible. Two rows of rectangular relays provide the control logic for the sorter. Behind the relay panel is the power supply for the sorter.

Inside the IBM type 83 card sorter. At top is the card feed. The cams are behind clear plastic.

There is no clock for the sorter; all timing is relative to the position of the driveshaft, with one 360° rotation corresponding to one clock cycle. Sixteen cams (behind plastic near the top of the sorter) open and close switches at various points in the cycle to provide electrical signals at the right times.

The photo below shows the brush and the chute blade selection solenoids. On the right, you can see the pointer that indicates the selected column. The brush itself is below the pointer. In the middle are the 12 oblong coils that select the bin. These coils push the selected chute blades down (using the levers at the front), allowing the card to pass between the selected blades.

Brush and sort mechanism in the IBM type 83 card sorter.

The card is read by a brush that makes electrical contact through a hole in the card. The brush is positioned to the proper column by manually turning a knob that rotates the worm screw and moves the brush. As you can see in the photo below, the small brush contacts the metal contact roll.

Brush mechanism in IBM Type 83 card sorter.

The photo below shows the drive rollers that feed cards through the sorter, dropping them into the appropriate bins, as directed by the chute blades. The chute blades are barely visible; they are the inch-wide metal strip on the right. The chute blades are stacked together, with just enough room for a card to pass between them.

Feed rollers and bins for the IBM type 83 card sorter. Cards enter at the far end. The chute blades are the inch-wide strip of metal to the right of the feed rolls.

In order to read a column before selecting a chute, the sorter needed a storage mechanism to remember the 12 hole values. This mechanism is an interesting combination of mechanical switches, vacuum tubes and relays.

Type 2D21 thyratron tubes in the IBM Type 83 card sorter. Each tube stores the presence of one hole.

Each bit of storage used a 2D21 thyratron tube. This interesting tube is about 2 inches tall. Unlike a regular vacuum tube, it contains low-pressure xenon. If the tube is activated (via its two control grids), the xenon ionizes, causing the tube to remain on until current through it is interrupted. Thus, the tube can be used for storage. Each tube is in a pull-out module that has the necessary resistors at the bottom.

As each card row passes under the brush, the corresponding thyratron is selected. Rotating cams attached to the driveshaft mechanically activate switches at the right point in the cycle to select each thyratron.[7] It seems strange to combine high-speed tubes with mechanically operated switches, but cam-based timing was common in that era. Once the column has been read into the thyratron tubes, the hole pattern is transferred to relays for "processing".

Relay logic

Unlike the older sorters, the Type 83 sorter reads the entire column before selecting a bin. This lets it, for instance, reject erroneous cards with multiple punches in one column. How does it detect multiple punches? Instead of using logic gates built from tubes or transistors, it uses a network of relays. This section describes how relay logic works.

IBM relay (permissive make type).

A relay (shown above) contains an electromagnet coil that moves contacts, switching circuits on or off like a toggle switch. In a typical relay, the circuit connects to the "normally closed" pin when the relay is inactive, and connects to the "normally opened" pin when the relay is active. A relay may have multiple sets of these contacts. The diagram below shows how a relay appears on IBM schematics. On the left is the electromagnet coil, and on the right is one set of contacts. The diagram shows the inactive state, with the center wire touching the bottom contact. When the relay is energized, the center wire moves and touches the top contact, switching the circuit.[8]

Symbol for a relay: relay number 9 and contact set 2.

The diagram below shows the relay circuit in the sorter that counts the holes and determines if zero, one, or more holes are present. With no holes (top), current flows along the bottom path. A single hole (middle) energizes a relay (#7 in this case), transferring current to the middle path. The next hole (bottom) energizes a second relay (#5 in this case), transferring current to the top path. Thus, this chain of relays determines the number of holes present, and erroneous cards can be rejected.

Relay network in the IBM Type 83 card sorter. This circuit determines if the card has 0, 1, or more holes.

A more complex relay circuit was the optional faster alphabetic sorting feature available on the Type 83 sorter. For an additional $15 a month rental fee, customers could sort the most common letters in one pass, saving time while sorting. This circuit used several large relays, each with a dozen sets of contacts (an unusually large number). These relays decoded the hole pattern to determine the specific character and then selected the appropriate bin. The diagram below shows a small part of the circuit; click for the full diagram.

Detail from relay network for enhanced alphabetic sorting in the IBM Type 83 card sorter.

The photo below shows the wiring on the back of the relay panel. The wiring in the sorter is all point-to-point wiring, rather than printed circuit boards. Note that the wires are carefully laced into neat bundles.

Wiring inside the IBM type 83 card sorter. This is the back of the relay panel.

The power supply

When the Type 80 sorter was introduced, standard AC power hadn't fully taken over and parts of the United States used DC or 25 Hertz AC.[9] Thus, the sorter needed to handle fifteen different line inputs including unusual ones such as 115V DC or 230V 25 Hertz AC. Internally, the sorter circuits used 115V DC, a rather high voltage for "logic" circuits. If the line voltage was AC, the power supply used a transformer and selenium rectifiers (an early form of diode build from stacks of selenium disks) to produce DC. The Type 81 power supply was considerably more complicated since its vacuum tubes required -40V DC. To create this voltage, the power supply used a vacuum tube oscillator, another transformer and vacuum tube diodes.

Power supply for the IBM Type 83 card sorter. Filter capacitors are at top. The power transformer is on the left. Selenium rectifiers (left and right) are built from stacks of selenium disks.

By the time the Type 83 sorter was introduced, AC line power was almost universal, so a transformer could replace the oscillator power supply. The picture above shows the power supply in a Type 83 sorter, showing the large power transformer (left), capacitors (orange cylinders), and selenium rectifiers (gray finned objects at lower left and right). Needless to say, modern switching power supplies are much more compact and efficient than the early power supplies used in the sorters.

Conclusion

IBM Type 82 punched card sorter. From IBM Card Equipment Summary, 1957.

Before computers existed, businesses carried out data processing tasks by using punched cards and electromechanical equipment such as the card sorter. Card sorters remained useful in the computer era and were still used until punched cards finally died out. Sorters used a variety of interesting technologies from mechanical brushes and cams to relay logic and thyristor tubes. Even though punched cards are now obsolete, their influence is visible whenever you use 80-column text.[5]

The Computer History Museum in Mountain View demonstrates a working card sorter weekly, so stop by if you're in the area. Thanks to the IBM 1401 restoration team and the Computer History Museum for access to the sorters.

If you're interested in vintage computing, you should follow me on Twitter.

Notes and references

[1] Herman Hollerith is one of the key inventors of the data processing industry. He founded a company that, after various mergers, became IBM in 1924. Hollerith's 1889 patent 395,782 (Art of Compiling Statistics) describes how to record data on punched cards and then generate statistics from those cards. Hollerith also gave his name to the Hollerith constants used for character data in old FORTRAN programs.

[2] Using a sorter to order cards for a report is roughly analogous to a database ORDER BY operation. Sorting cards so subtotals can be computed is analogous to a GROUP BY operation.

[3] Strictly speaming, radix sort on n records tames O(m*n) time if the field is m characters wide. But since punched cards limit m to 80 columns, m can be considered a constant factor, maming radix sort linear.

[4] The Type 80 card sorter was invented by Eugene Ford in 1925 and received patent 1,684,389 (Card feeding and handling device). The card sorter has many interesting features so it's a bit surprising that the patent covers just the "picker" that feeds cards through the sorter one at a time. The drawing below is from the patent, and can be compared with the photo of the sorter.

IBM card sorter, from patent 1,684,389 (Card feeding and handling device), 1928.

You might wonder how the Type 80 card sorter was introduced in 1925 when the modern punched card was developed a few years later in 1928. The first Type 80 sorters worked with 45-column cards and were slightly modified in 1928 to support 80-column cards. The changes were minor since the cards remained the same size; the brush mechanism needed to have 80 stops instead of 45.

[5] For detailed information on the sorters (including wiring diagrams) see the Reference Manual and the IBM Customer Engineering Manual.

[6] The industrial design section is based on The Interface: IBM and the Transformation of Corporate Design. This book gives a detailed history and analysis of IBM's industrial design.

[7] A primitive but complex mechanism is used to select one thyratron tube as each row is read. Although the 12 thyratrons are physically installed in a line, they are electrically wired in a 3x4 grid. Four mechanical cams select a grid row; one cam is activated at a time. You'd expect three cams to select a grid column, but there are six. The problem is a single mechanical cam can't turn the switch on and off fast enough. The solution is to use two cams in series with staggered operation. The first cam closes the circuit to select the thyratron, while the second cam opens a short time later to de-select the thyratron. By using two cams and two switches, each switch has more time to open and close. As a card is read, the cams open and close, selecting each thyratron in sequence to hold the value (hole or no hole) for that card position. After the card column has been read into the thyratrons, the hole pattern is transferred to 12 relays and the thyratrons are reset for the next card.

[8] IBM's relays are discussed in detail in Commutation and Control, IBM Relays Reference Manual and IBM Relays Customer Engineering.

[9] The story of why parts of the US used 25 Hertz power instead of the standard 60 Hertz is interesting. Hydroelectric power was developed at Niagara Falls starting in 1886. To transmit power to Buffalo, Edison advocated DC, while Westinghouse pushed for polyphase AC. The plan in 1891 was to use DC for local distribution and (incredibly) compressed air to transmit power 20 miles to Buffalo, NY. By 1893, the power company decided to use AC, but used 25 Hertz due to the mechanical design of the turbines and various compromises. In 1919, more than two thirds of power generation in New York was 25 Hertz and it wasn't until as late as 1952 that Buffalo used more 60 Hertz power than 25 Hertz power. The last 25 Hertz generator at Niagara Falls was shut down in 2006. See 25-Hz at Niagara Falls, IEEE Power and Energy Magazine, Jan/Feb 2008 for details.

Alan Kay recently gave his 1970's Xerox Alto to Y Combinator and I'm helping with the restoration of this legendary system. The Alto was the first computer designed around a graphical user interface and introduced Ethernet and the laser printer[1] to the world. The Alto also was one of the first object-oriented systems, supporting the Mesa and Smalltalk languages. The Alto was truly revolutionary when it came out in 1973, designed by computer pioneer Chuck Thacker.

Xerox built about 2000 Altos for use in Xerox, universities and research labs, but the Alto was never sold as a product. Xerox used the ideas from the Alto in the Xerox Star, which was expensive and only moderately successful. The biggest impact of the Alto was in 1979 when Steve Jobs famously toured Xerox and saw the Alto and other machines. When Jobs saw the advanced graphics of the Alto and Star, he was inspired to base the user interfaces of the Lisa and Macintosh systems on Xerox's ideas, making the GUI available to the mass market.[2]

How did Y Combinator end up with a Xerox Alto? Sam Altman, president of Y Combinator has a strong interest in the Alto and its place in computer history. When he mentioned to Alan Kay that it would be fun to see one running, Alan gave him one.

This article gives an overview of the Alto, its impact, and how it was implemented. Later articles will discuss the restoration process as we fix components that have broken over the decades and get the system running.

The Xerox Alto II XM computer. Note the video screen is arranged in portrait mode. Next to the keyboard is a mouse. The Diablo disk drive is below the keyboard. The base contains the circuit boards and power supplies.

The photo above shows Y Combinator's Alto computer. The Alto has an unusual portrait-format display, intended to match an 8½" by 11" page of paper. Most displays of the time were character-oriented, but the Alto had a bitmapped display, with each of the 606x808 pixels controllable independently. This provided unprecedented flexibility for the display and allowed WYSIWYG (what-you-see-is-what-you-get) editing. The bitmapped display memory used almost half the memory of the original Alto, however

In front of the keyboard is the three-button mouse. Xerox made the mouse a fundamental input device for the Alto and designed the user interface around the mouse. The disk drive at the top of the cabinet takes a removable 2.5 megabyte disk cartridge. The small capacity of the disk was a problem for users, but files could also be accessed over the Ethernet from file servers. The lower part of the cabinet contains the computer's circuit boards and power supplies, which will be discussed below.

Dynabook and the vision of the Alto

The motivation for the Alto was Alan Kay's Dynabook project. In 1972, he wrote A Personal Computer for Children of All Ages, setting out his vision for a personal, portable computer for education (or business), with access to the world's knowledge. In effect, Alan Kay presented a detailed vision for the touchscreen tablet decades before it was practical.[3]

Alan Kay with a mockup of the Dynabook. Although the Dynabook was proposed years before the necessary hardware was available, the ideas could be tried out on the Alto, an "interim Dynabook". Photo by Marcin Wichary, CC BY 2.0.

The Dynabook (seen in mockup above) was to be a low-cost, battery-powered, portable computer with a touchscreen and graphics, able to access information over the network. The system would be highly interactive and programmable in an object-oriented language. As well as the keyboard, voice input could be used. Books could be downloaded and purchased.

Since the necessary hardware was science fiction when the Dynabook was proposed, the Alto was built as an "interim Dynabook" for research. Butler Lampson's 1972 memo entitled "Why Alto" proposed using the Alto for research in distributed computing, office computing, graphics, and personal computing. He stated, "If our theories about the utility of cheap, powerful personal computers are correct, we should be able to demonstrate them convincingly on Alto." Xerox used the Alto to research and develop the ideas of personal computing.[4]

Software

The Alto had a large collection of software, largely implemented in the BCPL (predecessor to C), Mesa and Smalltalk languages. The Bravo text editor (seen below) is considered the first WYSIWYG editor, with formatted text on the screen matching the laser printer output. Also below is the Draw illustration program which used the mouse and an icon menu to create drawings. Other significant programs included email, file transfer (FTP) and an integrated circuit editor. The Alto also ran some of the first networked multiplayer games such as Alto Trek and Maze War.

Bravo was the word processor for the Xerox Alto, providing WYSIWYG text editing. The Draw program for the Xerox Alto uses the mouse and icons for drawing. The Alto simulator Salto was used for these images.

Hardware

The Alto was introduced in 1973. To understand this time in computer hardware, the primitive 4004 microprocessor had been introduced a couple years earlier. Practical microprocessors such as the 6502 and Z-80 were still a couple years in the future and the Apple II wouldn't be released until 1977. At the time, minicomputers such as the Data General Nova and PDP-11 built processors out of hundreds of simple but fast TTL integrated circuits, rather than using slow, unreliable MOS chips. The Alto was built similarly, and is a minicomputer, not a microcomputer.[5]

The Alto has 13 circuit boards, crammed full of chips. Each board is a bit smaller than a page of paper, about 7-5/16" by 10", and holds roughly 100 chips (depending on the board). For the most part, the chips are bipolar TTL chips in the popular 7400 series. (The MOS memory chips are an exception.) The image below shows the Alto's card rack and some of the boards.

The Xerox Alto contains 21 slots for circuit boards. Each board is crammed with chips, mostly TTL.

The Alto's CPU consists of three boards. The Control board is the heart of the processor: it manages the 16 microcode tasks and contains microcode in PROM. The ALU board performs arithmetic and logic operations, and provides the main register storage. The Control RAM board provides additional microcode storage in RAM and additional processor registers. (Note that in a few years, a single chip microprocessor could replace these three boards.)

The photo below shows the ALU board. The 16-bit addition, subtraction and Boolean operations are performed by four of the popular 74181 ALU chip, used in many other processors of the era. Each 16-bit register requires multiple chips for storage. The 32x16 register file is historically interesting as it is built from i3101 64-bit bipolar memory chips, Intel's first-ever product.

The ALU board from the Xerox Alto.

The Alto came out at a time when memory was expensive and somewhat unreliable.[6] In 1970, Intel introduced the first commercially available DRAM memory, the 1103 chip, holding 1 kilobit of storage and making magnetic core memory obsolete. The original Alto used 16 boards crammed full of these chips to provide 128 kilobytes of memory. The Alto we have is a more modern Alto II XM (eXtended Memory) with 512 kilobytes of storage on four boards. Even so, the limited memory capacity was a difficulty for programmers and users. The photo below shows one of the memory board, packed with denser 16 kilobit chips.

A 128KB memory card from the Xerox Alto. It uses eighty 4116 memory chips, each with 16 kilobits of storage.

Microcode

The Alto hardware provides a simple micro-instruction set and uses microcode to implement a full instruction set on top of this,[7] including some very complex instructions. The Alto introduced the BITBLT graphics instruction, which draws an arbitrary bitmap to the display in a variety of ways (e.g. paint over, gray stipple or XOR). "Blitting" became a standard graphics operation, still be found in Windows.

The Alto takes microcode further than most computers, implementing many functions in microcode that most computers implement in hardware, making the Alto hardware simpler and more flexible.[8] Microcode tasks copy every pixel to the display 30 times a second, refresh dynamic memory, read the mouse, handle disk operations, drive Ethernet — operations performed in hardware on most computers. Much of the Alto microcode is stored in RAM, so languages or even user programs can run custom microcode.

Next steps

The first step to getting the system running will be to make sure the power supplies work and provide the proper voltages. The Alto uses four complex but highly efficient switching power supplies: +15V, -15V, +12V, and +/-5V.[9] (Most of the chips use +5V, but the memory chips and some interfaces require unusual voltages.)

The power supplies are mounted in the cabinet behind the card cage, as you can see in the photo below. The +/-5V supply is on the right, and the three other power supplies are on the left.

Looking into the back of the Alto, you can see the four switching power supplies (blue). The card cage is behind them. The disk drive has been removed from the top of the cabinet. On the back of the cabinet, connectors to the display, Ethernet, and other devices are visible.

Restoring this systems is a big effort but fortunately there's a strong team working on it, largely from the IBM 1401 restoration team. The main Alto restorers so far are Marc Verdiell, Luca Severini, Ron, and Carl Claunch. Major technical contributions have been provided by Al Kossow (who has done extensive Alto restoration work in the past and is at the Computer History Museum) and the two Keiths (who have restored Altos at the Living Computer Museum). For updates on the restoration, follow kenshirriff on Twitter.[10]

Notes and references

[1] The laser printer was invented at Xerox by Gary Starkweather and networked laser printers were soon in use with the Alto. Y Combinator's Alto is an "Orbit" model, with slots for the four boards that drive the laser printer, laboriously rendering 16 rows of pixels at a time.

[2] Malcolm Gladwell describes Steve Jobs' visit to Xerox in detail in Creation Myth. The article claims that Xerox licensed its technology to Apple, but strangely that license wasn't mentioned in earlier articles about Xerox's lawsuit against Apple. The facts here seem murky.

[3] Amazingly, Alan Kay even predicted ad blockers in his 1972 Dynabook paper: "One can imagine one of the first programs an owner will write is a filter to eliminate advertising!" I thought that Alan Kay missed WiFi in the Dynabook design, but as he points out in the comments, wireless networking was a Dynabook feature.

[4] Xerox called the Alto "a small personal computing system", saying, "By 'personal computer' we mean a non-shared system containing sufficient processing power, storage, and input-output capability to satisfy the computational needs of a single user." (See ALTO: A Personal Computer System Hardware Manual.) Xerox's vision of personal computing is described in the retrospective Alto: A personal computer.

Since the concept of "personal computer" is ill-defined, I won't argue whether the Alto is really a personal computer or not. Costing tens of thousands of dollars, the Alto was much more expensive than what people usually think of as a personal computer. The book Fumbling the Future: How Xerox Invented, then Ignored, the First Personal Computer calls Xerox the inventor of the personal computer. (Or you can read Datapoint: The Lost Story of the Texans Who Invented the Personal Computer Revolution, which makes a convincing case for the Datapoint 2200 as the first personal computer.) For a long list of candidates for the first personal computer, see blinkenlights.com.

[5] The Alto documentation refers to the microprocessor, but this term describes the microcode processor. The Alto does not use a microprocessor in the modern sense.

[6] Since memory chips of the era were somewhat unreliable (especially in large quantities), the Alto used parity plus 6 bits of error correcting code to improve reliability. As well as the four memory boards, the Alto has three memory control boards to decode addresses and implement error correction.

[7] The Alto's microcode instructions and "real" instructions (called Emulated Instructions) are described in the Hardware Reference. The Alto's instruction set is similar to the Data General Nova.

[8] The Alto has 16 separate microcode tasks, scheduled based on their priority. The Alto's microcode tasks are described in Appendix D of the Hardware Manual. The display task illustrates how low-level the microcode tasks are. In a "normal" computer, the display control hardware fetches pixels from memory. In the Alto, the display task in microcode copies pixels from memory to the display hardware's 16 word pixel buffer as each line is being written to the display. The point is that the hardware is simplified, but the microcode task is working very hard, copying every pixel to the display 30 times a second. Similarly the Ethernet hardware is simplified, since the microcode task does much of the work.

[9] Steve Jobs claimed that the Apple II's use of a switching power supply was a revolutionary idea ripped off by other computer manufacturers. However, the Alto is just one of many computers that used switching power supplies before Apple (details).

[10] Many sources have additional information on the Alto. Bitsavers has a large collection of Alto documentation. DigiBarn has photos and more information on the Alto. The Computer History Museum has a large collection of Alto source code online here. The Alto simulator Salto is available here if you want to try out the Alto experience. Wikipedia has a detailed article on the Alto.

A few days ago, I wrote about how I'm helping restore a Xerox Alto for Y Combinator. This new post describes the first day of restoration: how we disassembled the computer and disk drive and fixed a power supply problem, but ran into a showstopper problem with the disk interface.

The Xerox Alto was a revolutionary computer from 1973, designed by computer pioneer Chuck Thacker at Xerox PARC to investigate ideas for personal computing. The Alto was the first computer built around a mouse and GUI, as well as introducing Ethernet and laser printers to the world. The Alto famously inspired Steve Jobs, who used many of its ideas in the Lisa and Macintosh computer.

Alan Kay, whose vision for a personal computer guided the Alto, recently gave an Alto computer to Y Combinator. Getting this system running again is a big effort but fortunately I'm working with a strong team, largely from the IBM 1401 restoration team. Marc Verdiell, Luca Severini, Ron, Carl Claunch, and I started on restoration a few days ago, as shown in Marc's video below.

Disassembling the Alto

We started by disassembling the computer. The Xerox Alto has a metal cabinet about the size of a dorm mini-fridge, with a Diablo hard disk drive on top, and a chassis with power supplies and the circuit boards below. With some tugging, the chassis slides out of the cabinet on rails as you can see in the photo below. At the front are the four cooling fans, normally protected by a decorative panel. Note the unusual portrait layout of the display.

The Xerox Alto II XM 'personal computer'. The card cage below the disk drive has been partially removed. Four cooling fans are visible at the front of it.

With the chassis fully removed, you can see the four switching power supplies on the left, the blue metal boxes. The computer's circuit boards are on the right, not visible in this picture. The wiring for the backplane is visible at right front, with pins connected by wire-wrapped wire connections. This wiring connects the circuit boards together.[1]

The Alto's chassis has been removed. On the left are the four switching power supplies (blue boxes). On the right, the connections for the wire-wrapped backplane are visible. The circuit boards plug into this backplane.

The power supplies

Our first goal was to make sure the power supplies worked after decades of sitting idle. The Alto uses high-efficiency switching power supplies.[2] To explain the power supplies in brief, input power is chopped up thousands of times a second to produce a regulated voltage. Unlike modern computer power supplies, there's a second switching stage (the inverter), which drops the voltage to the desired 15 volts. This was more complexity than I expected, but fortunately the detailed power supply manual was available online, thanks to Al Kossow's bitsavers.[3] We tested each power supply with a resistor as a dummy load and checked that the output voltage was correct. We also used an oscilloscope to make sure the output was stable. All the power supplies worked fine, except for the +15V supply (top center), which had trouble getting up to 15 volts and staying there.

We disassembled the faulty power supply to track down the problem. The photo of the power supply below shows how densely components are crammed into the power supply. Two of the circuit boards have been removed and are at the back. Note the three large filter capacitors at the front.

Switching power supply from the Xerox Alto computer. Two of the control boards have been removed and are visible at back.

We noted signs of overheating on the AC connector, as well as a somewhat sketchy looking repair (a trace replaced by a wire) and some signs of corrosion. Apparently the power supply had problems in the past and had been serviced. We cleaned up the corrosion and it appeared to be superficial.

The power supply disassembled easily for repair, as you can see below. The main board is at the right. The tower of three inductors on the main board is an unusual way of mounting inductors. Three circuit boards (top) plug into the main board. Because the power supply uses discrete components instead of a modern SMPS control IC, it needs a lot of control circuitry. The switching transistors (lower center) are mounted onto metal heat sinks for cooling.

The Alto's switching power supply, disassembled. The main board is in the lower right. The three circuit boards are at top, below the large input capacitors.

The large capacitors were attached with screws, making it easy to remove them for testing. A capacitance meter showed that the three large capacitors had failed, explaining why the power supply had trouble outputting the desired voltage. Ron went off and found replacement capacitors, although they weren't an exact match for the originals. With the new capacitors mounted in place, the power supply worked properly.

Inside the Diablo disk drive

We also looked at the Diablo disk drive, which provides 2.5MB of storage for the Xerox Alto. The first step was removing the disk pack. In normal operation, the front of the drive is locked shut to keep the disk from being removed during use. To remove the disk without powering up the drive, we had to open the drive and manually trip the latch that locks it shut (see Diablo drive manual).

This picture shows the disk pack being reinserted into the drive. Unlike modern hard disk drives, the Alto's disk can be removed from the drive. Users typically used different disks for different tasks — a programming disk, a word processing disk, and so forth. The disk pack is a fairly large white package, resembling a cross between an overgrown Frisbee and a poorly-detailed Star Wars spaceship. The drive's multiple circuit boards are also visible in the photo.[4]

Inserting a 2.5 MB hard disk pack into the Diablo drive used by the Xerox Alto computer.

As the disk pack enters the drive, it opens up to provide access to the disk surface. The photo below shows the exposed surface of the disk, brownish from the magnetizable iron oxide layer over the aluminum platter. The read/write head is visible above the disk's surface, with another head below the disk. The disk stores data in 203 concentric pairs of tracks, with the heads moving in and out together to access each pair of tracks.

Closeup of the hard disk inside the Diablo drive. The read/write head (metal/yellow) is visible above the disk surface (brown).

Although the heads are widely separated during disk pack insertion, they move very close to the disk surface during operation, floating about one thousandth of a millimeter above the surface. The diagram below from the manual helps visualize this minute distance, and illustrates the danger of particles on the disk's surface.

The Diablo disk and why contaminants are bad, from the Alto disk manual.

The disk interface cliffhanger

The final activity of the day was making sure all the Alto's circuit boards were in the right slots and the cables were all hooked up properly.[5] Everything went smoothly until I tried to hook up the Diablo disk drive to the disk interface card: the disk drive cable didn't fit on the card's connector!

The cable to the Alto disk didn't fit onto the disk interface card!

After trying various combinations of cables and edge connectors, we discovered that the rainbow-colored ribbon cable you can see in the lower right above did fit the disk interface card. But instead of going to the Diablo disk drive, this cable went to a connector on the back of the Alto labeled "Tricon". Tricon is the controller for the Trident Disk, a high-capacity disk drive that could be used with the Alto, providing 80 MB instead of the just 2.5 MB that the standard Diablo drive provides. Looking at the disk interface card more closely, we saw it was labeled "Alto II Trident Disk Interface" (upper left corner of the photo below), confirming that it was for the Trident.

Trident Disk Interface card for Xerox Alto computer. (See label in upper left.)

It was a shock to discover the disk interface card was for the Trident drive, since our Alto has the standard Diablo drive, which is completely incompatible with the Trident.[6] We checked all the boards and verified that the system was missing the Diablo interface board. This was a showstopper problem; with the wrong board, the disk drive would be unusable and we wouldn't be able to boot up the system. What could we do? Network boot the Alto? Build a disk simulator? Find a Trident drive on eBay? (We actually found a Trident disk platter on eBay for $129, but no drive.)

Tune in next episode to find out what we did about the disk interface problem. (Spoiler: we found a solution thanks to Al Kossow.)

Notes and references

[1] The physical layout of the power supplies is specified on page 11 of the Alto documentation introduction. On the top are three Raytheon/Sorensen power supplies, +12V (15A), +15V (12A), and -15V (12A). At the bottom is a large LH Research Mighty Mite power supply providing +5V (60A) and -5V (12A).

Why the variety of voltages? Most of the circuitry in the Alto uses 5V, which is standard for TTL chips. The MOS memory chips use +5V, -5V and +12V. The Ethernet card uses +15V and -15V, with +15V powering the transceiver. The disk drive uses +/- 15V.

[2] Steve Jobs claimed that the Apple II's use of a switching power supply was a revolutionary idea ripped off by other computer manufacturers. However, the Alto is just one of many computers that used switching power supplies before Apple (details).

[3] For full details on the power supply operation, see the block diagram. First, the 115V AC line input is converted to 300V DC by a rectifier and voltage doubler. (The voltage doubler is a clever way of supporting both 115V and 230V inputs; using the doubler with 115V. This is why older PCs have a switch on the power supply to select 115V or 230V. Modern power supplies handle a wide input range, and don't require a switch.) Next, the power supply has a chopper, a PWM transistor circuit that chops up the 300V DC, producing a regulated 120V-200V DC, depending on the output load. This goes to the inverter, which drives a step-down/isolation transformer that produces the desired 15V output. A regulation circuit sends feedback to the chopper based on the output voltage. Meanwhile, an entirely separate switching power supply circuit generates voltages (including +150V) used by the power supply internally.

Modern power supplies use a single switching stage in place of the separate chopper and inverter. I believe the two stages were used to reduce the load on the bipolar switching transistors, which don't have the performance of modern MOSFET switching transistors. (As Ron pointed out, modern power supplies often have a PFC (power factor correction) stage for improved efficiency. Thus, the two-stage design has returned, although the stages are entirely different now.)

Modern power supplies use a power supply control IC. The Alto's power supply instead has control circuits built from simple components: transistors, op amps, 555 timers. This is one reason the power supply requires three circuit boards.

[4] The following table from the Alto disk manual gives the stats for the drive.

Statistics on the Diablo 31 disk used with the Xerox Alto computer.

[5] The Alto backplane has 21 slots, not all of which are used in our system. The list of which board goes into which slot is on page 8 of the Alto documentation.

[6] I suspect that the Y Combinator Alto originally had both a Trident drive and a Diablo drive (as well as four Orbit boards to drive a laser printer), and when it was taken out of service, the Trident drive, the Diablo interface board, and the Orbit boards went off somewhere else. This left the Alto with a drive that didn't match the interface card.

For reference, schematics and documentation on the Trident interface board are here. Despite all the chips on the disk interface board, it doesn't do very much, since each TTL chip is fairly simple. The interface board has some counters, one word data buffers, parallel/serial conversion, and a bit of control logic. The Alto was designed to offload many hardware tasks to microcode, so the hard work of disk I/O is performed in microcode (software).

How does a tiny chip time the runners in the Bay to Breakers race? In this article, I take die photos of the RFID chip used to track athletes during the race.

Bay to Breakers, 2016. Photo courtesy of David Yu, CC BY-NC-ND 2.0.

Bay to Breakers is the iconic San Francisco race, with tens of thousands of runners (many in costume and some in nothing) running 12km across the city. To determine their race time, each runner wears an identification bib. As you can see below, the back of the bib has a small foam rectangle with a metal foil antenna and a tiny chip underneath. The runners are tracked using a technology called RFID (Radio Frequency Identification).

The bib worn by runners in the Bay to Breakers race. At the top, behind the foam is an antenna and RFID chip used to detect the runner at the start and end of the race.

At the beginning and end of the race, the runners cross special mats that contain antennas and broadcast ultra high frequency radio signals. The runner's RFID chip detects this signal and sends back the athlete's ID number, which is programmed into the chip. By tracking these ID numbers, the system determines the time each runner took to run the race. The cool thing about these RFID chips is they are powered by the received radio signal; they don't need a battery.

Mylaps, whose name appears on the foam rectangle, is a company that supplies sports timing systems: the bibs with embedded RFID chips, the detection mats, and portable detection hardware. The detection system is designed to handle large numbers of runners, scanning more than 50 tags per second.

Removing the foam reveals an unusually-shaped metal antenna, the tiny RFID chip (the black dot above the word "DO", and the barely-visible word "Smartrac". Studying the Smartrac website reveals that this chip is the Impinj Monza 4 RFID chip, which operates in the 860-960 MHz frequency range and is recommended for sports timing.

The RFID circuit used to detect runners in the Bay to Breakers. The metal forms an antenna. The tiny black square in the center is the RFID chip.

Getting the chip off the bib was a bit tricky. I softened the bib material in Goof Off, dissolved the aluminum antenna metal with HCl and removed the adhesive with a mysterious BGA adhesive solvent I ordered from Shenzhen.

The chip itself is remarkably tiny, about the size of a grain of salt. The picture below shows the chip on a penny, as seen through a microscope: for scale, a grain of salt is by the R and the chip is on the U (in TRUST). This is regular salt, by the way, not coarse sea salt or kosher salt. I spent a lot of time trying to find the chip when it fell on my desk, since it is practically a speck.

The RFID chip used to identify runners is very small, about the size of one of the letters on a penny. A grain of salt (next to R) and the RFID chip (next to U).

In the picture above, you can see the four round contact points where the chip was connected to the antenna. There's still a blob of epoxy or something around the die, making it hard to see the details. The chip decapsulation gurus use use boiling nitric and sulfuric acids to remove epoxy, but I'm not that hardcore so I heated the chip over a stove flame. This burned off the epoxy much better than I expected, making the die clearly visible as you can see in the next photo.

I took 34 die photos using my metallurgical microscope and stitched them together to get a hi-res photo. (I described the stitching process in detail here). The result is the die photo below (click it for the large image). Surprisingly, there is no identifying name or number on the chip. However, comparing my die photo with the picture in the datasheet confirms that the chip is the Monza 4 RFID chip.

I can identify some of the chip's layout, but the chip is too dense and has too many layers for me to reverse engineer the exact details. Thus, the description that follows is slightly speculative.

Die photo of the Impinj Monza 4 RFID chip.

The four pins in the corners are where the antenna is connected. (The chip has four pins because two antennas can be used for improved detection.)

The left part of the chip is the analog logic, extracting power from the antenna, reading the transmitted signal, and modulating the return signal. The rectangles on the left are probably transistors and capacitors forming a charge pump to extract power from the radio signal (see patent 7,561,866).

The right third of the chip is so-called "random logic" that carries out the commands sent to the chip. According to the datasheet, the chip uses a digital logic finite state machine, so the chip probably doesn't have a full processor.

The 32 orderly stripes in the middle are the non-volatile memory (NVM). Above the stripes, the address decode circuitry is barely visible. The chip has 928 bits of storage (counting up the memory banks on the datasheet) so I suspect the memory is set up as a 32x29 array. Some NVM details are in patent 7307534.

Along the lower and right edges of the chip, red lines are visible; these connect chips together on the wafer for testing during manufacturing (patent 7307528).

The Impinj Monza 4 RFID chip on top of a 8751 microcontroller chip shows that the RFID chip is very small and dense.

To show how small the chip is, and how technology has changed, I put the RFID chip on top of an 8751 microcontroller die. The 8751 microcontroller is a chip in Intel's popular 8051 family dating from 1983. Note that the circuitry on the RFID chip is denser and the chip is much, much smaller. (The photo looks a bit photoshopped, but it genuinely is the RFID chip sitting on the surface of the 8751 die. I don't know why the RFID chip is pink.)

So, if you ran in the Bay to Breakers, that's the chip that tracked your time during the race. (There aren't a lot of other RFID die photos on the web, but a few are at Bunnie Studios, Zeptobars and ExtremeTech if you want to see more.)

The first programming language for the Xerox Alto was BCPL, the language that led to C. This article shows how to write a BCPL "Hello World" program using Bravo, the first WYSIWYG text editor, and run it on the Alto simulator.

The Xerox Alto is the legendary minicomputer from 1973 that helped set the direction for personal computing. Since I'm helping restore a Xerox Alto (details), I wanted to learn more about BCPL programming. (The influential Mesa and and Smalltalk languages were developed on the Alto, but those are a topic for another post.)

Using the simulator

Since the Alto I'm restoring isn't running yet, I ran my BCPL program on Salto, an Alto simulator for Linux written by Juergen Buchmueller. To build it, download the simulator source from github.com/brainsqueezer/salto_simulator, install the dependencies listed in the README, and run make. Then run the simulator with the appropriate Alto disk image:

bin/salto disks/tdisk4.dsk.Z

Here's what the simulator looks like when it's running:

The Salto simulator for the Xerox Alto.

(To keep this focused, I'm not going to describe everything you can run on the simulator, but I'll point out that pressing ? at the command line will show the directory contents. Anything ending in .run is a program you can run, e.g. "pinball".)

Type bravo to start the Bravo text editor. Press i (for insert). Enter the BCPL program:

// Hello world demo
get "streams.d"
external
[
Ws
]

let Main() be
[
Ws("Hello World!*N")
]

Here's a screenshot of the Bravo editor with the program entered:

A Xerox Alto 'Hello World' program for written in BCPL, in the Bravo editor.

Press ESC to exit insert mode.
Press p (put) to save the file.
Type hello.bcpl (the file name) and press ESC (not enter!).
Press q then ENTER to quit the editor.

Run the BCPL compiler, the linker, and the executable by entering the following commands at the prompt:

bcpl hello.bcpl
bldr/d/l/v hello
hello

If all goes well, the program will print "Hello World!" Congratulations, you've run a BCPL program.

Output of the Hello World program in BCPL on the Xerox Alto simulator.

The following figure explains the Hello World program. If you know C, the program should be comprehensible.

'Hello World' program in BCPL with explanation.

The BCPL language

The BCPL language is interesting because it was the grandparent of C. BCPL (Basic Combined Programming Language) was developed in 1966. The B language was developed in 1969 as a stripped down version of BCPL by Ken Thompson and Dennis Ritchie. With the introduction of the PDP-11, system software needed multiple datatypes, resulting in the development of the C language around 1972.

Overall, BCPL is like a primitive version of C with weirdly different syntax. The only type that BCPL supports is the 16-bit word, so it doesn't use type declarations. BCPL does support supports C-like structs and unions, including structs that can access bit fields from a word. (This is very useful for the low-level systems programming tasks that BCPL was designed for.) BCPL also has blocks and scoping rules like C, pointers along with lvalues and rvalues, and C-like looping constructs.

A BCPL program looks strange to a C programmer because many of the special characters are different and BCPL often uses words instead of special characters. Here are some differences:

Blocks are defined with [...] rather than {...}.
For array indexing, BCPL uses a!b instead of a[b].
BCPL uses resultis 42 instead of return 42.
Semicolons are optional, kind of like JavaScript.
For pointers, BCPL uses lv and rv (lvalue and rvalue) instead of & and *. rvalues.
The BCPL operator => (known as "heffalump"; I'm not making this up) is used for indirect structure references instead of C's ->.
selecton X into, instead of C's switch, but cases are very similar with fall-throughs and default.
lshift and rshift instead of << and >>.
eq, ne, ls, le, gr, ge in place of ==, !=, <, <=, >, >=.
test / ifso / ifnot instead of if / else.

A BCPL reference manual is here if you want all the details of BCPL.

More about the Bravo editor

The Bravo editor was the first editor with WYSIWYG (what you see is what you get) editing. You could format text on the screen and print the text on a laser printer. Bravo was written by Butler Lampson and Charles Simonyi in 1974. Simonyi later moved to Microsoft, where he wrote Word, based on the ideas in Bravo.

Steve Jobs saw the Alto when he famously toured Xerox Parc in 1979, and it inspired the GUI for the Lisa and Mac. However, Steve Jobs said in a commencement address, "[The Mac] was the first computer with beautiful typography. If I had never dropped in on that [calligraphy] course in college, the Mac would have never had multiple typefaces or proportionally spaced fonts. And since Windows just copied the Mac, it's likely that no personal computer would have them." This is absurd since the Alto had a variety of high-quality proportionally spaced fonts in 1974, before the Apple I was created, let alone the Macintosh.

The image below shows the Hello World program with multiple fonts and centering applied. Since the compiler ignores any formatting, the program runs as before. (Obviously styling is more useful for documents than code.)

The Bravo editor provides WYSIWYG formatting of text.

The manual for Bravo is here but I'll give a quick guide to Bravo if you want to try more editing. Bravo is a mouse-based editor, so use the mouse to select the text for editing. left click and right click under text to select it with an underline. The editor displays the current command at the top of the editing window. If you mistype a command, pressing DEL (delete) will usually get you out of it. Pressing u provides an undo.

To delete selected text, press d. To insert more text, press i, enter the text, then ESC to exit insert mode. To edit an existing file, start Bravo from the command line, press g (for get), then enter the filename and press ESC. To apply formatting, select characters, press l (look), and then enter a formatting code (0-9 to change font, b for bold, i for italics).

Troubleshooting

If your program has an error, compilation will fail with an error message. The messages don't make much sense, so try to avoid typos.

The simulator has a few bugs and tends to start failing after a few minutes with errors in the simulated disk. This will drop you into the Alto debugger, called Swat. At that point, you should restart the simulator. Unfortunately any files you created in the simulator will be lost when you exit the simulator.

If something goes wrong, you'll end up in Swat, the Xerox Alto's debugging system.

Conclusion

The BCPL language (which predates the Alto) had a huge influence on programming languages since it led to C (and thus C++, Java, C#, and so forth). The Bravo editor was the first WYSIWYG text editor and strongly influenced personal computer word processing. Using the Alto simulator, you can try both BCPL and Bravo for yourself by compiling a "Hello World" program, and experience a slice of 1970s computing history.

This post describes how we repaired the monitor from a Xerox Alto. The Alto was a revolutionary computer, designed in 1973 at Xerox PARC to investigate personal computing. Y Combinator received an Alto from computer visionary Alan Kay, and I'm helping restore this system, along with Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen. You may have seen Marc's restoration video (below) on Hacker News; my article describes the monitor repairs in more detail. (See my first article on the restoration for more background.)

The Alto display

One of the innovative features of the Alto was its 606 by 808 pixel bitmapped display, which could display high-quality proportionally spaced fonts as well as graphics. This display led to breakthroughs in computer interaction, such as the first WYSIWYG text editor, which could print formatted text on a laser printer (which Xerox had recently invented). This editor, Bravo, was written for the Alto by Charles Simonyi, who later wrote Word at Microsoft. (I discussed the Bravo editor in more detail last week.)

As you can see from the photos, the Alto's display has a somewhat unusual portrait orientation, allowing it to simulate an 8½x11 sheet of paper. Custom monitor hardware was required to support the portrait orientation, which uses 875 scan lines instead of the standard 525 lines. The Alto's monitor was based on a standard Ball Brothers computer monitor with some component values changed for the higher scan rate. This was easier than turning a standard monitor sideways and rotating everything in software.

The Xerox Alto's monitor with the case removed.

How a CRT monitor works

Since many readers may not be familiar with how CRTs work, I'll give a brief overview. The cathode ray tube (CRT) ruled television and computer displays from the 1930s until a decade or two ago, when flat panel displays finally won out. In a CRT, an electron beam is shot at a phosphor-coated screen, causing a spot on the screen to light up. The beam scans across the screen, left to right and top to bottom in a raster pattern. The beam is turned on and off, generating a pattern of dots on the screen to form the image.

The Xerox Alto's display uses a 875-line raster scan. (For simplicity, I'm ignoring interlacing of the raster scan.)

The cathode ray tube is a large vacuum tube containing multiple components, as shown below. A small electrical heater, similar to a light bulb filament, heats the cathode to about 1000°C. The cathode is an electrode that emits electrons when hot due to physics magic. The anode is positively charged with a high voltage (17,000V). Since electrons are negatively charged, they are attracted to the anode, causing a beam of electrodes to fly down the tube and slam into the screen. The screen is coated with a phosphor, causing it to glow where hit by the electron beam. A control grid surrounds the cathode; putting a negative voltage on the grid repels the electrons, reducing the beam strength and thus the brightness. Two electromagnets are arranged on the neck of the tube to deflect the beam horizontally and vertically; these are the deflection coils.

Diagram of a Cathode Ray Tube (CRT). Based on drawings by Interiot and Theresa Knott (CC BY-SA 3.0).

The components of the tube work together to control the image. The cathode voltage turns the beam on or off, allowing a spot to be displayed or not. The control grid controls the brightness of the display. The horizontal deflection coil scans the beam from left to right across the display, and then rapidly returns it to the left for the next line. The vertical deflection coil more slowly scans the beam from top to bottom, and then rapidly returns the beam to the top for the next image.

Monitors were built with the same CRT technology as televisions, but a television includes a tuner circuit to select the desired channel from the antenna. In addition, televisions have circuitry to extract the horizontal sync, vertical sync and video signals from the combined broadcast signal. These three signals are supplied to the Alto monitor separately, simplifying the circuitry. Color television is more complicated than the Alto's black and white display.

Getting the monitor operational

We started by removing the heavy metal case from the monitor, as seen below. The screen is at the bottom; the neck of the tube is hidden behind the components. The printed circuit board with most of the components is visible at the top. Unlike more modern displays that use integrated circuits, this display's circuitry is built from transistors. On the right is the power supply for the monitor, with a large transformer, capacitor, and fuse. On the left is the vertical drive transformer.

Inside the Alto's monitor. The screen is at the bottom. The power supply transformer is on the right and the vertical deflection transformer is on the left. The circuit board is at top.

We started by checking out the 55 volt power supply that runs the monitor. This power supply is a linear power supply driven from the input AC. The input transformer produces about 68 volts, which is then dropped to 55 volts by a power transistor, controlled by a regulator circuit. (A few years later, more efficient switching power supplies were common in monitors.) It took some time to find the right place to measure the voltage, but eventually we determined that the power supply was working properly, as shown below. At the bottom of the photo below, you can also see the round, reddish connector that provides high voltage to the CRT tube; this is a separate circuit from the 55V power supply. (Grammar note: I consider CRT tube to be clearer, although technically redundant.)

Testing the 55V power supply in the Xerox Alto's monitor.

At the top of the photo above, you can see the power transistor for the vertical deflection circuit. This circuit generates the vertical sweep signal, scanning from top to bottom 60 times a second. This signal is a sawtooth wave fed into the vertical deflection coil, so the beam is slowly deflected from top to bottom and then rapidly returns to the top. The vertical deflection signal is synchronized to the video input by a vertical sync input from the Alto.

The monitor circuit board (below) contains circuitry for vertical deflection, horizontal deflection, 55V power supply regulation, and video amplification. The board isn't designed for easy repair; to access the components, many connectors (lower left) must be disconnected.

The circuit board for the Xerox Alto's monitor.

The horizontal deflection circuit generates the horizontal sweep signal, scanning from left to right about 26,000 times a second. Driving the deflection coils requires a current of about 2 amps, so this circuit must switch a lot of current very rapidly. I've heard that the horizontal circuitry on the Alto has a tendency to overheat. We noticed some darkened spots on the board, but it still works.

The horizontal deflection circuit also supplies 17,000 volts to the CRT tube. This voltage is generated by the flyback transformer. Each horizontal scanline sends a current pulse into the flyback transformer, which is a step-up transformer that produces 17,000 volts on its output. (Interestingly, phone chargers also use flyback transformers, but one that produces 5 volts rather than 17,000 volts.)

The photo below shows the flyback transformer (upper right), the UFO-like structure with a thick high-voltage wire leading to the CRT. The large white cylinder is the bleeder resistor that drains the high voltage away when the monitor is not in use. This unusually large resistor is 500 megaohms and 6 watts, and is connected to the tube by a thick, red high voltage wire. The deflection coils are visible at the left, coils of red wire with white plastic on either side. The deflection coils are mounted on the outside of the tube. At the bottom is the power transistor for the horizontal circuit.

The CRT tube in the Alto's monitor, showing the deflection coils around the tube. The flyback transformer is in the upper right. The bleeder resistor is on the right.

Needless to say, extreme caution is needed when working around the monitor, due to the high voltage that is present. The high voltage circuitry can be tested by holding an oscilloscope probe a couple inches away from the flyback transformer; even at that distance, a strong signal shows up on the oscilloscope. The CRT tube also poses a danger due to the vacuum inside; if the tube breaks, it can violently implode, sending shards of glass flying. Finally, X-rays are generated by using high voltage to accelerate electrons from a cathode to hit a target, just like a CRT operates, so there is a risk of X-ray production. To guard against this, the glass screen of a CRT contains pounds of lead to shield against radiation (which makes CRT disposal an environmental problem). In addition, the monitor's circuitry guards against overvoltages that would produce excessive X-rays. The photo below shows some of the safety warnings from the monitor.

Safety warnings on the Alto monitor. CRTs pose danger from implosion, X-rays, and high voltage.

Since we don't have the Alto computer running yet, we can't use it to generate a video signal. Fortunately we obtained a display test board from Keith Hayes at the Living Computer Museum in Seattle. This board, based on a PIC 24F microcontroller, generates video, horizontal, and vertical signals to produce a simple test bar pattern on the display. We used this board to drive the monitor during testing.

Test board from the Living Computer Museum to drive the Alto's monitor.

We got a bit confused about which signal from the test board was the horizontal sync and which signal was the video. When we switched the signals around, we got a buzzing noise out of the monitor (see the video). Since the horizontal sync signal drives the high voltage power supply, we were in effect turning the power supply on and off 60 times a second with an audible effect. We eventually determined that Keith's test board was wired correctly and undid our modifications.

Even with the test board hooked up properly, the display didn't seem to be operational. But by turning off the room lights, we could see very faint bars on the display. We discovered that the monitor's brightness adjustment was flaky; just touching caused the display to flicker. Removing the variable resistor (below) and cleaning it with alcohol improved the situation somewhat. We tested an electrolytic capacitor in the brightness circuit and found it was bad, but replacing it didn't make much difference.

The brightness control on the Alto monitor. This control was flaky and needed cleaning.

At the end of our session, we had dim bars on the display, showing that the display works but not at the desired brightness. We suspect that the CRT tube is weak due to its age, so we're searching for a replacement tube. Another alternative is rejuvenation– putting high-current pulses through the tube to get the "gunk" off the cathode, extending the tube's lifetime for a while. (If anyone has a working CRT rejuvenator in the Bay Area, let us know.)

The monitor successfully displays the test bars, but very faintly.

Our work on the monitor was greatly aided by the detailed service manual. If you want more information on how the monitor works, the manual has a detailed function description and schematics. An excerpt of the schematic is shown below.

Detail of the schematic diagram for the Alto's monitor, showing the CRT. The schematic is online in the manual.

The disk controller

Our previous episode of Alto restoration ended with the surprising discovery that the Alto had the wrong disk controller card which wouldn't work with our Alto's Diablo disk drive. Fortunately, we were rescued by Al Kossow, who happened to have an extra Alto disk interface card lying around (that's Silicon Valley for you) and gave it to us. Below is a photo of the Alto disk interface card we got from Al. At the left is the edge connector that plugs into the Alto's backplane. The disk cable attaches to the connector on the right.

The Alto II Disk Control card, the interface to the Diablo drive.

Conclusion

Our efforts to get the monitor working were moderately successful. Although the monitor is dim, it functions well enough to proceed with restoring the Alto. We'll see if we can improve the brightness or obtain a new CRT tube. We will probably work on the disk drive next, as the drive is necessary for booting up the Alto. Since the drive is a complex mechanical device with precise tolerances, I expect a challenge.

I'm helping restore a Xerox Alto — a legendary minicomputer from 1973 that helped set the direction for personal computing. This post describes how we cleaned and restored the disk drive and then powered up the system. Spoiler: the drive runs but the system doesn't boot yet.

While creating the Alto, Xerox PARC invented much of the modern personal computer: everything from Ethernet and the laser printer to WYSIWYG editors with high-quality fonts. Getting this revolutionary system running again is a big effort but fortunately I'm working with a strong team: Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen, along with special guest Tim Curley from PARC.

If this article gives you deja vu, you probably saw Marc's restoration video (above) on Hacker News last week or read the earlier restoration updates: introduction, day 1, day 2.

Hard disk technology of the 1970s

For mass storage, the Alto uses a Diablo disk drive, which stores 2.5 megabytes on a removable 14 inch disk cartridge. With 1970s technology, you don't get much storage even on an inconveniently large disk, so Alto users were constantly short of disk space. The photo below shows the Xerox Alto, with the computer chassis (bottom) partially removed. Above the chassis and below the keyboard is the Diablo disk drive, which is the focus of this article.

The Xerox Alto II XM 'personal computer'. The card cage below the disk drive has been partially removed. Four cooling fans are visible at the front of it.

To insert the disk cartridge into the drive, the front of the drive swings down and the cartridge slides into place. The cartridge is an IBM 2315 disk pack, which was used by many manufacturers of the era such as DEC and HP, and contains a single platter inside the hard white protective case. The disk drive has been partially pulled out of the cabinet and the top removed, revealing the internals of the drive. During normal use, the disk drive is inside the cabinet, of course.

Inserting a hard disk into the Diablo drive.

Unlike modern hard disks, the Alto's disk is not sealed; the disk pack opens during use to provide access to the heads. To protect against contamination and provide cooling, filtered air is blown through the disk pack during use. Air enters the disk through a metal panel on the bottom of the disk (as seen below) and exits through the head opening, blowing any dust away from the disk surface.

Hard disk for the Xerox Alto, showing the air intake vent.

Although the heads are widely separated during disk pack insertion, they move very close to the disk surface during operation, floating on a cushion of air about one thousandth of a millimeter above the surface. The diagram below from the manual illustrates the danger of particles on the disk's surface. Any contamination can cause the head to crash into the disk surface, gouging out the oxide layer and destroying the disk and the head.

The Diablo disk and why contaminants are bad, from the Alto disk manual.

The magnified photo below shows the read/write head. The two air bleed holes ensure that the head is flying at the correct height above the disk surface. The long part of the cross contains the read/write coil, while the short part of the cross contains the erase coils (which erase a band between tracks).

Read/write head for the Diablo drive.

The following diagram shows how data is stored on the disk in 203 tracks (actually 203 cylinders, since there are tracks on the top and bottom surfaces). The drive moves the tiny read/write heads to the desired track. Each track is divided into 12 sectors, with 256 words of data in each sector.

Diagram of how the Diablo disk drive's read/write head stores data in tracks on the disk surface. From the Maintenance Manual.

In the photo below, we have removed the top of the disk pack revealing the hard disk inside. Note the vertical metal ring along the inside of the disk; it has twelve narrow slots that physically indicate the twelve sectors of the disk. A double slot is the index mark, indicating the first sector. To make sure the disk surface was clean, we wiped the disk surfaces clean with isopropyl alcohol. This seemed a bit crazy to me, but apparently it's a normal thing to do with disks of that era.

Inside the disk pack used by the Xerox Alto.

The photo below shows the motor spindle that rotates the hard disk at 1500 RPM. In front of the spindle, you can see the sensor that detects the slots that indicate sectors. To the left is the air duct that provides filtered airflow into the disk pack. (The air intake on the bottom of the disk pack was shown in an earlier photo.) Around the edge of the air duct is foam to provide a seal between the duct and the disk cartridge, ensuring airflow through the cartridge.

The motor spindle (center) rotates the hard disk. In front of the spindle is the sensor to detect sectors. To the left is the ventilation air duct for the disk.

After 40 years, the foam had deteriorated into mush and needed to be replaced. The foam no longer provided an airtight seal. Even worse, particles could break off the foam. If a piece of foam got blown onto the disk surface, it would probably trigger a catastrophic disk crash. To replace the foam, we used weatherstripping, which isn't standard but seemed to get the job done.

As well as replacing the foam, we vacuumed any dust out of the drive and carefully cleaned the heads and other drive components.

How the Diablo drive works

The drive itself has fairly limited logic, with most of the functionality inside the Alto. The drive can seek to a particular track, indicate the current sector, and read or write a stream of raw bits. Since there's no buffering in the disk drive, the Alto must supply every bit at the precise time based on the disk's rotation. In the Alto, microcode performs many interfacing tasks that are usually done in hardware. Instead of using DMA, the Alto's microcode moves data words one at a time to the disk interface card in the Alto, which does the serial/parallel conversion.

The Diablo drive opened for servicing.

Modern disk drives use a dense disk controller integrated circuit. The Diablo drive, in contrast, implements its limited functionality with transistors and individual chips (mostly gates and flip flops), so it requires boards of components. The photo above shows the 6 main circuit boards of the Alto, plugged into the "mother board": three on the left side and three on the right side. For ease of maintenance, the electronics assembly pops up as seen above, allowing access to the boards. The leftmost board is the analog circuitry, generating the write signals for the heads and amplifying the signals read back from the disk. You can see a wire running from the board to the read/write heads. The next board detects sector and index marks and controls the motor speed. The third board has a counter to keep track of the current sector number.

The three boards on the right perform seeks, moving the disk head to the desired track. The first board computes the difference between the previous track number and the requested track number. The next board counts tracks as the head moves to determine the distance remaining. The rightmost board controls the servo that moves the head to the right track. The seek servo has a four-speed drive, so the head moves rapidly at first and slows down as it approaches the right track, more sophistication than I expected. The Diablo drive manual has detailed schematics.

The photo below shows some of the colorful resistors and diodes on the analog read/write board, along with some transistors. Modern circuit boards would be much denser, with tightly packed surface mounted components.

Circuitry inside the Diablo 31 drive.

The head positioning mechanism is shown below. The turquoise circles rotate as the drive moves to a new track and the yellow pointer indicates the track number on the dial. The heads themselves are on the arm below (lower center). In front of the heads (bottom of the picture) is the metal bar that opens the disk pack when it is inserted.

Inside the Diablo disk drive. The heads are visible in the center. In front of them is the metal bar that opens the disk pack.

As the disk pack enters the drive, it opens up to provide access to the disk surface. The photo below shows the same mechanism as the previous photo, but from the side and with a disk inserted. You can see the exposed surface of the disk, brownish from the magnetizable iron oxide layer over the aluminum platter. As described earlier, the airflow exits the cartridge here, preventing dust from entering through this opening. The read/write head is visible above the disk's surface, with another head below the disk.

Closeup of the hard disk inside the Diablo drive. The read/write head (metal/yellow) is visible above the disk surface (brown).

The drive largely uses primitive DTL chips—diode transistor logic, an early form of digital logic, as well as some slightly more modern TTL chips. The photo below shows some of the chips on the sector counting board. The chips labeled MC858P provide four NAND gates, so there's not much logic per chip. (7651 is the date code, indicating the chip was manufactured in week 51 of 1976.)

Chips on a control board for the Diablo drive.

Conclusion

After putting the disk drive back together, we carefully powered up the system. The disk drive spun up to high speed, the heads dropped to the surface, and the disk slowed to 1500 RPM as expected. (One surprising complexity of the drive is it runs at a faster speed for a while so the airflow will blow contaminants out of the disk pack before loading the heads; it has counters and logic to implement this.) We verified that the disk surface remained undamaged, so the drive works properly, at least mechanically.

This was the first time we had powered up the Alto circuitry. Happily, nothing emitted smoke. But not surprisingly, the Alto failed to boot from the disk. Unless the Alto can read boot code from the disk (or Ethernet), nothing happens, not even a prompt on the screen. The photo below shows the disk with the ready light illuminated, and the empty screen.

The Xerox Alto's drive powered up, along with monitor (showing a white screen).

We have a long debugging task ahead of us, to trace through the Alto's logic circuits and find out what's going wrong. We're also building a disk emulator using a FPGA, so we will be able to run the Alto with an emulated disk, rather than depending on the Diablo drive to keep running. The restoration is likely to keep us busy for a while, so expect more updates. One item we are missing is the Alignment Cartridge (or C.E. Pack), a disk cartridge with specially-recorded tracks used to align the drive; let us know if you happen to have one lying around!

For updates on the restoration, follow kenshirriff on Twitter. Thanks to Al Kossow and Keith Hayes for assistance with restoration.

This post describes our continuing efforts to restore a Xerox Alto. We checked that the low-level microcode tasks are running correctly and the processor is functioning. (The Alto uses an unusual architecture that runs multiple tasks in microcode.) Unfortunately the system still doesn't boot from disk, so the next step will be to get out the logic analyzer and see exactly what's happening. Here's Marc's video of the days's session:

The Alto was a revolutionary computer, designed at Xerox PARC to investigate personal computing, introducing the GUI, Ethernet and laser printers to the world. Y Combinator received an Alto from computer visionary Alan Kay. I'm helping restore the system, along with Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen (from the IBM 1401 restoration team). For background, see my previous restoration articles: day 1, day 2, day 3.

Checking the clocks

We started by checked that all the clock signals were working properly by connecting an oscilloscope to the wirewrap pins on the computer's backplane. This took a lot of careful counting to make sure we connected to the right pins! The system clock signals are generated by an oscillator on the video display card, which isn't where I'd expect to find them. Since the clock signals control the timing of the entire system, nothing will happen if the clock is bad. Thus, checking the clock was an important first step.

At first, the clock signals all looked awful, but after finding a decent ground for the oscilloscope probes, the clock signals looked much better. We verified that the multiple clock outputs were all running nicely. We also tested the reset line to make sure it was being triggered properly - the Alto is reset by pushing a button at the back of the keyboard.

Connecting oscilloscope probes to the Xerox Alto backplane.

Microcode tasks

Next we looked at the running tasks. The Alto has 16 separate tasks running in microcode, doing everything from pushing pixels to the display to refreshing memory to moving disk words. Keep in mind that these are microcode tasks, not operating-system level tasks. The Alto was designed to reduce hardware by performing as many tasks in software as possible to reduce price and increase flexibility. The downside is the CPU can spend the majority of its time doing these tasks rather than "useful" work.

Alto task scheduling is fairly complex. Each task has a priority. When a task is ready to run, its request line is activated (by the associated hardware). The current task can offer to yield by executing the TASK function at convenient points. If there is a higher-priority task ready to run, it preempts the running task. If there's nothing better to run, task 0 runs - this task is what actually runs user code, emulating the Data General Nova instruction set.

The point of this explanation is that microcode instructions need to be running properly for task switching to happen. If the TASK function doesn't get called, the current task will run forever. And if all the task scheduling hardware isn't working right, task switching also won't happen.

Below is a picture of the microcode control board from the Alto. When you're using 1973-era chips, it takes a lot of chips to do anything. This board manages which task is running and the memory address of each task. It uses two special priority encoder chips to determine which waiting task has the highest priority. The board holds the microcode, 1024 micro-instructions of 32 bits each, using eight 1K x 4 bit PROM chips. (PROM, programmable read-only memory, is sort of like non-erasable flash memory.) The board has 8 open sockets allowing an upgrade of 1K of additional microcode to be installed. Note the tiny memory capacity of the time, just 512 bytes of storage per chip.

The microcode control board from a Xerox Alto

Since tasks can be interrupted, the board needs to store the current address of each task. It uses two i3101A RAM chips for this storage. The 3101 is historically interesting: it was the first solid state memory chip, introduced by Intel in 1969. This chip holds 64 bits as 16 words of 4 bits each. Just imagine a time when a memory chip held not gigabits but just 64 total bits.

Looking at the running tasks

The control board has a 4-bit task number available on the backplane, indicating which task is running. We hooked up the oscilloscope so we could see the running tasks. The good news is we saw the appropriate tasks running at the right intervals, with preemption working properly. The following traces show the four task number bits. Most of the time the low-priority task 0 runs (all active-low signals high). Task 12 is running in the middle. Task 8 (memory refresh) runs three times, 38.08 microseconds apart as expected. From the traces, everything seems to be functioning correctly with the task execution.

Trace of the 4-bit microcode task select lines on the Xerox Alto. Top (red) is 8, then 4, 2 and bottom (yellow) is 1 bit. Signals are active-low. Each time interval is 10 microseconds, so this shows a 100 microsecond time interval.

Seeing the running tasks is a big thing, since it shows a whole lot of the system is working properly. As explained earlier, since tasks are running and switching, the microcode processor must be fetching and executing micro-instructions correctly.

Display working better now

You may remember from the previous article that the Alto display was very, very dim and we suspected the CRT was failing. The good news is the display has steadily increased in brightness from its original very dim state, so we probably won't need to replace the CRT. We also managed to see some garbage on the screen along with a cursor, showing that RAM is storing something and the display interface is working.

The display of the Xerox Alto displaying random junk.

Boot still doesn't work

Lots of things are working at this point. The minor :-) remaining problem is the system doesn't boot. Last time, we got the disk drive working: we can put a 14-inch disk cartridge (below) in the drive, the drive spins up, and the heads load. But looking at the backplane signals, we found nothing is getting read from the disk (which explains the boot failure). The oscilloscope showed that the Alto isn't sending any commands to the disk - the Alto isn't even trying to read the disk. We checked for various hardware issues and couldn't find any problems. My suspicion is the boot code in microcode isn't running properly.

Inserting a hard disk into the Diablo drive.

A bit of explanation on the boot process: On reset, microcode task 0 handles the boot. If backspace is pressed on the keyboard, the Alto does a Ethernet boot. Otherwise it does a disk boot by setting up a disk command block in RAM. The microcode disk sector task gets triggered on each sector pulse (which we saw coming from the disk). It checks if there is a command block in RAM, and if so sends the command to the disk. When the read data comes from the disk, the disk word task copies the data into memory. At the end, the block read from disk will be executed, performing the disk boot. So three microcode tasks need to cooperate to boot from disk.

Since we're seeing no command sent to the disk, something must be going wrong between task 0 setting up the command block in RAM and the sector task executing the command block. There's a lot that needs to go right here. If anything is wrong in the ALU or RAM has problems, the command block will get corrupted. And then no disk operation will happen.

Conclusion

The next step is to use a logic analyzer to see exactly what is running, instruction by instruction. By looking at the microcode address lines, we will be able to see what code is executing and where things go wrong. Then we can probe the memory bus to see if RAM is the problem, and look at the ALU to see if it is causing the problem. This is where debugging will get more complex.

I've studied the microcode and it is very bizarre. (You can see the source here.) Instructions are in random order in the PROM, what an instruction does depends on what task is running, branches happen when a device board flips address bits on the bus, and some bits in the PROM are inverted for no good reason (probably to save an inverter chip somewhere). So looking at the microcode can be mind-bending. But hopefully with the logic analyzer we can narrow the problem down. We can also use the Living Computer Museum's simulator to cross-check against what microcode should be running.

For updates on the restoration, follow kenshirriff on Twitter.

The BeagleBone is a inexpensive, credit-card sized computer with many I/O pins. These pins can be easily controlled from software, but it can be very mysterious what is really happening. To control a general purpose input/output (GPIO) pin, you simply write a character to a special file and the pin turns on or off. But how do these files control the hardware pins? In this article, I dig into the device drivers and hardware and explain exactly what happens behind the scenes. (Various web pages describe the GPIO pins, but if you just want a practical guide of how to use the GPIO pins, I recommend the detailed and informative book Exploring BeagleBone.)

This article focuses on the BeagleBone Black, the popular new member of the BeagleBoard family. If you're familiar with the Arduino, the BeagleBone is much more complex; while the Arduino is a microcontroller, the BeagleBone is a full computer running Linux. If you need more than an Arduino can easily provide (more processing, Ethernet, WiFi), the BeagleBone may be a good choice.

Beaglebone Black single-board computer. Photo by Gareth Halfacree, CC BY-SA 2.0

The BeagleBone uses the Sitara AM3358 processor chip running at 1 GHz - this is the thumbnail-sized chip in the center of the board above. This chip is surprisingly complicated; it appears they threw every feature someone might want into the chip. The diagram below shows what is inside the chip: it includes a 32-bit ARM processor, 64KB of memory, a real-time clock, USB, Ethernet, an LCD controller, a 3D graphics engine, digital audio, SD card support, various networks, I/O pins, an analog-digital converter, security hardware, a touch screen controller, and much more.[1] To support real-time applications, the Sitara chip also includes two separate 32-bit microcontrollers (on the chip itself - processors within processors!).

Functional diagram of the complex processor powering the BeagleBone Black. The TI AM3358 Sitara processor contains many functional units. Diagram from Texas Instruments.

The main document that describes the Sitara chip is the Technical Reference Manual, which I will call the TRM for short. This is a 5041 page document describing all the feature of the Sitara architecture and how to control them. But for specifics on the AM3358 chip used in the BeagleBone, you also need to check the 253 page datasheet. I've gone through these documents, so you won't need to, but I reference the relevant sections in case you want more details.

Using a GPIO pin

The chip's pins can be accessed through two BeagleBone connectors, called P8 and P9. To motivate the discussion, I'll use an LED connected to GPIO 49, which is on pin 23 of header P9 (i.e. P9_23). (How do you know this pin is GPIO 49? I explain that below.) Note that the Sitara chip's outputs can provide much less current than the Arduino, so be careful not to overload the chip; I used a 1K resistor.

To output a signal to this GPIO pin, simply write some strings to some files:

# echo 49 > /sys/class/gpio/export
# echo out > /sys/class/gpio/gpio49/direction
# echo 1 > /sys/class/gpio/gpio49/value

The first line causes a directory for gpio49 to be created. The next line sets the pin mode as output. The third line turns the pin on; writing 0 will turn the pin off.

This may seem like a strange way to control the GPIO pins, using special file system entries. It's also strange that strings ("49", "out", and "1") are used to control the pin. However, the file-based approach fits with the Unix model, is straightforward to use from any language, avoids complex APIs, and hides the complexity of the chip.

The file-system approach is very slow, capable of toggling a GPIO pin at about 160 kHz, which is rather awful performance for a 1 GHz processor.[2] In addition, these oscillations may be interrupted for several milliseconds if the CPU is being used for other tasks, as you can see from the image below. This is because file system operations have a huge amount of overhead and context switches before anything gets done. In addition, Linux is not a real-time operating system. While the processor is doing something else, the CPU won't be toggling the pins.

An I/O pin can be toggled to form an oscillator. Unfortunately, oscillations can stop for several milliseconds if the processor is doing something else.

The moral here is that controlling a pin from Linux is fine if you can handle delays of a few milliseconds. If you want to produce an oscillation, don't manually toggle the pin, use a PWM (pulse width modulator) output instead. If you need more complex real-time outputs, use one of the microcontrollers (called a PRU) inside the processor chip.

What's happening internally with GPIOs?

Next, I'll jump to the low level, explaining what happens inside the chip to control the GPIO pins. A GPIO pin is turned on or off when a chip register at a particular address is updated. To access memory directly, you can use devmem2, a simple program that allows a memory register to be read or written. Using the devmem2 program, we can turn the GPIO pin first on and then off by writing a value to the appropriate register (assuming the pin has been initialized). The first number is the address, and the second number is value written to the address.

devmem2 0x4804C194 w 0x00020000
devmem2 0x4804C190 w 0x00020000

What are all these mystery numbers? GPIO 49 is also known as GPIO1_17, the 17th pin in the GPIO1 bank (as will be explained later). The value written to the register, 0x00020000, is a 1 bit shifted left 17 positions (1<<17) to control pin 17. The address 4804C194 is the address of the register used to turn on a GPIO pin. Specifically it is SETDATAOUT register for the 32 GPIO1 pins. By writing a bit to this register, the corresponding pin is turned on. Similarly, the address 4804C190 is the address of the CLEARDATAOUT register, which turns a pin off.

How do you determine these register addresses? It's all defined in the TRM document if you know where to look. The address 4804C000 is the starting address of GPIO1's registers (see Table 2-3 in the TRM). The offset 194h is the offset of the GPIO_SETDATAOUT register, and 190h is the GPIO_CLEARDATAOUT register (see TRM 25.4.1). Combining these, we get the address 4804C194 for GPIO1's SETDATAOUT and 4804C190 for CLEARDATAOUT. Writing a 1 bit to the SETDATAOUT register will set that GPIO, while leaving others unchanged. Likewise, writing a 1 bit to the CLEARDATOUT register will clear that GPIO. (Note that this design allows the selected register to be modified without risk of modifying other GPIO values.)

Putting this all together, writing the correct bit pattern to the address 0x4804C194 or 0x4804C190 turns the GPIO pin on or off. Even if you use the filesystem API to control the pin, these registers are what end up controlling the pin.

How does devmem2 work? It uses Linux's /dev/mem, which is a device file that is an image of physical memory. devmem2 maps the relevant 4K page of /dev/mem into the process's address space and then reads or writes the address corresponding to the desired physical address.

By using mmap and toggling the register directly from a C++ program, the GPIO can be toggled at about 2.8 MHz, much faster than the device driver approach.[3] If you think this approach will solve your problems, think again. As before, there are jitter and multi-millisecond dropouts if there is any other load on the system.

The names of pins

Confusingly, each pin has multiple names and numbers. For instance, pin 23 on header 9 is GPIO1_17, which is GPIO 49, which is pin entry 17 (unrelated to GPIO1_17) which is pin V14 on the chip itself. Meanwhile, the pin has 7 other functions including GMPC_A1, which is the name used for the pin in the documentation. This section explains the different names and where they come form. (If you just want to know the pin names, see the diagrams P8 header and P9 header from Exploring Beaglebone.)

The first complication is that each pin has eight different functions since the processor has many more internal functions than physical pins.[4] (The Sitara chip has about 639 I/O signals, but only 124 output pins available for these signals.) The solution is that each physical pin has eight different internal functions mapped to it. For each pin, you select a mode 0 through 7, which selects which of the eight possible functions you want.

Figuring out "from scratch" which functions are on each BeagleBone header pin is tricky and involves multiple documents. To start, you need to determine what functions are on each physical pin (actually ball) of the chip. Table 4-1 in the chip datasheet shows which signals are associated with each physical pin (see the "ZCZ ball number"). (Or see section 4.3 for the reverse mapping, from the signal to the pin.) Then look at the BeagleBone schematic to see the connection from each pin on the chip (page 3) to an external connector (page 11).

For instance, consider the GPIO pin used earlier. With the ZCZ chip package, chip ball V14 is called GPMC_A1 (in mode 0 this pin is used for the General Purpose Memory Controller). The Beaglebone documentation names the pin with the the more useful mode 7 name "GPIO1_17", indicating GPIO 17 in GPIO bank 1. (Each GPIO bank (0-3) has 32 pins. Bank 0 is numbered 0-31, bank 1 is 32-63, and so forth. So pin 17 in bank 1 is GPIO number 32+17=49.) The schematic shows this chip ball is connected to header P9 pin 23, earning it the name P9_23. This name is relevant when connecting wires to the board. It is also the name used in BeagleScript (discussed later).

Another pin identifier is the sequential pin number based on the the pin control registers. (This number is indicated under the column $PINS in the header diagrams.) Table 9-10 in the TRM lists all the pins in order along with their control address offsets. Inconveniently, the pin names are the mode 0 names (e.g. conf_gpmc_a1), not the more relevant names (e.g. gpio1_17). From this table, we can determine that the offset of con_gpmc_a1 is 844h. Counting the entries (or computing (844h-800h)/4) shows that this is pin 17 in the register list. (It's entirely coincidental that this number 17 matches GPIO1_17). This number is necessary when using the pin multiplexer (described later).

How writing to a file toggles a GPIO

When you write a string to /sys/class/gpio/gpio49/value, how does that end up modifying the GPIO? Now that some background has been presented, this can be explained. At the hardware level, toggling a GPIO pin simply involves setting a bit in a control register, as explained earlier. But it takes thousands of instructions in many layers to get from writing to the file to updating the register and updating the GPIO pin.

The write goes through the standard Linux file system code (user library, system call code, virtual file system layer) and ends up in the sysfs virtual file system. Sysfs is an in-memory file system that exposes kernel objects through virtual files. Sysfs will dispatch the write to the gpio device driver, which processes the request and updates the appropriate GPIO control register.

In more detail, the /sys/class/gpio filesystem entries are provided by the gpio device driver (documentation for sysfs and gpio). The main gpio driver code is gpiolib.c. There are separate drivers for many different types of GPIO chips; the Sitara chip (and the related OMAP family) uses gpio-omap.c. The Linux code is in C, but you can think of gpiolib.c as the superclass, with subclasses for each different chip. C function pointers are used to access the different "methods".

The gpiolib.c code informs sysfs of the various files to control GPIO pin attributes (/active_low, /direction/, /edge, /value), causing them to appear in the file system for each GPIO pin. Writes to the /value file are linked to the function gpio_value_store, which parses the string value, does error checking and calls gpio_set_value_cansleep, which calls chip->set(). This function pointer is where handling switches from the generic gpiolib.c code to the device-specific gpio-omap.c code. It calls gpio_set, which calls _set_gpio_dataout_reg, which determines the register and bit to set. This calls raw_writel, which is inline ARM assembler code to do a STR (store) instruction. This is the point at which the GPIO control register actually gets updated, changing the GPIO pin value.[5] The key point is that a lot of code runs to get from the file system operation to the actual register modification.

How does the code know the addresses of the registers to update? The file gpio-omah.h contains constants for the GPIO register offsets. Note that these are the same values we used earlier when manually updating the registers.

#define OMAP4_GPIO_CLEARDATAOUT  0x0190
#define OMAP4_GPIO_SETDATAOUT  0x0194

But how does the code know that the registers start at 0x4804C000? And how does the system know this is the right device driver to use? These things are specified, not in the code, but in a complex set of configuration files known as Device Trees, explained next.

Device Trees

How does the Linux kernel know what features are on the BeagleBone? How does it know what each pin does? How does it know what device drivers to use and where the registers are located? The BeagleBone uses a Linux feature called the Device Tree, where the hardware configuration is defined in a device tree file.

Linux used to define the hardware configuration in the kernel. But each new ARM chip variant required inconvenient kernel changes, which led to Linus Torvald's epic rant on the situation. The solution was to move this configuration out of kernel code and into files known as the Device Tree, which specifies the hardware associated with the device. This switch, in the 3.8 kernel, is described here.

As if device trees weren't complex enough, the next problem was that BeagleBone users wanted to change pin configuration dynamically. The solution to this was device tree overlays, which allow device tree files to modify other device tree files. With a device tree overlay, the base hardware configuration can be modified by configuration in the overlay file. Then the Capemgr was implemented to dynamically load device tree fragments, to manage the installation of BeagleBoard daughter cards (known as capes).

I won't go into the whole explanation of device trees, but just the parts relevant to our story. For more information see A Tutorial on the Device Tree, Device Tree for Dummies, or Introduction to the BeagleBone Black Device Tree.

The BeagleBone's device tree is defined starting with am335x-boneblack.dts, which includes am33xx.dtsi and am335x-bone-common.dtsi.

The relevant lines are in am33xx.dtsi:

gpio1: gpio@44e07000 {
  compatible = "ti,omap4-gpio";
  ti,hwmods = "gpio1";
  gpio-controller;
  interrupt-controller;
  reg = <0x44e07000 0x1000>;
  interrupts = <96>;
};

The "compatible" line is very important as it causes the correct device driver to be loaded. While processing a device tree file, the kernel checks all the device drivers to find one with a matching "compatible" line. In this case, the winning device driver is gpio-omap.c, which we looked at earlier.

The other lines of interest specify the register base address 44e07000. This is how the kernel knows where to find the necessary registers for the chip. Thus, the Device Tree is the "glue" that lets the device drivers in the kernel know the specific details of the registers on the processor chip.

BoneScript

One easy way to control the BeagleBone pins is by using JavaScript along with BoneScript, a Node.js library. The BoneScript API is based on the Arduino model. For instance, the following code will turn on the GPIO on pin 23 of header P9.

var b = require('bonescript');
b.pinMode('P9_23', b.OUTPUT);
b.digitalWrite('P9_23', b.HIGH);

Using BoneScript is very slow: you can toggle a pin at about 370 Hz, with a lot of jitter and the occasional multi-millisecond gap. But for programs that don't require high speed, BoneScript provides a convenient programming model, especially for applications with web interaction.

For the most part, the BoneScript library works by reading and writing the file system pseudo-devices described earlier. You might expect BoneScript to have some simple code to convert the method calls to the appropriate file system operations, but BoneScript is surprisingly complex. The first problem is BoneScript supports different kernel versions with different cape managers and pin multiplexers, so it implements everything four different ways (source).

A bit surprise is that BoneScript generates and installs new device tree files as a program is running. In particular, for the common 3.8.13 kernel, BoneScript creates a new device tree overlay (e.g. /lib/firmware/bspm_P9_23_2f-00A0.dts) on the fly from a template, runs it through the dtc compiler and installs the device tree overlay through the cape manager by writing to /sys/devices/bone_capemgr.N/slots (source). That's right, when you do a pinMode() operation in the code, BoneScript runs a compiler!

Conclusion

The BeagleBone's GPIO pins can be easily controlled through the file system, but a lot goes on behind the scenes, making it very mysterious what is actually happening. Examining the documentation and the device drivers reveals how these file system writes affect the pins by writing to various control registers.[6] Hopefully after reading this article, the internal operation of the Beaglebone will be less mysterious.

Notes and references

[1] Many of the modules of the Sitara chip have cryptic names. A brief explanation of some of them, along with where they are described in the TRM:

PRU-ICSS (Programmable Real-Time Unit / Industrial Communication Subsystem, chapter 4): this is the two real-time microcontrollers included in the chip. They contain their own modules.
Industrial Ethernet is an extension of Ethernet protocols for industrial environments that require real-time, predictable communication. The Industrial Ethernet module provides timers and I/O lines that can be used to implement these protocols.
SGX (chapter 5) is a 2D/3D graphics accelerator.
GPMC is the general-purpose memory controller. OCMC is the on-chip memory controller, providing access to 64K of on-chip RAM. EMIF is the external memory interface. It is used on the BeagleBone to access the DDR memory. ELM (error location module) is used with flash memory to detect errors. Memory is discussed in chapter 7 of the TRM.
L1, L2, L3, L4: The processor has L1 instruction and data caches, and a larger L2 cache. The L3 interconnect provides high-speed communication between many of the modules of the processor using a "NoC" (Network on Chip) infrastructure. The slower L4 interconnect is used for communication with low-bandwidth modules on the chip. (See chapter 10 of the TRM.)
EDMA (enhanced direct memory access, chapter 11) provides transfers between external memory and internal modules.
TS_ADC (touchscreen, analog/digital converter, chapter 12) is a general-purpose analog to digital converter subsystem that includes support for resistive touchscreens. This module is used on the BeagleBone for the analog inputs.
LCD controller (chapter 13) provides support for LCD screens (up to 2K by 2K pixels). This module provides many signals to the TDA19988 HDMI chip that generates the BeagleBone's HDMI output.
EMAC (Ethernet Media Access Controller, chapter 14) is the Ethernet subsystem. RMII (reduced media independent interface) is the Ethernet interface used by the BeagleBone. MDIO (management data I/O) provides control signals for the Ethernet interface. GMII and RGMII are similar for gigabit Ethernet (unused on the BeagleBone).
The PWMSS (Pulse-width modulation subsystem, chapter 15) contains three modules. The eHRPWM (enhanced high resolution pulse width modulator) generates PWM signals, digital pulse trains with selectable duty cycle. These are useful for LED brightness or motor speed, for instance. These are sometimes called analog outputs, although technically they are not. This subsystem also includes the eCAP (enhanced capture) module which measures the time of incoming pulses. eQEP (Enhanced quadrature encoder pulse) module is a surprisingly complex module to process opitcal shaft encoder inputs (e.g. an optical encoder disk in a mouse) to determine its rotation.
MMC (multimedia card, chapter 18) provides the SD card interface on the BeagleBone.
UART (Universal Asynhronous Receiver/Transmitter, chapter 19) handles serial communication.
I2C (chapter 21) is a serial bus used for communication with devices that support this protocol.
The McASP (Multichannel audio serial port, chapter 22) provides digital audio input and output.
CAN (controller area network, chapter 23) is a bus used for communication on vehicles.
McSPI (Multichannel serial port interface, chapter 24) is used for serial communication with devices supporting the SPI protocol.

[2] The following C++ code uses the file system to switch GPIO 49 on and off. Remove the usleeps for maximum speed. Note: this code omits pin initialization; you must manually do "echo 49 > /sys/class/gpio/export" and "echo out > /sys/class/gpio/gpio49/direction".

#include <unistd.h>
#include <fcntl.h>

int main() {
  int fd =  open("/sys/class/gpio/gpio49/value", O_WRONLY);
  while (1) {
    write(fd, "1", 1);
    usleep(100000);
    write(fd, "0", 1);
    usleep(100000);
  }
}

[3] Here's a program based on devmem2 to toggle the GPIO pin by accessing the register directly. The usleeps are delays to make the flashing visible; remove them for maximum speed. For simplicity, this program does not set the pin directions; you must do that manually.

#include <fcntl.h>
#include <sys/mman.h>

int main() {
    int fd = open("/dev/mem", O_RDWR | O_SYNC);
    void *map_base = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED,
            fd, 0x4804C000);
    while (1) {
        *(unsigned long *)(map_base + 0x194) = 0x00020000;
        usleep(100000);
        *(unsigned long *)(map_base + 0x190) = 0x00020000;
        usleep(100000);
    }
}

For more on this approach, see BeagleBone Black GPIO through /dev/mem.

[4] Selecting one of the eight functions for each pin is done through the pin multiplexer (pinmux) which uses the pinctrl-single device driver. Pin usage is defined in the device tree. The details of this are complex and change from kernel to kernel. See GPIOs on the Beaglebone Black using the Device Tree Overlays or BeagleBone and the 3.8 Kernel for details.

[5] The function names tend to change in different kernel versions. I'm describing the Linux 3.8 kernel code.

[6] I've discussed just the GPIO pins, but other pins (LED, PWM, timer, etc) have similar file system entries, device drivers, and device tree entries.

The BeagleBone Black is an inexpensive, credit-card sized computer that has two built-in microcontrollers called PRUs. While the PRUs provide the real-time processing capability lacking in Linux, using these processors has a learning curve. In this article, I show how to run a simple program on the PRU, and then dive into the libraries and device drivers to show what is happening behind the scenes.

The BeagleBone uses the Sitara AM3358, an ARM processor chip running at 1 GHz—this is the thumbnail-sized chip in the center of the board below. If you want to perform real-time operations, the BeagleBone's ARM processor won't work well since Linux isn't a real-time operating system. However, the Sitara chip contains two 32-bit microcontrollers, called Programmable Realtime Units or PRUs. (It's almost fractal, having processors inside the processor.) By using a PRU, you can achieve fast, deterministic, real-time control of I/O pins and devices.

The BeagleBone computer is tiny. For some reason it was designed to fit inside an Altoids mint tin. The square thumbnail-sized chip in the center is the AM3358 Sitara processor chip. This chip contains an ARM processor as well as two 32-bit microcontrollers called PRUs.

The nice thing about using the PRU microcontrollers on the BeagleBone is that you get both real-time Arduino-style control, and "real computer" features such as a web server, WiFi, Ethernet, and multiple languages. The main processor on the BeagleBone can communicate with the PRUs, giving you the best of both worlds. The downside of the PRUs is there's a significant learning curve to use them since they have their own instruction set and run outside the familiar Linux world. Hopefully this article will help with the learning curve.

I wrote an article a couple weeks ago on The BeagleBone's I/O pins. That article discussed the pins controlled by the ARM processor, while this article focuses on the PRU microcontroller. If you think you've seen the present article before, they cover two different things, but I won't blame you for getting deja vu!

Running a "blink" program on the PRU

To motivate the discussion, I'll use a simple program that uses the PRU to flash an LED. This example is based on PRU GPIO example

Blinking an LED using the BeagleBone's PRU microcontroller.

The easiest way to compile and assemble PRU code is to do it on the BeagleBone, since the necessary tools are installed by default (at least if you get the Adafruit BeagleBone). Perform the following steps on the BeagleBone.

Connect an LED to BeagleBone header P8_11 through a 1K resistor.
Download the assembly code file blink.p:
Download the host file to load and run the PRU code, loader.c.
Download the device tree file, /lib/firmware/PRU-GPIO-EXAMPLE-00A0.dts.

Compile and install the device tree file to enable the PRU:

# dtc -O dtb -I dts -o /lib/firmware/PRU-GPIO-EXAMPLE-00A0.dtbo -b 0 -@ PRU-GPIO-EXAMPLE-00A0.dts
# echo PRU-GPIO-EXAMPLE > /sys/devices/bone_capemgr.?/slots
# cat /sys/devices/bone_capemgr.?/slots

Assemble blink.p and compile the loader:

# pasm -b blink.p
# gcc -o loader loader.c -lprussdrv

Run the loader to execute the PRU binary:
```
# ./loader blink.bin
```

If all goes well, the LED should blink 10 times.[1]

Documentation

The most important document that describes the Sitara chip is the 5041-page Technical Reference Manual (TRM for short). This article references the TRM where appropriate, if you want more information. Information on the PRU is inconveniently split between the TRM and the AM335x PRU-ICSS Reference Guide. For specifics on the AM3358 chip used in the BeagleBone, see the 253 page datasheet. Texas Instruments' has the PRU wiki with more information.

If you're looking to use the BeagleBone and/or PRU I highly recommend the detailed and informative book Exploring BeagleBone. Helpful web pages on the PRU include BeagleBone Black PRU: Hello World, Working with the PRU and BeagleBone PRU GPIO example.

The assembly code

If you're familiar with assembly code, the PRU's instruction set should be fairly straightforward. The instruction set is documented on the wiki and in the PRU reference manual (section 5).[2]

The demonstration blink code uses register R1 to count the number of blinks (10) and register R0 to provide the delay between flashes. The delay timing illustrates an important feature of the PRU: it is entirely deterministic. Most instructions takes 5 nanoseconds (at 200 MHz), although reads are slower. There is no pipelining, branch delays, memory paging, interrupts, scheduling, or anything else that interferes with instruction execution. This makes PRU execution predictable and suitable for real-time processing. (In contrast, the main ARM processor has all the scheduling and timing issues of Linux, making it unsuitable for real-time processing.)

Since the delay loop is two instructions long and executes 0xa00000 times (an arbitrary value that gave a visible delay), we can compute that each delay is 104.858 milliseconds, regardless of what else the processor is doing. (This is shown in the oscilloscope trace below.) A similar loop on the ARM processor would have variable timing, depending on system load.

LED output from the BeagleBone PRU demo, showing the 104ms oscillations.

The I/O pin

How do we know that pin P8_11 is controlled by bit 15 of R30? We're using one of the PRU output pins called "pr1_pru0_pru_r30_15", which is a PRU 0 output pin controlled by R30 bit 15. (Note that the PRU GPIO pins are separate from the "regular" GPIO pins.)[3] The BeagleBone header chart shows that this PRU I/O pin is connected to BeagleBone header pin P8_11.[4] To tell the BeagleBone we want to use this pin with the PRU, the device tree configuration is updated as discussed below.

Communication between the main processor and the PRU

The PRU microcontrollers are part of the processor chip, yet operate independently. In the blink demo, the main processor runs a loader program that downloads the PRU code to the microcontroller, signals the PRU to start executing, and then waits for the PRU to finish. But what happens behind the scenes to make this work? The PRU Linux Application Library (PRUSSDRV) provides the API between code running under Linux on the BeagleBone and the PRU.[5]

The prussdrv_exec_program() API call loads a binary program (in our case, the blink program) into the PRU so it can be executed. To understand how this works requires a bit of background on how memory is set up.

Each PRU microcontroller has 8K (yes, just 8K) of instruction memory, starting at address 0. Each PRU also has 8K of data memory, starting at address 0 (see TRM 4.3).[6] Since the main processor can't access all these memories at address 0, they are mapped into different parts of the main processor's memory as defined by the "global memory map".[7] For instance, PRU0's instruction memory is accessed by the host starting at physical address 0x4a334000. Thus, to load the binary code into the PRU, the API code copies the PRU binary file to that address.[8]

Then, to start execution, the API code starts the PRU running by setting the enable bit in the PRU's control register, which the host can access at a specific physical address (see TRM 4.5.1.1).

To summarize, the loader program running on the main processor loads the executable file into the PRU by writing it to special memory addresses. It then starts up the PRU by writing to another special memory address that is the PRU's control register.

The Userspace I/O (UIO) driver

At this point, you may wonder how the loader program can access to physical memory and low-level chip registers - usually is not possible from a user-level program. This is accomplished through UIO, the Linux Userspace I/O driver (details). The idea behind a UIO driver is that many devices (the PRU in our case) are controlled through a set of registers. Instead of writing a complex kernel device driver to control these registers, simply expose the registers to user code, and then user-level code can access the registers to control the device. The advantage of UIO is that user-level code is easier to write and debug than kernel code. The disadvantage is there's no control over device usage—buggy or malicious user-level code can easily mess up the device operation.[9]

For the PRU, the uio_pruss driver provides UIO access. This driver exposes the physical memory associated with the PRU as a device /dev/uio0. The user-level PRUSSDRV API library code does a mmap on this device to map the address space into its own memory. This gives the loader code access to the PRU memory through addresses in its own address space. The uio_pruss driver exposes information on the mapping to the user-level code through the directory /sys/class/uio/uio0/maps.

Another photo of the BeagleBone Black single-board computer inside an Altoids mint tin. At this point in the article, you may need a break from the text.

Device Tree

How does the uio_pruss driver know the address of the PRU? This is configured through the device tree, a complicated set of configuration files used to configure the BeagleBone hardware. I won't go into the whole explanation of device trees, but just the parts relevant to our story.[10] The pruss entry in the device tree is defined in am33xx.dtsi. Important excerpts are:

pruss: pruss@4a300000 {
 compatible = "ti,pruss-v2";
 reg = <0x4a300000 0x080000>;
 status = "disabled";
 interrupts = <20 21 22 23 24 25 26 27>;
};

The address 4a300000 in the device tree corresponds to the PRU's instruction/data/control space in the processor's address space. (See TRM Table 2-4 and section 4.3.2.) This address is provided to the uio_pruss driver so it knows what memory to access.

The "ti,pruss-v2" compatible line causes the matching uio_pruss driver to be loaded. Note that the status is "disabled", so the PRU is disabled by default until enabled in a device tree overlay.

The "interrupts" line specifies the eight ARM interrupts that are handled by the driver; these will be discussed in the Interrupts section below.

Device tree overlay

To enable the PRU and the GPIO output pin, we needed to load the PRU-GPIO-EXAMPLE-00A0.dts device tree overlay file.[11] This file is discussed in detail at credentiality, so I'll just give some highlights.

As described earlier, each header pin has eight different functions, so we need to tell the pin multiplexer driver which of the eight modes to use for the pin P8_11. Looking at the P8 header diagram, pin P8_11 has internal address 0x34 and has the PRU output function in mode 6. This is expressed in the overlay file:

exclusive-use = "P8.11", "pru0";
...
    target = <&am33xx_pinmux>;
...
       pinctrl-single,pins = <
         0x34 0x06

The above entry extends the am33xx_pinmux device tree description and is passed to the pinctrl-single driver, which configures the pins as desired by updating the appropriate chip control registers.

The device tree file also overrides the previous pruss entry, changing the status to "okay", which enables the uio_pruss device driver:

    target = <&pruss>;
    __overlay__ {
      status = "okay";

The cape manager

This device tree overlay was loaded with the command "echo PRU-GPIO-EXAMPLE > /sys/devices/bone_capemgr.?/slots". What does this actually do?

The idea of the cape manager is if you plug a board (called a cape) onto the BeagleBone, the cape manager reads an EEPROM on the board, figures out what resources the board needs, and automatically loads the appropriate device tree fragments (from /lib/firmware) to configure the board (source, details, documentation). The cape manager is configured in the device tree file am335x-bone-common.dtsi.

Examples of BeagleBone capes. Image from beaglebone.org, CC BY-SA 3.0.

In our case, we aren't installing a new board, but just want to manually install a new device tree file. The cape manager exposes a bunch of files in /sys/devices/bone_capemgr.*. Sending a name to the file "slots" causes the cape manager to load the corresponding device tree file. This enables the PRU and the desired output pin.

Interrupts

Interrupts can be used to communicate between the PRU code and the host code. At the end of the PRU blink code, it sends an interrupt to the host by writing 35 to register R31. This wakes up the loader program, which is waiting with the API call "prussdrv_pru_wait_event(PRU_EVTOUT_0)". This process seems like it should be straightforward, but it's surprisingly complex. In this section, I'll explain how PRU interrupts work.

I'll start with the host (loader program) side. The PRU can send eight different events to the host program. The prussdrv_pru_wait_event(N) call waits for event N (0 through 7) by reading from /dev/uioN. This read will block until an interrupt happens, and then return. (This seems like a strange way to receive interrupts, but using the file system fits the Unix philosophy.)[12]

If you look in the file system, there are 8 uio files: /dev/uio0 through /dev/uio7. You might expect /dev/uio1 provides access to PRU1, but that's not the case. Instead, there are 8 event channels between the PRUs and the main processor, allowing 8 different types of interrupts. Each one of these has a separate /dev/uioN file; reading from the file waits for the corresponding event, labeled PRU_EVTOUT0 through PRU_EVTOUT7.

Jumping now to the PRU side, the interrupt is triggered by a write to R31. Register R31 is a special register - writing to it sends an event to the interrupt system.[13] The blink code generates PRU internal event 3 (with the crazy name pr1_pru_mst_intr[3]_intr_req), which is mapped to system event 19.

The following diagram shows how an event (right) gets mapped to a channel, then a host, and finally generates an interrupt (left). In our case, system event 19 from the PRU is given the label PRU0_ARM_INTERRUPT. This interrupt is mapped to channel 2, then host 2 which goes to ARM interrupt PRU_EVTOUT0, which is ARM interrupt 20 (see TRM 6.3). The device tree configured the pruss_uio driver to handle events 20 through 27, so the driver receives the interrupt and unblocks /dev/uio0, informing the host process of the PRU event.

Interrupt handling on the BeagleBone between the PRU microcontrollers and the ARM processor. System events (right) are mapped to channels, then hosts, finally generating interrupts (left).

How does the above complex mapping get set up? You might guess the device tree, but it's done by the PRUSSDRV user-level library code. The interrupt configuration is defined in PRUSS_INTC_INITDATA, a struct that defines the various mappings shown above. This struct is passed to prussdrv_pruintc_init(), which writes to the appropriate PRU interrupt registers (mapped into memory).[14]

Next steps

A real application would use a host program much more powerful than the example loader. For instance, the host program could run a web server and actively communicate with the PRU. Or the host program could do computation on data received from the PRU. Such a system uses the ARM processor to get all the advantages of Linux, and the PRU microcontrollers to perform real-time activities impossible in Linux.

It's easier to program the PRU in C using an IDE. That's a topic for another post, but for now you can look here, here, here or here for information on using the C compiler.

Conclusion

The PRU microcontrollers give the BeagleBone real-time, deterministic processing. Combining the BeagleBone's full Linux system with the PRU microcontrollers yields a very powerful system. The Linux side can provide powerful processing, network connectivity, web servers, and so forth, while the PRUs can interface with the external world. If you're using an Arduino but want more (Ethernet, web connectivity, USB, WiFi), the BeagleBone is an alternative to consider.

There is a significant learning curve to using the PRU microcontrollers, however. Much of what happens may seem like magic, hidden behind APIs and device tree files. Hopefully this article has filled in the gaps, explaining what happens behind the scene.

Notes and references

[1] If you get the error "prussdrv_open() failed" the problem is the PRU didn't get enabled. Make sure you loaded the device tree file into the cape manager.

[2] One of the instructions in the PRUs instruction set is LBBO (load byte burst), which reads a block of memory into the register file. That sounded to me like an obscure instruction I'd never need, but it turns out that this instruction is used for any memory reads. So don't make my mistake of ignoring this instruction!

[3] To understand the pin name "pr1_pru0_pru_r30_15": pru0 indicates the pin is controlled by PRU 0 (rather than PRU 1), r30 indicates the pin is an output pin, controlled by the output register R30 (as opposed to the input register R31), and 15 indicates I/O pin 15, which is controlled by bit 15 of the register. A mnemonic to remember that R30 is the output register and R31 is the input register: 0 looks like O for output, 1 looks like I for input.

[4] Most PRU pins conflict with HDMI pins, so you need to disable HDMI to use them. To avoid this problem, I picked one of the few non-conflicting PRU pins. Disabling HDMI is described here and here.

[5] The PRU library is documented in PRU Linux Application Loader API Guide. The library is often installed on the BeagleBone by default but can also be downloaded in the am335x_pru_package along with PRU documentation and some coding examples. Library source code is here.

[6] The PRU, like many microcontrollers, uses a Harvard architecture, with independent memory for code and data.

[7] In the main processor's address space, PRU0's data RAM starts at address 0x4a300000, PRU1's data RAM starts at 0x4a302000, PRU0's instruction RAM starts at 0x4a334000, and PRU1's instruction RAM starts at 0x4a338000. (See TRM Table 2-4 and section 4.3.2.) The PRU control registers also have addresses in the main processor's physical address space; PRU0's control registers start at address 0x4a322000 and PRU1's at 0x4a324000.

[8] The API uses prussdrv_pru_write_memory() to copy the contents into the memory-mapped region for PRU's instruction memory. The prussdrv_load_datafile API call is used to load a file into the PRU's data memory.

[9] Note that using Userspace I/O to control the PRU is the exact opposite philosophy of how the regular GPIO pins on the BeagleBone are controlled. Most of the BeagleBone's I/O pins are controlled through complex device drivers that provide access through simple file system interfaces. This is easy to use: just write 1 to /sys/class/gpio/gpio60/value to turn on GPIO 60. On the other hand, the BeagleBone gives you raw access to the PRU control registers - you can do whatever you want, but you'll need to figure out how to do it yourself. It's a bit strange that two parts of the BeagleBone have such wildly different philosophies.

[10] Device trees are discussed in more detail in my previous BeagleBone article. Also see A Tutorial on the Device Tree, Device Tree for Dummies, or Introduction to the BeagleBone Black Device Tree.

[11] Why is the example device tree overlay (and most device tree files) labeled with version 00A0? I found out (thanks to Robert Nelson) that A0 is a common initial version for contract hardware. So 00A0 is the first version of something, followed by 00A1, etc.

[12] The blink example only sends events from the PRU to the host, but what about the other direction? The host can send an event (e.g. ARM_PRU0_INTERRUPT, 21) to a PRU using prussdrv_pru_send_event(). This writes the PRU's SRSR0 (System Event Status Raw/Set Register, TRM 4.5.3.13) to generate the event. The interrupt diagram shows this event flows to R31 bit 30. The PRUs don't receive interrupts as such, but must poll the top two bits of R31 to see if an interrupt signal has arrived.

[13] The PRU has 64 system events for all the different things that can trigger interrupts, such as timers or I/O (see TRM 4.4.2.2). When generating an event with R31, bit 5 triggers an event, while the lower 4 bits select the event number (see TRM 4.4.1.2). PRU system event numbers are offset by 16 from the internal event numbers, so internal event 3 is system event 19. See PRU interrupts for the full list of events.

[14] Multiple PRU registers are set up to initialize interrupt handling. The CMR (channel map registers, TRM 4.5.3.20) map from each system event to a channel. The HMR (host interrupt map registers, TRM 4.5.3.36) map from a channel to a host interrupt. The PRU interrupt registers (INTC) start at address 0x4a320000 (TRM Table 4-8).

In today's Xerox Alto restoration session we investigated why the system doesn't boot. We find a broken wire, hook up a logic analyzer, generate a cloud of smoke, and discover that memory problems are preventing the computer from booting. (In previous episodes, we fixed the power supply, got the CRT display working and cleaned up the disk drive: days 1, 2, 3. and 4.)

The Alto was a revolutionary computer, designed at Xerox PARC in 1973 to investigate personal computing. It introduced the GUI, Ethernet and laser printers to the world, among other things. Y Combinator received an Alto from computer visionary Alan Kay and I'm helping restore the system, along with Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen (from the IBM 1401 restoration team).

The broken wire

The Xerox Alto is built from 13 circuit boards, crammed with TTL chips. In 1973, minicomputers such as the Alto were built from a whole bunch of simple ICs instead of a primitive microprocessor chip. (People still do this as a retro project.) The Alto's CPU is split across 3 boards: an ALU board, a control board, and a control RAM board. The control board is the focus of today's adventures.

If a circuit board has a design defect or needs changes, it can be modified by attaching new wires to create the necessary connections. The photo below shows the control board with several white modification wires. While examining the control board, we noticed one of the wires had come loose. Could the boot failures be simply due to a broken wire?

Control board from the Xerox Alto, showing a broken wire. The white wires were for a modification, but one wire came loose.

We carefully resoldered the wire and powered up the system. The disk drive slowly came up to speed and the heads lowered onto the disk surface. We pressed the reset button (under the keyboard) to boot. As before, nothing happened and the display remained blank. Fixing the wire had no effect.

After investigation, it appears the rework wires were to support the Trident/Tricon hard disk. In the photo above, note the small edge connector in the upper right, with the white wires connected. The Trident disk controller used this connector, but our (Diablo) disk controller does not. In other words, the broken wire might have caused problems with a different disk drive, but it was irrelevant to us.

Microcode on the Xerox Alto

Some background on the Xerox Alto's architecture will help motivate our day's investigation. The Alto, like most modern computers, is implemented using microcode. Computers are programmed in machine instructions, where each instruction may involve several steps. For instance, a "load" instruction may first compute a memory address by adding an offset to an index register. Then the address is sent to memory. Finally the contents of memory are stored into a register. Instead of hardcoding these steps (as done in the 6502 or Z-80 for instance), modern computers run a sequence of "micro-instructions", where each micro-instruction performs one step of the larger machine instructions. This technique, called microcode, is used by the Xerox Alto.

The Alto uses microcode much more heavily than most computers. The Alto not only uses microcode to implement the instruction set, but implements part of the software in microcode directly. Part of the Alto's design philosophy was to use software (i.e. microcode) instead of hardware where possible. For instance, most video displays pull pixels out of memory and display them on the screen. In the Alto, the processor itself fetches pixels out of memory and passes them to the video hardware. Similarly, most disk interfaces transfer data between memory and the disk drive. But in the Alto, the processor moves each data word to/from memory itself. The code to perform these tasks is written in microcode.

To perform all these low-level activities, the Alto hardware manages 16 different tasks, listed below. High-priority tasks (such as handling high-speed data from the disk) can take over from low-priority tasks, such as handling the display cursor. The lowest-level task is the "emulator", the task that executes program instructions. (In a normal computer, the emulator task is the only thing microcode is doing.) Remember, these tasks are not threads or processes handled by the operating system. These are microcode tasks, below the operating system and scheduled directly by the hardware.

Task	Name	Description
0	Emulator	Lowest priority.
1	-	unused
2	-	unused
3	-	unused
4	KSEC	Disk sector task
5	-	unused
6	-	unused
7	ETHER	Ethernet task
8	MRT	Memory refresh task. Wakeup every 38.08 microseconds.
9	DWT	Display word task
10	CURT	Cursor task
11	DHT	Display horizontal task
12	DVT	Display vertical task. Wakeup every 16.666 milliseconds.
13	PART	Parity task. Wakeup generated by parity error.
14	KWD	Disk word task
15	-	unused

Last episode, we found that processor was running the various tasks, but never tried to access the disk. System boot is started by the emulator task, which stores a disk command in memory. The disk sector task (KSEC) periodically checks if there are any disk commands to perform. Thus, it seemed like something was going wrong in either the emulator task (setting up the disk request), or the disk sector task (performing the disk request). To figure out exactly what was happening, we needed to hook up a logic analyzer.

The logic analyzer

A logic analyzer is a piece of test equipment a bit like an oscilloscope, except instead of measuring voltages, it just measures 0's or 1's. A logic analyzer also has dozens of inputs, allowing many signals to be analyzed at once. By using a logic analyzer, we can log every micro-instruction the processor runs, track each task, and even record every memory access.

Most of the signals of interest are available on the Alto's backplane, which connects all the circuit cards. Since the backplane is wire-wrapped, it consists of pins that conveniently fit the logic analyzer probes. For each signal, you need to find the right card, and then count the pins until you find the right pin to attach the probe. This setup is very tedious, but Marc patiently connected all the probes, while Carl entered the configuration into the logic analyzer.

The backplane of the Xerox Alto, with probes from the logic analyzer attached to trace microcode execution. Note the thick power wires on the left.

Unfortunately, a few important signals (the addresses of the micro-instructions) were not available on the backplane, and we needed to attach probes to one of the PROM chips that hold the microcode. Fortunately, the Living Computer Museum in Seattle gave us an extender card; by plugging the extender card into the backplane and the circuit board into the extender card, the board was accessible and we could connect the probes.

Probes from the logic analyzer hooked up to the Xerox Alto. By plugging the control board into an extension board, probes can be attached to it.

Hours later, with all the probes connected and the configuration programmed into the logic analyzer, we were ready to power up the system and collect data.

Running the logic analyzer

"Smoke! Stop! Shut it off!"

As soon as we flipped the power switch, smoke poured out of the backplane. Had we destroyed this rare computing artifact? What had gone wrong? When something starts smoking, it's usually pretty obvious where the problem is. In our case, one of the ground wires from the logic analyzer pod had melted, turning its insulation into smoke. A bit of discussion followed: "Pin 3 is ground, right?""No, pin 9 is ground, pin 3 is 5 volts.""Oops." It turns out that when you short +5 and ground, a probe wire is no match for a 60 amp power supply. Fortunately, this wire was the only casualty of the mishap.

This logic probe wire melted when we accidentally connected +5 volts and ground with it.

With this problem fixed, we were able to get a useful trace from the logic analyzer. The trace showed that the Alto started off with the emulator/boot task. After just four instructions, execution switched to the disk word task, which was rapidly interrupted by the parity error task. When that task finished, execution went back to the disk word task, which was interrupted a few instructions later by the display vertical task. The disk word task was able to run a few more instructions before the display horizontal task ran, followed by the cursor task.

The vintage Agilent 1670G logic analyzer that we connected to the Xerox Alto. The screen shows the start of the Alto's boot sequence.

It's rather amazing how much task switching is going on in the Alto, with low-priority tasks only getting a few instructions executed before being interrupted by a higher-priority task. Looking at the trace made me realize how much overhead these tasks have. In our case, the emulator task is running the boot code, so progress towards boot requires looking at hundreds of instructions in the logic analyzer.

The key thing we noticed in the traces is the parity error task ran right near the start, indicating an error in memory. This would explain why the system doesn't boot up. We ran a few more boot cycles through the logic analyzer. The specific order of tasks varied each time, as you'd expect since they are triggered asynchronously from hardware events. But we kept seeing the parity errors.

The Alto's memory system

The Alto was built in the early days of semiconductor memory, when RAM chips were expensive and unreliable. The original Alto module used Intel's 1103 memory chips, which were the first commercially available DRAM chip, holding just 1 kilobit. To provide 128 kilobytes of memory, the Alto I used 16 boards crammed full of chips. (If you're reading this on a computer with 4 gigabytes of memory, think about how much memory capacity has improved since the 1970s.)

We have the later Alto II XM (extended memory) system, which used more advanced 16 kilobit chips to fit 512 kilobytes of storage onto 4 boards. Each memory board stored a 10 bit chunk—why 10 bits? Because memory chips were unreliable, the Alto used error correction. To store a 32-bit word pair, 6 bits of Hamming error correction were added, along with a parity bit, and one unused bit. The extra bits allow single-bit errors to be corrected and double-bit errors to be detected. The four memory boards in parallel stored 40 bits at a time—the 32 bit word pair and the extra bits for error correction.

A 128KB memory card from the Xerox Alto. The board has eighty 4116 DRAM chips, each with 16 kilobits of storage.

In addition to the 4 memory boards, the Alto has three circuit boards to control memory. The "MEAT" (Memory Extension And Terminator) is a trivial board to support four memory banks (the extended memory in the Alto XM). The "AIM" board (Address Interface Module) is a complex board that maps addresses to memory control signals, as well as handling memory-mapped peripherals such as the keyboard, mouse, and printer. Finally, the "DIM" board (Data Interface Module) generates the Hamming error correcting code signals, and performs error detection and correction.

More probing showed that the DIM board was always expressing a parity error. At this point, we're not sure if some of the memory chips are bad or if the complex circuitry on the DIM board is malfunctioning and reporting errors. As you can tell from the above description, the memory system on the Alto is complex. It may be a challenge to debug the memory and find out why we're getting errors.

A look at the microcode

In this section, I'll give a brief view of what the microcode looks like and how it appears in the logic analyzer. Microcode is generally hard to understand because it is at a very low level in the system, below the instruction set and running on the bare hardware. The Alto's microcode seems especially confusing.

Each Alto micro-instruction specifies an ALU operation and two "functions". A function can be something like "read a register" or "send an address to memory". But a function can also change meaning depending on what task is running. For instance, when the Ethernet task is running, a function might mean "do a four-way branch depending on the Ethernet state". But during the display task, the same function could mean "display these pixels on the screen". As a result, you can't figure out what an instruction does unless you know which task it is a part of.

The image below shows a small part of the logic analyzer output (as printed on Marc's vintage HP line printer). Each line corresponds to one executed micro-instruction. The "address" column shows the address of the micro-instruction in the 1K PROM storage. The task field shows which task is running. You can see the task switch midway through execution; 0 is the emulator and 13 is the parity task. Finally, the 32-bit micro-instruction is broken into fields such as RSEL (register select), ALUF (ALU function) and F1 (function 1).

The start of the logic analyzer trace from booting the Xerox Alto. The trace shows us each micro-instruction that was executed.

Note that the addresses jump around a lot; this is because the microcode isn't stored linearly in the PROM. Every micro-instruction has a "next instruction address" field in the instruction, so you can think of it as a GOTO inside every instruction. To make it worse, this field can be modified by the interface hardware, turning a GOTO into a computed GOTO. To make this work, the assembler shuffles instructions around in memory, so it's hard to figure out what code goes with a particular address. The point of this is that the logic analyzer output shows us every micro-instruction as it executes, but the output is somewhat difficult and tedious to interpret.

Fortunately we have the source code for the microcode, but understanding it is a challenge. The image below shows a small section of the boot code. I won't attempt to explain the microcode in detail, but want to give you a feel for what it is like. Labels (along the left) and jumps to labels are highlighted in blue. Things such as IR, L, and T are registers, and they get assigned values as indicated by the arrows. MAR is the memory address register (giving an address to memory) and MD is memory data, reading or writing the memory value.

A short section of the Xerox Alto's microcode. Labels and jumps are colored blue. Comments are gray.

Figuring out the control flow of the microcode requires detailed understanding of what is happening in the hardware. For example, in the last line above, ":Q0" indicates a jump to label "Q0". However the previous line says "BUS", which means the contents of the data bus are ORed into the address, turning the jump into a conditional jump to Q0, Q1, Q2, etc. depending on the bus value. And "TASK" indicates that a task switch can happen after the next instruction. So matching up the instructions in the logic analyzer output with instructions in the source code is non-trivial.

I should mention that the authors of the Alto's microcode were really amazing programers. An important feature for graphics displays is BITBLT, bit block transfer. The idea is to take an arbitrary rectangle of pixels in memory (such as a character, image, or window) and copy it onto the screen. The tricky part is that the regions may not be byte-aligned, so you may need to extract part of a byte, shift it over, and combine it with part of the destination byte. In addition, BITBLT supports multiple writing modes (copy, XOR, merge) and other features. So BITBLT is a difficult function to implement, even in a high-level language. The incredible part is that the Xerox Alto has BITBLT implemented in hundreds of lines of complex microcode! Using microcode for BITBLT made the operation considerably faster than implementing it in assembly code. (It also meant that BITBLT was used as a single machine language instruction.)

Conclusion

Hooking up the logic analyzer was time consuming, but succeeded in showing us exactly what was happening inside the Alto processor. Although interpreting the logic analyzer output and mapping it to the microcode source is difficult, we were able to follow the execution and determined that the parity task was running. It appears that memory parity errors are preventing the system from booting. Next step will be to understand the memory system in detail to determine where these errors are coming from and how to fix them.

You might wonder if it's worth spending $79 for a genuine MacBook charger when you can get a charger on eBay for under $15. You shouldn't get a cheap charger because they are often dangerous and lack safety features. In addition, they produce poor-quality power that isn't good for your laptop and may charge more slowly. I've written before about the safety problems with cheap chargers, but they say a picture is worth a thousand words, so here is why you shouldn't buy a cheap knockoff charger:

A knockoff MacBook charger emits large sparks if short-circuited. Genuine Apple chargers have safety features to protect against this.

If the connector comes in contact with something metal (a paperclip in this instance), it shorts out, creating a big spark. (Don't try this at home.) The genuine Apple charger (below) has safety features that protect against a short circuit. Shorting the connector on a genuine charger has no effect.

A genuine Apple MacBook charger has safety features that protect it from short circuits.

It's really hard to tell a genuine charger from a knockoff from the outside, since the knockoffs look just like the real thing. If you carefully read the text on this charger, you'll notice that "Apple" is missing. However, many knockoff chargers duplicate the text from a real charger, so often you can't tell if it is genuine or not just by looking. Big sparks, however, are a clear sign.

A cheap MacBook charger from eBay. Unlike most cheap chargers, this one doesn't claim to be an Apple charger, but just a "Replacement AC Adapter".

Why does a fake charger produce sparks, while a genuine one doesn't? The fake charger constantly outputs 20 volts, so if any metal shorts the connector, it produces a big spark with all its 85 watts of power. On the other hand, the genuine charger doesn't power up until it has been securely connected to the laptop for a full second. Until it is properly connected (details), it outputs a tiny amount of power (0.6 volts at 100µA) that can't produce a spark. To manage this, the genuine charger includes a powerful microcontroller (more powerful than the microprocessor in the original Macintosh by some measures). Since this processor increases the cost of the charger, knockoff chargers omit it, even though this makes the charger more dangerous.

As the photos below show, the cheap charger (left) omits as much as possible. On the other hand, the genuine Apple charger (right) is crammed full of components. Many of these components filter the power to provide higher-quality power to your laptop. The Apple charger also includes power factor correction, making the charger more efficient.

The cheap MacBook charger (left) omits most of the components found in a genuine Apple charger (right). The genuine charger includes more filtering, power factor correction (left), and a powerful microcontroller (board in upper right).

I've written in detail before about how chargers work, but I'll give a quick explanation here. The AC power comes in the red wires at the top and is converted to high-voltage DC (170V or 340V, depending on if you're in the US or Europe). A transistor (black component on left) chops the power into high-frequency pulses. The pulses create a changing magnetic field in the flyback transformer (large blue box), generating a high-current, low-voltage output. The output is converted to DC by diodes (black component, upper right), and filtered by capacitors (cylinders), to produce the 20 volt output (wires at bottom). A control IC (see photo below) controls the system to regulate the voltage. This may seem like an excessively complicated way to generate 20 volts, but switching power supplies like this are very compact, lightweight and efficient compared to simpler power supplies.

Shorting a cheap charger with a paperclip creates impressive sparks.

Looking at the underside of the cheap charger board shows it has very few components, while the genuine Apple charger's board is covered with tiny components. The two chargers are worlds apart as far as complexity, and this complexity is what provides more efficiency, more safety, and better quality power in the Apple charger.

The cheap MacBook charger (left) uses very simple circuits compared to the genuine Apple charger (right), which is crammed full of components.

Conclusion

While buying a cheap charger saves a lot of money, these chargers omit many safety features and can be hazardous to you and your computer. Don't buy a cheap knockoff charger; if you don't want to pay for a genuine Apple charger, at least buy a charger from a name-brand manufacturer.

Maybe you think these safety issues don't matter because you don't poke your charger with a paperclip. But if you have any metal objects on your desk, a random contact could yield a surprisingly large spark.

I've written a bunch of articles before about chargers, so if this article seems familiar, you're probably thinking of an earlier article, such as: Counterfeit MacBook charger teardown, Magsafe charger teardown, iPhone charger teardown or iPad charger teardown.

Follow me on Twitter to find out about my new articles.

Notes

If you're interested in the components inside the cheap charger, I have some details. The PWM control IC is a SiFirst 1560, a basic control IC for a flyback converter. The IC datasheet has the approximate schematic for the charger. The switching transistor is a 2N601 2 amp, 600 volt N-channel MOSFET. The voltage reference is an AZ431, similar to the ubiquitous TL431. The optoisolator is an 817C. The output diode is a MBRF20100C 10 amp Schottky diode pair. The electrolytic capacitors are from HKLCON.

A cheap charger emits large sparks if you short the connector with a paperclip. Safety features in a genuine charger protect against shorts.