In today's Xerox Alto restoration session we investigated why the disk drive isn't working and found a failed chip. With this chip repaired, we were able to read a block from disk, although the system still doesn't boot. (In previous episodes, we fixed the power supply, got the CRT display working, cleaned up the disk drive and hooked up a logic analyzer: days 1, 2, 3, 4 and 5.)

Our test setup for the Xerox Alto. The Alto computer itself is the metal cabinet in the center with the visible circuit boards. On the left is a vintage HP line printer, with the logic analyzer behind it. The video display for the Alto is visible on the right, behind the oscilloscope.

The Alto was a revolutionary computer, designed at Xerox PARC in 1973 to investigate personal computing. It introduced the GUI, Ethernet and laser printers to the world, among other things. Y Combinator received an Alto from computer visionary Alan Kay and I'm helping restore the system, along with Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen (from the IBM 1401 restoration team). Marc's video of this restoration session is below.

The missing disk sector task

In the Alto, like most modern computers, each machine instruction is implemented in an even more primitive form of code called microcode. But unlike most computers, the Alto also implements some of its low-level software in microcode. Part of the Alto's design philosophy was to use software (i.e. microcode) instead of hardware where possible. For instance, a microcode sector task processes each disk sector and a word task stores each word of data as it arrives from the disk drive; most computers do this with DMA hardware.

Last week we hooked a logic analyzer to the Alto to trace the executing microcode and found the disk sector task was failing to run. Each track on the Alto's hard disk is divided into 12 sectors, with 12 slots in the hub to indicate the sectors. We verified that the disk drive was detecting these slots and sending the sector pulses every 3.33 milliseconds. The disk sector task is supposed to run for each sector and perform any disk command, but the logic analyzer showed that this task was not running.

The hard disk pack for the Xerox Alto has 12 sectors. Slots cut into the disk hub trigger a signal for each sector. Four of the sector slots are labeled in the photo.

Why was the sector task not running? The disk interface board provides a signal to indicate when the sector task should run (WAKEST), but we found it was not being activated even though the disk drive was providing sector pulses to the disk interface board. Looking at the disk interface board schematic, the sector pulse circuit is fairly simple: just a few flip flops. (You don't need to understand the schematic below. The key point is the sector pulse comes in on the left, goes through a few chips, and the wakeup signal comes out on the right.) I've heard that old TTL flip flops fail regularly, so I figured one of the flip flop chips had failed. We decided to hook up an oscilloscope and see where things were going wrong, but one problem stood in our way.

Schematic from the Xerox Alto's disk controller card. This circuit processes sector pulses from the disk drive and generates signals to wake up the microcode sector task.

The extender card

The Alto consists of 13 circuit cards plugged into a wire-wrapped backplane, making them inaccessible to probing. Fortunately, the Living Computer Museum gave us an extender card, a board that goes between an Alto board and the backplane, physically extending the Alto board out of the cabinet where it can be diagnosed. Last week, we used the extender card to probe signals on the CPU control board. But no matter how hard I tried, I couldn't get the extender board to plug into the disk interface board's slot. Marc noticed out that the board was hitting something, and we realized that the disk interface board had a notch on the right, allowing the board to clear a bar that was in the way. The extender board, like most of the Alto boards, lacked this notch. A bit more investigation revealed that memory boards had a notch, but on the left.

Why did some boards have notches? Most of the boards are powered with 5 volts. The memory boards also require -5 volts and +12 volts for the 4116 DRAM chips. The I/O boards (Ethernet and disk) have +/- 15 volts as well as 5 volts. The Alto backplane was apparently designed so you couldn't plug a board into a slot with the wrong voltages (which would have been catastrophic). Boards with unusual power requirements had a notch that allowed them to fit into slots wired with unusual voltages. The consequence was that we couldn't use the extender board with the disk interface without cutting a notch in it, which we did (see photo below).

Milling a notch into the extender board.

We were worried that by cutting a notch in the extender board and using it in a slot where it wasn't intended we might destroy the computer in a spectacular show of sparks and smoke. The concern was that the extender board doesn't simply pass the 162 lines through, but wires all the ground lines to a ground plane and wires the +5 lines together. If the disk interface card had +15 volts where the extender board expected, say, +5 volts, the extender card would run +15 volts to all the chips and destroy them. We verified the wiring five times to make sure nothing would get shorted, plugged in the extender board, and turned the Alto on with some trepidation. Fortunately our calculations were correct and nothing blew up.

Debugging the disk interface

The photo below shows the disk interface card extended out of the Alto cabinet, with some oscilloscope probes attached to the flip flop chips. (The ribbon cable attached to the board connects to the disk drive, while the ribbon cable hanging above the board allows us to probe microcode signals with the logic analyzer.) Strangely, we didn't see any signals either going into the flip flops or coming out. We checked that the sector pulses were showing up in the logic analyzer, and on the connector from the disk drive, but the flip flops were getting nothing. Eventually we turned our attention to the inverter chip (see earlier schematic). We saw the sector signal going into the inverter, but not coming out. Could this simple chip be causing the problems?

Debugging the disk interface card in the Xerox Alto.

The 7414 TTL chip contains 6 inverters, which turn a 1 input into a 0 output and vice versa. We pulled the chip out of the disk interface board and tested it with a simple LED circuit (see photo below). Five of the six inverters worked fine, but one of the inverters had entirely failed. The chip is a bit unusual since it uses a Schmitt trigger—a circuit that cleans up noisy signals (such as the sector pulses that traveled over a long cable from the disk drive)—so we couldn't get a replacement at Fry's or Radio Shack. Were we stuck for the day?

Testing the 7414 inverter chip from the Xerox Alto's disk interface card. One inverter was burnt out, preventing the disk from working.

Fortunately we could work around the faulty chip. Carl studied the schematics and discovered that one of the good inverters on the chip was unused. We rewired the chip to replace the bad inverter with the unused good inverter by using an ugly but effective "dead bug" hack. We bent out the pins from the good inverter and attached wires. We cut off the pins from the bad inverter. Finally, we stuck the wires into the socket along with the IC, so the good inverter was wired in place of the bad inverter.

We re-wired a 7414 inverter chip. An unused inverter replaced the failed inverter.

We booted the Alto and found that our chip hack actually worked and the system worked much better than before: the sector pulses got through the inverter, were processed by the flip flops, and triggered the sector task as we hoped. The sector task read the disk command from memory and sent it to the disk drive. The disk drive read the desired sector and started sending bits back. For each word, the disk word task read the word from the disk interface and stored it in memory. In summary, we were now reading data from disk!

Reading data from disk was a big milestone, since most of the system needed to be working properly for this to happen. Unfortunately the Alto didn't boot up, and we'll need to figure out where things went wrong. Is the boot block not running correctly? Is the read data corrupted? Is the disk returning an error at some point? Is our disk not a boot disk? Strangely, there was no sign of the parity errors we kept seeing last week.

The timeline diagram below shows task switching in the Alto over an interval of 700 microseconds.. You can see that the microcode is constantly switching between tasks. Today's accomplishment can be seen in the periodic execution of disk word task (KWT) at the bottom of the image; this task runs about every 9 microseconds when each word comes from the disk drive. The disk sector task (KSEC) runs at the start of the next sector (at which time the word task stops). Other tasks are the memory refresh task (MRT) and cursor task (CURT) that run periodically. (You can see where the higher-priority MRT task interrupted the KSEC task.) The lowest priority task is the Nova emulator (NOVEM), which runs program code when nothing else is happening. The numbers at the bottom show the micro-instruction count since boot; at this point we are 14.8 milliseconds into the boot process. I generated the diagram below by processing the logic analyzer output to show each running task. An interactive version is here, allowing zoom and pan with the mouse.

Timeline showing task switching on the Xerox Alto. These are microcode tasks switched by hardware, not operating system level processes or threads.

Conclusion

In today's repair session, we found a failed 7414 inverter chip that was preventing disk operation. By working around that issue, we could finally read from disk, but boot is still failing for unknown reasons. Nonetheless, today's session got us much closer to a working system. We'll need to dig through the logic analyzer output to figure out where the boot process is breaking down.

In the previous Xerox Alto restoration session, we got the disk working, but the system didn't boot. After much investigation, I discovered the explanation for the boot failure: the disk has been overwritten with random data! This article describes my journey through the Alto microcode to determine what happened.

Inserting a disk into the Xerox Alto's disk drive. The Alto's video display is visible at the back.

For background, the Alto was a revolutionary computer designed at Xerox PARC in 1973 to investigate personal computing. It introduced the GUI, Ethernet and laser printers to the world, among other things. Y Combinator received an Alto from computer visionary Alan Kay and I'm helping restore it, along with Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen (from the IBM 1401 restoration team). For posts on previous restoration days see 1, 2, 3, 4, 5 and 6.

Debugging the boot failure

Last session, after fixing a broken 7414 TTL chip on the disk interface board, we could fetch a block from disk but the Alto failed to boot. We used a logic analyzer to trace the microcode instructions and the ALU bus contents. Josh Dersch from the Living Computer Museum studied the traces and found that the boot program was executing a few instructions (jump, add, load), and then seemed to go off the rails. But it turns out things were more messed up than that.

I made a microcode trace browser to help figure out what was going on. With this program, I can step through an execution trace one micro-instruction at a time and see the corresponding source code line. (Click the image below for the live trace browser.) First, I examined the KWD (disk word task), which executes for each word from disk, and copies that word to memory. I verified that the disk read was working as expected. The second task of interest is the NOVEM (Nova emulator task), which runs a program. In our case, it runs the boot program as soon as it is loaded from disk. By examining this task, we can figure out what is going wrong with the boot process.

Xerox Alto microcode trace viewer. With the viewer, you can step through the execution trace collected by the logic analyzer and see each source code line as it is executed. The buttons on the right indicate which microcode task is running at each step.

By studying the disk read microcode (KWD) closely, I was able to extract each word in the disk sector from the logic analyzer trace. This was very difficult for many reasons. For example, we logged the ALU bus which doesn't have the words from disk. I had to figure out the disk contents by reversing the checksum computation, which was on the ALU bus. Another problem was the Alto stores sectors on disk backwards. But eventually I extracted the contents of the boot sector, as read into the Alto:

16a5 2d4a 5a94 b528 14db 29b6 536c a6d8
333b 6676 ccec e753 b02d 1ed1 3da2 7b44
...

I hand-disassembled these words into Data General Nova assembler code and discovered a few things. First, the first few instructions matched Josh's interpretation, so the CPU and the emulator task seemed to be working correctly. Second, the instructions didn't make any sense as code, and some words weren't even instructions, which explained why the boot rapidly fell apart. Third, and most puzzling, the instructions were nothing like what the Alto boot code was supposed to be.

Backplane of the Xerox Alto wired with logic analyzer probes. These probes monitor the executing micro-instructions and the contents of the ALU bus.

The boot block seemed to contain random junk. The problem wasn't flaky hardware generating bad data, because the block checksum validated correctly. This wasn't the drive returning the wrong sector, because the sector header was correct. The sector didn't contain instructions, it wasn't ASCII, and it didn't look like a sensible file format. As I studied the sector contents more, I wondered it the data was literally random. I made a histogram of how many times each byte value occurred, and it was pretty much uniform so (In comparison, archived Alto disk sectors showed very non-uniform distributions.) But why would the boot block have been overwritten with (pseudo-) random data?

Josh mentioned DiEx (Diablo disk exerciser), a utility program to diagnose problems with the Alto's Diablo disk drive, and suggested that it could have wiped the disk. I found the DiEx source code in the Computer History Museum's Alto archive, and sure enough, it has a feature to write random data to the disk (and then verify it).

Screenshot of the Diablo Disk Exerciser (DiEx) running on a Xerox Alto simulator. Note the early mouse-based GUI; clicking on an entry changes the value. Image courtesy of Nathan Lineback.

I could believe someone had inconveniently wiped our disk with the DiEx utility, but I still had nagging doubts that maybe we were seeing a hardware issue. Could I prove that DiEx was responsible? All I had to do was show that the disk data wasn't arbitrary, but came from DiEx.

Generating random numbers on the Alto

I found the source code for RANDOM.ASM, the Alto's random number code, in the Computer History Museum's Alto archive. This algorithm generates 16-bit random numbers with the recurrence formula: "x[n] = (x[n-33] + x[n-13]) mod 2^16". (Note that are very bad random numbers cryptographically since once you have 33 numbers in the sequence you can generate them all.) I wanted to see if the data we read from disk was generated from this function, so I coded up the algorithm. This was somewhat difficult as the original was written in Nova assembler code. The results didn't match the disk data, no matter what I tried. Finally, I realized that I could just use a brute force solution and ignore the details of the algorithm. I picked random pairs of values in the data and checked if their sum appeared in the data. If the data came from any sort of recurrence, I would get a bunch of matches, but I didn't. I concluded that the disk data wasn't generated from this random number algorithm.

However, on closer examination I noticed that the RANDOM.ASM function signature didn't match the DiEx code, so it probably wasn't the right function. After more searching I found TriexML.asm, another Alto random number function. To generate a random 16-bit word, this algorithm simply shifts the previous value one bit to the left. If there is an overflow, the result is xor'd with the number 077213. (It would be hard to come up with a cryptographically worse random number generator—from one number you can generate the whole sequence—but the algorithm is very fast.)

To check the disk contents against this algorithm, I skipped the careful implementation and went straight to brute force. To see if any shift-and-xor algorithm would explain our data, I shifted each word from the disk sector and xor'd it with the next one. In each case, I got either 0 or octal 077213, matching the algorithm. Starting the algorithm with 012345 (the seed value in the code) eventually generates the exact sector of data we read, proving this algorithm generated the random data we saw on the disk.

A few of the old Xerox Alto disks in Xerox PARC's collection. Hopefully they haven't been overwritten with junk.

Thus, someone had clobbered our disk (probably decades ago) while testing the drive with DiEx. Since we couldn't boot off this disk, we'd need a new boot disk. Xerox PARC has dozens of old Alto disks lying around and they offered some of them to us. But the Living Computer Museum offered to send us a working Alto disk, rather than risk damage to the potentially-interesting contents of an old PARC disk, so we'll use the LCM disk instead.

Conclusion

Last repair session, we fixed a failed 7414 inverter chip on the disk interface board. With that fixed, we could read the disk but boot still failed. After careful investigation of the microcode and traces, I discovered that our disk had been overwritten with random data making it impossible to boot from it. In one way this is a good result, since it means our boot wasn't failing because of a hardware problem.

When we get a new Alto disk, we'll try booting again. I'm moderately optimistic that the system will come up successfully, but there could be more hardware problems waiting for us. For updates on the restoration, follow kenshirriff on Twitter.

Thanks to Josh Dersch and the Living Computer Museum for their debugging help. Thanks to Tim Curley and Xerox PARC for supplying additional disks.

This article describes how to write C programs for the BeagleBone's microcontrollers. The BeagleBone Black is an inexpensive, credit-card sized computer that has two built-in microcontrollers called PRUs. By using the PRUs, you can implement real-time functionality that isn't possible in Linux. The PRU microcontrollers can be programmed in C using an IDE, which is much easier than low-level assembler programming. I recently wrote an article about the PRU microcontrollers, explaining how to program them in assembler and describing how they interact with the main ARM processor; so read it for more background.

A "blink" program in C

To motivate the discussion, I'll use a simple program that uses the PRU to flash an LED ten times. This example is based on PRU GPIO example but using C instead of assembly code.

Blinking an LED using the BeagleBone's PRU microcontroller.

The C code, below, flashes the LED ten times. The LED is controlled by setting or clearing a bit in register R30, which controls the GPIO pins. The code demonstrates two ways of performing delays. The first delay uses a for loop, leaving the LED on for 400 ms. The second delay uses the special compiler function __delay_cycles(), which delays for the specified number of cycles. Since the PRUs run at 200 MHz, each cycle is 5 nanoseconds, yielding an off time of 300 ms. At the end, the code sends an interrupt to the host code via register R31 to let it know the PRU has finished.[1]

How to compile C programs with Code Composer Studio

Although you can compile C programs directly on the BeagleBone,[2] it's more convenient to use an IDE. Texas Instruments provides Code Composer Studio (CCS), an integrated development environment on Windows and Linux that you can use to compile C programs for the PRU.[3] To install CCS, use the following steps:

Download CCS here. (You'll need to create a TI account and then fill out an export approval form before downloading, which seems so 1990s but isn't too difficult.)
Follow the instructions here to make sure you have the necessary dependencies or CCS installation will mysteriously fail.
In the installer, select Sitara 32-bit ARM Processors: GCC ARM Compiler and TI ARM Compiler.
In the add-ons dialog, selects PRU Compiler.
After installation, run CCS, select App Center, and install the additional add-ons (i.e. the PRU compiler).

To create a C program in CCS, use the following steps. The image highlights the fields to update in the dialog.

Start CCS.
Click New Project.
Change target to AM3358.
Change tab to PRU.
Enter a project name, e.g. "test".
Open "Project templates and examples" and select "Basic PRU Project".
Click Finish.
Enter the code.

How to set up Code Composer Studio to compile a PRU program for the BeagleBone.

To set up the BeagleBone for the example:

Download the device tree file: /lib/firmware/PRU-GPIO-EXAMPLE-00A0.dts.

Compile and install the device tree file to enable the PRU:

# dtc -O dtb -I dts -o /lib/firmware/PRU-GPIO-EXAMPLE-00A0.dtbo -b 0 -@ PRU-GPIO-EXAMPLE-00A0.dts
# echo PRU-GPIO-EXAMPLE > /sys/devices/bone_capemgr.?/slots
# cat /sys/devices/bone_capemgr.?/slots

Download the linker command file bin.cmd.
Download the host file that loads and runs the PRU code (loader.c) and compile it:
```
# gcc -o loader loader.c -lprussdrv
```

To compile and run the C program:

In CCS, select Project -> Build All (control-B) to compile the program.[4]
Copy the binary (test/Debug/test.out) to BeagleBone (e.g. with scp)

On the BeagleBone, link and run the program:[5]

# hexpru bin.cmd test.out
# ./loader text.bin data.bin

If everything went correctly, the LED should flash. (See my previous article for debugging help.)

In this example, loader simply loads and runs the executable on the PRU.[6] In a more advanced application, it would communicate with the PRU. For example, it could get commands from a web page, send them to the PRU, get results, and display them on the web. The point is that you can use the Linux-side code to do complex network or computational tasks, in combination with the PRU doing low-level, real-time hardware operations. It's kind of like having an Arduino together with a "real computer", in a tiny package.

The BeagleBone Black is a tiny computer that fits inside an Altoids mint tin. It is powered by the TI Sitara™ AM3358 processor, the large square chip in the center.

Documentation

The PRUs are very complex and don't have nice APIs, so you'll probably end up reading a lot of documentation to use them. The most important document that describes the Sitara chip is the 5041-page Technical Reference Manual (TRM for short). This article references the TRM where appropriate, if you want more information. Information on the PRU is inconveniently split between the TRM and the AM335x PRU-ICSS Reference Guide. For specifics on the AM3358 chip used in the BeagleBone, see the 253 page datasheet. Texas Instruments' has the PRU wiki with more information. More information on using CCS is here.

If you're looking to use the BeagleBone and/or PRU I highly recommend the detailed and informative book Exploring BeagleBone. Helpful web pages on the PRU include BeagleBone Black PRU: Hello World, Working with the PRU and BeagleBone PRU GPIO example. Some PRU example code is in the TI PRU training course.

The BeagleBone Black, with the AM3358 processor in the center. The 512MB DRAM chip is below, with the HDMI framer chip to the right of it. The 4GB flash chip is in the upper right.

Using a timer and interrupts

For a more complex example, I'll show how to use the PRU with a timer and interrupts.[7] The basic idea is the timer will trigger an interrupt at a set frequency. The PRU code in this example will toggle the GPIO pin when an interrupt occurs, generating a sequence of 5 pulses.[8]

It is important to understand that PRU interrupts are not "real" interrupts that interrupt execution, but are signaled through polling.[9] A PRU interrupt sets bit 30 or bit 31 in register R31.[10] The PRU code can busy-wait on this bit to determine if an interrupt has happened. This is fast and very low latency, compared to context-switching interrupt, but it puts more demands on the program structure.

The first step is to add the plumbing for the timer's interrupt, so the PRU will receive the interrupt. The PRUs can handle 64 different interrupt types from various subcomponents of the system. The timer interrupt is assigned system event number 15 and has the cryptic name pr1_ecap_intr_req. (See TRM table 4-22.) Interrupts are configured in the host side code (loader.c) using the PRUSSDRV library API call prussdrv_pruintc_init. To support the timer interrupt, The diagram below shows the complex PRU interrupt configuration on the BeagleBone (details). The new interrupt path, highlighted in red, connects the timer interrupt (15) to CHANNEL0 and in turn to register R31, the register for polling.

Interrupt handling on the BeagleBone for the PRU microcontrollers. The timer interrupt (15) is shown in red. The default interrupt configuration is extended so the timer interrupt will trigger bit 30 of R31.

To add interrupt 15 to the configuration as shown above, the configuration struct in loader.c must be modified. The following structure is passed to prussdrv_pruintc_init to set up the interrupt handling. The changes are highlighted in red. Without this change, timer interrupts will be ignored and the example code will not work.

#define PRUSS_INTC_CUSTOM {   \
 { PRU0_PRU1_INTERRUPT, PRU1_PRU0_INTERRUPT, PRU0_ARM_INTERRUPT, PRU1_ARM_INTERRUPT, \
   ARM_PRU0_INTERRUPT, ARM_PRU1_INTERRUPT,  15, (char)-1  },  \
 { {PRU0_PRU1_INTERRUPT,CHANNEL1}, {PRU1_PRU0_INTERRUPT, CHANNEL0}, {PRU0_ARM_INTERRUPT,CHANNEL2}, {PRU1_ARM_INTERRUPT, CHANNEL3}, \
   {ARM_PRU0_INTERRUPT, CHANNEL0}, {ARM_PRU1_INTERRUPT, CHANNEL1}, {15, CHANNEL0}, {-1,-1}},  \
 {  {CHANNEL0,PRU0}, {CHANNEL1, PRU1}, {CHANNEL2, PRU_EVTOUT0}, {CHANNEL3, PRU_EVTOUT1}, {-1,-1} },  \
 (PRU0_HOSTEN_MASK | PRU1_HOSTEN_MASK | PRU_EVTOUT0_HOSTEN_MASK | PRU_EVTOUT1_HOSTEN_MASK) \
}

The second step to using the timer is to initialize the timer to create interrupts at the desired frequency, as shown in the following code. Using PRU features is fairly difficult since you are controlling them through low-level registers, not a convenient API, so you'll probably need to study TRM section 15.3 to fully understand this. The basic idea is the timer counts up by 1 every cycle (PWM mode is enabled in ECCTL2). When the counter reaches the value in the APRD (period) register, it resets and triggers a "compare equal" interrupt (as controlled by ECEINT). Thus, interrupts will be generated with the period specified by DELAY_NS.

inline void init_pwm() {
  *PRU_INTC_GER = 1; // Enable global interrupts
  *ECAP_APRD = DELAY_NS / 5 - 1; // Set the period in cycles of 5 ns
  *ECAP_ECCTL2 = (1<<9) /* APWM */ | (1<<4) /* counting */;
  *ECAP_TSCTR = 0; // Clear counter
  *ECAP_ECEINT = 0x80; // Enable compare equal interrupt
  *ECAP_ECCLR = 0xff; // Clear interrupt flags
}

The final step is to wait for the interrupt to happen with a busy-wait. The while loop polls register R31 until the timer interrupt fires and sets bit 30. Then the interrupt is cleared in the PRU interrupt subsystem and in the timer subsystem.

inline void wait_for_pwm_timer() {
  while (!__R31 && (1 << 30)) {} // Wait for timer compare interrupt
  *PRU_INTC_SICR = 15; // Clear interrupt
  *ECAP_ECCLR = 0xff; // Clear interrupt flags
}

The oscilloscope trace below shows the result of the timer example program: five precision pulses with a width of 100 nanoseconds on and 100 nanoseconds off. The important advantage of using the PRU microcontroller rather than the regular ARM processor is the output is stable and free of jitter. You don't need to worry about nondeterminism such as context switches or cache misses. If your application won't be affected by milliseconds of random delay, the regular processor is much easier to program, but if you require precision timing, you should use the PRU.

Using the BeagleBone Black's PRU microcontroller to generate pulses with a width of 100 nanoseconds.

The full source code for the timer example is here.[11] To run the timer example, you'll also need to use the updated loader.c that enables interrupt 15 (or else nothing will happen).

Conclusion

The PRU microcontrollers give the BeagleBone real-time, deterministic processing, but with a substantial learning curve. Programming the PRUs in C using the IDE is much easier than programming in assembler. (And you can embed assembler code in C if necessary.)

Combining the BeagleBone's full Linux environment with the PRU microcontrollers yields a very powerful system since the microcontrollers provide low-level real-time control, while the main processor gives you network connectivity, web serving, and all the other power of a "real" computer. (My current project using the PRU is a 3 megabit/second Ethernet emulator/gateway to connect to a Xerox Alto.)

Notes and references

[1] Delivering the interrupt to the host code is more complex than you'd expect. I wrote a longer description here, explaining details such as how event 3 on the PRU turns into event 0 on the host.

[2] To compile a C program on the BeagleBone, use the clpru command. See this article for details on clpru.

[3] Code Composer Studio isn't available for Mac, but CCS works well if you run Linux on your Mac using Parallels. I also tried running Linux in VirtualBox, but ran into too many problems.

[4] If you want to see the assembly code generated by the C compiler, use the following steps:

Go to Project -> Properties
Select the configuration you're building (Debug or Release)
Check Advanced Options -> Assembler Options: Keep the generated assembly language file. This adds the --keep_asm flag to the compile.

The resulting assembly file will be in Debug/main.asm. Although the file is hundreds of lines long, the actual generated code is much shorter, starting a few dozen lines into the file. Comments indicate which source lines correspond to the assembly lines.

[5] The hexpru utility converts the ELF-format file generated by the compiler into a raw image file that can be loaded onto the PRU. The bin.cmd file holds the command-line options for hexpru. See the PRU Assembly Language Tools manual for details.

You can configure Code Composer Studio to run hexpru automatically as part of compilation, by doing a bit of configuration. Follow the steps at here to enable and configure PRU Hex Utility.

[6] The loader.c code uses the PRU Linux Application Loader API (PRUSSDRV) to interact with the PRU. I'm told that the cool new framework is remoteproc, but I'll stick with PRUSSDRV for now. (There seems to be a great deal of churn in the BeagleBone world, with huge API changes in every kernel.)

[7] For a timer, I'll use the PRU's ECAP module, which can be configured for PWM and then used as a 32-bit timer. (Yes, this is confusing; see TRM section 15.3 for details.)

[8] This code is intended to demonstrate the timer, not show the best way to generate pulses. If you just want to generate pulses, use the PWM or even a simple delay loop.

[9] You might wonder why you'd use the PRU polling interrupts rather than just polling a device register directly. The reason is you can test the R31 register in one cycle, but reading a device register takes about 3 or 4 cycles (read latency details).

[10] The library uses the convention that PRU0 polls on bit 30 and PRU1 polls on bit 31, but this is arbitrary. You could use both bits to signal one PRU, for instance.

[11] One complexity in the timer source code is the need to define all the register addresses. To figure out a register address, find the address of the register block in the PRU Local Data Memory Map (TRM 4.3.1.2). Then add the offset of the register (TRM 4.5). Note that you can also access these registers from the Linux host side, but the addresses are different. (The PRU is mapped into the host's address space starting at 0x4a300000, TRM table 2.4.)

In this Alto restoration session we controlled the Alto's disk drive with an FPGA disk emulator and attempted booting the Alto with a BeagleBone-based Ethernet emulator. The GIF below shows the drive performing seeks as commanded by the emulator. (With the cover off the Diablo drive, you can see the disk head floating above the spinning disk surface and moving back and forth for seeks.) However, both emulators encountered some bugs, which we will need to fix.

Looking inside the Diablo disk drive, you can see the head moving over the disk's surface as disk seeks take place. The green dial on the right rotates to indicate the current track.

The Alto was a revolutionary computer designed at Xerox PARC in 1973 to investigate personal computing. It introduced the GUI, Ethernet and laser printers to the world, among other things. Y Combinator received an Alto from computer visionary Alan Kay and I'm helping restore the system, along with Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen. For posts on previous restoration days see 1, 2, 3, 4, 5, 6 and 6 update.) Marc's YouTube video on Day 7 is below:

In our previous session, we discovered a faulty 7414 inverter chip on the disk interface card was preventing the disk from working: one of the six inverters on the chip had failed, preventing the disk sector task from running. Since we didn't have a 7414 lying around the house, we used a "dead bug" hack (below) to replace the bad inverter on the chip with an unused one, allowing us to access the disk. This session, we replaced the bad 7414 with a new one since we didn't want our hack to be permanent.

We re-wired a 7414 inverter chip. An unused inverter replaced the failed inverter.

Last week, I discovered that our boot disk had been overwritten with random data decades ago to test the drive (details). This made it impossible to boot off our disk, blocking our progress. Tim Curley from Xerox PARC offered me some disks from PARC's collection of dozens of old Alto disks (below). Some people were concerned, though, that the disks could get damaged in a boot attempt, losing their historical data. To avoid damage, we decided not to boot these disks until we're sure the Alto is working properly and we have them archived. Instead, Josh Dersch at the Living Computer Museum in Seattle is sending us a fresh boot disk with no historical significance. Unfortunately we didn't get the disk in time for today's session, but we'll try it out next session.

Some old Xerox Alto hard disks at PARC. I borrowed a couple of them and we'll try reading them later.

The disk emulator

Our test setup to exercise the Diablo disk drive (center) with the FPGA board (front). The oscilloscope shows the sector pulses (top, blue), clock (middle, green), and data (bottom, yellow). Four sectors are visible on the bottom trace. The Xerox Alto is behind the oscilloscope. On the right are the power supply and the laptop controlling the FPGA board.

Carl built a Diablo disk emulator / exerciser from a FPGA board. The idea is we can hook this up to the Diablo drive to read and archive disks. Then we can connect the Emulator to the Alto and simulate multiple disk packs without physically handling disks. Building a disk emulator is complex because the drive itself implements very little functionality. It provides the raw bit stream as it is read off the disk, and the emulator needs to process this into bytes. In the photo above, the bottom oscilloscope trace shows several sectors as they are read from disk.

If you're not familiar with a FPGA (field-programmable gate array), it is a chip that can be programmed to generate custom hardware. The FPGA chip contain numerous logic blocks along with a switch matrix that allows them to be interconnected as desired. You describe the hardware configuration (gates, latches, and so forth) using a hardware description language such as Verilog and the chip is programmed to implement the desired circuitry.

The FPGA board for the emulator (below) is a Digilent Nexys 2 with a Xilinx Spartan-3E FPGA chip in the center of the board. This chip contains over ten thousand logic cells, allowing it to implement complex circuits. The FPGA board is connected to a prototyping board (right) with chips that shift the voltage levels to TTL as required by the Diablo drive. Carl's FPGA code generates the numerous signals required by the Diablo drive; in the photo below you can see the thick black cable going to the drive.

A Digilent FPGA board configured to control a Diablo disk drive.

We hooked up the FPGA board to the Diablo drive and tested it out. It communicated with the drive just fine and could read from different tracks. Unfortunately, the read data was zeros, which was surprising since the Alto successfully read from the disk last week. After some investigation, Carl found the problem was in the FPGA code that stored the data in RAM, not his code. (See his blog for details.) You'd think writing to RAM would be the easy part, but apparently not. The disk logic appears to work fine so hopefully next session we will be able to read and archive disks.

The Ethernet emulator

The Xerox Alto was the first system with Ethernet, introducing a lot of networking innovations. Unfortunately, it uses 3 Mb/second Ethernet over coaxial cable, which is incompatible with anything modern. I built an Ethernet emulator using a BeagleBone Black, allowing me to send Ethernet boot packets to the Alto. The photo below shows the BeagleBone, along with a chip (74AHCT125) to convert the BeagleBone's 3.3V signals to 5V TTL signals. (The Ethernet signals to and from the Alto are 5V TTL. These signals normally go to a transceiver, which converts these signals to signals over the network cable.) I'm using the BeagleBone's PRU microcontrollers to implement this code; I wrote a blog post with more about the PRUs.

A BeagleBone Black configured to emulate the 3Mb/s Ethernet on the Xerox Alto.

The emulator operates by converting a data block into the low-level signal required by Ethernet. A 0 bit is high-then-low and a 1 bit is low-then-high, with 170 nanosecond pulses. (Note that each data bit includes a transition (high-to-low or vice versa), which allows the receiver to detect bits and extract a clock signal.) My emulator almost worked; by using the logic analyzer, I saw the Ethernet microcode was running and the Alto was receiving data from my board. Unfortunately, there was about one bit error per word, making it unusable. The problem is probably interference due to the sketchy wiring I used; I'll try shielded wire next session.

Conclusion

This week we tried a Diablo disk emulator and an Ethernet emulator. They both partially worked, but still have some bugs. Next week we'll try booting the system with a new disk. I'm moderately optimistic that the system will come up successfully, but there could be more hardware problems waiting for us. For updates on the restoration, follow kenshirriff on Twitter.

Thanks to Josh Dersch and the Living Computer Museum for their debugging help and sending out a boot disk. Thanks to Tim Curley and Xerox PARC for supplying additional disks.

The discussion of this post on Hacker News is here.

My Sonicare electric toothbrush recently quit working, so I took it apart and examined the interesting circuitry inside. There's much more complexity than I expected inside a toothbrush, especially in the mechanism that drives the brush head at 31,000 strokes per minute. Internally, the brush appears to be designed for quality rather than ease of manufacturing. Unfortunately, moisture can get in, causing reliability problems.

The toothbrush is a Sonicare Flexcare Platinum with more features than you'd expect in a toothbrush: three brushing modes, three intensities and a couple timers, along with 10 LEDs to indicate its status. A pressure sensor in the toothbrush changes the vibration if you apply too much pressure while brushing. The toothbrush uses wireless inductive charging so it charges when set on the base. (This toothbrush may seem overly complicated, but it's nothing compared to the new model that includes Bluetooth.)[1]

Disassembling the Sonicare toothbrush. At the left is the induction coil used for charging.

The first step was to remove the toothbrush base, allowing the toothbrush mechanism to be removed from the case. The toothbrush head mounts on the right; it needed to be removed to disassemble the toothbrush. At the left is the charging coil used to wirelessly charge the toothbrush.

The photos below show the top and bottom of the toothbrush internals. I expected to find a simple, low-cost mechanism, so I was surprised at how much complexity there was inside. The vibration mechanism (right) is built from multiple metal and plastic parts screwed together, requiring more expensive assembly than I expected. The circuit board is literally gold-plated and has a lot of components, even if it doesn't quite reach Apple's level of complexity. Overall, the toothbrush's internal design is high quality (except, of course, for the fact that it quit working, as did an earlier one).

Inside the Sonicare toothbrush, top and bottom composite view. The charging coil is at the left. The battery (red) is in the lower left. The coil that vibrates the brush is in the center and the brushing mechanism is at the right.

The brush contains several key components, as can be seen above. In the center is the large red coil that causes the toothbrush to vibrate. On the right is the vibration mechanism, which has a powerful magnet that is moved by the coil. The brush head snaps on at the right. The battery (red, left) takes up about a third of the toothbrush. The long, thin circuit board (green) has the circuitry to operate the toothbrush. A white spacer sits on top of the circuit board, with holes for the LEDs and buttons.

The photo below shows the brush mechanism partially disassembled and separated from the electronics. The toothbrush still powers on in this state, as you can see from the illuminated LEDs. Note the flexible brown ribbon cable between the center of the brush mechanism and the electronics board. This connects the pressure sensor on the brush mechanism to the electronics board.

The brush mechanism (left) separated from the electronics (right). Note the illuminated LEDs. Alto note the flexible brown ribbon connecting the pressure sensor to the electronics board.

The diagram below shows the main components on the circuit board. The buttons are the most visible feature. The gold circles at the left are used to program the microcontroller. The MOSFET transistor switch the coil on and off to produce vibrations. Ten LEDs are scattered across the board. At the right, the diode bridge is part of the charging circuit.

The circuit board for the Sonicare toothbrush is crammed with tiny parts. The gold circles on the left are used to program the microcontroller chip. The tiny gold circles scattered across the board are test points for testing the board during manufacturing.

The circuit board is covered with tiny gold circles. These are test points, allowing test connections to most parts of the board. For instance, each LED and each button has a test point that can be used to test the component. During testing, spring loaded pogo pins on the test circuit make contact with these test points on the toothbrush board. The number of test points (about 56) looks like overkill to me.

The diagram below shows the components on the back of the circuit board. The toothbrush is controlled by a mid-range 8-bit microcontroller, the PIC16F1516.[2] This chip contains the code for all the toothbrush functions: reading the buttons, lighting the LEDs, controlling the coil, and managing charging. There are too many LEDs (10) for the chip to control individually, so eight of the LEDs are controlled by a separate LED driver chip.[3]

The back of the Sonicare circuit board contains the PIC16F1516 microcontroller chip. The sensor is probably a Hall-effect magnetic field sensor.

The microcontroller is an off-the-shelf part, not a custom chip, so it needs to be programmed with the right software. This is done during manufacturing through the large gold circles and triangle near the end of the toothbrush.[4] The resonator provides the clock signal for the microcontroller's timing.[5]

The driver mechanism and the H bridge circuit

The toothbrush head is driven by an electromagnetic coil that moves a magnet. The coil has two halves, wired in opposite directions, so the sides will have opposite magnetic fields. The coil is pulsed one way to rotate the magnet one direction, and then pulsed the opposite way to rotate the magnet the other direction. The result is the high-speed brushing vibration.

The diagram below shows the driver mechanism disassembled. The coil constantly switches polarity so the north pole will switch from the top to bottom (the yellow and blue poles of the coil). The magnet has poles on the front and back edges (perpendicular to the coils), so it will attempt to rotate back and forth to line up with the coil, along the long axis of the toothbrush. The mechanism limits the rotation to a few degrees, resulting in a rotational vibration back and forth rather than spinning like a motor. This rotational vibration is transmitted to the toothbrush head by the torsion bar causing the head and bristles to vibrate. More details on the driver mechanism are here.

Sonicare toothbrush driver mechanism. As the polarity of the coil switches, the magnet rotates back and forth slightly. The torsion bar transmits the rotation to the shaft, which causes the toothbrush head to vibrate around its axis.

The figure below shows the voltage across the coil. Every 2 milliseconds, there is a 4 volt pulse across the coil, followed by a negative 4 volt pulse. The pulses generate the reversing magnetic field that drives the magnet and causes the toothbrush to vibrate. If you count the positive and negative pulses as separate brush strokes, you get the advertised 31,000 brush strokes per minute. (Although counting an up-down cycle as a single stroke rather than two would make more sense to me.)

Voltage across the actuator coil in a Sonicare toothbrush. An H bridge drives the coil with +/- 4 volt pulse every 2 milliseconds.

You might think that driving a coil in two directions would use two switches, but instead it uses four, in a common circuit called an H bridge, as shown below. If switches 1 and 4 are closed, current flows in the forward direction. If switches 2 and 3 are closed, current flows in the reverse direction. In the toothbrush, transistors are used for the switches, and are turned on and off by the microcontroller.[6] An H bridge is often used to control motors that need to go forwards and reverse, for example in a hoverboard.

An H bridge circuit is used to drive the vibration coil. This allows the coil to be off or energized in either direction. Four switches (MOSFET transistors) are used in the H bridge.

Pressure sensor

One of the features of this toothbrush is a pressure sensor. If you press too hard while brushing, the vibrations start pulsing and the LEDs flash. The sensor itself is a tiny mystery chip (below) mounted on the drive assembly, and connected to the electronics board with a thin flexible cable. The cable is labeled with Vdd (1), Data (2), Clock (3), and Ground (4), so the sensor is probably sending a stream of bits using an I2C protocol. My suspicion is the sensor is a Hall effect magnetic field sensor that detects a change in the magnetic field if pressure is preventing the magnet from vibrating. The chip doesn't seem to be in a position to measure actual pressure, which is why I suspect it's measuring the magnetic field instead.

The pressure sensor on the toothbrush is connected to the electronics via a flexible cable. The sensor is probably a Hall effect magnetic sensor using the I2C protocol.

Charging

To charge the toothbrush, it is set on a stand and charges inductively without physically being plugged in. A coil in the stand is magnetically coupled to a coil in the toothbrush, transmitting the power wirelessly. You can see the coil at the bottom of the toothbrush. When set on the stand, the coil picks up about 12 volts, which is used to charge the battery. The power is transmitted at high frequency (80kHz) for efficiency.

The coil is connected to a diode bridge that converts the power to DC. It then goes through a transistor circuit that regulates the charging, as directed by the microcontroller. The battery in the toothbrush is a Sanyo Li-ion rechargeable battery, which is said to be 3.7V but I measured 4.0V.[7]

Voltage across the charging coil in a Sonicare toothbrush oscillates about about 80kHz.

The toothbrush is designed to conserve battery by using very little power when not in use. The microcontroller has a low power standby mode when it is waiting for a button press. When the toothbrush is activated, a transistor energizes the LEDs and the LED driver chip, while another circuit powers up the pressure sensor. This prevents these components from draining the battery while the toothbrush is not in use.

Conclusion

Overall, I was surprised by how much electronics was inside the toothbrush, as well as the complexity of the drive mechanism. It was designed with quality in mind, not low-cost production. Unfortunately, the brush has reliability issues—this was the second one to fail on me. The problem appears to be water seeping in around the shaft, eventually damaging the internals.

Some other Sonicare teardowns are here, here and here. I would have expected different models to be based on similar electronics that just changed the LEDs, buttons and software. Surprisingly the different teardowns show a variety of microcontrollers, circuitry, and drive coils. Some models even move the magnets from the toothbrush unit to the brush head.

Unfortunately after disassembling my toothbrush I was unable to fix its problem. But at least I got an interesting teardown out of it!

To find out about my latest teardowns, follow kenshirriff on Twitter.

Notes and references

[1] It's ironic for a toothbrush to include Bluetooth technology because Bluetooth is named after Harald Bluetooth, a tenth century Danish king who was called Bluetooth because he had a bad, discolored tooth. The Bluetooth logo itself is formed by combining two runes from the king's name.

[2] The PIC microcontroller runs at 16 megahertz. It has 8K of flash memory for the program, as well as 512 bytes of RAM (the RAM on microcontrollers is usually very small) and 128 bytes of flash memory for data. It includes analog-to-digital conversion, which I think is used to monitor the charging voltage. The toothbrush's 8-bit microcontroller is less powerful than the 16-bit microcontroller inside a Macbook power supply.

[3] The LEDs are controlled by a 75HC595A serial to 8-bit output chip. The benefit of this chip is that the microcontroller would use 8 pins to control 8 LEDs, while the microcontroller only uses 3 pins to communicate with the serial chip, freeing up 5 pins for other tasks.

[4] Programming of the chip is done using the ISCP protocol. This uses the programming contacts labeled Vdd, Vpp, Tx, and Ground, as well as the triangle contact, which provides the ISCP data. For some reason, the Tx and Rx circles are also connected to the chips's UART serial pins, allowing serial communication with the microcontroller. I'm not sure why one would want to communicate with the chip outside programming. Maybe there's serial communication with the microcontroller as part of testing. Or maybe the NSA can download information on your brushing habits :-)

[5] The resonator is a 3-pin unit with built-in load capacitors, similar to a quartz crystal oscillator. I suspect it's a CERALOCK®, or something similar.

[6] The H bridge uses a 6866S 20V dual N-channel MOSFET on the low side and a 6963SD 20V dual P-channel MOSFET on the high side.

[7] The charger circuit is puzzlingly simple. The voltage from the diode bridge goes through a microcontroller-controlled transistor (Q5) and then to the battery (through a tiny fuse), without the filtering, voltage regulator or battery voltage monitoring I'd expect. The microcontroller is connected to the AC side of the diode bridge, and presumably is monitoring the input voltage waveform.

We've been restoring a Xerox Alto from the 1970s for several months, and we finally got it to boot and run some programs! There's still some hardware debugging ahead of us, since the Alto drops into the debugger for many programs, but we're quite happy to see the system running. In this post, I describe our latest debugging session and show some programs running on the Alto.

The Xerox Alto, successfully booted and listing the files on the disk. The diagonal strips are an artifact of photographing the CRT and do not appear on the display.

For background, the Alto was a revolutionary computer designed at Xerox PARC in 1973 to investigate personal computing. It introduced the GUI, Ethernet and laser printers to the world, among other things. Y Combinator received an Alto from computer visionary Alan Kay and I'm helping restore the system, along with Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen. For posts on previous restoration days see parts 1, 2, 3, 4, 5, 6, 6 update and 7.

The new boot disk

In an earlier session, we discovered that our boot disk had been used for drive testing decades earlier and was filled with random garbage, making it impossible to boot from the disk. Fortunately, the Living Computer Museum in Seattle sent us a new boot disk, loaded with diagnostic software. I received a vintage Digital RK05K-11 disk cartridge box:

Box for a vintage Digital RK05K-11 disk cartridge

Inside the box was the 14" disk. Despite its size, the disk cartridge only hold 2.5 megabytes, a tangible indication of the exponential improvements in disk density since the 1970s. We loaded the disk into the Alto's Diablo drive, waited a minute for the disk to spin up to speed and the heads to load, and Ed eagerly pressed the reset button. Would we be lucky and successfully boot the Alto? After all the anticipation, nothing happened.

An Alto diagnostic boot disk, sent to us by the Living Computer Museum in Seattle.

Why won't the system boot?

Since we had successfully loaded a disk sector (of random data) earlier, we knew that the system was working end-to-end, from the drive through the disk interface card and into the processor boards and memory. One possibility was that the alignment was different between our drive and the Living Computer Museum's drive, corrupting the data. Needing to hand-align our drive would be very difficult, so we hoped that wasn't the problem.

To see the words as they came off the disk, we added more logic analyzer probes to the Alto's backplane to trace the processor bus. At this point, the backplane is liberally decorated with probes, allowing us to monitor the buses and microcode execution in detail.

We added more probes to the Alto's backplane to monitor the processor bus. The probes are connected to a vintage Agilent logic analyzer.

Using the logic analyzer, we could step through the microcode to see each disk word getting loaded into memory, but the data didn't match the boot sector we expected. The Alto stores each sector on disk as a 2-word header (holding the disk address), an 8-word label (holding a next block pointer), and the 256-word data block. Although the data seemed wrong, more interesting was the octal value 000100 in the header coming from disk. (The Alto uses octal, causing us no end of confusion.) This header value corresponds to a disk address of cylinder 8, not the boot sector 0. Could we be reading the wrong sector?

By removing the cover from the Diablo drive, you can watch it seek. Unlike modern hard drives, the Alto's disk isn't sealed so you can see the disk surface and head when the disk is loaded in the drive.

Looking inside the Diablo disk drive, you can see the head moving over the disk's surface as disk seeks take place. The green dial on the right rotates to indicate the current track. These seeks are from an earlier test, not from boot.

Watching the drive as the Alto attempted to boot, we saw the disk arm seek, which it shouldn't have done to read from boot sector 0. The seek dial rotated to cylinder 8—as the logic analyzer suggested, the Alto was trying to boot from the wrong disk cylinder, which clearly wouldn't work.

Inside the Diablo disk drive, the turquoise sector indicator shows the drive has seeked to sector 8.

Since the drive seeked correctly last week, why was it trying to read from the wrong cylinder today? Were we suffering another chip failure on the disk interface card? Had something malfunctioned in the drive? We pored over the disk interface schematics and suspected a problem with the nine cylinder select lines between the Alto and the drive. In particular, a malfunction in the CYL(5) line could set the cylinder to 8, causing the seek we saw. (Bits on the Alto are inconveniently numbered backwards, so cylinder bit 5 corresponds to the value 8.)

We noticed a scratch in the 40-conductor ribbon cable between the Alto and the disk drive, exposing a wire. Could this be the cause of our problems? We carefully checked continuity and found no problems with the cable despite the scratch, so we hooked the cable back up along with an oscilloscope to monitor the offending signal, so we could debug the problem.

Running the Alto

We tried booting the Alto again, watching for the seek problem. This time the disk unexpectedly performed multiple seeks. And then the boot screen appeared on the Alto. We had a running system!

The Xerox Alto screen after booting, waiting for a command.

A few months ago, I had used the Salto simulator to see how the Alto worked. But now, facing a working system, I couldn't remember the commands. To see the files, was it LIST, or DIR? No. How about HELP? No good. After a minute or two, I remembered that a simple question mark was the command to list the disk, and I got a list of files. The system was working well enough to read a directory.

I tried running the WYSIWYG text editor Bravo and the mouse-based drawing program Draw, but they crashed, dropping the system into the debugger, Swat. Clearly some hardware problems remain and our debugging adventure is not over yet.

The Alto's debugger is called Swat, and runs if there is an error.

Some programs ran successfully. The CRT test program drew grids on the bitmapped screen. The CRT is a bit fuzzy in the upper left, but the quality is surprisingly good considering that this tube was almost too dim to see a few months ago. Apparently running the tube a while restored it by burning contaminants off the cathode (or something mysterious tube-era phenomenon like that).

The Xerox Alto running a CRT test program. Antique mechanical calculators are in the background.

The Ethernet diagnostic program ran and showed off the mouse-based GUI. I'm developing a BeagleBone-based Ethernet simulator for the Alto, so this program will be very helpful. We don't have a gridded optical mouse pad, so the mouse didn't work and we couldn't click anything.

The Alto's Ethernet Diagnostic Program uses a mouse-based GUI.

The keyboard test program graphically displays the keyboard and shows each key as it is pressed. We used this to verify the keys all work.

The Alto running the keyboard test program. Antique calculators are in the background.

A closeup of the Alto's keyboard test programming. It highlights keys when they are pressed.

Conclusion

It was an exciting day, with the Alto finally booting successfully. A disk seek problem blocked us for a while, but then the problem mysteriously disappeared. We ran a bunch of test programs from the disk. About half of them ran successfully, and half crashed into the debugger. There may be a malfunction in the processor that we need to track down. Or perhaps we're getting memory errors; the parity errors we saw earlier could have returned. In any case, we have some more debugging ahead of us, but it's exciting to see the system finally running. Hopefully we will soon be playing Alto Trek and Maze War.

For updates on the restoration, follow me on Twitter at kenshirriff.

Thanks to Josh Dersch and the Living Computer Museum for their debugging help and sending out the boot disk.

Edit: I just learned macOS Sierra and iOS 10 include the Bitcoin symbol I proposed (₿)!

Star characters (☆★) have long been part of the Unicode standard, which means they can appear as characters in web pages, text, and email. But half-stars were missing, so they required special images or custom fonts. I recently co-wrote a proposal to add half-star characters to Unicode, and it was just accepted. In an upcoming Unicode release, half-stars will be usable like any other text character. In this article, I discuss how I got the half-star characters and two others added to Unicode.

Usage of the four different half-stars to express 3.5 of 5.

Unicode is the computer standard that defines the characters that are used by almost every computer—this standard allows different computers to easily display text in almost every language, and with almost every symbol you might need. (Before Unicode, dealing with non-English text on computers was a mess.) But Unicode doesn't include everything. Last June, a comment on Hacker News complained that Unicode lacked the half-star character used in ratings and movie reviews:

Until Unicode has a half-star character, it won't even be able to encode the average newspaper.

I suggested that someone should propose the half-star to Unicode, but quickly realized that "someone" would be me. Since I had successfully proposed two symbols to Unicode earlier, I knew the process necessary to get the half-star added.

A few years ago, a detailed article described how a couple people got power symbols added to Unicode. Adding a new character to Unicode is easier than most people think. You don't need to pay money, be part of a major company or join a committee. All you need to do is write a proposal explaining why the character is needed. If the Unicode Committee agrees, they'll approve your character for addition to Unicode.

In 2015, I started programming the 1960s-era IBM 1401 mainframe at the Computer History Museum. But when I wrote about the IBM 1401 system, I ran into a problem. This computer uses a 6-bit character set (the precursor to EBCDIC) with some strange characters. All these characters appeared in Unicode, with the exception of one: the Group Mark. I was a bit shocked that Unicode, with its 128,172 characters, lacked a character I needed. Having read about the power symbol team's success in adding characters, I figured it would be interesting to see if I could get the group mark character added to Unicode. I wrote a proposal, submitted it to Unicode, and at the next meeting it was approved.

The group mark character, from an IBM 705 computer manual (1959). Since Unicode lacked this character, you couldn't write this text on a modern computer.

A few months later, I learned that the Bitcoin symbol was missing from Unicode. This was a surprising omission, since the Bitcoin symbol is widely used in the real world. The symbol had been rejected before, so I made a more thorough proposal in October 2015 with the enthusiastic support of /r/bitcoin and other Bitcoin groups. The Bitcoin symbol proposal was accepted by the Unicode Committee in November 2015.

The Bitcoin symbol on an IBM punched card. Mining Bitcoins on a punched card mainframe isn't practical, but was an interesting experiment.

So when I saw the comment about half-stars on Hacker News, I figured it would be straightforward to get it accepted to Unicode. I wrote a proposal after discussion on HN and on the Unicode mailing list. The Unicode committee considered the proposal in August 2016, but to my surprise they had also received another half star proposal, so they decided to wait on a single proposal. It turned out that Andrew West had also written a proposal for half-stars, and we had both submitted proposals, unaware of the other. So Andrew and I joined forces and made a combined proposal, which was accepted by the committee Sept 30, 2016.

Why did we propose four different half-stars? We included both the outline half-star and solid half-star because both forms are commonly used. (I wasn't sure if the committee would consider these characters distinct enough to include both, but they did.) Right-to-left languages such as Hebrew do their star ratings right-to-left too (which was a bit of a surprise to me), so we included mirrored versions for RTL languages. Thus, the four different half-stars cover the range of uses.

Half-stars in Hebrew are written right-to-left. From Haaretz 2 November 2012, provided by Simon Montagu.

If there's a character that you want to add to Unicode and it meets the requirements, you should submit a proposal, since its a very interesting process and not too difficult. Make sure your character meets the criteria. In particular, you'll need to find a bunch of examples of the character used in text. The Unicode Committee isn't going to add a character just because you think it's cool, so you need examples to prove the character is in use. Creating a font to demonstrate your new character is probably the most challenging part; I used FontForge. The power symbol team has lots of helpful advice on making a successful proposal. I'm also happy to offer advice if you're writing a proposal.

I should mention that emojis have a totally different process, so don't argue that "since the poop emoji exists, my character should too". (The poop emoji 💩 was added for backwards compatibility with Japanese mobile phones.) For emoji, expected popularity of the symbol is a major factor in acceptance. Regular Unicode, on the other hand, isn't concerned with popularity—historical scripts such as Tangut won't get a millionth the usage of a new emoji—but with existing usage in text. (Reading between the lines, I think a lot of the Unicode committee wishes they weren't in the emoji business at all.)

Once a character is accepted, there's still a long road for it to appear in fonts and be usable. A new version of Unicode is released typically every June, so the half stars will probably appear in Unicode 11.0 mid-2018. The Bitcoin community in particular has had to wait patiently since the Bitcoin symbol just missed the cutoff for Unicode 9.0, so it will probably appear in Unicode 10.0, mid-2017. So if you're patient, eventually you'll be able to use the group mark, Bitcoin symbol and half stars in web pages and text just like any other symbol.

Last week, after months of restoration, we finally got the vintage Xerox Alto computer to boot (details) and run programs. However, some programs (such as the mouse-based Draw program) crashed so we knew there must still be a hardware problem somewhere in the system. In today's session we traced through the software, microcode and hardware to track down the cause of these crashes.

For background, the Alto was a revolutionary computer designed at Xerox PARC in 1973 to investigate personal computing. It introduced the GUI, Ethernet and laser printers to the world, among other things. Y Combinator received an Alto from computer visionary Alan Kay and I'm helping restore the system, along with Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen. The full collection of Alto restoration posts is here.

When the Xerox Alto encounters a problem, it drops into the Swat debugger.

To assist with debugging, the Alto includes a debugger called Swat. If a program malfunctions, it drops into the Swat debugger, as seen above. The debugger lets you examine memory and set breakpoints. It is more advanced than I'd expect for 1973, including a feature to disassemble machine instructions from memory and view them with names from the symbol table.

In our case, the debugger showed that when we ran the MADTEST test program, the Alto had jumped to address 2, which triggered the debugger. The first 8 memory locations in the Xerox Alto contain TRAP instructions to catch erroneous jumps to a zero address (or near-zero address) which can happen if a subroutine return address is clobbered. By examining the stack frames, I determined which subroutine had been called when the system crashed. The problem occurred when the program was jumping to microcode that had been loaded into microcode RAM, Since this is an unusual operation, it would explain why most programs ran successfully and only a few crashed.

Microcode

Microcode is a low-level feature of most processors, but I should explain what it means for a program to jump to microcode, since this is a strange feature of the Alto. Computers execute machine code, the simple, low-level instructions that the CPU can understand; on modern computers this may be the x86 instruction set, while the Alto used the Data General Nova instruction set. Most processors, however, don't run machine instructions directly, but have a microcode layer that is invisible to the programmer. While the processor appears to be running machine instructions, it internally executes microcode, a simpler, low-level instruction set that is a better match for the hardware. Each machine instruction may turn into many micro-instructions.

The Xerox Alto uses microcode much more extensively than most computers, with microcode performing tasks such as device control that most computers do in hardware, resulting in a cheaper and more flexible system. (As Alan Kay wrote, "Hardware is just software crystallized early.") On the Alto, programmers have access to the microcode—a user program can load new microcode into special control RAM. This microcode can implement new machine instructions, optimize particular operations (analogous to GPU programming), or obtain low-level control over the system.

The Xerox Alto's CRAM board (Control RAM) stores 1024 microcode instructions. The 32 memory chips in the lower left provide the 1024x32 storage. Foreshadowing: the connector at the lower left connects the CRAM board to the microcode control board.

Our Alto has 1024 words of microcode in ROM (for the standard microcode) and 1024 words in RAM (for software-controlled microcode). The photo above shows the CRAM (control RAM) board that holds the user-modifiable microcode. This board illustrates the incredible improvements in memory density since 1973—this board required 32 memory chips to hold the 1024 32-bit words (4 Kbytes) of microcode.

The Alto's microcode uses a 1K (10-bit) address space. Since Altos can support up to 2K of microcode in ROM and 3K in RAM, bank switching is used to switch between different 1K memory banks. Bank switching is triggered by a special micro-instruction called SWMODE (switch mode).

Getting back to our crash, the MADTEST test program loads special test microcode into the control RAM. Then it executes the JMPRAM machine instruction to switch execution from machine instructions to the microcode in RAM. The microcode that implements JMPRAM performs a SWMODE to switch execution to the RAM microcode bank and the microcode in RAM will execute. When the microcode is done, it is supposed to return execution to the machine code emulator, and execution of the user-level program (MADTEST) will continue. But somehow execution ended up at address 2, causing the program to crash.

To track down a problem with the Xerox Alto's bank switching circuit, we attached many probes to the CPU control board.

We used a logic analyzer to record every micro-instruction and memory access, so we could determine exactly what went wrong. After a few tries, we captured a trace showing what the Alto was executing until it crashed. Over the past week, I've been using the Living Computer Museum's ContrAlto simulator of the Xerox Alto to understand how the Alto's software and microcode work. With this background, I could interpret the logic analyzer output and map it to the MADTEST code and the microcode. Everything proceeded fine until the JMPRAM instruction was executed. Instead of switching to the microcode in RAM, it was still running microcode from the ROM. Since the micro-address was intended for the RAM code, the processor was running essentially random microcode. Through pure luck, this microcode routine completed and returned control to the regular machine code emulator rather than hanging the system. Unfortunately this code didn't load the return address register, resulting in a jump to address 2 and the Swat crash we saw.

To summarize, everything was working fine except instead of switching to the microcode RAM bank, execution stayed in the microcode ROM bank. This was pretty clearly a hardware problem, so we started looking at the bank switch circuit, which consists of multiple integrated circuits.

The bank switch hardware

The Alto was built at the dawn of the microprocessor age, so instead of using a microprocessor chip, it used three boards of TTL chips for the CPU. The control board interprets the microcode, including performing bank switching, so that's where we started our investigation.

Bank switching in the Alto happens when the SWMODE micro-instruction is executed. The destination bank is selected following complex rules that depend on the hardware configuration and the current bank. Rather than implement these rules with a complex hardware circuit, the Alto designers used the short cut of encoding all the logic into a 256x4 PROM chip. (This also has the advantage that a different hardware configuration can be supported simply by replacing the PROM.) The schematic below shows the PROM (left) generating the bank select signals (yellow), which pass through various chips to create the current bank select signals (right), which are fed back into the PROM for the next cycle.

This schematic shows the Xerox Alto's bank switching circuit, allowing microcode to run from ROM or RAM banks. (Click for larger image.)

We connected logic analyzer probes so we could trace each chip in the bank select circuit. The PROM correctly generated the RAM bank signals when the SWMODE micro-instruction executed, but in the next step its inputs had reverted to the ROM bank for some reason. This showed the PROM worked, so we continued probing through the circuit. Each chip had the proper output until we got to the multiplexer chip that feeds back to the PROM. (This chip, on the right, handles microcode task switching by selecting either the current task's bank, or the new tasks's bank, which is recorded in a RAM chip.) The input signal to the multiplexer pulsed high for the new bank, but the output stayed low, blocking the bank switch signal. The oscilloscope trace below shows the problem: the input signal (bottom trace) is not passed to the output (middle trace).

A multiplexer IC in the Xerox Alto was failing to pass the bank switch signal from its input (bottom trace) to its output (middle trace).

We found a bad chip on the disk interface board a few weeks ago, so had we located a second bad chip? We pulled out the suspicious chip (a 74S157 multiplexer) and tested it in a breadboard to prove that it was faulty. Surprisingly, it worked just fine. Perhaps the problem only showed up at high frequency? We swapped it with an identical chip on the board and the crash still happened. Clearly there was nothing wrong with the chip. But its output stayed low when it should go high. Why was this?

We thought this 74S157 multiplexer IC from the Xerox Alto was faulty. However, the chip worked fine when tested in a breadboard.

Our next theory was that something was grounding the chip's output signal, forcing the output to remain low. To test this, we disconnected the chip's output pin from the rest of the circuit by bending the pin so it didn't go into the socket, With the output not connected to the circuit, the output went high as expected. (See oscilloscope trace below.) This proved that the chip worked and something else was pulling the signal low. Since the chip's output was connected to the PROM chip, the obvious suspect was the PROM, which might have an input shorted low. We hoped the PROM chip wasn't at fault, since locating a 1970s-era D3601 PROM chip and reprogramming it would be inconvenient. We pulled the PROM chip out of the board and the short to ground remained, demonstrating the PROM chip was not the culprit.

With the multiplexer's output disconnected from the circuit, the input signal (bottom) appears on the output (top) as expected.

We removed the control board from the Alto to examine it for short circuits. On the back of the circuit board, we noticed that two white wires were connected to the multiplexer chip that was causing us problems. (Wires are added to printed circuit boards to fix manufacturing problems, support new features, or support new hardware.) These wires went to the connector that was cabled to the CRAM (control RAM) board shown earlier. With the CRAM board disconnected, the short to ground went away. Thus, the cause of our crashes was these two wires that someone had added to the board! Could we simply cut these wires and have the system work correctly? We figured we should understand why the wires were there, rather than randomly ripping them out. Maybe our control board and CRAM board were incompatible? Maybe these wires were to support the Trident disk drive we aren't using? It was the end of the day by this point, so further investigation will wait until next time.

This is the Xerox Alto control board, one of three boards that make up the CPU. The board has been modified with several white wires which trigger our crashes.

Conclusion

After a bunch of software, microcode and hardware debugging we found that the crashes are due to some wires added to one of the circuit boards. These wires messed up microcode bank switching, causing programs that used custom microcode to crash. Fixing this should be straightforward, but we want to understand the motivation behind these wires. On the whole, the processor is working reliably other than this one issue. Once it is fixed, we can run MADTEST (the microcode test program) to stress-test the processor. If there are no more processor issues, we'll move on to getting the mouse working.

For updates on the restoration, follow me on Twitter at kenshirriff. Thanks to the Living Computer Museum for the extender board and the ContrAlto simulator.

Last week our vintage Alto was crashing; we traced the problem to an incompatibility between two of the processor boards. Today we replaced these boards and now we can run all the programs on the disk without crashes. Unfortunately our mouse still doesn't work, which limits what we can do on this GUI-based computer. We discovered that the mouse has the wrong connector; even though it plugs in fine, it doesn't make any electrical connection.

If you're just joining, the Alto was a revolutionary computer designed at Xerox PARC in 1973 to investigate personal computing. It introduced the GUI, Ethernet and laser printers to the world, among other things. Y Combinator received an Alto from computer visionary Alan Kay and I'm helping restore the system, along with Marc Verdiell, Carl Claunch Luca Severini, Ed Thelen, and Ron Crane,

Incompatibility in the Alto's circuit boards

The Xerox Alto was designed before microprocessor chips, so its processor is built from three boards full of TTL chips: the ALU (Arithmetic-Logic Unit) board, the Control board, and the CRAM (Control RAM) board. The Control board and the CRAM board are the ones that we dealt with today. Last week we traced through software execution, microcode, and hardware circuitry to figure out why some programs on the Alto crashed. We discovered that the problem happened when a program attempted to execute microcode stored in the Control RAM. The microcode RAM select circuit malfunctioned due to some wiring added to the back of the Control board (the white wires in the photo below). Why had this wiring been added to the board? And why did it break things? At the end of last episode, we briefly considered ripping out this wiring, but figured we should understand what was going on first.

This is the Xerox Alto control board, one of three boards that make up the CPU. The board has been modified with several white wires which trigger our crashes.

A bit of explanation will help understand what's going on. Like most computers, the Xerox Alto's instruction set is implemented in microcode. The Alto is more flexible than most computers, though, and allows user programs to actually change the instruction set by modifying the microcode, actually changing the instruction set. To achieve this, the Alto stores microcode in a combination of RAM and ROM.[1] The default microcode (used for booting and for standard programs) is stored in 1K of ROM on the Control board. Some programs use custom microcode, which allows them to modify the computer's instruction set for better performance. This microcode is stored in high-speed RAM on the Control RAM (CRAM) board. Our Alto came with a 1K CRAM board, but some programs (such as Smalltalk) required the larger 3K CRAM board.[2] (This microcode RAM is entirely different from the 512 Kbyte RAM used by Alto programs; you didn't need to fit an Alto program into 3K.)

Inconveniently, the 1K and 3K RAM boards have incompatible connections, and the Control board needs to be wired to work with one or the other. We determined that the white-wire modifications on our Control board converted it from working with a 1K RAM board to working with a 3K RAM board.[3] Unfortunately, our Alto had a 1K RAM board so the two boards were incompatible and programs that attempted to use the microcode RAM crashed, as we saw last week. It's a mystery why our Alto had two incompatible board types, but at least we knew why the modifications are there. (Since our Alto also came with the wrong disk interface card and an unbootable hard disk, we wonder what happened to the Alto since it was last used. It clearly wasn't left in a usable state.)

Fortunately Al Kossow of bitsavers came to our rescue and supplied us with some 3K Control RAM boards and the Control boards to go with them. This saved us from needing to rewire the board we had. Al also provided the strange but necessary connector (visible below on the left) that goes between the Control board and the CRAM board. We swapped these boards with our boards and everything worked without crashing! We could now run all the programs on the disk without crashes.

The Alto's Control board is part of the CPU. This board contains 2K words of microcode ROM, as well as control circuitry. Our original board had 1K of ROM (and 8 empty sockets), while this board has the full 2K of ROM. The ROM chips are in the lower left, with labels. The chip labeled SW3K (upside down) is the PROM that selects the hardware configuration. The spare PROM (labeled SW1) is in the upper left. The edge connector on the right plugs into the backplane, while the two connectors on the left are cabled to the CRAM board.

Some Alto software

With the Alto running reliably, we could try out the various programs on the hard disk that had crashed before. Draw, the Alto's mouse-based drawing program, apparently uses microcode for optimizing performance, so last week it immediately dropped into the Swat debugger. With the compatible boards, Draw ran successfully. Unfortunately, since our mouse isn't working, we couldn't actually draw anything, but you can still see the icon-based GUI below. I've tried Draw on the Alto simulator, and despite the icons, it's not exactly intuitive to use.

'Draw' is the Alto's mouse-based drawing program. Clicking an icon on the left selects an operation.

We also tried Bravo, the first WYSIWYG (What you see is what you get) text editor. Again, functionality is limited without the mouse. But I could enter text and change the font to a larger, bold font with proportional spacing. Xerox PARC also invented the laser printer and Ethernet, so one could create documents in Bravo and then print them on a networked laser printer. Charles Simonyi, one of the co-authors of Bravo, later created Microsoft Word, so there's a direct connection between the Alto's editor and Word. I've written more about how to use Bravo here.

'Bravo' is the Alto's WYSIWYG text editor. It supports multiple fonts, among other features.

The Alto included a GUI file manager called Neptune, allowing files and directories to be manipulated with the mouse. Neptune has an invisible scroll bar to the left of the file list that appears when you mouse-over it; apparently the Alto also invented the scroll bar.

The Alto includes a graphical file system browser.

A rather complex GUI is in PrePress, a program that converted a spline-based font to a bitmapped font for display or printing. (You can think of this as a forerunner of TrueType fonts.) High-quality fonts were created for the Alto using the FRED font editor. As you would expect from a document company, these fonts included proportional spacing, ligatures such as "ffl" and "fl", and multiple styles and sizes.

'PrePress' is used to convert a spline-based font into a bitmap suitable for display or printing.

Most importantly, we were able to run MADTEST, the Microcoded Alto Diagnostic test, which runs a suite of low-level diagnostics using microcode. This test ran successfully, increasing our confidence that there are no obscure problems in the processor.

If you want to try these Alto programs for yourself, you can use the ContrAlto simulator, built by the Living Computer Museum. This simulator has been very useful to use for learning how the software is supposed to work.

The mouse

The biggest problem remaining at this point is the mouse doesn't work, so we investigated that next. Although the mouse was invented prior to the Alto, the Alto was the first computer to include the mouse as a standard input device. The Alto uses a three-button mouse that plugs into the back of the keyboard.

The three-button optical mouse.

Some Altos had a mechanical mouse, while others had an optical mouse. Our Alto came with an optical mouse, but we couldn't get it to work at all. The mouse uses a special mousepad with a specific dotted pattern (which we didn't have), so at first we suspected that was the problem. However, the mouse didn't respond at all when we used a printed copy of the pattern. We also didn't see any light (visible or IR) from the three illumination LEDs on the mouse, so we suspected bigger problems than a missing mousepad.

Underside of the mouse. The sensor (right) consists of three illumination LEDs surrounding the optical sensor.

Opening up the mouse shows the simple circuitry inside, with a single chip controlling it. The chip is rather unusual since it includes a 16-pixel optical sensor, with a light guide that goes from the bottom of the chip to the bottom of the mouse. The pixel-based optical mouse was invented at Xerox PARC in 1980 (and later patented), so this mouse is somewhat more modern than the original Alto (1973). The handwritten markings on the chip suggest this may have been a prototype of some sort.

Inside the optical mouse. In the middle are the three buttons. At top is the IC that contains the optical sensor.

When we closely looked at how the mouse was wired up, we discovered the problem. The mouse plugs into a 19-pin socket (DE-19), while the mouse used a 9-pin plug (usually called DB-9, but technically DE-9), so the connectors are entirely incompatible. The DE-19 has three rows of pins: 6/7/6 (with the middle row empty on the Alto), and the DB-9 has two rows of pins (7/6). The bizarre thing is that the mouse plugged into the socket just fine: the connector shells are physically the same size, and the mouse plug's pins went between the Alto socket's connections. So the mouse was plugged in, but not actually connected to anything! It's surprising the connectors could go together without bending any pins.

The Alto's mouse plugs into the 19-pin connector on the keyboard housing (above). Unfortunately our mouse has a 9-pin connector (below).

After some investigation, we learned that the Alto was missing the mouse when it came from Alan Kay. YCombinator picked up a replacement mouse on eBay, but it wasn't compatible with our Alto. We're still trying to figure out if the mouse is an Alto mouse with a nonstandard connector or if it is for a different machine. The Xerox Star used a 2-button mouse, so the mouse isn't from a Star. Tim Curley at Xerox PARC loaned us a compatible Alto mouse, so we'll give that one a try next episode. We're also looking into making an adapter cable, but DE-19 connectors appear to be obsolete and difficult to find.

Conclusion

Last week we discovered that the control board in our Alto was incompatible with the microcode RAM board. Al Kossow loaned us some compatible boards, and with those boards our Alto has been functioning without any crashes or malfunctions. We discovered that our mouse wasn't working because it had the wrong connector—although it plugged into the Alto, it didn't make any electrical connection. Since the mouse is necessary for many Alto programs, we hope to get the mouse working soon.

For updates on the restoration, follow me on Twitter at kenshirriff. Thanks to Al Kossow for helping us out again.

Notes and references

[1] The table below shows the three microcode configurations the Alto supported. Details are in section 8.4 of the Hardware Manual. The desired configuration is selected by inserting a particular PROM in the Control board: SW1, SW2, or SW3K. Each control board has one PROM in use and an unused PROM in the upper left corner; switching PROMs switches the configuration. The Control board has sockets for 2K of ROM; these sockets are left empty for systems with 1K of ROM.

Configuration Name	ROM	RAM
1K	1K	1K
2K	2K	1K
3K	1K	3K

[2] The Alto introduced the 3K RAM board to take advantage of the new 4 kilobit RAM chips, replacing the 1 kilobit chips on the 1K board. Both boards required 32 RAM chips for the 32-bit micro-instructions, showing that back then you needed a lot of chips for not much memory. The microcode required high-speed static memory, so density was worse with microcode than with regular RAM.

[3] The 3K RAM board requires a few additional signals from the Control board, such as the task id. The 1K RAM board grounds one of the lines used for these signals, so using the 3K Control board with the 1K RAM board (as our Alto did) shorts out one of the bank select lines. This causes bank switching to fail and explains the crashes we saw last week. Schematics for the boards are available at bitsavers.

The revolutionary Xerox Alto computer came out in 1973 and set the direction for personal computing. If you want to try out the Alto's software, the easiest approach is the ContrAlto simulator. This article describes how to set up the ContrAlto simulator, explains where to find Alto disk images, and shows some of the programs you can run on it.

If you're just joining this series, the Alto was designed at Xerox PARC in 1973 to investigate personal computing. It introduced the GUI, Ethernet, high-quality computer typography and laser printers to the world, among other things. The Alto was used to build Smalltalk, which was one of the first object-oriented programming languages and had a huge influence on modern programming languages. Y Combinator received an Alto from computer visionary Alan Kay, who led the Smalltalk research at PARC. I'm helping restore the system, along with Marc Verdiell, Carl Claunch Luca Severini, Ed Thelen, and Ron Crane. The ContrAlto simulator (developed by Josh Dersch at the Living Computer Museum in Seattle) has been very helpful in our restoration—we can debug by comparing our real system to the simulator. If you're in the Seattle area, you can see a couple running Altos at this museum.

Using ContrAlto

ContrAlto is written in C# and only works on Windows. To install ContrAlto, download the zip file. Open the zip file and click on ContraltoSetup.msi to run the standard installer.

Next, you'll need a Xerox Alto disk image. The real Alto uses 14" removable disk packs that hold 2.5 megabytes, as shown in the photo below. A disk pack is loaded into the Alto, the system boots off the disk, and then disks can be swapped as necessary. The simulator replaces this process by loading a disk image file into the simulator. There are several archived disk images that you can use. One source is bitsavers.org. Download an image file, such as allgames.dsk.

Inserting a disk into the Xerox Alto's disk drive. The Alto's video display is visible at the back.

Next, run ContrAlto and tell it to load the disk image: select "System -> Drive 0 -> Load". Select allgames.dsk. Then start the simulator with: "System -> Start" (or Reset if its already running). (Also see the README file.) After a few seconds, the ContrAlto simulator should start up and show you the Alto boot screen and command line prompt.

At the > prompt, hit ? to see the list of files on the disk. Files ending with .~ or .RUN are programs you can execute (by typing the file name). Note that the simulator takes control of your mouse. To release the mouse cursor, press Alt. Click the mouse inside the window to give focus back to the Alto simulation. The easiest way to stop a program is to hit Alt and then select System -> Reset.

Typing a question mark at the Xerox Alto executive prompt displays the contents of the disk. The "File / System / Help" menu items are for the ContrAlto simulator, not part of the Alto GUI.

One of the games on the disk is Space Invaders, which can be run by simply typing "invaders". The game is controlled by using the three mouse buttons for left, shoot, and right. This game illustrates the bitmapped graphics of the Alto, an unusual feature for that era due to the high price of memory.

The Xerox Alto has a space invaders game, controlled by the mouse buttons.

Star Trek was clearly popular with the Alto programmers. The allgames disk lets you battle Klingons with the Spacewar game (below). It also includes Trek, one of the first networked multiplayer games, which took advantage of the system's Ethernet.

The Spacewar game on the (simulated) Xerox Alto lets you battle Klingons.

Smalltalk

Smalltalk is a highly-influential programming language and environment that introduced the term "object-oriented programming" and was the ancestor of modern object-oriented languages. Smalltalk was developed on the Xerox Alto by Alan Kay, Dan Ingalls, Adele Goldberg and others. The Alto's Smalltalk environment is also notable for its creation of the graphical user interface with the desktop metaphor, icons, scrollbars, overlapping windows, popup menus and so forth. When Steve Jobs famously visited Xerox PARC, the Smalltalk GUI inspired him on how the Lisa and Macintosh should work; the details of the visit are highly controversial but the description in Dealers of Lightning seems most accurate.

To start up Smalltalk, download st80.dsk. Load the disk image into ContrAlto, reset, and then run "resume small.boot". (A second Smalltalk disk is xmsmall.zip; "resume xmsmall.boot" starts Smalltalk from this disk.). I should mention that the Smalltalk system is fairly slow, so be patient.

Smalltalk-80 running on the Xerox Alto simulator. The large window is the class browser.

The Smalltalk environment includes a class browser window (center) that lets you explore all the classes in Smalltalk. Reflection, being able to examine all the structures of a running system, was a key feature of Smalltalk.

Programming in Smalltalk is too complex to explain here, so I'll just give a simple example of executing an expression. In the upper-left window, type 355.0/113.0 followed by the cursor down key. (Cursor down simulates the Alto's line feed (LF) key, which ends a command. Return will not work.) While this looks like a simple expression, it is object-oriented. The message "/113.0" is sent to the Float object "355.0" (a subclass of Number), executing the division method. (See this document for more information on Alto Smalltalk.)

Smalltalk also includes an image editor called BitRect. To start it, enter "BitRect new fromuser; edit" (and then cursor down) into the upper-left window, and select the size of the editor window with rubber-banding. The editor includes various tools, brush sizes and gray levels. Note that the drawing window can overlap other windows.

Using the BitRect class in Smalltalk to draw an image. The workspace window in the upper right has useful Smalltalk commands to select and execute.

Writing BCPL code

You can use the simulator to write code in the BCPL language. (BCPL was the precursor to C and is essentially C with a different syntax and no data types.) For the BCPL environment, download and uncompress the tdisk4.dsk.Z image. Code is edited using Bravo, the first WYSIWYG document editor. One consequence of using this editor is that code on the Alto can have all the formatting and styling of a document: a variety of proportionally-spaced fonts, bolding, italics, and so forth. (Being able to style code is a nice feature that can make code easier to understand, so the modern approach of writing programs with plain unformatted text seems like a step backwards.) The image below shows some BCPL code with formatting applied; the styling has no effect on the code's operation. I wrote an article earlier explaining how to use the Bravo editor and the BCPL compiler, so see that article for details on writing BCPL programs.

The Bravo editor provides WYSIWYG formatting of text. This BCPL program prints "Hello World", using the Ws (Write String) function.

The diagnostics disk

If you want to follow along with our Alto restoration, we're currently using a copy of the diagnostic disk diags.zip. (You'll need to unzip the disk image before loading it.) This disk includes the keyboard test, MADTEST microcode test, CRT test and other diagnostics we have used in our blog posts and videos.

The ContrAlto debugger

The ContrAlto simulator includes an extensive debugger that lets you examine the running state of the (simulated) Alto. This feature was very helpful for us when debugging the real Alto, since we could compare what was supposed to happen with what was really happening. If you want to see what the Alto does internally, or how microcode interprets machine instructions, give the debugger a try. You'll find that it takes many micro-instructions to execute one machine instruction, and many machine instructions to do anything useful.

To start the debugger, select "System -> Show debugger". This opens up a debugger window, as seen below. The left panel shows the source code for Alto's microcode. The upper center panel shows disassembled machine instructions (using the Nova instruction set). Clicking "Step" will single-step through the microcode. Clicking "Nova Step" will step through Nova code. Click "Run" to return to normal execution

Debug window for the ContrAlto simulator. The left pane shows the microcode. The upper center pane shows the Nova program instructions that are running.

Conclusion

If you don't have a Xerox Alto available, the ContrAlto simulator is the best way to get the Alto experience. I should emphasize that I didn't write ContrAlto; I'm just a user. Josh Dersch at the Living Computer Museum wrote it. The museum also has a couple running Altos, so stop by the museum if you're in Seattle.

For updates on the Alto restoration, follow me on Twitter at kenshirriff.

I recently watched a cross-country running race that used a digital timing system, so I investigated how the RFID timing chip works. Each runner wears a race bib like the one below. The bib has two RFID tags, consisting of a metal foil antenna connected to a tiny RFID (radio-frequency identification) chip. At the finish line, runners pass over a pad that reads the chip and records the finish time. (I'm not sure why there are two RFID tags on each bib; perhaps for reliability of detection.)

This race bib has two RFID chips and antennas for timing the race. The RFID chips are tiny black specks in the small loop of each antenna.

The die photo below shows the RFID chip used on these tags. To create it, I took 22 photos of the chip with my metallurgical microscope and stitched them together to create a high resolution photo. (Click the image for a larger version.) To prepare the chip, I removed it from the plastic carrier with Goof Off, dissolved the antenna with pool acid (HCl), and burnt off the mounting adhesive over a stove. This process left the chip visible with just a bit of debris that wouldn't come off. I'd probably get better results with boiling sulfuric acid, but that's too hazardous for me. I described the image stitching process in this article.

Die photo of the Impinj Monza R6 RFID chip.

The chip turns out to be the Monza R6 RFID chip built by Impinj, a leading RFID chip company. The chip datasheet includes a die photo of the chip (below), but it is mostly obscured for some reason. Even so, it is clear that the chip in the datasheet matches the one I photographed.

Die photo of the Monza R6 RFID chip from the datasheet. Most of the chip is obscured; the antenna contacts are visible at the top.

How RFID timing works

The basic idea of RFID timing is the finish line has an RFID reader connected to a directional antenna. When the runner crosses the finish line, the reader communicates with the RFID chip on the runner's tag, determines the runner's ID number, and records the elapsed time.

You might expect that the RFID chip simply returns the runner's number as the runner crosses the finish line, but there's a complex two-way protocol at work here. The chip uses the industry-standard EPC standard, and supports about a dozen commands: Select, Query, Read, Write, Lock, and various others. (While some RFID chips include cryptography, the R6 does not.) The chip receives commands from the reader and responds according to the protocol. The chip's memory includes an Electronic Product Code (EPC), which is a 96-bit universal identifier. In this case, the EPC value is returned to identify the runner.

One interesting thing is the RFID chip has no power source other than the radio signal; it is passive and doesn't need a battery. The reader transmits a radio signal (between 860 and 960 MHz), which is picked up by the antenna on the tag. The tiny amount of power received by the tag is what powers the RFID chip.

To send data to the RFID chip, the reader pulses the radio signal it emits. The chip detects these pulses and converts them into bits, using various modulation schemes (details). You might expect the chip transmits a radio signal to send data back to the reader, but it doesn't have enough power to do that. Instead, the chip dynamically changes load on the antenna (basically a transistor shorts out the antenna) while the reader is sending the radio signal. This causes a tiny change in the signal strength at the reader (about 1 part in 1000), which the reader can detect. This process, called backscatter, allows the chip to send data back to the reader without using power.

You might wonder how the system works if two runners cross the finish line at the same time—how do both tags get read without interference? RFID tags are designed for inventory systems, so they are designed to handle environments where there are hundreds of tags in range of the reader. They use an efficient anti-collision protocol. The reader can minimize collisions by querying a subset of tags at a time ("I only want to hear from tags whose ids start with 123.") By rapidly iterating through subsets, a large number of tags can be queried with minimal conflicts. Collisions are handled using a protocol called the Q-algorithm. Essentially, each tag waits a random amount before responding, so hopefully there won't be a collision. If there is a collision, tags wait a longer random amount next time until they can respond without collisions. This is based on the ALOHA protocol, which was also used to handle collisions on Ethernet networks.

Inside the chip

Below is another die photo. I took this one earlier in the cleaning process, so there is more debris on the chip surface. The colors are slightly different in this photo due to different microscope camera settings.

Die photo of the Monza R6 RFID chip.

Because the chip is complex and has multiple layers of metal, I was unable to analyze its circuitry in detail. However, I can make some informed speculation about it. The top part of the chip is the analog circuitry, extracting power from the antenna, reading the transmitted signal, and modulating the return signal. The two antenna contacts are the large gold pads at the top (on the left and right). The rectangles between the antenna contacts are probably transistors and capacitors forming a charge pump to extract power from the radio signal (see patent). The voltage received from the radio signal is about 200 millivolts, while the chip's circuitry requires about 1 volt. The charge pump boosts the incoming voltage to the higher voltage required.

The middle of the chip has the logic circuitry. Given the complex protocol that the chip handles, you might expect a microprocessor to control the chip. But it uses a complex state machine instead, presumably to keep the chip small and reduce power consumption. This is probably build from CMOS standard digital logic cells, connected by the visible wiring (horizontal lines). This logic includes a pseudo-random number generator to support the anti-collision algorithm.

The chip has the label "IMPINJ KELSO". It's unclear why it is labeled "KELSO" and not "R6". I couldn't find any references to a "kelso" chip, so this could be an internal name. The Impinj company is in Seattle, Washington and Kelso is a small city in Washington, so perhaps that's the connection.

The bottom third of the chip likely contains the storage. The black rectangles above and below the IMPINJ KELSO text are probably the 12 bytes of non-volatile memory (NVM) that hold the tag identification. (This is essentially flash memory.) This Impinj patent discusses the NVM system. Writing the memory requires about 12 volts, which is provided by another charge pump. The gold squares scattered across the chip are probably test points used to program the chip and verify proper operation during manufacturing.

These chips are really small

Let me emphasize how tiny the RFID chips are—they are specks, about the size of a grain of salt. According to the datasheet, the chip is 464.1µm x 400µm (about 1/64 inch on a side). This is much smaller than a typical IC, and explains why the RFID chip doesn't break if the tag is flexed. The biggest problem I had with the die photos was making sure I didn't lose the chip. If I brushed against the chip it could stick to my finger and vanish until I carefully searched my hands to find it. The image below shows the Monza 4, a similar but slightly larger RFID chip, on top of a penny, next to a grain of salt. (A couple months ago, I wrote about the Monza 4 chip and how it is times the Bay to Breakers race.)

The RFID chip used to identify runners is very small, about the size of one of the letters on a penny. A grain of salt (next to R) and the RFID chip (on top of U). This chip has four antenna contacts.

The antenna

Closeup of the RFID antenna and chip. The chip is the black speck in the center above DogBone, below the + sign. The two halves of the antenna are seemingly shorted together, but that's an important part of the RF impedance matching.

The antenna seems straightforward at first—two metal strips connected to the chip. But if you look more closely, you'll notice that the strips are joined together above the chip, which would short out the signal. How does this work? The short answer is it's RF magic. The longer answer is the antenna is carefully designed to match impedance between the transmitter and the chip, so as much power as possible is received by the chip. The central part of the antenna is the "loop" and the two long parts of the antenna are the "dipole". The loop doesn't short out the dipole, but instead they work together. The area of the loop acts as an antenna, and the dipole provides the appropriate impedance so the system resonates at the right frequency. The "dogbone" antenna shape isn't artistic but helps fit a longer dipole into the available area. RFID antennas are explained in more detail in this application note. The tag (combining the antenna and chip) is produced by Smartrac (whose name is barely visible under the black text), and sells for about 13 cents.

The RFID system is similar in some ways to the NFC (near-field communication) that smart phones use for tap-and-pay. The following infographic summarizes the differences (full size).

This infographic is courtesy of atlasRFIDstore.

Conclusion

The RFID chips used for race timing are very small but include a lot more complexity than you'd expect. I took some die photos of the Monza R6 chip that show the analog and digital circuitry crammed into a piece of silicon the size of a grain of salt. The chip manages to power itself off the radio signal, allowing it to operate without a battery. This circuitry supports a complex communication protocol between the RFID tag and the reader. The foil antenna is carefully designed to maximize power transfer to the chip. All this is combined into a tag that sells for under 13 cents.

I announce my latest blog posts on Twitter, so follow me at kenshirriff.

The LM108 op amp is an interesting chip to examine under a microscope because it uses special superbeta transistors for high performance. Photos of the die reveal the tiny circuitry of the chip as well as unused components that make the chip more complex than necessary. Surprisingly, these extra components allow the same die to be reused for two totally different chips! In this article I examine the internals of the LM108 in detail, explain how it works and reveal how the die can take on two roles.

I've written about the famous 741 op amp, which came out in 1968. A year later, the improved LM108 op amp was invented by eccentric analog IC design genius Bob Widlar.[1] The main claim to fame of the LM108 is it uses a very small input current, orders of magnitude smaller than the 741.[2] To attain this low input current, the LM108 contains special transistors called "superbeta" transistors, with about 25 times the amplification of a regular transistor.[3] The downside is the superbeta transistors are delicate and require special circuitry to protect them from damage.

The LM308 op amp in an 8-pin metal can.

The photo above shows the LM108 op amp in a metal can. (The LM308 is the commercial-grade version of the LM108.[4] ) I opened up the can and photographed the die (below). The chip's metal layer is clearly visible, with thin metal traces connecting the different parts of the chip. The square bonding pads around the edge of the chip are connected by thin wires to the chip's external pins. Under the metal layer, you can see the silicon that forms the basis of the chip. To form transistors and resistors, a process called doping treats regions of the silicon with elements such as phosphorus or boron. In the die photo, these regions have a slightly different color, which makes the structure of the chip visible under the metal.

Die photo of the LM308 op amp. The LM308 is the commercial version of the LM108.

While the chip seems incomprehensible at first, close examination reveals the different components and their connections. By carefully studying the die photo, I reverse engineered the circuit for the op amp. Surprisingly, this chip has an unusual circuit design, more modern than National Semiconductor's "classic" LM108 design. Although the package has the National Semiconductor logo, the internal circuitry matches the Motorola LM308 datasheet.[5] You might expect that LM108's would all be the same internally, but as with many ICs, the part number doesn't indicate as much as you expect. Different manufacturers have widely differing implementations of the chip, so you can't expect two chips to behave the same just because they have the same name.[6] Even so, it's puzzling that a National Semiconductor chip doesn't match the National Semiconductor schematic.

Why op amps are important

The function of an op amp is to take two input voltages, subtract them, multiply the difference by a huge value (100,000 or more), and output the result as a voltage. If you've studied analog circuits, op amps will be familiar to you, but otherwise this may seem like a bizarre and pointless device. How often do you need to subtract two voltages? And why would you want to amplify by such a huge factor? Would amplifying a 1 volt input result in lightning shooting from the op amp?

It turns out that op amps are extremely useful and versatile, making them a key component in analog circuits. With simple feedback circuits, you can use an op amp as an amplifier, a filter, integrator, differentiator, or a variety of other circuits.[7] When an op amp is in use, the voltages on the two inputs will normally be almost identical, so multiplying by the huge amplification factor yields a reasonable output of a few volts. The point of the high amplification is it improves accuracy, even if the amplification of the overall circuit is small.

Transistors inside the IC

Transistors are the key components in a chip. The LM108 op amp uses NPN and PNP bipolar transistors, while many newer op amps use low-power CMOS transistors instead. If you've studied electronics, you've probably seen a diagram of an NPN transistor like the one below, showing the collector (C), base (B), and emitter (E) of the transistor. A transistor is usually illustrated as a sandwich of P silicon in between two symmetric layers of N silicon; the N-P-N layers make an NPN transistor. But it turns out that transistors on a chip look nothing like this, and the base often isn't even in the middle!

Symbol and oversimplified structure of an NPN transistor.

The photo below shows an NPN transistor on a 741 op amp die. The different brown and purple colors are regions of silicon that has been doped differently, forming N and P regions. The whitish-yellow areas are the metal layer of the chip on top of the silicon—these form the wires connecting to the collector, emitter, and base.

Underneath the photo is a cross-section drawing showing approximately how the transistor is constructed. There's a lot more than just the N-P-N sandwich, but if you look carefully at the vertical cross section below the 'E', you can find the N-P-N that forms the transistor. The emitter (E) wire is connected to N+ silicon. Below that is a P layer connected to the base contact (B). And below that is an N+ layer connected (indirectly) to the collector (C).

Structure of an NPN transistor in the 741 op amp

The innovative feature of the LM108 is the superbeta transistor, seen below. It has a much thinner base region below the emitter. This gives the superbeta transistor a much higher beta (i.e. amplification), but makes the transistor much more delicate: just 4 volts between the collector and emitter can "punch through" the thin base and destroy the transistor.

This image shows one of the superbeta transistors in the LM108 op amp. Note the large, round emitter. The green rectangle below the transistor is a resistor.

How the op amp works

In this section, I'll give a simplified overview of how the op amp works.[8] First I'll explain the differential pair, the important circuit that subtracts and amplifies the two input voltages. The next section explains the different parts of the LM108 op amp. The final section describes the current mirror that provides precise currents to the op amp's circuits.

The differential pair

The key component of an op amp is the differential pair, which is the most common two-transistor subcircuit used in analog ICs.[9] You may have wondered how the op amp subtracts two voltages since it's not obvious how to make a subtraction circuit. This is the job of the differential pair.

Schematic of a simple differential pair circuit. The current sink sends a fixed current I through the differential pair. If the two inputs are equal, the current is split equally between the two branches. Otherwise, the branch with the higher input voltage gets most of the current.

The schematic above shows a simple differential pair. The current sink at the bottom provides a fixed current I, which is split between the two input transistors. If the input voltages are equal, the current will be split equally into the two branches (I1 and I2). If one of the input voltages is a bit higher than the other, the corresponding transistor will conduct more current, so one branch gets more current and the other branch gets less. A small input difference is enough to direct most of the current into the "winning" branch, providing the amplification.

The LM108 op amp circuit

In this section, I'll give a brief explanation of the LM108 circuit, based on a detailed discussion by Bob Widlar, the chip's designer.[10] The schematic below is simplified to show the key features.

The superbeta transistors Q1 and Q2 are the heart of the chip. These form the input stage, and are connected as a differential pair. Resistors R1 and R2 provide the load for the two branches of the differential pair. By using superbeta transistors for the input, the LM108 achieves high performance with very low input currents.

The problem with superbeta transistors is they will break down and be destroyed by a small voltage difference, just 4 volts. The LM108 uses a couple interesting circuits to protect the superbeta transistors. The first protection mechanism is the two diodes across the inputs, ensuring the voltage difference is small. (On the chip, these diodes are implemented with transistors.) The second protection mechanism is transistors Q5 and Q6, which ensure that the collector-emitter voltage across the superbeta transistors is essentially zero, preventing an overload. Transistors Q3 and Q4 "bootstrap" the desired voltage from Q1/Q2's emitters to Q5 and Q6.[11]

Simplified schematic of the LM108 op amp from the chip's application notes.[10]

The second stage of amplification is provided by PNP transistors Q9 and Q10. They form a second differential amplifier that amplifies the output of the first stage. Instead of resistors, Q15 and Q16 form the load for the second stage differential amplifier. Transistors Q7 and Q8 bias the inputs of Q9 and Q10 to the right level.

The output of the op amp is driven by a high-current class AB amplifier with power transistors Q13 and Q14. That is, Q13 will pull the output high and Q14 will pull the output low. To ensure that the right output transistor turns on at the right time, Q11 and Q12 bias the output transistors (by two diode drops).

IC component: The current mirror

The schematic above uses a symbol that you may not be familiar with: the double circles that indicate a current source. A current source may seem like a strange concept, but it is very common in analog integrated circuits. The idea is that instead of controlling currents with resistors (which are inconveniently large and inaccurate on ICs), currents are generated from a current mirror.[12] Once you have one fixed current, you can use a current mirror to generate copies of this current. Simple modifications can scale the current or even invert it.

Detail of the LM108 op amp schematic, showing the current source symbol.

The diagram below shows how a current mirror is implemented.[12] A reference current passes through the transistor on the left. (In this case, the current is set by the resistor.) Since both transistors have the same emitter voltage and base voltage, they source the same current, so the current on the right matches the reference current on the left. Thus, the current mirror provides a mirror image on the right of the fixed current on the left.

Current mirror circuit. The current on the right copies the current on the left.

In the LM108, the initial current is generated not by a resistor, but by a patented four-transistor circuit that depends on one transistor having 10 times the emitter area of the others. The photo below shows the transistor that combines 10 square emitters into one large emitter, as well as an unusual transmitter with two separate emitters.[13]

The LM108's current source contains some interesting transistors. The transistor on the left has 10 emitters wired together, creating a transistor with an effective emitter size of 10 times normal. The transistor on the right has two separate emitters, providing two current outputs.

Interactive chip viewer

The image and schematic[5] below are an interactive exploration of the LM108. Click a component to see its location on the die and in the schematic highlighted. The box below will give an explanation of the component. The schematic below is the full schematic for the LM108; the component numbers don't match the earlier simplified schematic.

Click components in the image below for more information.

How I photographed the op amp die

Usually getting the die out of an IC requires concentrated acid to dissolve the epoxy package. But some ICs, such as op amps, are available in metal cans (for shielding) which can be easily opened with a hacksaw (or even better a jeweler's saw). I used a metallurgical microscope for my die photos, but you can use even a basic middle-school microscope to see many of the chip features. The photo below shows the LM108 op amp after removing the top. The tiny die is visible in the center, with thin wires connecting the die to the pins that surround it. The metal tab on the right indicates pin 8. To create the high-resolution die photo, I composited multiple photographs into one image (details).

The LM308 op amp has been cut open revealing the tiny die inside. Pads on the die are connected to the pins with thin bond wires.

Some strange things in the LM108 die

The LM108 die is more complex than I expected and has some strange circuitry. If you compare the 741 op amp (below) with the LM108, you'll notice that the 741 is much simpler. Part of this is the LM108 has 30 transistor versus 22 in the 741, but this small increase in components doesn't explain the large increase in the intricacy of the LM108. After examining the LM108 closely, I realized that it has many components on the die that aren't used. More investigation revealed that the LM108 die can be reused to create an entirely different op amp, the LM11![14] The manufacturer can use the same die (with a few changes to the metal wiring layer) to produce two different integrated circuits, which presumably saves them money.

Die photo of the 741 op amp. This chip is much simpler than the LM108.

The photo below shows some of the unused components on the LM108 die. On the left are two unused transistors, including one with two emitters. Next is a larger transistor. The small component is a resistor that is shorted out by the metal on top of it, making it nonfunctional. Seven diodes are connected in an undulating chain, but the chain isn't connected to anything. Finally, the chip has two large unused capacitors, one of which is shown below. I was puzzled by the amount of space wasted on the die for these unused components.

The LM108 integrated circuit contains an unusual number of unused components. This image shows some of them: two transistors, a larger transistor, a resistor that is shorted out, a chain of diodes, and a capacitor.

Another puzzle on the LM108 die is it has several resistors that are made up of multiple segments. By shorting out some of these segments with the metal layer, the resistance can be tuned to a desired value.[15] The mystery is why the LM108 die has so many resistances that need to be customized.

The LM108 op amp contains several resistors with resistance than can be modified by changing the metal layer. This image shows one resistor with about 20 segments. A few of the segments are shorted out with metal, reducing the resistance.

Another strange feature of the LM108 die is the protection transistors are unusually large and have extra unused structures (below). The transistor has two emitters: one is a regular emitter, and the second is a large, oval superbeta transistor emitter. The superbeta emitter is not wired to anything, which raises the question of why it exists. The die has other strange, unused transistors.[16]

This unusual transistor from the LM108 has two emitters: one has a regular base and one has a supertransistor base. The second emitter is not connected to anything.

By studying the schematics closely, I think I solved the puzzle of these strange and unused components. I determined that the LM108 I examined is a combination of the "classic" LM108 and the LM11 op amp introduced in 1980. The diagram below shows that my chip uses the input stage from the classic LM108 (blue). But the second stage (yellow) and the current source (red) matches the LM11. All chips use the same output stage (green).

The LM108 I examined (bottom) is a combination of the 'classic' LM108 and the more modern LM11 op amp. It takes the input stage from the classic (blue) and the second stage (yellow) and the current source (orange) from the LM11. All three op amps use the same output stage (green).

My conclusion is the die is designed so it can be used both as an LM11 and an LM108, by just making some changes to the metal layer. Thus, when configured as an LM11, the chip uses the components that are unused in the LM108. The LM11 schematic shows a zener diode for protection: this is the unused chain of diodes shown earlier. The LM11 makes use of the other unused resistors, transistors, and capacitors described above. Finally, the two protection transistors in my LM108 look like a combination of a regular transistor and an unused superbeta transistor. The LM11 schematic shows weird transistors that are half regular and half superbeta, an exact match for these puzzling transistors.

Conclusion

Every IC die has its own interesting puzzles and features, and the LM108 is no exception. The LM108 is an interesting chip since it uses superbeta transistors for high performance, but requires internal protective circuitry to keep the delicate transistors safe from damage. The most unusual thing about this LM108 is that the same chip die is used for both the LM108 and the LM11 op amps, just by tweaking the metal layer. While this makes the die more complex, presumably it saves money for the manufacturer.

Thanks to Bil Herd (famed designer of the Commodore 128) for suggesting the LM108 as an interesting chip to examine. /r/AskElectronics had an informative discussion of the LM108's strange transistors; thanks to crb3 for pointers to relevant documents.

I announce my latest blog posts on Twitter, so follow me at kenshirriff.

Notes and references

[1] By all reports, Robert Widlar was an amazing analog engineer, as well as an alcoholic crazy guy. Widlar invented key analog IC circuits such as the Widlar current source as well as groundbreaking ICs such as the µA702 and µA723. In 1970 he sold his stock options for a million dollars (about 6 million adjusted for inflation) and retired to Mexico at 33. Some entertaining stories about him are here, on Wikipedia, and with pictures of his sheep.

[2] The 741 has up to 1.5µA input bias current, while the LM108's maximum input bias current is orders of magnitude lower at 3nA.

[3] The base of a regular transistor is 0.5 to 1µm thick, while a superbeta transistor has a much thinner base of 0.1 to 0.2µm. According to an Application Note on the LM108, the superbeta transistor has a gain of 5000, compared to 200 for a regular NPN transistor. The downside is the superbeta transistors have a breakdown voltage of 4 volts, compared to 80 for regular transistors.

[4] The first digit of the part number specifies the temperature range. The LM108 is the military-grade chip that can handle the widest temperature range, the LM208 is the industrial-grade chip, and the LM308 is the commercial-grade chip with the narrowest temperature range. The "H" in LM308AH indicates the metal can package.

[5] The LM108 schematic that matches my die photo is from the Motorola LM108 datasheet. It is strange that my chip is labeled National Semiconductor but its circuit does not match the National Semiconductor datasheet. Instead it matches the Motorola datasheet.

Another variant of the LM108 is the Raytheon design (datasheet). This version has a totally different second stage, output stage, and current biasing. It uses many more transistors: 48 transistors including 6 superbeta devices.

[6] The LM108 is used in distortion pedals; people search out this op amp to get its specific sound quality when overdriven. Since different manufacturers have different internal designs for the LM108, this raises the question of whether people may unexpectedly end up with different op amps that produce different sound effects.

[7] To see the variety of circuits that can be built from an op amp, see this op amp circuit collection. The book Op Amp Applications Handbook has a ton of useful information about op amp types (including the LM108), applications, and history; it is also available as a PDF.

[8] IC Op-Amps Through the Ages. Op Amp History (Walt Jung) page H.52 (also here) discusses the LM108 in more detail.

[9] Differential pairs are also called long-tailed pairs. According to Analysis and Design of Analog Integrated Circuits the differential pair is "perhaps the most widely used two-transistor subcircuits in monolithic analog circuits." (p214) For more information about differential pairs, see wikipedia, any analog IC book, or chapter 4 of Designing Analog Chips. The latter is an excellent book written by Hans Camenzind, the inventor of the 555 timer, so definitely check out the PDF version.

[10] Widlar wrote a detailed explanation of the LM108 in IC Op Amp Beats FETs on Input Current (National Semiconductor Application Note 29, Dec 1969).

[11] To protect the input transistors, transistors Q3 and Q4 boost the input transistor emitter voltage by two diode drops to make the voltages match. Transistors Q5 and Q6 are in a cascode configuration. The result is the collector-base voltage of the input transistors is effectively zero, protecting them.

[12] A current mirror is a very useful way of connecting transistors so the current through the second transistor matches the current through the first transistor. For more information about current mirrors, you can check Wikipedia or any analog IC book such as chapter 3 of Designing Analog Chips.

[13] The LM108's four-transistor current source is configured so the four base-emitter junctions cancel out, except that one transistor has 10 times the base area of the remainder. The voltage generated across R20 is kT/q * ln(10), which is approximately 60mV at room temperature, but proportional to absolute temperature. (The principle of using emitters of different areas is similar to a bandgap voltage reference. However, the bandgap reference is configured so the temperature dependency cancels out and the voltage is stable.) The resistance of R20 controls the current into the current mirror. The temperature dependence is used to counteract temperature dependence of other parts of the circuit.

A field-effect transistor (Q30) generates the small current that powers the LM108's current source. The current source circuit turns the unpredictable current from Q30 into a stable output current. For a full explanation, see patent 3930172.

[14] The LM11 op amp is discussed in detail in Reducing DC Errors in Op Amps (National Semiconductor Technical Paper 15) by Widlar.

[15] In some precision chips, the resistance can be tuned on a per-chip basis, for instance by laser-trimming the resistor or using Zener zapping. This is not the case in the LM108; the resistances are controlled by the metal layer, which requires a new mask if it is changed.

[16] Two more strange transistors are shown below, with unconnected oval emitters.

Closeup of two strange transistors on the LM108 die.

The schematic symbol (below) is even more puzzling, showing transistors with two emitters and two bases. While transistors with two emitters are common in integrated circuits, I've never seen two bases in one transistor. After discussion on /r/AskElectronics, my theory is the second emitter is tied off on the LM108 but used on the LM11 for the balance inputs.

Symbol on the schematic for two strange transistors inside the LM108 op amp.

Intel's groundbreaking 8008 microprocessor was first produced 45 years ago.1 This chip, Intel's first 8-bit microprocessor, is the ancestor of the x86 processor family that you may be using right now. I couldn't find good die photos of the 8008, so I opened one up and took some detailed photographs. These new die photos are in this article, along with a discussion of the 8008's internal design.

Die photograph of the 8008 microprocessor

The photo above shows the tiny silicon die inside the 8008 package. (Click the image for a higher resolution photo.) You can barely see the wires and transistors that make up the chip. The squares around the outside are the 18 pads that are connected to the external pins by tiny bond wires. You can see the text "8008" on the right edge of the chip and "© Intel 1971" on the lower edge. The initials HF appear on the top right for Hal Feeney, who did the chip's logic design and physical layout. (Other key designers of the 8008 were Ted Hoff, Stan Mazor, and Federico Faggin.)

Inside the chip

The diagram below highlights some of the major functional blocks of the chip. On the left is the 8-bit Arithmetic/Logic Unit (ALU), which performs the actual data computations.3 The ALU uses two temporary registers to hold its input values. These registers take up significant area on the chip, not because they are complex, but because they need large transistors to drive signals through the ALU circuitry.

Die of the 8008 microprocessor showing major components.

Below the registers is the carry look ahead circuitry. For addition and subtraction, this circuit computes all eight carry values in parallel to improve performance.2 Since the low-order carry depends on just the low-order bits, while the higher-order carries depend on multiple bits, the circuit block has a triangular shape.

The triangular layout of the ALU is unusual. Most processors stack the circuitry for each bit into a regular rectangle (a bit-slice layout). The 8008, however, has eight blocks (one for each bit) arranged haphazardly to fit around the space left by the triangular carry generator. The ALU supports eight simple operations.3

In the center of the chip is the instruction register and the instruction decoding logic that determines the meaning of each 8-bit machine instruction. Decoding is done with a Programmable Logic Array (PLA), an arrangement of gates that matches bit patterns and generates the appropriate control signals for the rest of the chip. On the right are the storage blocks. The 8008's seven registers are in the upper right. In the lower right is the address stack, which consists of eight 14-bit address words. Unlike most processors, the 8008's call stack is stored on the chip instead of in memory. The program counter is just one of these addresses, making subroutine calls and returns very simple. The 8008 uses dynamic memory for this storage

The physical structure of the chip is very close to the block diagram in the 8008 User's Manual (below), with blocks located on the chip in nearly the same positions as in the block diagram.

Block diagram of the 8008 microprocessor, from the User's Manual.

The structure of the chip

What does the die photo show? For our purposes, the chip can be thought of as three layers. The diagram below shows a closeup of the chip, pointing out these layers. The topmost layer is the metal wiring. It is the most visible feature, and looks metallic (not surprisingly). In the detail below, these wires are mostly horizontal. The polysilicon layer is below the metal and appears orange under the microscope.

A closeup of the 8008 die, showing the metal layer, the polysilicon, and the doped silicon.

The foundation of the chip is the silicon wafer, which appears purplish-gray in the photo. Pure silicon is effectively an insulator. Regions of it are "doped" with impurities to create semiconducting silicon. Being on the bottom, the silicon layer is difficult to distinguish, but you can see black lines along the border between doped silicon and undoped silicon. A few vertical silicon "wires" are visible in the photo.4

Transistors are the key component of the chip, and a transistor is formed where a polysilicon wire crosses doped silicon. In the photo, the polysilicon appears as a brighter orange where it forms a transistor.

Why an 18 pin chip?

One inconvenient feature of the 8008 is it only has 18 pins, which makes the chip slower and much more difficult to use. The 8008 uses 14 address bits and 8 data bits so with 18 pins there aren't enough pins for each signal. Instead, the chip has 8 data pins that are reused in three cycles to transmit the low address bits, high address bits, and data bits. A computer using the 8008 requires many support chips to interact with this inconvenient bus architecture.5

There was no good reason to force the chip into 18 pins. Packages with 40 or 48 pins were common with other manufacturers, but 16 pins was "a religion at Intel".6 Only with great reluctance did they move to 18 pins. By the time the 8080 processor came out a few years later, Intel had come to terms with 40-pin chips. The 8080 was much more popular, in part because it had a simpler bus design permitted by the 40-pin package.

Power and data paths in the chip

The data bus provides data flow through the chip. The diagram below shows the 8-bit data bus of the 8008 with rainbow colors for the 8 data lines. The data bus connects to the 8 data pins along the outside of the upper half of the chip. The bus runs between the ALU on the left, the instruction register (upper center), and the registers and stack on the right. The bus is split on the left with half along each side of the ALU.

Die photo of the 8008 microprocessor. The power bus is shown in red and blue. The data bus is shown with 8 rainbow colors.

The red and blue lines show power routing. Power routing is an under-appreciated aspect of microprocessors. Power is routed in the metal layer due to its low resistance. But since there is only one metal layer in early microprocessors, power distribution must be carefully planned so the paths don't cross.7 The diagram above shows Vcc lines in blue and Vdd lines in red. Power is supplied through the Vcc pin on the left and the Vdd pin on the right, then branches out into thin, interlocking wires that supply all parts of the chip.

The register file

To show what the chip looks like in detail, I've zoomed in on the 8008's register file in the photo below. The register file consists of an 8 by 7 grid of dynamic RAM (DRAM) storage cells, each using three transistors to hold one bit.8 (You can see the transistors as the small rectangles where the orange polysilicon takes on a slightly more vivid color.) Each row is one of the 8008's seven 8-bit registers (A, B, C, D, E, H, L). On the left, you can see seven pairs of horizontal wires: the read select and write select lines for each register. At the top, you can see eight vertical wires to read or write the contents of each bit, along with 5 thicker wires to supply Vcc. Using DRAM for registers (rather than the more common static latches) is an interesting choice. Since Intel was primary a memory company at the time, I expect they chose DRAM due to their expertise in the area.

The register file in the 8008. The chip has seven 8-bit registers: A, B, C, D, E, H, L

How PMOS works

The 8008 uses PMOS transistors. To simplify slightly, you can think of a PMOS transistor as a switch between two silicon wires, controlled by a gate input (of polysilicon). The switch closes when its gate input is low and it can pull its output high. If you're familiar with the NMOS transistors used in microprocessors like the 6502, PMOS may be a bit confusing because everything is backwards.

A simple PMOS NAND gate can be constructed as shown below. When both inputs are high, the transistors are off and the resistor pulls the output low. When any input is high, the transistor will conduct, connecting the output to +5. Thus, the circuit implements a NAND gate. For compatibility with 5-volt TTL circuits, the PMOS gate (and thus the 8008) is powered with unusual voltages: -9V and +5V.

A NAND gate implemented with PMOS logic.

For technical reasons, the resistor is actually implemented with a transistor. The diagram below shows how the transistor is wired to act as a pull-down resistor. The detail on the right shows how this circuit appears on the chip. The -9V metal wire is at the top, the transistor is in the middle, and the output is the silicon wire at the bottom.

In PMOS, a pull-down resistor (left) is implemented with a transistor (center). The photo on the right shows an actual pull-down in the 8008 microprocessor.

History of the 8008

The 8008's complicated story starts with the Datapoint 2200, a popular computer introduced in 1970 as a programmable terminal. (Some people consider the Datapoint 2200 to be the first personal computer.) Rather than using a microprocessor, the Datapoint 2200 contained a board-sized CPU build from individual TTL chips. (This was the standard way to build a CPU in the minicomputer era.) Datapoint and Intel decided that it would be possible to replace this board with a single MOS chip, and Intel started the 8008 project to build this chip. A bit later, Texas Instruments also agreed to build a single-chip processor for Datapoint. Both chips were designed to be compatible with the Datapoint 2200's 8-bit instruction set and architecture.

The 8008 processor was first described publicly in "Electronic Design", Oct 25, 1970. Although Intel claimed the chip would be delivered in January 1971, actual delivery was more than a year later in April, 1972.

Around March 1971, Texas Instruments completed their processor chip, calling it the TMC 1795. After delaying the project, Intel finished the 8008 chip later, around the end of 1971. For a variety of reasons, Datapoint rejected both microprocessors and built a faster CPU based on newer TTL chips including the 74181 ALU chip. TI tried unsuccessfully to market the TMC 1795 processor to companies such as Ford, but ended up abandoning the processor, focusing on highly-profitable calculator chips instead. Intel, on the other hand, marketed the 8008 as a general-purpose microprocessor, which eventually led to the x86 architecture you're probably using right now. Although TI was first with the 8-bit processor, it was Intel who made their chip a success, creating the microprocessor industry.

A family tree of the 8008 and some related processors. Black arrows indicate backwards compatibility. Light arrows indicate significant architecture changes.

The diagram above summarizes the "family tree" of the 8008 and some related processors.10 The Datapoint 2200's architecture was used in the TMC 1795, the Intel 8008, and the next version Datapoint 220011. Thus, four entirely different processors were built using the Datapoint 2200's instruction set and architecture. The Intel 8080 processor was a much-improved version of the 8008. It significantly extended the 8008's instruction set and reordered the machine code instructions for efficiency. The 8080 was used in groundbreaking early microcomputers such as the Altair and the Imsai. After working on the 4004 and 8080, designers Federico Faggin and Masatoshi Shima left Intel to build the Zilog Z-80 microprocessor, which improved on the 8080 and became very popular.

The jump to the 16-bit 8086 processor was much less evolutionary. Most 8080 assembly code could be converted to run on the 8086, but not trivially, as the instruction set and architecture were radically changed. Nonetheless, some characteristics of the Datapoint 2200 still exist in today's x86 processors. For instance, the Datapoint 2200 had a serial processor, processing bytes one bit at a time. Since the lowest bit needs to be processed first, the Datapoint 2200 was little-endian. For compatibility, the 8008 was little-endian, and this is still the case in Intel's processors. Another feature of the Datapoint 2200 was the parity flag, since parity calculation was important for a terminal's communication. The parity flag has continued to the x86 architecture.

The 8008 is architecturally unrelated to Intel's 4-bit 4004 processor12. The 8008 is not an 8-bit version of the 4-bit 4004 in any way. The similar names are purely a marketing invention; during its design phase the 8008 had the unexciting name "1201".

If you want more early microprocessor history, I wrote a detailed article for the IEEE Spectrum. I also wrote a post about TI's TMC 1795.

How the 8008 fits into the history of semiconductor technology

The 4004 and 8008 both used silicon-gate enhancement-mode PMOS, a semiconductor technology that was only used briefly. This puts the chips at an interesting point in chip fabrication technology.

The 8008 (and modern processors) uses MOS transistors. These transistors had a long path to acceptance, being slower and less reliable than the bipolar transistors used in most computers of the 1960s. By the late 1960s, MOS integrated circuits were becoming more common; the standard technology was PMOS transistors with metal gates. The gates of the transistor consisted of metal, which was also used to connect components of the chip. Chips essentially had two layers of functionality: the silicon itself, and the metal wiring on top. This technology was used in many Texas Instruments calculator chips, as well as the TMC 1795 chip (the chip that had the same instruction set as the 8008).

A key innovation that made the 8008 practical was the self-aligned gate—a transistor using a gate of polysilicon rather than metal. Although this technology was invented by Fairchild and Bell Labs, it was Intel that pushed the technology ahead. Polysilicon gate transistors had much better performance than metal gate (for complex semiconductor reasons). In addition, adding a polysilicon layer made routing of signals in the chip much easier, making the chips denser. The diagram below shows the benefit of self-aligned gates: the metal-gate TMC 1795 is bigger than the 4004 and 8008 chips combined.

Intel's 4004 and 8008 processors are much denser than Texas Instruments' TMC 1795 chip, largely due to their use of self-aligned gates.

Shortly afterwards, semiconductor technology improved again with the use of NMOS transistors instead of PMOS transistors. Although PMOS transistors were easier to manufacture initially, NMOS transistors are faster, so once NMOS could be fabricated reliably, they were a clear win. NMOS led to more powerful chips such as the Intel 8080 and the Motorola 6800 (both 1974). Another technology improvement of this time was ion-implantation to change the characteristics of transistors. This allowed the creation of "depletion-mode" transistors for use as pull-up resistors. These transistors improved chip performance and reduced power consumption. They also allowed the creation of chips that ran on standard five-volt supplies.13 The combination of NMOS transistors and depletion-mode pull-ups was used for most of the microprocessors of the late 1970s and early 1980s, such as the 6502 (1975), Z-80 (1976), 68000 (1979), and Intel chips from the 8085 (1976) to the 80286 (1982).

In the mid 1980s, CMOS took over, using NMOS and PMOS transistors together to dramatically reduce power consumption, with chips such as the 80386 (1986), 68020 (1984) and ARM1 (1985). Now almost all chips are CMOS.14

As you can see, the 1970s were a time of large changes in semiconductor chip technology. The 4004 and 8008 were created when the technological capability intersected with the right market.

How to take die photos

In this section, I explain how I got the photos of the 8008 die. The first step is to open the chip package to expose the die. Most chips come in epoxy packages, which can be dissolved with dangerous acids.

The 8008 microprocessor in a ceramic package

Since I would rather avoid boiling nitric acid, I took a simpler approach. The 8008 is also available in a ceramic package (above), which I got on eBay. Tapping the chip along the seam with a chisel pops the two ceramic layers apart. The photo below shows the lower half of the ceramic package, with the die exposed. Most of the metal pins have been removed, but their positions in the package are visible. To the right of the die is a small square; this connects ground (Vcc) to the substrate. A couple of the tiny bond wires are still visible, connected to the die.

Inside the package of the 8008 microprocessor, the silicon die is visible.

Once the die is exposed, a microscope can be used to take photographs. A standard microscope shines the light from below, which doesn't work well for die photographs. Instead, I used a metallurgical microscope, which shines the light from above to illuminate the chip.

I took 48 photographs through the microscope and then used the Hugin stitching software to combine them into one high-resolution image (details). Finally, I adjusted the image contrast to make the chip's structures more visible. The original image (which is approximately what you see through the microscope) is below for comparison.

Die photograph of the 8008 microprocessor

Conclusion

I took detailed die photos of the 8008 that reveal the circuitry it used. While the 8008 wasn't the first microprocessor or even the first 8-bit microprocessor, it was truly revolutionary, triggering the microprocessor revolution and leading to the x86 architecture that dominates personal computers today. In future posts, I plan to explain the 8008's circuits in detail to provide a glimpse into the roots of todays computers.

I announce my latest blog posts on Twitter, so follow me at kenshirriff. Or you can use the RSS feed.

Notes and references

According to the oral history of the 8008, photos of the 8008 were obtained in October / November 1971 (page 6). Chip designer Federico Faggin mentions that toward the end of 1971, "everything was working except for a few errors." Faggin then debugged a problem with the dynamic memory losing data, making it ready for production (page 9). ↩
Using the carry look ahead circuit avoids the delay from a standard ripple-carry adder, where the carries propagate through the sum. ↩
The 8008's ALU supports eight operations: add, subtract, add with carry, subtract with carry, AND, OR, XOR, and compare. It also implements left and right shift and rotate operations. The 8008 also has increment and decrement instructions, extending the Datapoint 2200's instruction set. ↩
Because silicon has higher resistance than polysilicon, most chips use the polysilicon and metal layers for wiring, not the silicon layer. The 4004 and 8008 chips are unusual in that they prefer to use the silicon layer for wiring rather than polysilicon. I expect this was due to the recent introduction of polysilicon: before polysilicon, routing needed to be done in the silicon layer and perhaps the chip designers were sticking with the older layout techniques. ↩
The 8008 required 20 support chips according to chip architect Federico Faggin. In contrast, the 4004 and earlier MOS computers such as the Four Phase and CADC were designed with a small number of MOS chips that worked together without extra "glue chips". In this sense, the 8008 was a step backwards architecturally, saying "here's the CPU, you figure out how to make a computer out of it." ↩
For details on Intel's insistence on 16 pins, see Oral History of Federico Faggin, page 55-56. It was only when the 1103 memory chip required 18 pins that Intel reluctantly moved beyond 16 pins. And that was treated by Intel like "the sky had dropped from heaven," resulting in "so many long faces". ↩
If two metal lines need to cross, one of them can be routed under the other by using the polysilicon layer. To be low resistance, this cross-under must be relatively wide, so cross-unders are avoided if possible. ↩
The 8008 registers use the "3T1C" cell: three transistors and one capacitor (details). The circuit doesn't physically contain a separate capacitor, but uses the gate capacitance of the transistor. One unusual feature of the 8008 cell is it uses one wire for both reading and writing the bit, while the typical 3T cell has separate wires for reading and writing. The 4004 had separate wires, but the design changed slightly in the 8008. ↩
Pull-up resistors in later chips such as the 6502 were implemented using depletion-mode NMOS transistors. These yielded more faster, more efficient logic. They were also wired differently, with the gate connected to the output rather than the power rail. ↩
The 8008 architecture and the evolution of Intel's microprocessors are discussed in detail in Intel Microprocessors: 8008 to 8086. ↩
The second version of the Datapoint 2200 had a totally new implementation of the processor, still built from TTL chips. While the first version had a serial ALU (processing one bit at a time), the second version operated in parallel using 74181 ALU chips. As a result, the second version was much faster. ↩
The extensive 4004 Anniversary Project has reverse-engineered the 4004 processor. The 4004 schematic is here. ↩
The Motorola 6800 microprocessor originally used enhancement-mode transistors. To operate off a single +5V supply, it had a voltage-doubler circuit on the chip. ↩
Interestingly, in 2007 Intel started using metal gates again in order to scale transistors further (details). In a way, semiconductor technology has gone full circle, back to metal gates, although now unusual metals such as hafnium are used. ↩

What's inside a TTL chip? To find out, I opened up a 74181 ALU chip, took high-resolution die photos, and reverse-engineered the chip.1 Inside I found several types of gates, implemented with interesting circuitry and unusual transistors. The 74181 was a popular chip in the 1970s used to perform calculations in the arithmetic-logic unit (ALU) of minicomputers. It is a moderately complex chip containing about 67 gates and 170 transistors3, implemented using fast and popular TTL (transistor-transistor logic) circuitry.

The 74181 die photo is below. (Click the image for a high-resolution version.) The golden stripes are the metal layer that interconnects the circuitry of the chip. (It's not gold, just aluminum that looks golden from the lighting.) The white squares around the edge of the die are the pads that are connected by tiny bond wires to the external pins. Under the metal layer is the silicon that makes up the chip. Faint lines show the doped silicon regions that make up the transistors and resistors. While the chip may appear impossibly complex at first, with careful examination it is possible to understand how it works.

Die photo of the 74181 ALU chip.

The 74181 chip is important because of its key role in minicomputer history. Before the microprocessor era, minicomputers built their processors from boards of individual chips. The arithmetic operations (addition, subtraction) and logical operations (AND, OR, XOR) were performed by the arithmetic/logic unit (ALU) in the processor. Early minicomputers built ALUs out of a large number of simple gates. But in March 1970, Texas Instruments introduced the 74181 Arithmetic / Logic Unit (ALU) chip, which put a full 4-bit ALU on one fast TTL chip.4 This chip provided 32 arithmetic5 and logic functions2, as well as fast carry lookahead.7 Using the 74181 chip simplified the design of a minicomputer processor and made it more compact, so it was used in many minicomputers. Computers using the 74181 ranged from the popular PDP-11 and NOVA minicomputers to the powerful VAX-11/780 to the Datapoint 2200 desktop computer. The 74181 is still used today in retro hacker projects.6

A brief guide to NPN transistors

The 74181 is built from bipolar NPN transistors, a different technology from the MOS transistors in modern processors. The diagram below shows how an NPN transistor appears in an integrated circuit, along with a cross section. The transistor has three connections: the collector, base and emitter, with metal lines for each. The collector is connected to N-type silicon, the base to P silicon, and the emitter to N silicon (giving it the NPN structure). On the chip, you can recognize the emitter from its nested squares, the base because its silicon region surrounds the emitter, and the collector because it is the largest contact.

Structure of an NPN transistor appears in an IC.

The key idea of the NPN transistor is it acts as a switch between the collector and emitter, controlled by the base. Normally there is no current flow between the collector and the emitter, so it's like a switch in the "off" position. But if you pass a small current from base to emitter, the transistor allows a large current from collector to emitter, like a switch in the "on" position. (This is vastly oversimplified—bipolar transistors are much more "analog"—but should be enough to understand how the 74181 works.) At the right is the symbol for an NPN transistor with the collector, base and emitter labeled.

Inverter

The fundamental component of TTL logic is the inverter, and other gates are modifications of the inverter circuit. Thus, it's important to understand the basic construction of the inverter, even though it is a bit complicated. I'll explain how it works in an oversimplified way.9

The diagram below shows an inverter in the 74181 chip. The 5V and ground lines run vertically along the left, powering the inverter. The transistors are highlighted with boxes. The resistors are visible as long strips of doped silicon snaking around.11 An input pin (A0) is wired to the pad. On the right is the schematic for a TTL inverter10, with components highlighted to match the die photo.

An inverter in the 74181 ALU chip, along with a schematic showing the components of the inverter.

The input is connected to transistor Q1 (red). This transistor is used in an unusual way, acting as a "current-steering" transistor. If the input is low, R1's current is steered through Q1's emitter to the input, leaving Q2 off. If the input is high, R1's current flows "backwards" out Q1's collector to Q2's base, turning on Q2. Transistor Q2 (orange) can be considered a "phase splitter transistor", which makes sure that exactly one of the output transistors (Q3 and Q4) is activated. (That is, they turn on in opposite phases.) If Q2 is off, R2 provides current to turn on Q3 (yellow), which pulls the output high. Meanwhile, R3 turns off Q4. On the other hand, if Q2 turns on, it provides enough current to turn on Q4 (green), which pulls the output low. I'll explain the diodes in a footnote.12

The 74181 schematic

The schematic below13 shows the circuitry of the 74181. If you've taken a digital logic course, you've probably seen how to build a full adder circuit. But if you look at the schematic of the 74181, it's implemented in a very different way, to provide higher speed and more flexibility.8 The main reason for its complexity is it computes everything in parallel, rather than waiting for the carry to ripple from bit to bit, and this requires a lot more logic.

The different types of gates are highlighted. There are a few inverters (red) to invert input signals. Most of the logic consists of AND-OR-INVERT gates. The AND stages are shown in blue, and the OR-INVERT (NOR) stages in green. (Some of the OR-INVERT stages are not explicit on the schematic and are empty boxes.) The chip uses a few XOR gates (purple) to compute sums. Finally, there are a couple unique gates shown in yellow.

Schematic of the 74181 ALU.

The schematic can be matched up with the labeled die image below. Conveniently, the layout of the die largely matches the schematic. The AND-OR-INVERT gates make up the majority of the chip. Also notice the large chip real estate used for resistors. The chip pins are labeled with blue text. (The metal layer was removed for this photo, to make the underlying circuitry more visible.)

The 74181 ALU die, with main gate types outlined.

AND-OR-INVERT

Most of the 74181's logic is implemented with AND-OR-INVERT gates, which consist of AND gates connected to a NOR gate as shown below. After seeing the inverter, you may expect that an AND-OR-INVERT gate is very complex. But as the schematic below14 shows, the AND-OR-INVERT is not much more complex than an inverter, requiring just a few more transistors. An AND gate is implemented by adding more emitters to the current-steering input transistor (red). (This may seem very strange, but transistors with multiple emitters are common in TTL circuits.) If all inputs are high, the base current will be steered to the collector. Otherwise, the base current will flow out the emitter. Thus, the AND of the inputs is generated. The NOR gate is implemented by putting phase splitter transistors in parallel (orange). If any of the bases are high, the corresponding transistor (Q2A or Q2B) will conduct, pulling the output low. While the circuit below has two AND gates, it can easily be extended to as many gates and inputs as desired.

The AND-OR-INVERT circuit from the 7451 chip. The multiple-emitter transistors that implement AND are highlighted in red. The transistors that implement OR are highlighted in orange.

The diagram below shows how these multiple-emitter transistors are implemented on the chip. Three of these transistors are shown, each with four or five emitters (the dark squares), creating 4-input or 5-input AND gates. Each transistor's base is at the top and each collector is at the bottom. The signal lines run horizontally, with emitters connected as needed. With this structure, multiple AND gates can arranged efficiently on the chip (similar to a Programmable Logic Array or PLA). Note that the base resistors take up a significant amount of space.

Three AND gates in the 74181 ALU chip. Each one is a single transistor with multiple emitters.

The diagram below shows how the OR-INVERT part of the circuit appears on the chip. Note that Q2A and Q2B (orange) share a collector, so the two transistors don't take up much space on the die. Their inputs come from AND circuits such as the ones above. 3-input and 4-input OR gates are implemented similarly, by adding more transistors.

The OR-INVERT stage of the AND-OR-INVERT gate in the 74181, compared with the 7451 AND-OR-INVERT gate.

Exclusive-OR

The chip uses a clever, compact circuit to compute XOR with two transistors wired in an unusual way: the emitters and bases are tied together and there is no connection to ground. The way it works is if the first input is high and the second is low, the first transistor turns on due to the base-emitter current. This pulls the output low through the transistor, with the second input acting as ground. Likewise, if the first input is low and the second is high, the second transistor turns on and pulls the output low. If both inputs are the same, there is no base-emitter current, both transistors remain off, and the output is pulled high by the resistor. The output from the transistor pair goes to the standard inverter stage, so the resulting signal is the XOR of the two inputs. 15 As with OR-INVERT, the two transistors share a collector, making the layout more compact.

The circuit used in the 74181 to compute XOR. Layout inspired by userbinator.

A few things to note about the photo. The two transistors share a collector, which is equivalent to wiring their collectors together. The pull-up resistor doesn't appear in the photo; it is off to the right. The inputs to the XOR are from AND-OR-INVERT gates; their output transistors are at the top of the photo.

NOT-AND

The chip uses four AND gates that have one inverted input.17 On the die, it appears at first that the gates are implemented with the standard AND transistors, but an interesting trick is used to invert one of the inputs. Transistor Q1 is wired in the normal current-steering way, with R1 providing a base current. But transistor Q2 has its resistor connected to the collector, not the base.16 Normally R2 will pull the output high. But if input X is high and input Y is low, R1's current will go through Q1's collector and Q2's emitter, turning on Q2 and pulling the output low. Thus, the result after the inverter stage will be X AND NOT Y.

The 74181 uses an interesting circuit to generate NOT-AND. It uses the multi-emitter transistors but in a subtly different way from the AND gates.

Getting the die photos

To create die photos, the integrated circuit package must be opened to expose the silicon die inside. Most chips have an epoxy package, which can be dissolved in boiling sulfuric acid. Since I don't like boiling acid, I obtained the 74181 chip in a ceramic package, which is much easier to open.

The 74181 ALU chip in a ceramic package.

I tapped the chip along the seam with a chisel, splitting the two layers apart. Below, you can see how the metal pins are mounted between the layers, and are connected to the silicon die with tiny bond wires.

By tapping the 74181 chip with a chisel, the ceramic package can be popped open.

To photograph the die, I used a metallurgical microscope, a special type of microscope that shines light down through the lens to illuminate the chip from above. I took 22 photographs and then used the Hugin stitching software to combine them into a high-resolution image (details). Then, I removed the metal layer from the chip with hydrochloric acid and took more images, resulting in the image below. Removing the metal makes it easier to see the structure of the silicon layer and determine how the chip works. (Click for high-resolution version.)

Removing the metal layer of the 74181 chip with HCl reveals the silicon layer underneath.

Conclusion

The 74181 ALU chip is a complex, high-performance TTL chip that was a key component in the processor of many minicomputers. I took detailed die photos of the 74181 ALU that reveal how the chip works internally. It uses several different logic gates, primarily AND-OR-INVERT gates that have an efficient layout on the chip. These gates are implemented by extending an inverter circuit in different ways, but are more complex than their MOS equivalents. I plan to explain how the 74181 implements its 32 functions and fast carry in a future article, so keep watching.

I announce my latest blog posts on Twitter, so follow me at kenshirriff. I also have an RSS feed.

Notes and references

To understand what's inside a TTL chip, it might be more sensible to start with a simple chip such as a NAND gate. But why take the easy way when there's a complex chip to explore? ↩
Many of the 74181's 32 functions are strange, but there is actually a system behind it. Note that there are exactly 16 possible functions on two (one-bit) binary inputs A and B. (There are 4 lines in the truth table, and two choices for each output, so 2^4 possible functions.) The 74181's 16 logic functions are simply these 16 functions (extended to 4 bits). The 74181's 16 arithmetic functions are A PLUS (one of the 16 possible functions of A and B) PLUS carry-in. ↩
Various sources say the 74181 has 61 or 75 gates. The schematic shows 67 gates. If you omit the five 1-input AND gates, you get 62 gates, i. On the die, I counted 169 transistors, but it's quite possible I missed some. ↩
The history of the 74181 chip is described in detail on this site. The 74181 is apparently the first ALU chip created. In 1968, Fairchild introduced the 3800, an 8-bit accumulator chip, but it didn't have logical functions so technically it's just an AU (Arithmetic Unit) not an ALU like the 74181. Before the 74181 was the 7483 4-bit adder chip (1968); internally, the 7483 is similar to the lower half of the 74181. The 7483 was used in minicomputers such as the PDP 8/E. ↩
ALU chips of this era didn't perform multiplication or division, let alone floating point operations. Multiplication and division operations were common in computers of that era, but were typically performed with multiple cycles of addition or subtraction. The one operation that seems missing from the 74181 is "shift right"; it can do a shift left with "A PLUS A". ↩
Retro projects using the 74181 include the APOLLO181 CPU, Fourbit CPU, 4 Bit TTL CPU, Magic-1 (using the 74F381), TREX, Mark 1 FORTH and Big Mess o' Wires. ↩
When multiple 74181 chips are connected together for larger words, you can simply feed the carry-out of one chip into the carry-in of the next. For higher performance, the 74182 look-ahead carry generator could be used to compute the carries across multiple chips in parallel. Some minicomputers (such as the Xerox Alto) didn't use the 74182, while others (such as the Interdata 7/16) did. ↩
I'll give a brief overview of the chip's implementation here. The chip is build around the idea of carry lookahead. In particular, the upper and-or-invert gates create the carry P (propagate) and G (generate) signals for each bit of A PLUS f(A,B). The lower and-or-invert gates use these signals to compute the carry for each bit of the sum. Finally, the xor gates add the P, G and carry to compute each final sum. The point of this implementation is to compute the four bits in parallel and avoid a slow ripple carry. In a later post I'll explain this circuitry in full detail. ↩
I've simplified the discussion of the TTL logic circuit, since most people probably don't care about saturation, β, biasing, and so forth. If you want the full analysis, see Logic gates: the NOT gate or Transistor-Transistor Logic. This presentation shows schematics for the different gates and TTL logic families. ↩
The inverter schematic is from the datasheet of the common 7404 inverter chip. Interestingly, the basic circuit used in an inverter chip is almost identical to the circuit used inside the 74181. This turns out to be true for most of the 74181 circuits—they are similar to individual TTL parts. The 74181's transistors are a bit smaller because not as much current is required inside the chip, but there is much less scaling than you might expect. ↩
In the 74181's inverter, R4 is not used. R1 takes its place. This is probably because an inverter in the chip doesn't need to provide as much current as a 7404 inverter chip. ↩
The tricky part of the inverter circuit is that if Q2 turns on, there's enough voltage to turn on Q4 but not Q3, thanks to diode D2. The purpose of diode D2 isn't to conduct current in one direction, like you'd expect from a diode. Instead, its purpose is to raise Q3's emitter voltage by one diode drop (about 0.7V). As a consequence, Q3 requires 0.7V more at the base to turn on. Thus, when Q2 is active, there is enough voltage to turn Q4 on, but not Q3. And diode D1 simply protects the chip by shunting any negative input voltage to ground. ↩
The schematic is based on a diagram by Poil on WikiMedia, CC By-SA 3.0, with labeling changes and gates highlighted. ↩
The AND-OR-INVERT schematic is slightly modified from the 7451 AND-OR-INVERT chip to match the 74181's circuit. The 74181's AND-OR-INVERT circuits in the lower half of the chip omit the pull-up output transistor found in the 7451 since the 74181 doesn't require as much output current internally. (The output diode remains to drop the phase splitter transistor's collector voltage by one diode drop, or else the low output voltage will be too high.) ↩
The circuit used in the 74181 for exclusive-OR is similar to the 7486 TTL XOR chip. ↩
On the die, Q2 appears to have two collectors. But these are just two contacts to the same collector, to simplify routing of the wiring. This is unlike the multiple-emitter transistors, which genuinely have multiple emitters. ↩
Some datasheets (example) show an XOR gate instead of NOT-AND. You might wonder how this could possibly work since XOR and NOT-AND are different. The answer is that one of the four input combinations never happens in the 74181, and the gates are equivalent across the other three inputs. The physical implementation is NOT-AND rather than XOR. ↩

A computer's arithmetic-logic unit (ALU) is the heart of the processor, performing arithmetic and logic operations on data. If you've studied digital logic, you've probably learned how to combine simple binary adder circuits to build an ALU. However, the 8008's ALU uses clever logic circuits that can perform multiple operations efficiently. And unlike most 1970's microprocessors, the 8008 uses a complex carry-lookahead circuit to increase its performance.

The 8008 was Intel's first 8-bit microprocessor, introduced 45 years ago.1 While primitive by today's standards, the 8008 is historically important because it essentially started the microprocessor revolution and is the ancestor of the x86 processor family that you are probably using right now.2 I recently took some die photos of the 8008, which I described earlier. In this article, I reverse-engineer the 8008's ALU circuits from these die photos and explain how the ALU functions.

Inside the 8008 chip

The image below shows the 8008's tiny silicon die, highly magnified. Around the outside of the die, you can see the 18 wires connecting the die to the chip's external pins. The rest of the chip contains the chip's circuitry, built from about 3500 tiny transistors (yellow) connected by a metal wiring layer (white).

Die photo of the 8008 microprocessor, showing important functional blocks.

Many parts of the chip work together to perform an arithmetic operation. First, two values are copied from the registers (on the right side of the chip) to the ALU's temporary registers (left side of the chip) via the 8-bit data bus. The ALU computes the result, which is stored back into the accumulator register via the data bus. (Note that the data bus splits and goes around both sides of the ALU to simplify routing.) The carry lookahead circuit generates the carry bits for the sum in parallel for higher performance.3 This is all controlled by the instruction decode logic in the center of the chip that examines each machine instruction and generates signals that control the ALU (and other parts of the chip).

The Arithmetic-Logic Unit

The 8008's ALU implements four functions: Sum, AND, XOR and OR. The Sum operation adds two 8-bit numbers. The remaining three operations are standard Boolean logic operations. The AND operation sets an output bit if the bit is set in the first AND the second number. OR checks if a bit is set in the first OR the second number (or both). XOR (exclusive-or) checks if a bit is set in the first OR the second number (but not both).

The concept of carries during addition is a key part of the ALU. Binary addition in a processor is similar to grade-school long addition, except with binary numbers instead of decimal. Starting at the right, each column of two numbers is added and there can be a carry to the next column. Thus, in each column, the ALU adds two bits as well as a carry bit.

In most early microprocessors, addition of each column needs to wait until the column to the right has been added and the carry is available. The carry "ripples" through the bits, right to left, slowing the addition. The 8008, however, uses a fast carry-lookahead circuit3 to generate the carries for all 8 columns in parallel before the addition happens. Then all the columns can all be added in parallel without waiting for the carry to "ripple" through the sum. This carry-lookahead circuit is an unusual feature to see in an early microprocessor due to its complexity.

Since the 8008 is an 8-bit processor, the ALU operates on two eight-bit arguments. Most 8-bit processors (including the 8008) use a "bit-slice" construction for the ALU, with a one-bit ALU slice repeated eight times. Each one-bit ALU slice takes two input bits and the carry-in bit, and produces the output bit. In most 8-bit processors, the bit-slice ALU is arranged by stacking 8 rectangular ALU slices to form a compact, regular block. However, the 8008 has its eight ALU slices arranged in an irregular fashion—some blocks are even sideways—as shown in the diagram below. The motivation for this is that the carry lookahead circuit takes up a triangular space on the chip. To fit the remaining space better, the 8008's ALU is arranged into its unusual triangular layout.

Arrangement of the eight ALU slices on the 8008 microprocessor die. Unlike most processors, the 8008's ALU slices are arranged in a haphazard triangular arrangement. This fits better with the triangular carry-lookahead circuit above the ALU.

Zooming in on the die photo, we can look at one of the ALU slices and see how the circuitry is constructed. The chip is built from three layers (to simplify slightly). The topmost layer is the metal wiring. It is the most visible feature, and looks metallic (not surprisingly). In the detail below, you can see the horizontal and vertical metal traces. The polysilicon layer is underneath the metal layer and appears yellow/orange under the microscope. Polysilicon can act as wiring, but more importantly it forms the gates of the transistors, switching them on and off. The bottom layer is the grayish silicon die itself, but it is hard to see under the other layers.

Die photo of the 8008 processor, zoomed in on the circuit for one bit of the ALU.

In the diagram above, the carry c and the complemented a and b inputs enter through the metal wires at the top. The ALU output is at the bottom. The control signals are horizontal metal lines. The circuit is powered by the Vcc (+5 volts) and Vdd (-9 volts) metal lines. The brighter yellow polysilicon regions are transistors. Each gate in the circuit requires a "load resistor" connected to Vdd to pull its output low; for improved performance, these are implemented with transistors rather than resistors.

Removing the metal layer with acid makes the silicon and polysilicon layers more visible, as shown below.6 The chip is formed on a silicon wafer with regions of it "doped" with impurities to create regions of semiconducting silicon. You can see dark lines along the border between doped silicon and undoped silicon. A transistor is formed where a yellowish polysilicon wire crosses the doped silicon. The transistor forms a switch between the two silicon sides, controlled by the polysilicon gate. Each ALU slice contains 20 transistors; the diagram below points out two of them.5

With the metal layer removed from the 8008 processor die, the underlying silicon is visible. The photo shows bit 1 of the 8008's ALU.

Simulating one slice of the ALU

By examining the die photos carefully, you can map out the ALU slice's 20 transistors and their connections. From this, you can reverse-engineer the gates that make up the circuit. I explained in my previous article how PMOS gates are structured, so I won't go into the details here. The result is the schematic below, showing one bit of the ALU. Each ALU slice takes two inputs (a and b) and the input carry c, and outputs one result bit. There are three mode lines (m1, m2 and m3) that select one of the four ALU operations.7

The schematic below is interactive. First, select an operation and the table will update with results for the eight different inputs. Next, click a row in the table, and the schematic will update, showing how the ALU computes that row. (Note that the a and b inputs to the ALU are inverted, indicated by an overbar.)

Operation:

While this ALU slice looks like it is made of many gates, physically it is only three gates: two large, multilevel AND-OR-NAND gates and one NAND gate. The AND-OR-NAND logic is implemented on the chip as a single complex gate, rather than by combining simpler gates, since a single large gate provides better performance with less circuitry than multiple small gates. One feature of MOS logic is it's just as easy to form an AND-OR-NAND gate (for instance) as a plain NAND gate.

Understanding the ALU logic

The 8008's ALU circuit above looks like a mysterious collection of gates, but eventually I figured out the structure behind it. The starting point is a full adder that handles the Sum operation. (A full adder adds three input bits (a, b and c) and outputs the (low-order) sum bit and a carry bit.) The full adder is then heavily modified to support the logic operations, yielding the ALU from the previous section. The logic operations are implemented by using the mode lines to block parts of the circuit, yielding XOR, AND or OR, rather than the more complex Sum.

The diagram below strips down the 8008's ALU circuit to reveal the full adder "hidden" inside. The gate in red generates the carry-out from the three inverted inputs, using relatively straightforward logic. (Since the 8008 uses carry-lookahead, this carry-out signal isn't passed to the next ALU slice, but just used to generate the ALU output.) If you examine the possible sum cases, you will see that the sum bit is almost always just the carry-out inverted, except for the 0+0+0 and 1+1+1 cases. Thus, the sum bit can be generated by inverting the carry-out and handling the two exceptional cases.8 The two gates indicated below handle the exceptions by forcing the sum output to the correct value.

Simplified 8008 ALU slice, showing the full adder circuit.

Comparing the full adder with the full ALU circuit earlier shows how the mode lines support the logic operations. Once you have a full adder, generating XOR is simply a matter of setting the carry-in to 0, which is done by the m3 control line. For the OR and AND operations, mode lines m3 and m2 respectively disable all of the circuit except the gates labeled in green.9 Thus, if you start with a full-adder and extend it to support XOR, AND and OR, the 8008's ALU circuit is a logical result.

Intel's earlier 4004 microprocessor had a simple ALU that only supported addition and subtraction, not any logic operations.10 Interestingly, the 4004's ALU circuit is almost identical to the full adder circuit shown above. So it's very likely that Intel designed the 8008 ALU by extending the 4004 ALU as described above. This would explain why the 8008's ALU generates carries internally, even though the carry lookahead circuit made this redundant.11

The 8008's ALU logic is very similar to the Z80's ALU,12 although the Z80's ALU is (surprisingly) 4 bits (details). The 8085 uses a different complex gate arrangement. The 6502 on the other hand, uses an entirely different approach: straightforward circuits for addition, AND, OR, XOR and shift-right, using pass-transistor multiplexers to select the operation.

Instruction decoding: how the ALU knows what operation to do

The 8008 executes 8-bit instructions, which move data, perform I/O, branch, call subroutines, and so forth. The instruction decoding logic examines the instruction and determines what operation to perform, generating about 30 control signals.13 Over a quarter of the instructions perform ALU operations, and the instruction set is carefully designed so three bits of the instruction specify which of the eight operations to perform.14 By examining these bits, the instruction decoder generates the ALU's mode control lines m1, m2 and m3.

Looking at AND instructions illustrates how this works. All AND instructions have the bit pattern xx100xxx (where x is either 0 or 1). For instance, the instruction to AND with memory is 10100111 and the instruction to AND with a constant is 00100100. When the instruction decode circuit matches this pattern, it pulls the m1 control line low, which causes the ALU to perform an AND operation.7 Other bit patterns generate the other ALU control signals.15

Part of the 8008's instruction decode PLA. The three indicated transistors match opcode pattern XX100XXX, indicating an AND instruction.

The diagram above shows part of the instruction decode circuit. The instruction bits (and their complements) are on yellow polysilicon wires running vertically through the circuit. Each row matches a bit pattern, with a transistor connected to each instruction bit to be matched. (The doped silicon regions forming transistors are the black outlines. Circles are connections between a transistor and the row's metal line.) For example, the three transistors marked with arrows match bit 3 low, bit 4 low, and bit 5 high, detecting the AND instruction pattern. Thus, the processor uses the grid of transistors in the instruction decoder to determine the meaning of each instruction.

Loose ends: Subtraction and rotating

The ALU implements a Sum operation, so you might wonder how subtraction is implemented. By using two's complement arithmetic, the CPU can perform subtraction by simply flipping all the bits on a value and then adding it. The ALU uses two temporary registers to hold the two operands since the ALU can't read the operands from the register file and write the result back simultaneously. One of the temporary registers has the feature that its value can be fed to the ALU directly or inverted. The subtraction instructions generate a signal causing the temporary register to provide the inverted value to the ALU, causing the ALU to perform subtraction.

One important operation in most processors is rotating or shifting the bits in a value, to the left or to the right. In most of the microprocessors I've examined, shifting is performed by the ALU.16 The 8008, on the other hand, implements the rotate logic in the register access circuit, on the opposite side of the chip from the ALU. When reading a register, the bits can be shifted one position left or right by a simple circuit before going onto the data bus.

History of the 8008

The Intel 8008 is important historically since it is the ancestor of the dominant Intel x86 architecture that you're probably using right now.2 I wrote a detailed article for the IEEE Spectrum on early microprocessor history, so I'll just give the outline of the 8008's complicated history here.

The 8008 copies the instruction set and architecture of the Datapoint 2200, a popular minicomputer introduced in 1970 as a programmable terminal.17 As was typical for minicomputers, the Datapoint 2200 contained a CPU build from individual TTL chips, filling up a circuit board. Datapoint contracted with both Intel and Texas Instruments to build a single-chip CPU that would replace this processor board, but keeping the same architecture and instruction set.

The Datapoint 2200 computer. The 8008 microprocessor was built to implement the Datapoint 2200's architecture and instruction set. Photo courtesy of Austin Roche.

Texas Instruments was first to build a 2200-compatible microprocessor, creating the TMC 1795 chip. Intel got their version, the 8008, working a bit later, around the end of 1971. Datapoint rejected both processors, instead updating the Datapoint 2200 to use the 74181 TTL ALU chip. Texas Instruments couldn't find a new customer for the TMC 1795 and abandoned it. Intel, on the other hand, came up with the idea of selling the 8008 as a well-supported general-purpose processor. The 8008 led to the 8080, the 8085, 8086, and Intel's x86 line, which still retains some features of the 8008.

Conclusion

Although the 8008 was a very early microprocessor, its ALU was more advanced than you might expect. In particular, it used a complex carry-lookahead circuit for higher performance. Unfortunately, even with the carry-lookahead circuit, the 8008 was slower than the TTL-based Datapoint 2200 processor it was supposed to replace; addition took 20µs on the 8008, compared to 16µs on the original Datapoint 2200 and just 3.2µs on the upgraded Datapoint 2200. This illustrates the speed advantage that TTL had over MOS in the early 1970s. To us, a microprocessor may seem obviously better than a board of chips, but this wasn't always the case.

If you're interested in the 8008, my previous article has a detailed discussion of the architecture, more die photos and information on how to take them, and information on semiconductor history, so take a look.

I announce my latest blog posts on Twitter, so follow me at kenshirriff. I also have an RSS feed.

Notes and references

The 8008 chip was publicly announced in an article in Electronics on March 13, 1972, entitled "8-bit parallel processor offered on a single chip", offering the chips for $200 each. ↩
If you're not using an x86 processor right now, you're probably using an ARM processor. Don't feel neglected, though, since I've reverse-engineered the ARM-1 too. (Although there are many more ARM chips out there than x86, analytics show 71% of my readers are on x86.) ↩
Using a carry look ahead circuit avoids the delay from a standard ripple-carry adder, where the carries propagate through the sum. The 8008's carry-lookahead is based on the Manchester carry chain, but with a separate carry chain for each carry, yielding the triangular structure you see on the die. For performance, the carry chain is implemented with dynamic logic, depending on wire capacitance, rather than with standard Boolean gates. The 74181 ALU chip in comparison, uses a different carry lookahead scheme implemented with standard logic. I plan to write more about the 8008's carry lookahead later. ↩
The 8008 implements eight different arithmetic/logic functions: Add, Add with carry, Subtract, Subtract with borrow, AND, XOR, OR, and Compare.14 These are implemented in terms of the ALU's four basic operations. Subtraction is performed by inverting the second argument. The operations without carry/borrow clear the carry-in bit. Compare is simply a subtraction that doesn't store the result; it just sets the flags with the status. Thus, the four fundamental operations of the ALU are used to implement eight different arithmetic/logic operations. ↩
Note that the 8008 uses PMOS transistors, rather than the faster NMOS transistors in later microprocessors such as the 8080, 6502 and Z80. If you're familiar with NMOS circuits, PMOS can be confusing since everything is backwards. PMOS transistors turn on if the gate is low, and typically pull the output high. Vdd in PMOS is negative, and "ground" is positive. The "pull-up resistor" in a PMOS gate pulls the output down. A PMOS NAND gate has transistors in parallel (compared to serial for an NMOS NAND gate). A PMOS NOR gate has transistors in serial (compared to parallel for an NMOS NOR gate). ↩
The metal layer of the chip is protected by silicon dioxide passivation layer. The professional way to remove this layer is with dangerous hydrofluoric acid. Instead, I used Armour Etch glass etching cream, which is slightly safer and can be obtained at craft stores. I applied the etching cream to the die and wiped it for four minutes with a Q-tip. (Since the cream is designed for frosting glass, it only etches in spots. It must be moved around to obtain a uniform etch.) After this, I soaked the die in hydrochloric acid (pool acid from the hardware store) overnight to dissolve the metal. This was probably too long, since the edges of the polysilicon were eaten away in places. ↩
The following values are used for the three mode lines to select the ALU function:
Operation m1 m2 m3
Sum 1 1 1
And 0 1 0
Or 1 0 0
Xor 1 1 0
↩
A more straightforward way of generating the sum bit is by xoring the three inputs: a⊕b⊕c. Unfortunately, an XOR gate is relatively difficult to implement with Boolean logic, so designers will often try to avoid XOR. ↩
You might wonder why the OR operation is implemented with an AND gate, and vice versa. Since the inputs and the output of the OR gate are inverted, this is equivalent to an AND gate (by De Morgan's laws), and similarly for the AND gate. ↩
Strictly speaking, the 4004 microprocessor has an AU (arithmetic unit), not an ALU (arithmetic/logic unit), since it doesn't do logical operations. Since the 4004 was designed for a calculator, logical operations weren't required. ↩
The 8008's full adder generates the carry-out first, and generates the sum from that. In contrast, the typical full adder circuit combines two half adders to generate the sum and carry-out separately. If the typical full adder circuit had been used in the 8008, the carry-out logic could easily be omitted. ↩
To see the similarity between the Z80's ALU circuit and the 8008's, you need to swap AND and OR gates. (Apply De Morgan's laws since the 8008's ALU inputs are inverted.) In the Z80, the carry-out comes from the ALU rather than a carry-lookahead circuit, so the control lines are somewhat different. But the fundamental ALU circuit is otherwise the same between the 8008 and Z80, which is not surprising since Federico Faggin worked on both chips. ↩
Instruction decoding is based on a Programmable Logic Array (PLA), an arrangement of transistors that efficiently implements logic gates. These gates match bit patterns and generate the appropriate control signals for the rest of the chip. The 8008's PLA has 16 input lines flow vertically through the PLA. Each row in the PLA matches a bit pattern and generates a control signal output.
In more detail, each row output line is pulled low by a load resistor/transistor to Vdd. The transistors are connected between the row line and Vcc (+5V). The bit lines are connected to the transistor's gate. If any bit line is low (indicating a mismatch), the PMOS transistor turns on, pulling the row line high. Thus, if there is no mismatch, the control line is low, and if there is a mismatch, the control line is high. In other words, each row is a NAND gate with instruction bit inputs.
The input lines are ordered as follows: bit 3, bit 3 complement, 4, 4', 5, 5', 0, 0', 1, 1', 2, 2', 6, 6', 7, 7'. This order may seem strange, but there's a reason for it. In the 8008, the ALU operation is selected by bits 3, 4 and 5 of the instruction. By putting those bits on the left side of the PLA, they are closer to the ALU. Some rows of the PLA actually decode two instructions: bits 3, 4 and 5 are decoded on the left side, generating an ALU control signal, while the remaining bits are decoded on the right side generating a different control signal. This increases the PLA density and saves space on the chip. ↩
The 8008's instruction set is designed around octal. Among other things, there are 8 ALU operations, 8 registers and 8 conditionals. In octal, the ALU instructions have the value 2ar, where a is the ALU operation to perform (0 through 7) and r is the register to use (0 through 7, where 7 indicates memory). The octal structure originates with the Datapoint 2200, which decoded instructions with TTL 7442 BCD chips that decoded groups of three bits. This octal structure persisted in descendants of the 8008, including the Z80 and x86. Unfortunately, these instruction sets are almost always presented in hexadecimal, which hides the underlying structure. ↩
The instruction decoder generates all the signals required by the ALU. As described above, AND matches xx100xxx, pulling the m1 control signal low. An OR opcode has the bit pattern xx110xxx, which causes the instruction decode circuit to pull the m2 control line low. An XOR instruction has the bit pattern xx101xxx. The m3 control line is pulled low for patterns xx10xxxx or xx1x0xxx, matching AND, OR or XOR instructions. The subtract (with and without borrow) instructions match xx01xxxx, generating a signal that inverts the second argument. ↩
Different processors use a variety of techniques for shifting. In the Z80, shifting is performed as data enters the ALU. The 6502 performs a left shift with "A plus A", and has a path inside the ALU for right shifts; the 8085 is similar. The ARM-1 has a barrel shifter next to the ALU that performs arbitrary shifts. ↩
The instruction set of the Datapoint 2200 is described in the Reference Manual. The 8008 has a couple minor changes. For instance, the 8008 has increment and decrement instructions that are not present in the 2200. ↩

Operation	m1	m2	m3
Sum	1	1	1
And	0	1	0
Or	1	0	0
Xor	1	1	0

The revolutionary Intel 8008 microprocessor is 45 years old today (March 13, 2017), so I figured it's time for a blog post on reverse-engineering its internal circuits. One of the interesting things about old computers is how they implemented things in unexpected ways, and the 8008 is no exception. Compared to modern architectures, one unusual feature of the 8008 is it had an on-chip stack for subroutine calls, rather than storing the stack in RAM. And instead of using normal binary counters for the stack, the 8008 saved a few gates by using shift-register counters that generated pseudo-random values. In this article, I reverse-engineer these circuits from die photos and explain how they work.

The image below shows the 8008's tiny silicon die, highly magnified. Around the outside of the die, you can see the 18 wires connecting the die to the chip's external pins. The 8008's circuitry is built from about 3500 tiny transistors (yellow) connected by a metal wiring layer (white). This article will focus on the stack circuits on the right side of the chip and how they interact with the data bus (blue).

The die of the Intel 8008 microprocessor, showing the stack and other important subcomponents.

For the 8008 processor's birthday, I'm using the date of its first public announcement, an article in Electronics on March 13, 1972 entitled "8-bit parallel processor offered on a single chip." This article described the 8008 as a complete central processing unit for use in "intelligent terminals" and stated that chips were available at $200 each.1

You might think that an intelligent terminal is a curiously specific application for the 8008 processor. There's an interesting story behind that, going back to the roots of the chip: the Datapoint 2200 "programmable terminal", introduced in June 1970. The popular Datapoint 2200 was essentially a desktop minicomputer with its processor consisting of a board full of simple TTL chips. The photo below shows the CPU board from the Datapoint 2200. The chips are gates, flip flops, decoders, and so forth, combined to build a processor, since microprocessors didn't exist at the time.

The processor board from the Datapoint 2200. The 8008 microprocessor was created to replace this board, but was never used by Datapoint. Photo courtesy of unknown source.

Processors typically use a stack to store addresses for subroutine calls, so they can "pop" the return address off the stack. This stack is usually stored in main memory. However, the Datapoint 2200 used slow shift-register memory2 instead of expensive RAM for its main storage, so implementing a stack in main memory would be slow and inconvenient. Instead, the Datapoint 2200's stack was stored in four i3101 RAM chips, providing a small stack of 16 entries. 34 The i3101 was Intel's very first product, and held just 64 bits. In the photo above, you can see the chips in their distinctive white packaging each with a large "i" for Intel. 5

To keep track of the top of the stack, the Datapoint 2200 used a 4-bit up/down counter chip to hold the stack pointer. The clever thing about this design is there's no separate program counter (PC) and stack; the PC is simply the value at the top of the stack. You don't need to explicitly push and pop the PC onto the stack; for a subroutine call you just update the counter and write the subroutine address to the stack.

The story of the 8008's origin is that Datapoint went to Intel and asked if Intel could build a chip that combined the stack memory and the stack pointer onto a single chip. Intel said not only could they do that, they could put the whole processor board onto a single chip! This was the start of Intel's 8008 project to duplicate the Datapoint 2200's processor board onto a chip, keeping the Datapoint 2200 instruction set and architecture.6 After various delays, Intel completed the 8008 microprocessor, but Datapoint rejected it. Intel decided to sell the 8008 as a general-purpose processor chip, sparking the microprocessor revolution. Intel improved the 8008 with the 8080 and then the 16-bit 8086, leading to the x86 architecture that dominates desktop and server computers today.

The consequence of the 8008's history is that it inherited its architecture and instruction set from the Datapoint 2200 intelligent terminal. One of these features was the fixed, internal stack. But the 8008's implementation of that stack is unusual.

Shift-register counter

The most unexpected part of the 8008's stack is how it keeps track of the current position. The straightforward way to implement the stack would be with a binary up/down counter to keep track of the current stack position (which is what the Datapoint 2200 did). But to save a few transistors, the 8008 uses a nonlinear feedback shift register instead of a counter. The result is the stack entries are accessed in a pseudo-random order! But since they are read and written in the same order, everything works out fine.

The shift register outputs are based on a de Bruijn sequence, a cyclic sequence in which every possible output occurs as a subsequence exactly once. The 8008's de Bruijn sequence is shown below. The first value (000) is underlined in red. Shifting to the blue position yields the second value (001). Proceeding around the circle clockwise yields all eight values in the sequence: 000, 001, 010, 101, 011, 111, 110, 100 and finally back to 000. Note that each value appears exactly once, but they are not in standard binary order.

This de Bruijn sequence contains all eight 3-bit values as subsequences. 000 and 001 are underlined. The Intel 8008's internal counters are built form this sequence.

At each step in the sequence, the last two bits are shifted to the left and a new bit is placed on the right. Counting down is the converse: the first two bits are shifted to the right and a new bit is placed one the left. This process can be implemented with a shift register, a circuit that allows a bit sequence to be shifted and an additional bit inserted.7

The diagram below shows how the 8008 implements the nonlinear feedback shift register counter. While it make look complex, it's a straightforward implementation of the de Bruijn sequence. The three latches in the middle form a shift register, with each latch holding one bit. To count up, each bit is shifted to the left and a new bit is added on the right (green arrows). To count down, each bit is shifted to the right and a new bit is added on the left (purple arrows). The logic gate on the left generate the "new" bit for counting down and the gates on the right generate the new bit for counting up.

The 8008 uses the above circuit for its internal stack counter. The refresh counter is based on this, but counts up only.

The logic gates may appear complex. However, one feature of PMOS logic is it's as simple to build an AND-OR-NOR gate as a plain NOR gate, just by wiring transistors in parallel or series. Designing the logic is also straightforward: for each triple of current bits, the de Bruijn sequence specifies the next bit. If you've studied digital logic, Karnaugh maps can be used to create the logic circuits to generate the desired next bit.

Inside the stack storage

The 8008 uses dynamic RAM (DRAM) to for its stack storage and its registers. The other 1970s microprocessors that I've examined use static latches, so the 8008 is a bit unusual in this regard. Since Intel was primarily a RAM company at the time, I assume they wanted to leverage their RAM skills and save transistors by using DRAM.

Each bit of storage in the 8008 uses a cell with three transistors and one capacitors, called a 3T1C cell, similar to the cell in Intel's i1103 DRAM chip. The diagram below shows a closeup of the 8008's stack storage, with six DRAM cells visible. Each row is one 14-bit address in the stack. Each row has a read enable and write enable control line coming from the left. Each column stores one of the 14 bits; the column sense line is used to read and write the selected bit.

Detail of the Intel 8008 microprocessor's die, showing six storage cells for the stack registers. Each bit is stored with a DRAM cell consisting of three transistors and a capacitor.

The transistors for the first cell are labeled T1, T2 and T3. The value is stored on the capacitor labeled C. (There is no separate physical capacitor; the capacitance of the wiring is sufficient to store the bit.)

To write a bit, the write line for the desired row is pulled low, turning on T1. The desired voltage (low or high) is fed onto the sense line, passes through T1, and is stored by the capacitor. To read the value, the appropriate read line is pulled low, turning on T3. If C has a low voltage, T2 is turned on. This connects the sense line to ground through T3 and T2. On the other hand, if C has a high voltage, T2 is turned off and the sense line is not grounded. Thus, the circuitry connected to the sense line can tell what bit value is stored on C.

The inconvenience with dynamic RAM is that values can only be stored temporarily. After a few hundred microseconds, the charge stored on capacitor C will leak away and the value will be lost. The solution is a refresh circuit that periodically reads each value and writes it back, before the bit fades away. (A similar refresh process is used by your computer's RAM.) The 8008's internal RAM is refreshed at least every 240 microseconds, ensuring that bits are not lost. (Static RAM, on the other hand, uses a larger, more complex circuit for each bit, but will preserve the bit as long as the circuit is powered up.)

In the 8008, the stack storage (and the registers) are refreshed by continuously stepping through each entry: reading it and writing it back. To accomplish this, a second 3-bit shift-register counter is used as a refresh counter, tracking the current position that is being refreshed. The circuit for this is the same as the stack counter, except it omits the logic to count down, as it only needs to count in one direction.9

Understanding the die photo

I'll briefly explain what you're looking at in the die photo above. The chip itself is made from a silicon wafer. Plain silicon is essentially an insulator, but by doping it with impurities, it becomes a semiconductor. The dark lines indicate the boundary between doped and undoped regions; the doped silicon in the first cell is indicated in red.

On top of the silicon is the polysilicon layer, which is the yellowish stripes. Polysilicon acts as a conductor and is used as internal wiring of the chip. More importantly, a transistor is created when polysilicon crosses doped silicon. A thin oxide layer separates the polysilicon from the silicon, forming the transistor's gate. A low voltage on the polysilicon gate causes the transistor to conduct, connecting the two sides (called source and drain) of the transistor. A high voltage on the gate turns the transistor off, disconnecting the two sides. Thus, the transistor acts as a switch, controlled by the gate.

The top layer of the chip is the metal layer, which is also used as wiring. For the photo above, I removed the metal layer with hydrochloric acid to make the underlying silicon more visible. The green, blue and gray lines indicate where the metal wiring was before being removed. Transistors T1 and T3 are connected to the sense line (blue), while transistor T2 is connected to ground (green). The read and write lines enter the circuit on the left as metal wiring, connected to polysilicon lines.

The interface between the stack and the data bus

To access memory, the address in the stack must be provided to external memory via the 8 data/address pins on the chip. These pins are connected to the stack (and other parts of the 8008) via the data bus. The die photo below shows the circuitry that interfaces the 14-bit stack storage to the 8-bit data bus.11 At the top of the photo are the metal control lines and three of the data bus lines. At the bottom are the sense lines, discussed earlier, from the stack storage. In between are the transistors (orange) that connect the data bus and the stack.

The control lines select the low (L) or high (H) half of the address. These activate the appropriate read or write transistors, connecting the appropriate stack columns to the data bus.

The stack /bus driver circuit provides the "glue" between the data bus and the stack DRAM storage.

The transistors to write an address to the data bus are much larger than typical transistors, appearing as vertical yellow bars in the die photo. The reason for this is the data bus passes through the whole chip. Due to the length of the bus, it has relatively high capacitance and larger, high-current transistors are required to drive a signal on the data bus.

Near the bottom of the photo are the inverter amplifiers. Each sense line is attached to an inverter that boosts the signal from the stack storage. During refresh, this boosted signal is written back, strengthening the bit stored on the capacitor.10

Conclusion

By examining die photos, it is possible to reverse-engineer the 8008 microprocessor. One unusual feature of the 8008 is that instead of using standard binary counters internally, it saves a few gates by using shift-register counters. Although these count in a pseudo-random order rather than sequentially, the 8008 still functions correctly. One counter is used for the on-chip address stack. The 8008 also uses DRAM internally for stack storage and register storage, requiring a second counter to refresh the DRAM. Since every transistor was precious at the dawn of the microprocessor age, the 8008 has these interesting design decisions that produced compact circuitry.

If you're interested in the 8008, my previous article has a detailed discussion of the architecture, more die photos and information on how to take them. This article explains the 8008's ALU.

I announce my latest blog posts on Twitter, so follow me at kenshirriff. I also have an RSS feed.

Notes and references

The first announcement of the 8008 microprocessor in Electronics is shown below (click for a larger version). The announcement called the chip a "parallel processor", a term that had a different meaning back then, indicating that the processor operated on all 8 bits at the same time. This was in contrast to serial processors (such as the Datapoint 2200) that handled one bit of the word at a time.)
↩
The 8008 chip was announced in Electronics on March 13, 1972: "8-bit parallel processor offered on a single chip."
In 1970, RAM memory chips were extremely expensive: $99.50 for an i3101 chip with just 64 bits of storage. Shift-register memory was cheaper and denser, with 512 bits of storage in an Intel 1405 chip. The big disadvantage is the bits were circulated around and around inside the chip, with only one bit available at a time. Sequential access wasn't a problem, but if you wanted to read memory out of order, you might need to wait half a millisecond for the right bit to circle around. I wrote about shift-register memories in detail here, with detailed die photos. ↩
The i3101 memory was called the 3101 due to Intel's part numbering system at the time, described in Intel Technology Journal, Q1 2001. To summarize, the first digit indicate the product family: 1xxx is PMOS, 2xxx is NMOS, 3xxx is bipolar and so forth. The second digit indicates the product type: 1 is RAM, 2 is a controller, 3 is ROM, and so forth. The last two digits are sequence numbers typically starting with 01. Thus, the first bipolar RAM was the 3101.
During development, the 8008 chip was called the 1201, following Intel's naming scheme: the 1 indicated the chip was built from PMOS technology, the 2 indicated a custom chip and the 01 was a serial number. Fortunately, when it came time to market microprocessors, Intel decided that marketing was more important than systematic numbering: Intel's 4-bit microprocessor became the 4004 and their 8-bit microprocessor the 8008. ↩
Intel introduced the i3101 chip in April 1969. The i3101 RAM chip was a static memory chip, rather than the dynamic RAM chips common today. It was also built from Schottky TTL technology, rather than MOS used in modern RAM chips. Other companies, such as National Semiconductor, Signetics and Fairchild, made 64-bit memory chips compatible with the Intel i3101. However, they typically used the standard 74xx numbering scheme, calling the chip the 7489. ↩
Although the Datapoint's stack could hold 16-bit values, the Datapoint 2200 only used 13 address bits, supporting a maximum of 8K of memory. The 8008 expanded the address range to 14 bits, supporting 16K of memory, which was a huge amount at that time. However, the 8008's internal stack was only 8 values, rather than the 16 of the Datapoint 2200. ↩
Texas Instruments heard that Intel was designing a processor for Datapoint and asked Datapoint if they could build a processor for Datapoint too. TI beat Intel to the finish, creating the TMC 1795 processor before Intel completed the 8008, largely because Intel put the 8008 on the back burner. After Datapoint rejected TI's microprocessor, TI tried to find a new customer for the chip. TI was unsuccessful, and the TMC 1795 was abandoned and mostly forgotten. I've written about the TI chip in more detail here. ↩
You may be familiar with linear-feedback shift registers (LFSRs), which can be used as pseudo-random number generators or noise generators. With N stages, a LFSR can generate 2^N-1 output values. The de Bruijn sequence is generated from a nonlinear-feedback shift register. Nonlinear-feedback shift registers are a generalization of LFSRs; by using more complex feedback circuitry than just XOR, a nonlinear feedback shift register can generate sequences of arbitrary length. In particular, it can generate a sequence of 2^N values, while a LFSR is limited to 2^N-1. ↩
Nonlinear feedback shift registers seem pretty obscure. The only other use I've seen is the TMS 0100 calculator chip, which generates an internal sequence of length 11. For information on the theory, see The Synthesis of Nonlinear Feedback Shift Registers and Counting with Nonlinear Binary Feedback Shift Registers. The book Shift Register Sequences goes into great detail on linear and nonlinear sequences; Section VII:5 is probably most relevant, describing how to make a shift register cycle of any length.
The TMS 1000 microcontroller saves a few gates by using a LFSR for the program counter. Instead of incrementing, the PC goes through a pseudo-random sequence. The code is stored in the ROM in the same sequence; everything works out, but it seems like a strange way to implement a program counter. ↩
I was expecting the stack counter and refresh counter to have a regular layout on the chip, with a single shift register stage repeated three times. However, on the 8008 die, the transistors are arranged irregularly, scattered around where there was room. Presumably this made the layout more compact. ↩
Since the signal read from stack storage passes through an inverter before being written back, you might expect the bit to get flipped. The explanation is that transistor T2 in the storage cell inverts the value on C. Thus, the value read from a sense line is inverted compared to the value written on the sense line. The inverter amplifier provides a second inversion, restoring the original value. ↩
Each 8008 instruction takes multiple clock cycles to execute. An instruction is broken into one or more machine cycles; each machine cycle typically corresponds to one memory access for instruction or data. Each machine cycle consists of up to 5 states (T1 through T5). An address is transmitted to memory during state T1 and T2, and the memory location is read or written during T3. Each T state requires two clock cycles, so an 8008 instruction takes a minimum of 10 clock cycles. The Intel 8008 user's manual provides detailed timings. ↩

The 74181 ALU (arithmetic/logic unit) chip powered many of the minicomputers of the 1970s: it provided fast 4-bit arithmetic and logic functions, and could be combined to handle larger words, making it a key part of many CPUs. But if you look at the chip more closely, there are a few mysteries. It implements addition, subtraction, and the Boolean functions you'd expect, but why does it provide several bizarre functions such as "A plus (A and not B)"? And if you look at the circuit diagram (below), why does it look like a random pile of gates rather than being built from standard full adder circuits. In this article, I explain that the 74181's set of functions isn't arbitrary but has a logical explanation. And I show how the 74181 implements carry lookahead for high speed, resulting in its complex gate structure.

Schematic of the 74LS181 ALU chip, from the datasheet. The internal structure of the chip is surprisingly complex and difficult to understand at first.

The 74181 chip is important because of its key role in minicomputer history. Before the microprocessor era, minicomputers built their processors from boards of individual chips. A key part of the processor was the arithmetic/logic unit (ALU), which performed arithmetic operations (addition, subtraction) and logical operations (AND, OR, XOR). Early minicomputers built ALUs out of a large number of simple gates. But in March 1970, Texas Instruments introduced the 74181 Arithmetic / Logic Unit (ALU) chip, which put a full 4-bit ALU on one fast TTL chip. This chip provided 32 arithmetic and logic functions, as well as carry lookahead for high performance. Using the 74181 chip simplified the design of a minicomputer processor and made it more compact, so it was used in many minicomputers. Computers using the 74181 ranged from the popular PDP-11 and Xerox Alto minicomputers to the powerful VAX-11/780 "superminicomputer". The 74181 is still used today in retro hacker projects.1

The 74181 implements a 4-bit ALU providing 16 logic functions and 16 arithmetic functions, as the datasheet (below) shows. As well as the expected addition, subtraction, and Boolean operations, there are some bizarre functions such as "(A + B) PLUS AB".

The datasheet for the 74181 ALU chip shows a strange variety of operations.

So how is the 74181 implemented and why does it include such strange operations? Is there any reason behind the 74181's operations, or did they just randomly throw things in? And why are the logic functions and arithmetic functions in any particular row apparently unrelated? I investigated the chip to find out.

The 16 Boolean logic functions

There's actually a system behind the 74181's set of functions: the logic functions are the 16 possible Boolean functions f(A,B). Why are there 16 possible functions? If you have a Boolean function f(A,B) on one-bit inputs, there are 4 rows in the truth table. Each row can output 0 or 1. So there are 2^4 = 16 possible functions. Extend these to 4 bits, and these are exactly the 16 logic functions of the 74181, from trivial 0 and 1 to expected logic like A AND B to contrived operations like NOT A AND B. These 16 functions are selected by the S0-S3 select inputs.

Arithmetic functions

The 74181's arithmetic operations are a combination of addition, subtraction, logic operations, and strange combinations such as "A PLUS AB PLUS 1". It turns out that there is a rational system behind the operation set: they are simply the 16 logic functions added to A along with the carry-in.2 That is, the arithmetic functions are: A PLUS f(A,B) PLUS carry-in. For example, If f(A,B)=B, you get simple addition: A PLUS B PLUS carry-in. If f(A,B) = NOT B, you get A PLUS NOT B PLUS carry-in, which in two's-complement logic turns into subtraction: A MINUS B MINUS 1 PLUS carry-in.

Other arithmetic functions take a bit more analysis. Suppose f(A,B) = NOT (A OR B). Then each bit of A PLUS f(A,B) will always be 1 except in the case where A is 0 and B is 1, so the result of the sum is A OR NOT B. Even though you're doing addition, the result is a logical function since no carry can be generated. The other strange arithmetic functions can be understood similarly.3

Thus, the 16 arithmetic functions of the 74181 are a consequence of combining addition with one of the 16 Boolean functions. Even though many of the functions are strange and probably useless, there's a reason for them. (The Boolean logic functions for arithmetic are in a different order than for logical operations, explaining why there's no obvious connection between the arithmetic and logical functions.)

Carry lookahead: how to do fast binary addition

The straightforward but slow way to build an adder is to use a simple one-bit full adders for each bit, with the carry out of one adder going into the next adder. The result is kind of like doing long addition by hand: in decimal if you add 9999 + 1, you have to carry the 1 from each column to the next, which is slow. This "ripple carry" makes addition a serial operation instead of a parallel operation, harming the processor's performance. To avoid this, the 74181 computes the carries first and then adds all four bits in parallel, avoiding the delay of ripple carry. This may seem impossible: how can you determine if there's a carry before you do the addition? The answer is carry lookahead.

Carry lookahead uses "Generate" and "Propagate" signals to determine if each bit position will always generate a carry or can potentially generate a carry. For instance, if you're adding 0+0+C (where C is the carry-in), there's no way to get a carry out from that addition, regardless of what C is. On the other hand, if you're adding 1+1+C, there will always be a carry out generated, regardless of C. This is called the Generate case. Finally, for 0+1+C (or 1+0+C), there will be a carry out if there is a carry in. This is called the Propagate case since if there is a carry-in, it is propagated to the carry out.4 Putting this all together, for each bit position you create a G (generate) signal if both bits are 1, and a P (propagate) signal unless both bits are 0.

The carry from each bit position can be computed from the P and G signals by determining which combinations can produce a carry. For instance, there will be a carry from bit 0 to bit 1 if P₀ is set (i.e. a carry is generated or propagated) and there is either a carry-in or a generated carry. So C₁ = P₀ AND (C_in OR G₀).

Higher-order carries have more cases and are progressively more complicated. For example, consider the carry in to bit 2. First, P₁ must be set for a carry out from bit 1. In addition, a carry either was generated by bit 1 or propagated from bit 0. Finally, the first carry must have come from somewhere: either carry-in, generated from bit 0 or generated from bit 1. Putting this all together produces the function used by the 74181: C₂ = P₁ AND (G₁ OR P₀) AND (C₀ OR G₀ OR G₁).

As you can see, the carry logic gets more complicated for higher-order bits, but the point is that each carry can be computed from G and P terms and the carry-in. Thus, the carries can be computed in parallel, before the addition takes place.5

Creating P and G with an arbitrary Boolean function

The previous section showed how the P (propagate) and G (generate) signals can be used when adding two values. The next step is to examine how P and G are created when adding an arbitrary Boolean function f(A, B), as in the 74181. The table below shows P and G when computing "A PLUS f(A,B)". For instance, when A=0 there can't be a Generate, and Propagate depends on the value of f. And when A=1, there must be a Propagate, while Generate depends on the value of f.

A	B	A PLUS f(a,b)	P	G
0	0	0+f(0,0)	f(0,0)	0
0	1	0+f(0,1)	f(0,1)	0
1	0	1+f(1,0)	1	f(1,0)
1	1	1+f(1,1)	1	f(1,1)

In the 74181, the four f values are supplied directly by the four Select (S pin) values, resulting in the following table:6

A	B	A PLUS f	P	G
0	0	0	S1	0
0	1	1	S0	0
1	0	1	1	S2
1	1	10	1	S3

The chip uses the logic block below (repeated four times) to compute P and G for each bit. It is straightforward to verify that it implements the table above. For instance, G will be set if A is 1, B is 1 and S3 is 1, or if A is 1, B is 0 and S2 is set.

This circuit computes the G (generate) and P (propagate) signals for each bit of the 74181 ALU chip's sum. The S0-S3 selection lines select which function is added to A.

Creating the arithmetic outputs

The addition outputs are generated from the internal carries (C0 through C3), combined with the P and G signals. For each bit, A PLUS f is the same as P ⊕ G, so adding in the carry gives us the full 4-bit sum. Thus, F0 = C0 ⊕ P0 ⊕ G0, and similarly for the other F outputs.7 On the schematic, each output bit has two XOR gates for this computation.

Creating the logic outputs

For the logic operations, the carries are disabled by forcing them all to 1. To select a logic operation, the M input is set to 1. M is fed into all the carry computation's AND-NOR gates, forcing the carries to 1. The output bit sum as as above, producing A ⊕ f ⊕ 1 = A ⊕ f. This expression yields all 16 Boolean functions, but in a scrambled order relative to the arithmetic functions.8

Interactive 74181 viewer

To see how the circuits of the 74181 work together, try the interactive schematic below.9 The chip's inputs are along the top and right; click on any of them to change the value. The A and B signals are the two 4-bit arguments. The S bits on the right select the operation. C is the carry-in (which is inverted). M is the mode, 0 for logic operations and 1 for arithmetic operations. The dynamic chart under the schematic describes what operation is being performed.

The P and G signals are generated by the top part of the circuitry, as described above. Below this, the carry lookahead logic creates the carry (C) signals by combining the P and G signals with the carry-in (Cn). Finally, the sum for each bit is generated (Σ) from the P and G signals7, then combined with each carry to generate the F outputs in parallel.10

Result and truth table for inputs entered above
Select :

	0000
A	0000
B	0001
F	1001

AiBiPiGiFi
00XYF
01XYF
10XYF
11XYF

Die photo of the 74181 chip.

I opened up a 74181, took die photos, and reverse engineered its TTL circuitry. My earlier article discusses the circuitry in detail, but I'll include a die photo here since it's a pretty chip. (Click image for full size.) Around the edges you can see the thin bond wires that connect the pads on the die to the external pins. The shiny golden regions are the metal layer, providing the chip's internal wiring. Underneath the metal, the purplish silicon is doped to form the transistors and resistors of the TTL circuits. The die layout closely matches the simulator schematic above, with inputs at the top and outputs at the bottom.

Die photo of the 74181 ALU chip. The metal layer of the die is visible; the silicon (forming transistors and resistors) is hidden behind it.

Conclusion

While the 74181 appears at first to be a bunch of gates randomly thrown together to yield bizarre functions, studying it shows that there is a system to its function set: it provides all 16 Boolean logic functions, as well as addition to these functions. The circuitry is designed around carry lookahead, generating G and P signals, so the result can be produced in parallel without waiting for carry propagation. Modern processors continue to use carry lookahead, but in more complex forms optimized for long words and efficient chip layout.12

I announce my latest blog posts on Twitter, so follow me at kenshirriff.

Notes and references

Retro projects using the 74181 include the APOLLO181 CPU, Fourbit CPU, 4 Bit TTL CPU, Magic-1 (using the 74F381), TREX, Mark 1 FORTH and Big Mess o' Wires. ↩
The carry-in input and the carry-out output let you chain together multiple 74181 chips to add longer words. The simple solution is to ripple the carry from one chip to the next, and many minicomputers used this approach. A faster technique is to use a chip, the 74182 look-ahead carry generator, that performs carry lookahead across multiple 74181 chips, allowing them to all work in parallel. ↩
One thing to note is A PLUS A gives you left shift, but there's no way to do right shift on the 74181 without additional circuitry. ↩
To simplify the logic, the 74181 considers 1+1+C both a Propagate case and a Generate case. (Some carry lookahead systems consider 1+1+C to be a Propgate case but not a Generate case.) For the 74181's outputs, Propagate must be set for Generate to be meaningful. ↩
The carry-lookahead logic in the 74181 is almost identical to the earlier 74LS83 adder chip. The 74181's circuitry can be viewed as an extension of the 74LS83 to support 16 Boolean functions and to support logical functions by disabling the carry. ↩
The way the S0 and S1 values appear in the truth table seems backwards to me, but that's how the chip works. ↩
The bit sum Σ can be easily produced from the P and G signals. Some datasheets show each Σ signal generated as P XOR G, while other datasheets show Σ generated as NOT P AND G. Since the combination P=0, G=1 never arises, both generate the same results. I show XOR on the schematic as it is conceptually easier to understand, but examining the die shows the physical circuit uses the NOT/AND gates. ↩
The logic functions are defined in terms of Select inputs as follows:
A B F
0 0 S1
0 0 S0
0 0 S2
0 0 S3
Because the first two terms are inverted, the logic function for a particular select input doesn't match the arithmetic function. ↩
The schematic is based on a diagram by Poil on WikiMedia, CC By-SA 3.0, with circuitry and labeling changes. ↩
The 74181 chip has a few additional outputs. The A=B output is used with the subtraction operation to test the two inputs for equality. The Cn+4 output is the inverted carry out, supporting longer words. P and G are the carry propagate and generate outputs, used for carry lookahead with longer words.11 ↩
The P and G outputs in my schematic are reversed compared to the datasheet, for slightly complicated reasons. I'm describing the 74181 with active-high logic, where a high signal indicates 1, as you'd expect. However, the 74181 can also be used with active-low logic, where a low signal indicates a 1. The 74181 works fine with active-low logic except the meanings of some pins change, and the operations are shuffled around. The P and G labels on the datasheet are for active-low logic, so with active-high, they are reversed. ↩
One example of a modern carry lookahead adder is Kogge-Stone. See this presentation for more information on modern adders, or this thesis for extensive details. ↩

Long before computers existed, businesses used electromechanical accounting machines for data processing. These one-ton accounting machines were "programmed" through wiring on a plugboard control panel, allowing them to generate complex business reports from records stored on punched cards. Even though they lacked electronics and used spinning mechanical wheels to add up data, these machines could process more than two cards a second.

This plugboard for an IBM 403 implements tax deduction computation. Board courtesy of Carl Claunch.

In honor of April 151, I examine a plugboard that was used for tax preparation in the 1950s9 and explain the forgotten art of plugboard programming, showing how a tangle of wiring implemented a data processing algorithm. By mounting the plugboard on an accounting machine, a particular data processing task could be performed. Although the plugboard looks like spaghetti code made physical, tracing out the connections shows its function: it computed deductions by summing records across multiple fields, printed a report with subtotals and totals, and punched a smaller card deck with the subtotals.

Overview of punched card data processing

Punched cards were a key part of data processing from 1890 until the 1970s, used for accounting, inventory, payroll and many other tasks. Typically, each 80-column punched card held one record, with data stored in fixed fields on the card. The example below shows an example card with columns divided into fields such as date, vendor number, order number and amount. An accounting machine would process these cards: totaling the amounts, and generating a report with subtotals by account and department, as shown below.

Example of a punched card holding a 'unit record', and a report generated from these cards. The accounting machine can group records based on a field to produce subtotals, intermediate totals, and totals. From Manual of Operation.

Punched-card data processing was invented by Herman Hollerith for the 1890 US census, which used a simple tabulating machine that counting records indicated by holes in the cards.2 These machines steadily accumulated features, becoming complex "accounting machines" that could generate business reports.6 These machines became popular with businesses and by 1944, IBM had 10,000 tabulating and accounting machines in the field.3 In July 1948, IBM introduced the 402 Accounting Machine, which used the plugboard I'm examining. The 402 (and the similar 4035) were feature-rich machines that had 16 counters, multiple levels of subtotals, vertical spacing control to support forms, comparisons and conditional operations, and leading zero elimination.

IBM 403 accounting machine, with Type 82 card sorter at right.4 These machines are on display at the Computer History Museum.

The surprising thing about this history is that businesses were performing data processing with punched cards decades before the first computers, using machinery that was entirely electro-mechanical, not even using vacuum tubes. This equipment was built from components such as wire brushes to read holes in punch cards, relays to control the circuits, and mechanical counter wheels to add values. Even though these systems were technologically primitive, they revolutionized business data processing and paved the way for electronic business computers such as the popular IBM 1401.

Plugboard programming

The accounting machines were programmed by wiring up a plugboard for a specific task. Since each application used cards with fields in different positions, accounting machines needed a way to define each field. Different reports would be formatted with values in different locations on the page. Applications would need to total and subtotal different values. Before stored-program computing existed, a technique was needed to easily customize the system for a particular application. The result was wiring on control panel plugboards.

Closeup of the plugboard for an IBM 403. The accounting machine is "programmed" by plugging in wires to form connections.

The photo above shows a closeup of the plugboard. The plugboard has a grid of holes (which are called hubs), with their functions labeled. By inserting a wire into the board, two hubs are connected, causing the accounting machine to perform a particular operation. The collection of wires specifies the operations that are performed on each card.

The back of the plugboard for an IBM 403 accounting machine.

When a wire is inserted into the plugboard, the jack on the end of the wire sticks out the back of the plugboard, as shown above. When the plugboard is mounted in the accounting machine (below), these jacks make contact with a grid of connectors on the accounting machine, completing the desired circuits. (Note the "setup change" switches above the plugboard; these switches will be relevant later.)

A plugboard inserted into the side of an IBM 403 at the Computer History Museum. Note the control switches above the plugboard. These can be used to change what the plugboard does.

Since the plugboard is removable, companies could easily switch plugboards to perform different tasks. (Rewiring a plugboard for each function would be much too time-consuming.) As a consequence, companies might have shelves full of plugboards for all the operations they performed; with plugboards, the "software" takes up considerable physical space. The photo below shows one company's collection of plugboards to perform different tasks.

Shelves full of plugboards for the IBM 402, courtesy of IBM 1401 restoration team.

The tax program

I closely examined the wiring of the tax plugboard to determine what it does. The first step was to trace out each wire to draw a schematic wiring diagram (below) that shows all the connections on the plugboard. If you compare the diagram with the plugboard photo at the start of the article, you can see that it shows the same wiring, but in a much easier to follow format.

A wiring diagram for an IBM 403 plugboard to compute tax deductions. (Click for full size.)

I found that the program wired into the board reads cards and computes subtotals and totals from the cards. In more detail, each card has seven fields that are read. The first field is an identifier, and all cards with the same identifier are totaled together to give totals for each of five fields. My hypothesis is that this field is an employee id, and each card corresponds to one pay period.7 Summing the records for each employee id gives the employee's total deductions (or year-to-date deductions). The totaled five fields could be payroll deductions such as federal income tax, state tax, social security tax, Medicare tax and retirement contributions. After reading the cards for an employee, the accounting machine punches a new summary card with the employee's total deductions prints a line on the report. The per-employee totals are then summed together to give overall totals at the end.

Here's how the plugboard works, step by step. When an 80-column card is read, each digit is available in one of the reading hubs, labeled 1 through 80. By putting a wire in a hub, the digit is transmitted to another part of the machine. For instance, suppose there is a 6-digit number punched into columns 28 to 33 of the card and we want to total these numbers. This is done by connecting a wire from reading column 28 to the upper digit of the counter, a wire from column 29 to the second digit of the counter, and so forth, for 6 wires in total.

The wires transferring the field to counter 6C are the six red wires in the photo below. The 80 card columns are available in the two rows of hubs below the label "Third reading". The inputs to the counters are the four rows of hubs below the "Counter entry" labels. Other fields are wired to counters similarly.

The six red wires connect six columns read from the card (right) to the entry of counter 6C (left).

Trying to figure out the wiring from the photo is difficult, so plugboard wiring is typically indicated in a diagram. The diagram below shows the wiring between the columns read (right) and the counter 6C (left). The six wires are compressed into one line on the diagram, using IBM's style of representing plugboards. The horizontal bars connected by a line indicate six parallel wires.

A diagram representing the connection between the card read (right) and the counter (left).

To print a total, a counter "exit" is wired to the desired printer columns. On the plugboard, the printer columns are labeled print entries: 43 "alphamerical print entry" positions that can print alphabetical or numerical characters, followed by 45 "numerical print entry" positions that only print numbers. The diagram below shows four wires from counter 4C to print columns 1 through 4 (yellow), and six wires from counter 6C (red) to print columns 35 through 40.

Wiring a counter to a "print exit" causes the counter value to be printed.

The accounting machine contains 16 decimal counters in all. Four of them are 8-digit counters, named 8A, 8B, 8C and 8D. Four are 6-digit counters (6A to 6D), four are 4-digit counters (4A to 4D), and four are 2-digit counters (2A to 2D). In addition, two counters can be joined together to form a larger counter. There are also connections between counters for subtotals. For instance, counter 8A accumulates a per-employee subtotal. These subtotals are added to counter 8B to form the final total.

Another important operation is to compare two cards to see if they have the same id (and should be counted together) or if they have different ids (so a subtotal should be printed and the counters reset). A comparison is done by wiring two fields to the two "comparing entry" rows. If the fields are different, the "comparing exit" will trigger a signal. Since we want to compare each card with the next card, we get one field from the "second reading" and one from the "third reading"; the card we are processing will be at the third reading stage while the card behind it will be at the second reading stage. Finally, the comparison output is wired to the "program start (minor)" hub. This causes the accounting machine to start an additional cycle to print the subtotals (i.e. minor totals) and reset the counters. (There are also "intermediate" and "major" program start hubs, which provide two additional levels of totals.)

Columns 1-4 of the cards are compared to determine if subtotals should be printed.

On the diagram above, columns 1-4 from the second reading and from the third reading are wired to the comparing entry hubs. The four corresponding comparing exit hubs are wired together (gray) and connected to the minor (MI) program start hub (yellow wire to PRG START in upper right). The closeup of the plugboard below shows the wiring on the plugboard.

Columns 1-4 of the cards are compared to determine if subtotals should be printed.

Another interesting feature of the plugboard is conditional behavior, using "selectors". Connections can be switched based on a different signal, allowing behavior to change based on a comparison, or a panel switch. This plugboard changes behavior based on the "setup change 1" panel switch, one of the switches on the accounting machine above the panel. (You can think of this as the plugboard version of command-line options.) According to the label on the plugboard (below), this switch selects "year to date". On the board, this switch enables processing of one field, as well as switching between the constant 2 and 5 for addition to counter 2B. (The reason for this constant is a mystery to me.)

The label on the plugboard shows it computes tax deductions.9 "S/P" is presumably "summary punch". The setup 1 switch selects "year to date".

The wiring on the right side of the plugboard controls the counter behavior, such as accumulating subtotals versus final totals. It also wires some of the counters together to form larger counters. For instance, counters 2C and 4D are combined to form a single 6-digit counter. 8 I won't explain the counter control wiring here; the manuals15 explain how it works.

"Summary punching" is another interesting feature of the accounting machine. This lets you take a large file of cards and punch a smaller summary file. For the tax plugboard, one summary card is punched for each employee, with the totals for that employee. Thus, a card file with one record for each employee's pay period is reduced to a much smaller file with one card for the employee's yearly totals. This smaller card file can then be used for further processing.

IBM 403 accounting machine connected to a 519 summary punch. Courtesy Columbia University Computing History.

Summary punching is accomplished by connecting a summary punch machine (above right) to the accounting machine (left) through a thick cable. A hub on the plugboard is wired to enable summary punching, and another hub is wired to control when to punch a card. For the tax plugboard, a summary card is punched for each minor total with the wiring below. A separate plugboard on the summary punch machine controlled which columns were punched on the summary card.

Summary punch wiring on the IBM 403 plugboard. The summary punch control pickup (SP Control PU on the left) is wired to punch a summary card on a minor total. The summary punch switch (SP.SW) hubs are connected by the gray wire (lower left).

Inside the 403 Accounting Machine

Its amazing how much functionality these accounting machines could provide without the benefit of electronics, purely through clever electromechanical systems. Inside the accounting machine is a maze of motors, rotating shafts, cams and clutches, making it seem more like a car than a computer—it even contained an oil pump! With all these mechanical parts a 403 accounting machine weighed over a ton (2515 pounds / 1143 kg).

Inside an IBM 403 Accounting Machine, front view. From the 402/403 Field Engineering Manual, fig. 5.

On the plugboard, a wire is used to route a column of the card. How does a character on the card get sent across this wire? How does a counter perform addition? And how does the result get printed? The accounting machines use clever mechanisms, closely tied to the structure of a punched card, to perform these operations.

In modern terms, a character is encoded serially over a wire, by a single pulse whose timing depends on the position of the hole. These pulses start and stop the counters used to add values. These pulses also control the timing of the typebars that print the result. How these pulses are generated and how they electromechanically control the system will be described more below.

The 403's timing is based off the rotating shafts that drive the machine, rather than clock time. Each revolution of the shaft corresponds to a "card cycle", the reading and processing of one card. The fundamental timing unit is a rotation of 18°: this is the time between reading successive card holes, moving a typebar by one character, and rotation of a counter by one count. At 150 cards per minute, these values work out to approximately 400 milliseconds per card and 20 milliseconds per 18° step, remarkably fast for mechanical operations.

Reading cards

To understand the accounting machine, one must first understand how punched cards hold data. Punched cards hold 80 characters of data; each character is represented by the hole pattern in a column. The card below shows how numbers and the alphabet are punched; each character is printed at the top of the card with the corresponding punches in the column below. A digit is simply represented by a hole in the corresponding row, 0 through 9. (Note that numbers are stored in decimal, not binary.) To support alphanumeric data, two "zone" rows were added above the digit rows.10 A letter is represented by putting two holes in a column: a zone punch and a digit punch.11

An 80-column IBM punched card. Each column encodes a character (printed at top) by punching holes in the column. For a digit, a hole is punched in the row with the same number. A letter is encoded by adding a "zone punch" in one of the top three rows.

You might expect the accounting machine to read cards a column at a time, so one character gets processed at a time. But instead, cards are read "sideways", starting at the bottom. All 80 columns are read in parallel, one row at a time, starting with row 9 and ending with row 0 and then the zone rows. The accounting machine uses sets of 80 wire brushes to read a card, one for each column. If there is a hole, the brush makes contact with the energized metal roller underneath the card, completing a circuit and generating a pulse. Thus, each column will have a pulse corresponding to its hole, with the 9 pulse first, followed by 8 and so forth, ending with 0. Thus, each character is encoded serially, and each plugboard wire carries one of these serial signals, but all columns are processed in parallel.

Printing

Typebars in an IBM 402 accounting machine. Courtesy Columbia University Computing History.

The accounting machine's printing mechanism consists of 88 typebars;12 each vertical bar holds all the characters that can be printed. The typebars move vertically to line up the proper characters and then hammers13 hit the typebars into an inked ribbon to print the selected characters. Thus, the characters in a line of text are printed simultaneously.

The wires in the plugboard control what gets printed by stopping each rising typebar at the right time to select the desired character. The motion of the typebars is carefully timed to match the reading of a card, so the "3" row (for instance) of a card is read at the same time that the "3" on the typebar moves into position. If the brush's hub is wired to a column's print hub, this signal energizes a print magnet, releasing a "stop pawl" which meshes with a tooth on the typebar, stopping it with the "3" character in position to print. If a "2" is read instead, the brush reads the hole one time unit later; the typebar will have risen one more position, causing a "2" to be printed.

The printing mechanism consists of a complex arrangement of mechanical parts: cams, pawls, slides, springs and clutches, in combination with electromagnets to activate these parts at the right time. The mechanism can print 100 lines per minute, so the parts are flying around rapidly and require exact timing. The typebars move one position for every 18° rotation of the driveshaft, keeping them synchronized with card reading.

Counters

The heart of the accounting machine is the electromechanical counters that sum the values. Each digit in a counter is represented by a wheel that rotates to perform addition. The position of the wheel indicates the digit. For instance, to add 27 to a counter, the tens digit wheel is rotated two positions and the unit wheel is rotated seven positions. Thus, to add the value in a card field, the wheels must rotate an amount corresponding to the number punched in the card. The wheel starts rotating when a hole is read, rotates one position as each additional row is read, and stops reading at row 0. Since row 9 is read first and row 0 is last, the result is the counter rotates the number of positions indicated by the hole.

An electromechanical counter from the IBM 403 accounting machine performs addition on two digits by rotating the counter wheels.

The photo above shows a two-digit counter unit. The counter wheels are at the left. The start and stop coils cause the counter to start and stop rotating at the correct times by activating lever arms that control a clutch under the wheel. Carry is implemented by cams underneath the wheel that close electrical contacts. On the back of the board are the electrical contacts that read out the value stored in the counter; these are wired to the connector on the right.

Diagram of the electromechanical counter, indicating the key components. From the IBM 403 Field Manual.

The plugboard specifies which card columns are added to which counter digits. To add a field's value to a counter, a column's read brush is wired to the counter through the plugboard, so the card controls how much the counter rotates. This signal activates the counter's start coil, engaging the counter's clutch and starting the counter's rotation. At the 0 position, the stop coil disengages the clutch, stopping the counter. For instance, if the brush read a 7 from the card, the counter will rotate through seven positions before stopping, adding 7 to its value. If the brush read a 1, the counter will rotate by just one position. The reason this works is the synchronization between card movement and counter rotation; an 18° rotation corresponds to the card moving by one row as well as one count on the counter wheel. (A counter wheel has 20 positions spaced 18° apart. Counting by 10 rotates the wheel halfway.) Subtraction is performed by adding the complement.14

A carry from one position to the next is handled by a complex mechanism. You might expect that when one wheel rolls over from 9 to 0, it increments the higher wheel like an odometer, but that would be slow for multi-digit counters. (Keep in mind that the counters can add 150 numbers per minute, so they are spinning rapidly.) Instead, the counters use a mechanism similar to carry lookahead. If a wheel is at 9, an electrical contact closes, allowing a lower-order carry to be passed through to the higher wheel. If a wheel passes from 9 to 0, it closes a different electrical contact, generating a carry. After the "regular" addition, any necessary carries are generated in parallel and added in a single time step. Thus, something like 99999999+1 isn't delayed by a ripple carry; instead all digits get a carry in parallel.

Relays

The accounting machine is controlled by hundreds of relays, electromechanical switches that provide all the "control logic" for the system. The photo below shows the back of the accounting machine, filled with relays; more relays are on the end panel. To generate timing signals signals for the relays, switches were opened and closed by cams attached to the rotating shaft. Thus, everything in the system is timed from the rotating shaft.

The IBM 403 accounting machine is controlled by hundreds of relays, many of which are mounted in the back of the machine. Photo from the Field Engineering Manual, fig 81.

Conclusions

Punched card data processing is almost forgotten now, but it ruled data processing for almost a century. Even before computers existed, businesses used punched cards and tabulators for accounting. IBM's accounting machines were able to perform surprisingly complex tasks even though they were built from electromechanical components that seem primitive today. Accounting machines and plugboard programming remained popular into the 1960s, when businesses gradually switched to stored-program business computers such as the IBM 1401. Even so, IBM continued marketing accounting machines until 1976. Incredibly, one company in Texas still uses an IBM 402 accounting machine for their accounting today (details), illustrating the amazing longevity of punched card technology.

I announce my latest blog posts on Twitter, so follow me at kenshirriff. I also have an RSS feed.

Thanks to Carl Claunch (one of the Xerox Alto restoration co-conspirators) for providing the plugboard and documentation.

Notes and references

April 15 is traditionally tax day in the US, but if you don't have your taxes done yet, don't panic. In 2017, US tax day is April 18 due to the weekend and holiday. ↩
To support addition, tabulators used a module called an "accumulator" with rotating dials to hold decimal numbers. This accumulator gave its name to the accumulator register still used in microprocessors today. For example, Intel's x86 processors have a register called EAX, the EXtended Accumulator. ↩
The history of IBM's tabulating machines is described in IBM's Early Computers. Also see Columbia University's computing timeline. ↩
Another part of the unit record system is the card sorter, which rapidly sorts cards on a field, putting them in the proper order to be processed by an accounting machine. I discuss IBM card sorters in detail here. ↩
The 402 and 403 accounting machines were essentially the same except the 403 could print three-line addresses. In order to print three lines from one card, the 403 has three card reading stations instead of two. (That is, it read each card three times using three sets of 80 brushes). This feature is called MLP (multi-line printing) and is useful for printing addresses on invoices, for instance. An MLP card is indicated with a special punch: 8, 9 and (1, 2, 3 or 4) punched in a single column; the last digit controls the number of lines printed. ↩
I wouldn't be surprised if these accounting machines were technically Turing-complete due to their support for conditional operations, although it's unclear how to represent the tape. Perhaps storage could be implemented by punching a new deck of cards on each cycle through the machine. Of course this would be impractical for any real use. ↩
I suspect each card represents one employee pay stub and each field indicates a payroll deduction. However, there are alternative explanations for the plugboard. For instance, the id field could indicate a company division, and each card represents a subdivision. In this case, the accounting machine could be totaling the tax deductions for each division such as business expenses and depreciation. Or each card could represent one month. Since there are no variable names, it is speculation. ↩
The table below summarizes the program implemented by the plugboard, showing the mapping between input fields on the card and output fields on the printer.
Card columns Output columns Subtotal counter Total counter
1-4 1-4 4C
34-38 5-10 8D 4A/2D
44-45 11-18 8A 8B
61-66 19-26 6A 2C/4D
67-71 27-32 6B 2A/4B
28-33 35-40 6C 8C
14-17 4D
from switch 2B
Columns 14-17 are summed but not printed. Presumably they are punched on the summary punch card. Columns 34-38 are only processed if the "setup change 1" switch is active. Counter 2B is controlled by the panel switch, adding either 2 or 5 each step. I can't figure out a reason for this; I assume the plugboard on the summary punch (which I don't have) does something useful with this value. ↩
The tax plugboard I'm examining was labeled with embossing tape which dates the labeling of the board to post 1958. The board could originally be older, or it could have been used into the 1960s. ↩
The row above "0" is called 11 or X, while the row above that is 12. For alphabetical characters, the "0" row is used as a zone instead of a digit. (This causes some complications in the accounting machine, such as a special mechanism to print a "numeric zero" versus a "zone zero".) ↩
This punch card code evolved into EBCDIC (Extended Binary Coded Decimal Interchange Code), the encoding used by IBM computers in place of ASCII. Many of the strange characteristic of EBCDIC, such as the alphabet not being entirely sequential, are due to its roots in punched cards. ↩
The accounting machine has typebars on the left that print alphanumerics and typebars on the right that just print digits. For alphanumeric printing, the digit signal moves the typebar in steps of four, while the zone signal moves the typebar 0 through 3 steps. Thus, an alphanumeric character can be printed. Typebars with special characters could also be installed, to print $, @, - or %. ↩
You might expect that to print each line, all the hammers hit the typebars, but it's more complex than that. First, each hammer has a mechanical "hammerlock" control, which can enable the hammer, disable the hammer, or put the hammerlocked hammers under program control. Thus, part of the line may be printed or suppressed based on the data. In addition, the hammers also have mechanical "hammersplit" levers which when raised cause leading zeros in a field to be suppressed. This allows the value "000123" to be printed as " 123" for instance. ↩
Subtraction uses 9's complement addition. That is, subtracting a digit n is done by adding 9-n. This is accomplished mechanically by starting the counter's rotation at position 9 and stopping when a hole is read. For example, if the hole is at position 7, the counter will increment by two positions. There are a few complications with 9's-complement subtraction. The answer is off by one, but an "end-around carry" adds 1 to yield the correct result. Negative numbers require special handling to be printed properly using the "net balance method" or the "balance selection" method; see the 403 manual if you care about the details. The numeric typebars include a "CR" symbol, which indicates negative numbers as a "credit". On a punch card, negative numbers are typically indicated with an X-punch (i.e. a zone punch in row 11) over the value. ↩
IBM's accounting machine manuals are available on Bitsavers. The operation of the IBM accounting machines is discussed in detail in: IBM 402, 403 and 419 Accounting Machines: Manual of Operation. For a thorough discussion of how the machine works internally, see IBM 402, 403, 419 Field Engineering Manual of Instruction. For an overview of how plugboard wiring for IBM's works, see IBM Functional Wiring Principles. ↩

Card columns	Output columns	Subtotal counter	Total counter
1-4	1-4	4C
34-38	5-10	8D	4A/2D
44-45	11-18	8A	8B
61-66	19-26	6A	2C/4D
67-71	27-32	6B	2A/4B
28-33	35-40	6C	8C
14-17		4D
from switch		2B

Remember the old video game Space Invaders? Some of its sound effects were provided by a chip called the 76477 Complex Sound Generation chip. While the sound effects1 produced by this 1978 chip seem primitive today, it was used in many video games, pinball games. But what's inside this chip and how does it work internally? By reverse-engineering the chip from die photos, we can find out. (Photos courtesy of Sean Riddle.) In this article, I explain how the analog circuits of this chip works and show how the hundreds of transistors on the silicon die form the circuits of this complex chip.

The 76477 chip combines several functional blocks to produce a variety of sound effects. A voltage-controlled oscillator (VCO) produces a signal whose frequency depends on the control voltage. A "super low frequency" SLF oscillator generates a triangle wave. Feeding this into the VCO generates a varying pitch, useful for bird chirps, sirens, or the warbling sound of the UFO in Space Invaders. A "one-shot" produces a pulse of a fixed length to control the length of the sound. An envelope generator makes the sound more realistic by ramping its amplitude (volume) up at the start and down at the end. A digital white noise generator can be used for drums, gunshots, explosions and other similar sound effects. Finally a digital mixer combines these signals and feeds them to the output amplifier.

The diagram below indicates the functional blocks on the 76477 die. Looking under a microscope, you can see the circuitry that makes up the chip. The yellowish lines are metal traces that connect the circuits of the die. The reddish and greenish regions are the silicon of the chip, forming transistors and resistors. The black blobs around the edges of the chip show where tiny bond wires connected the die to the integrated circuit pins. Analog circuits are outlined in purple, while digital circuits are in cyan. The 76477 is primarily analog—most control signals are analog, the chip has no digital registers, and most sounds are generated from analog circuits—but about a third of the chip's area is digital logic.2

Functionality blocks inside the 76477 sound chip, indicated on the die. Die photo courtesy of Sean Riddle.

The block diagram below shows the chip's functional elements and can be compared to the die photo above. The chip is primarily controlled by resistors (red pins), capacitors (cyan pins) and voltages (violet pins). This made the chip difficult to control with a microprocessor, and more useful for hardwired sounds.

Block diagram of the 76477 sound chip, from the datasheet. Resistor inputs: red, capacitor inputs: cyan, voltage inputs: violet.

The remainder of this article will dive into the internals of the 76477 chip. First I'll show how transistors and resistors are built on an integrated circuit. Next I'll explain two key analog building blocks: the current mirror and the comparator. Finally, I'll show the reverse-engineered circuits for the 76477's analog functional modules and discuss how they operate. (I'll describe the chip's digital logic in a future article.)

Integrated circuit transistors and resistors

A bipolar integrated circuit such as the 76477 is built from two types of transistors: NPN and PNP. The diagram below shows two transistors on the 76477 die, with the emitter, base and collector labeled. The N-doped silicon appears reddish, while P-doped silicon appears green. Metal lines (yellowish) on top of the silicon connect the circuits, with outlines visible where metal is connected to the silicon layer. The transistor on the left is an NPN transistor. Internally, the transistor is built vertically, with the emitter on top, the base forming a thin layer beneath the emitter, and the collector region underneath. The transistor on the right is a PNP transistor. The collector forms a ring surrounding the central emitter. 3

Two transistors as they appear on the die of the 76477, showing the Emitter, Base, and Collector.

Resistors are an important component of analog circuits. On a silicon chip, they can be formed from a long, narrow region of doped silicon with higher resistance. On an IC, resistors take a lot of space and are inaccurate, so they are generally avoided where possible. The die image below shows three resistors, which appear red in the photo. They are connected to the metal wiring at the contact points marked with blue arrows.

Three resistors (red) on the die of the 76477 chip. The ends of the resistors are connected to the metal layer at contact points marked in blue.

If a metal wire needs to cross another metal wire, the signal can use the silicon layer to pass under the wire. Two of these cross-unders are shown below. The silicon (green) is doped to be lower resistance than in the case of a resistor. Cross-unders are higher resistance than metal wiring, so they are only used when necessary.

A relatively low resistance silicon "wire" (green) passes under two metal wires.

By carefully examining the die photo, you can pick out the transistors and resistors and determine how they are connected. From this, you can reverse-engineer the chip's circuits.

The current mirror

A key component of most analog circuits is the current mirror, and the 76477 is no exception, containing many current mirrors. A current mirror takes one reference current and "clones" it, generating a current that matches the reference current. Either of the symbols below can be used to indicate a current mirror or current source.

Schematic symbols for a current source.

The following circuit shows the circuit for a current mirror with two current source outputs. A reference current passes through the transistor on the right. (In this case, the current is set by the resistor.) Since all the transistors have the same emitter voltage and base voltage, they source the same current, so the currents on the left matches the reference current on the right.4

Current mirror circuit. The currents on the right copy the reference current on the left.

The die segment below shows four PNP current mirror transistors providing a dozen current outputs. Each of the three pinwheel-shaped transistors has four collectors surrounding the central emitter, allowing it to produce four matched current outputs. The lower left transistor is a standard PNP transistor with a ring-shaped collector. The large green rectangle in the center is the shared base connection for the transistors.

Four transistors from current mirrors in the 76477 sound chip. Three of them have four collectors surrounding the emitter, giving them a "pinwheel" appearance.

Current mirrors are commonly used to generate bias currents instead of pull-up resistors. Since resistors inside ICs are both inconveniently large and inaccurate, a current mirror is used when possible. If you look at the die image at the beginning of the article, note the large die area dedicated to current mirrors for bias current generation. 5

Comparators

Another key building block of the 76477 is the comparator, comparing two voltages and determining which one is higher. The heart of the comparator is a differential pair, a two-transistor circuit. If both inputs are equal, the transistors will conduct equally and the current will be split equally along both branches. But if one input is lower, that transistor will conduct more, switching most of the current into that branch.

The schematic below shows a typical comparator in the 76477. If the positive input is higher than the negative input, the comparator outputs a 1. Otherwise it outputs a 0. Transistors 3 and 4 form the differential pair. The current to them is supplied by a current mirror above them, and most of the current will be directed to the side with the lower input. Transistors 1 and 2 buffer the inputs (using emitter followers) and are biased by more current mirrors. Transistors 5 and 6 form another current mirror, used as an active load to double the circuit's amplification. Finally, transistors 7 and 8 form an inverter, generating a digital output from the comparator.

Schematic of comparator circuit in 76477 sound chip, slightly simplified.

The die image below shows one of the comparators used in the 76477 with the transistors labeled to match the schematic above. Note that transistors pairs 1 and 2, 3 and 4, and 5 and 6 have similar layouts to give them matched characteristics, improving the balance of the comparator. The emitter and collector of transistor 5 are connected together for the current mirror. The current sources and resistors are on another part of the die, not shown below.

Die image of the 76477 sound chip showing a comparator used in the one-shot circuit.

The one-shot

The one-shot is a simple circuit that generates one pulse of a set width , triggered when the chip's inhibit signal drops low. This pulse controls the duration of the sound. For instance, a short pulse of noise can be used for a gunshot sound, while a longer noise could be an explosion.

Schematic of one-shot circuit inside the 76477 sound chip.

The one-shot charges an external capacitor via an external resistor and current mirror. The resistor sets the reference current for the current mirror, and the mirror feeds this current into the capacitor. The advantage of using this charging circuit rather than a simple R-C circuit is that the charging current remains constant, rather than decreasing as the capacitor charges.7 This "charging trick" is used several times in the 76477.

When the capacitor's voltage reaches the comparator's limit level (2.6V), the comparator output goes low and the pulse ends. Thus, the faster the capacitor charges, the shorter the pulse. Digital logic circuitry (not shown) resets the one-shot by discharging the capacitor at the end of the pulse and holds it low until the next pulse is triggered via the inhibit pin.

The super-low frequency oscillator

The next functional block of the 76477 that I'll examine is the super-low frequency (SLF) oscillator. It generates a triangle wave that can control the voltage-controlled oscillator (VCO) to generate warbling sounds by ramping the pitch up and down. The frequency is controlled by an external resistor and an external capacitor. Like the one-shot, a current mirror provides a fixed charging current. However, the SLF oscillator uses a second current mirror to discharge the capacitor at the same rate, generating the triangle wave output.

Building a current mirror from NPN transistors creates a current mirror that sinks current instead of sourcing current, as in the lower current mirror. This current mirror also uses another trick: by using a transistor with two emitters, the current mirror doubles the output current; this is indicated on the schematic with two arrows in the current mirror circle. When the lower current mirror is disabled by the transistor, the capacitor is charged by the upper current mirror, with a current (I) set by the resistor, similar to the one-shot circuit. But with the lower current mirror enabled, the lower current mirror sinks current 2I. Since the upper current mirror is still supplying I to the capacitor, the net current (-I) discharges the capacitor. Thus, by combining two current mirrors, the capacitor can either be charged or discharged with the same current I. This trick will appear again in the VCO.

Schematic of SLF inside the 76477 sound chip.

The final piece of the SLF oscillator is the comparator. The + input is set to an upper limit of 2.46 volts, so the comparator will output 1 as the capacitor charges. When the capacitor reaches the limit voltage, the comparator output drops to 0. The first effect of this is to enable the lower mirror, so the capacitor starts discharging. The second effect is to pull the comparator input low (0.36V) through the hysteresis circuit.6 This keeps the comparator output low until the capacitor has discharged. Thus the circuit "remembers" if it is charging or discharging, without using a flip flop. The square wave output is used by the mixer and the triangle wave output is used by the VCO.

Zooming in on the die of the 76477 sound chip shows the circuitry for the SLF oscillator.

The diagram above shows how the SLF circuit looks on the die. Note the three transistors for the upper current mirror. Below the capacitor pin is a transistor for the lower current mirror with a doubled emitter; this causes the mirror's output current to be doubled. On the left are resistors forming the hysteresis circuit. The comparator circuit is underneath it. The Vcc power trace has been colored red and the ground trace has been colored blue.

The voltage-controlled oscillator (VCO)

The voltage-controlled oscillator generates a pitch that depends on its voltage input, as shown below. The circuit for the VCO has a lot in common with the SLF: it creates a triangle wave by charging and discharging an external capacitor. The main difference is that it charges until the capacitor reaches the control voltage (rather than a fixed voltage). Thus, the voltage input controls the pitch: with a higher control voltage, the capacitor takes longer to charge so the frequency is lower. This control voltage can be provided either from the SLF or an external pin. The "VCO select" pin selects which control voltage to use.

The triangle wave from the SLF oscillator can control the frequency of the Voltage Controlled Oscillator (VCO). From the 76477 datasheet.

The VCO's output is a digital square wave that is active during the charging half of the VCO's internal triangle wave. Another pin controls the output's duty cycle, the fraction of the time it is high.8 The triangle wave is compared to the duty cycle control voltage to determine when the output switches on and off. A lower control voltage results in a shorter duty cycle while a higher control voltage results in a longer duty cycle (up to 50%). The digital logic combines the two outputs to yield the final output. The VCO has two comparators in parallel; the VCO select input enables one of them.

Schematic of VCO inside the 76477 sound chip.

Envelope generation

The 76477 provides envelope generation, so the output can smoothly ramp up at the start of a sound and ramp down at the end, making it more realistic9 The diagram below (from the datasheet) shows the linear attack and decay applied to a sound waveform. (The sound below illustrates the random pulses produced by the white noise generator.)

A sound waveform with attack and decay applied.

The schematic below shows the circuit for envelope generation. As with the other circuits, a capacitor is charged and discharged using current mirrors, but two separate resistors are used so the charge (attack) and discharge (decay) rates can be different. The attack signal (from the digital logic) causes the envelope capacitor to charge through a current mirror at a rate controlled by an external attack resistor. The decay signal (simply the complement of the attack signal) causes the capacitor to discharge, controlled by the external decay resistor. Discharge uses a second current mirror operating as a current sink, but unlike earlier circuits it doesn't double the current. The inhibit signal rapidly discharges the capacitor, resetting the envelope.

Schematic of envelope generator inside the 76477 sound chip.

The output circuit

The 76477's output circuit uses four separate current mirrors. The varying reference current for the first current mirror is generated from the envelope voltage and the external amplitude resistor. Unlike the other control resistors, this resistor has a varying voltage applied so it produces a varying reference current. This current controls the output's overall amplitude.

Schematic of output circuit inside the 76477 sound chip.

The amplitude reference current goes into the second current mirror (lower left). The inhibit signal blocks this current mirror, which is how the inhibit signal blocks the chip's output. A third current mirror (upper right) generates two output currents referenced from the current sunk by the second current mirror. The final current mirror is enabled and disabled by the output signal from the mixer. Thus, the current to the op amp alternates between positive and negative, with magnitude depending on the envelope and the control resistor.

The output op amp drives triple-Darlington emitter-follower output transistors. These transistors are not particularly large, so the 76477 has limited output power. An external feedback resistor to the op amp controls the output's amplification. The die photo below shows part of the output circuit. A capacitor helps stabilize the output so it doesn't oscillate.

Part of the output circuit for the 76477 sound chip.

Conclusions

The 76477 is a complex integrated circuit with hundreds of transistors, but by examining the die the operation of the chip can be reverse engineered. The chip uses interesting techniques to generate sounds by combining oscillators, a noise generator and other functional blocks. Outside of analog chips, current mirrors are fairly obscure but the 76477 makes heavy use of current mirrors, with multiple current mirrors in almost every functional unit, driving comparators, generating bias currents, and providing uniform charge/discharge currents.

The chip had several disadvantages that led to its replacement by more advanced chips. The biggest inconvenience is that most of the 76477's parameters are controlled by resistors and capacitors, rather than digitally, making it hard to control the chip with a microprocessor. A second disadvantage of the chip is the sounds were largely digital square waves which gave the sounds a harsh quality rather than a "warm" analog sound. Finally, it was difficult to produce accurate pitches with the VCO, making the chip less useful for music synthesis. For these reasons, digitally-controlled chips such as the AY-3-8910 (1978) surpassed the 76477 in popularity.

I announce my latest blog posts on Twitter, so follow me at kenshirriff. I also have an RSS feed.

Thanks to Sean Riddle for the die photos. I'll end with his die photo of the 76477 after dissolving the metal layer in acid, making it easier to see the resistors and transistors.

Die photo of the 76477 sound chip. The metal layer has been dissolved with acid to reveal the silicon. Colors are enhanced. Photo courtesy of Sean Riddle.

Notes and references

The Space Invaders schematics show that the video game used seven different circuits to create its different sounds. The 76477 generated the "UFO" sound, while other sounds (saucer hit, explosion, missile, invader hit, etc.) were mostly generated by collections of op amps. ↩
The 76477's digital circuitry was built with Integrated Injection Logic (I2L), which was developed in the 1970s with the promise of building fast bipolar logic with the VLSI density of MOS logic. Spoiler: I2L lost out to CMOS, the technology used in microprocessors today. For the 76477, the most useful feature of I2L is that it used the same manufacturing process as analog bipolar transistors, allowing the analog and digital circuitry to be combined on one chip. In the 76477, digital logic is used for digitally selecting and combining audio signals, generating white noise with a nonlinear feedback shift register, and control functions. ↩
Some PNP transistors have a different structure; a PNP transistor with the collector grounded lacks the collector ring, using the grounded substrate as the collector. This makes it more similar to an NPN transistor both in appearance and construction. For details on the internal structure of bipolar IC transistors, see my earlier articles on the 555 and 741 chips. In brief, the NPN transistor is vertical in cross-section with the emitter on top and collector on the bottom. Most of the PNP transistors are lateral (i.e. horizontal) with the emitter on the inside and the collector on the outside, and the base in between (somewhat distant from the base connection). ↩
Many of the 76477's current mirrors are more complex, with a Darlington pair of transistors between the reference transistors's base and collector. This compensates for base currents and makes the mirror more accurate. See this paper for an explanation. ↩
For bias generation, you might wonder what provides the initial current to the current mirrors. The answer is an internal resistor, along with several transistors to keep the current stable. The circuit is similar to the self-biased current source described here. ↩
Note that the hysteresis circuit used by the SLF is not providing feedback like with an op amp. It provides a comparison voltage of 2.46V while charging and 0.36V while discharging. The SLF comparator's output is open collector, so it can only pull the circuit low. ↩
If you use a simple resistor-capacitor circuit instead of a current mirror, the capacitor charges more slowly as its voltage increases, resulting in an exponential charging curve. By using a current mirror, the current remains constant so the capacitor charges at a constant rate. The result is a linear triangle wave. ↩
The datasheet calls the VCO's duty cycle control a "pitch control". This is wrong, since the frequency is unaffected. ↩
Note that the envelope is the only analog part of the sound; until the envelope ramps are applied, the output is a digital square wave. ↩

I wrote a short program to generate the Mandelbrot set on the Xerox Alto, a groundbreaking minicomputer from the 1970s. The program, in the obsolete BCPL language, ran very slowly—taking almost exactly an hour—but the result below shows off the Alto's monochrome bitmapped display. (Bitmapped displays were a rarity at the time because memory was so expensive.)

The Xerox Alto took an hour to generate the Mandelbrot set.

The Alto was a revolutionary computer designed at Xerox PARC in 1973 to investigate personal computing. It introduced the GUI, Ethernet and laser printers to the world, among other things. In the photo above, the Alto computer itself is in the lower cabinet. The Diablo disk drive (with the 1970s orange stripe) uses a removable 14 inch disk pack that stores 2.5 megabytes of data. (A bunch of disk packs are visible behind the Alto.) The Alto's display is bitmapped with 606x808 pixels in an unusual portrait orientation, and the optical mouse is next to the display.

Last year Y Combinator received an Alto from computer visionary Alan Kay and I'm helping restore the system, along with Marc Verdiell, Luca Severini and Carl Claunch. My full set of Alto posts is here and Marc's videos are here. I haven't posted an update for a while, but now I can write new programs and download them to the Alto using the Living Computer Museum's Alto file system implementation and gateway to the Alto's 3Mb Ethernet. I decided to start with the Mandelbrot set to take advantage of the Alto's high resolution display.

Marc's latest video shows the Mandelbrot programming running on the Alto.

The Mandelbrot program

The Mandelbrot set algorithm is fairly simple. Each point on the plane represents a complex number c. You repeatedly iterate the complex function f(z) = z^2 + c. If the value diverges to infinity, the point is outside the set. Otherwise, the point is inside the set and the pixel is set to black. Setting the pixel is tricky because the Alto doesn't have a graphics API; you need to determine which bit in memory to set.4

Since the Xerox Alto doesn't support floating point1, I needed a way to represent the numbers with its 16-bit word. I use fixed point arithmetic: 4 bits to the left of the decimal point and 12 bits to the right.2 For instance, the number 1.25 is represented in 16 bits as 1.25*2^12 = 0x1400. These fixed point numbers can be added with standard integer addition. After multiplying two fixed point numbers with integer multiplication, the 32-bit result must be divided by 2^12 (i.e. shifted right by 12) to restore the decimal point location.3

The code (above) is written in BCPL, the main language used on the Alto. BCPL is a precursor to C and many features of C are clearly visible in BCPL: everything from lvalues and rvalues to the ternary operator. You can think of BCPL as C without types; the only BCPL types are 16-bit words along with C-like structs, unions and bitfields. BCPL may look unfamiliar at first, but the code above should be clear if you consider the following syntax differences with C:

Blocks are indicated with [ and ] instead of { and }.
Indexing is with a!1 instead of a[1].
And, Or, and Shift bit operations are &, %, and lshift/rshift.
Variable definitions use let.
Arrays are defined with vec.

More information on BCPL is in the BCPL Reference Manual and my earlier article on using BCPL with the Alto.

The Xerox Alto, a few minutes into generation of the Mandelbrot set.

Why is the Alto so slow?

Running the Mandelbrot set illustrates the amazing improvement in computer speed since the Alto was created in 1973 and the huge changes in computer architecture. On a modern computer, a Javascript program can generate the Mandelbrot set in a fraction of a second, while the Alto took an hour. The first factor is the Alto's slow clock speed of 5.88 MHz, hundreds of times slower than a modern processor. In addition, the Alto doesn't execute machine instructions directly, but uses a relatively inefficient microcode emulator that takes many cycles to perform one machine instruction.

The ALU board from the Xerox Alto. The Alto doesn't use a microprocessor chip, but a CPU built from three boards of integrated circuits.

Unlike modern computers, the Alto doesn't use a microprocessor chip, but instead has a CPU built from three boards full of simple TTL chips. The photo above shows the arithmetic-logic unit (ALU) board, which uses four 4-bit 74181 ALU chips to perform addition, subtraction and logic operations. You can also see the CPU's registers on this board. The Alto doesn't include a hardware multiplier, but must perform multiplication by repeated shifts and adds. Thus, the Alto performs especially poorly on the Mandelbrot set, which is essentially repeated multiplications.

Conclusion

The Mandelbrot set was a quick program to try out the Alto's graphics. Next I'll try some more complex projects on the Alto. If you want to run my code, it's on Github; you can run it on the ContrAlto simulator if you don't have an Alto available.

If you're interested in retrocomputing fractals, I also generated a Mandelbrot on a 50 year old IBM 1401 mainframe The 1401 generated the Mandelbrot set in 12 minutes—not because it's a faster machine than the Alto, but because the resolution on the line printer was very very low.

Mandelbrot generated on the IBM 1401 mainframe.

Notes and references

There is a floating point library (source) for the Alto. I decided not use use it since the integer Mandelbrot was already very slow. But using floating point would make sense if you wanted to zoom in on the Mandelbrot. ↩
Fixed-point arithmetic is a common trick for fast Mandelbrot calculation. ↩
To multiply two 16-bit numbers efficiently, I use the double precision MulFull function (written in Nova assembler) in PressML.asm, part of the Computer History Museum's archived Alto software. ↩
The hardest part of generating the Alto Mandelbrot was figuring out how to configure the display memory and update it correctly. The details on how the display works are in chapter 4 of the Xerox Alto Hardware Manual. To summarize, the display contents are defined by a linked list of display control blocks (DCBs), which define a rectangular region of pixels on the display. A microcode task reads 16 words of pixels from memory at a time and writes them to the display board, which shifts the pixels out to the monitor. Thus, as each scanline is being written to the CRT, the CPU is busily reading the pixels for that line from memory and feeding them to the display, another reason why the Alto is slow.
The Alto's Smalltalk environment has a simple graphics API, but we don't have Smalltalk running yet. ↩

A_i	B_i	P_i	G_i	F_i
0	0	X	Y	F
0	1	X	Y	F
1	0	X	Y	F
1	1	X	Y	F