The 6502 CPU's overflow flag explained at the silicon level

January 12, 2013, 12:10 am

≫ Next: Notes on the PLA on the 8085 chip

≪ Previous: The 6502 overflow flag explained mathematically

In this article, I show how overflow is computed in the 6502 microprocessor at the transistor and silicon level. I've discussed the mathematics of the 6502 overflow flag earlier and thought it would be interesting to look at the actual chip-level implementation. Even though the overflow flag is a slightly obscure feature, its circuit is simple enough that it can be explained at the silicon level.

The 6502 microprocessor chip

The 6502 is an 8-bit microprocessor that was very popular in the 1970s and 1980s, powering popular home computers such as the Apple II, Commodore PET, and Atari 400/800. The following photograph shows the die of a 6502 processor. Looking at the photograph, it seems impossibly complex, but it turns out that it actually can be understood, using the Visual 6502 group's reverse engineered 6502. The red box shows that part of the chip that will be explained in this article. The 6502 chip is made up of 4528 transistors (3510 enhancement transistors and 1018 depletion pullup transistors). (By comparison, a modern Xeon processor has over 2.5 billion transistors, which would be almost hopeless to try to understand.)

Photomicrograph of the 6502, from Visual 6502 (CC BY-NC-SA 3.0). The following diagrams zoom in on the red box, where the overflow circuit is located.

As a rough overview of the above photograph, the edge of the die shows the wires going to the pins. Approximately top fifth of the chip (with the regular rectangular pattern) is the PLA that decodes instructions. The middle third is a bunch of logic, mostly to do additional decoding of instructions. The bottom half has the registers, ALU (arithmetic-logic unit), and main busses. They are all 8 bits, with each bit in a horizontal layer. The high-order bit is at the bottom of the photo, and this is where the overflow logic lies.

The overflow formula

In brief, if an unsigned addition doesn't fit in a byte, the carry flag is set. But if a signed addition doesn't fit in a byte, the overflow flag is set. The 6502 processor computes the overflow bit for addition from the top bits of the two operands (A₇ and B₇), and the carry out of bit 6 into bit 7 (C₆):

V = not (((A₇ NOR B₇) and C₆) NOR ((A₇ NAND B₇) NOR C₆))

For a more detailed explanation of what overflow means, see my previous article or The overflow flag explained.

Gate-level implementation

The overflow computation circuit in the 6502 microprocessor.

Described as gates, the actual circuit to generate the overflow flag in the 6502 turns out to be surprisingly simple. It uses the carry out of bit 6, and the top bits of the two arguments A and B. Since the values of NAND(a7, b7) and NOR(a7, b7) are already available in the ALU (Arithmetic-Logic Unit) for other purposes, the actual overflow circuit is simply the three gates on the right. (The ALU is, of course, much more complex than the part shown above.) This circuit can be seen at the bottom of the 6507 schematic (where the inverted overflow value is called FLOW). You might wonder why the circuit uses NAND and NOR gates so heavily; it turns out that these are much easier to implement with transistors than AND and OR gates.

Transistor-level implementation

The transistors that implement the overflow circuit in the 6502 microprocessor. The circuits on the left compute the NAND and NOR of the top bits of A and B. The circuit on the right computes the overflow flag. Based on the remarkable transistor-level schematic of the full 6502 chip, reverse-engineered by Balazs.

The circuit above shows the actual implementation of the overflow circuit in the 6502 using NMOS transistors. The circuit to generate the overflow flag is very simple, requiring just a few transistors to implement the three gates. A, B, and carry are the inputs, and the output #overflow indicates complement of the overflow signal.

MOS transistors are fairly easy to understand, since they operate like switches. Most of the transistors are NMOS enhancement mode transistors, which can be considered as switches that close if the gate has a positive input, and are open otherwise. The transistors with a black bar are NMOS depletion mode transistors, which can be considered as pull-up resistors, giving a positive output if nothing else pulls the output low.

The three transistors on the left implement a simple logic gate to compute NAND of A and B. If both inputs A and B are positive, the switches close and connect the output to ground (the horizontal line at the bottom). Otherwise, the pullup transistor connects the output to the positive voltage (circle at the top). Thus, the output is the NAND of A and B - 0 if both inputs are positive, and 1 otherwise.

The next three transistors compute NOR of A and B. If A, B, or both are positive, the associated transistor is switched on and connects the output to ground. Otherwise the output is positive.

The remaining transistors are the actual overflow circuit. The next group of three transistors is a NOR gate, which was described above. It computes the NOR of the carry and the NAND output from the ALU, feeding its output into the final group of four transistors. The four transistors on the right implement an AND gate and NOR gate in a single circuit. If the output from the previous circuit is 1, the rightmost transistor switches on, pulling the output (inverted V) to ground. If both NOR7 and CARRY6 are 1, the two associated transistors switch on, pulling the output to ground. Otherwise, the pullup transistor keeps the output high. The result is the complemented overflow value.

Going to the silicon

Now that you've seen how the circuit works at the transistor level, the silicon level can be explained.

We'll begin with an (oversimplified) description of how the chip is constructed. The chip starts with the silicon wafer. Regions are diffused with an element such as boron, yielding conductive n⁺ diffusion regions. On top of the polysilicon layer is a layer of metal "wires" providing more connections. For our purposes, diffusion regions, polysilicon, and metal can all be consider conductors. In the 6502, the polysilicon connections run roughly vertical, and the metal wires run generally horizontal.

Structure of an NMOS transistor. The n⁺ diffusion regions (yellow) separated by undiffused silicon (gray). The gate is formed by an insulating oxide layer (red) with a diffusion line (purple) over it.

To build a transistor, two n⁺ regions are separated by an undiffused region. A thin insulating oxide layer on top forms the transistor gate, which is wired to a diffusion line. When charge is applied to the gate via the polysilicon line, the two n⁺ regions can conduct.

The follow picture zooms in on the base silicon layer in the 6502, showing the region in the red outline. The darker gray regions are n⁺ diffusion areas, which have been doped to be conducting. The white stripes that separate n⁺ regions are the transistor gates, showing the thin insulating oxide layer that switches on and off conduction between the neighboring n⁺ regions. The gray squares are vias, which connect to other layers.

The diffusion layer of the 6502, zoomed in on the overflow circuit. The shaded regions are diffusion regions, and the unshaded regions are undiffused silicon. The white strips show transistor gates. From Visual 6502 (CC BY-NC-SA 3.0).

The next picture shows the polysilicon and metal layers that lie on top of the base silicon. This picture is aligned with the previous one, and you may be able to pick out some of the diffusion layer underneath. The whitish vertical stripes are conductive polysilicon. The greenish metallic-looking horizontal stripes are in fact metal, forming conductors. The gray square are vias, which connect different layers. Note that the chip is crammed full of conductors, making it hard at first glance to tell what is going on.

Closeup of the 6502 microprocessor die, showing the overflow circuit. From Visual 6502 (CC BY-NC-SA 3.0).

The following picture shows approximately how the transistor-level circuit maps onto the silicon. This circuit is the same as the transistor schematic earlier, just drawn to match the actual layout on the chip. The A, B, and CARRY inputs come from other parts of the chip, and the inverted #OVERFLOW output exits on the right to other destinations.

The final picture explains exactly what is happening at the silicon level. It labels the different layers that take part in the overflow circuit with different colors. The lowest layer is the diffusion layer in yellow. On top of this is the polysilicon layer in purple. The topmost layer of metal is in green. Power (Vcc) and ground are supplied through the metal layer. The crosshatches show transistor gates, formed by polysilicon over insulating oxide. The skinny crosshatched areas are the enhancement transistors used as switches. The blocky crosshatched areas connected to Vcc (positive voltage) are the depletion transistors used as pullups.

The circuit can be understood starting in the upper left. A and B are bit 7 of the A and B values going into the ALU. (A and B come from elsewhere in the processor.) If A and B are positive, the two upper transistors (vertical crosshatches) will pull the NAND output low. If A or B are positive, one of the two transistors below will pull the NOR output low. The NAND and NOR outputs travel to multiple parts of the ALU through metal, polysilicon, and diffusion "wires", but only the relevant connections are shown.

In the lower left is the first gate of the overflow circuit, computing the NOR of the NAND output and carry (which comes from elsewhere in the chip). The polysilicon line (purple) on the bottom is the output from this gate. In the lower right is the second gate of the overflow circuit, combining the NOR, carry, and output of the first gate. The result is #overflow (i.e. inverted overflow).

You can see this circuit in action in the Visual 6502 simulator. The color scheme in the simulator is different - diffusion is green, yellow, orange, and red. The metal layer is shown in ghosted white, but Vcc and ground are omitted. Polysilicon is in purple, and the transistors are not explicitly shown.

Conclusions

By focusing on a simple circuit, the 6502 microprocessor chip can actually be understood at the silicon level. It's interesting to see how the complex patterns etched on the chip can be mapped onto gates, and their function understood.

More comments on this article are at Hacker News. Thanks for visiting!

↧

Notes on the PLA on the 8085 chip

January 13, 2013, 3:36 pm

≫ Next: Inside the ALU of the 8085 microprocessor

≪ Previous: The 6502 CPU's overflow flag explained at the silicon level

The 8085 processor uses a PLA (programmable logic array) to control much of the activity within the processor, such as instruction decoding and controlling the data flow between components of the chip. Pavel Zima has reverse-engineered the transistor-level circuitry of the 8085 microprocessor. I've looked into this in a bit more to figure out the architecture of the Programmable Logic Array, which takes up a large fraction of the chip. The PLA circuit is much more complex than the PLA on the 6502, for instance. It turns out that Pavel is ahead of me with information on the decode and timing PLAs, but the information below may still be of interest.

The following diagram shows the arrangement of the PLA on the chip (image from Visual 6502). The PLA has 5 planes, which I have labeled A through G.

The block diagram below shows approximately how the planes are connected. Plane A receives inputs from the instruction circuit. Its outputs are fed into the small plane B, producing outputs that go into the instruction circuit. The outputs from A also are fed into C (through pass transistors).

Planes D and E can be considered the same plane, split apart for better layout. They share 11 input lines, and the remaining inputs are different between D and E. These inputs come from the ALU/register circuits on the left, as well as other parts of the chip. They also receive inputs from G - these inputs are not handled via normal PLA input lines, but are wired through transistors directly to the associated output lines, which makes the layout more compact.

Planes F and G provide outputs through pass transistors to the ALU/register circuits. These outputs probably control the actions and bus activity, but more analysis is needed.

The following diagram shows how the PLA planes are wired to the rest of the chip. Planes D and E in particular receive inputs from many parts of the chip. The outputs from F and G are very short because the displayed wires end at the nearby pass transistors to the left.

The transistors in the PLA

I have diagrams showing where the transistors are in each PLA grid here.

↧

Inside the ALU of the 8085 microprocessor

January 24, 2013, 11:07 pm

≫ Next: Silicon reverse engineering: The 8085's undocumented flags

≪ Previous: Notes on the PLA on the 8085 chip

The arithmetic-logic unit is a fundamental part of any computer, performing addition, subtraction, and logic operations, but how it works is a mystery to many people. I've reverse-engineered the ALU circuit from the 8085 microprocessor and explain how it works. The 8085's ALU is a surprisingly complex circuit that at first looks like a mysterious jumble of gates, but it can be understood if you don't mind diving into some Boolean logic.

The following diagram shows the location of the ALU in the 8085. The ALU is 8 bits wide, with the high-order bit on the left. The register file is the large block below the ALU. The registers are 16 bits wide, made up of pairs of 8-bit registers. Surprisingly, the register file has the high-order bit on the right, the opposite order from the ALU.

The ALU takes two 8-bit inputs, which I'll call A and X, and performs one of five basic operations: ADD, OR, XOR, AND, and SHIFT-RIGHT. As well, if the input X is inverted, the ALU can perform subtraction and complement operations. You might think SHIFT-LEFT is missing from this list. However, it is simply performed by adding the number to itself, which shifts it to the left one bit in binary. Note that the 8085 arithmetic operations are very basic. There is no multiplication or division operation - these were added in the 8086.

The ALU consists of 8 mostly-identical slices, one for each bit. For addition, each slice of the ALU adds the appropriate input bits, computing the sum A + X + carry-in, generating a sum bit and a carry-out bit. That is, each bit of the ALU implements a full adder. The logic operations simply operate on the two input bits: A AND X, A OR X, A XOR X. Shift-right simply outputs the A bit from the slice to the right.

ALU schematic

The following schematic shows one bit of the ALU. The schematic has roughly the same layout as the implementation on the chip, flowing from bottom to top. Eight of these circuits are stacked side-by-side, with the low-order bit on the right. Carries flow from right to left, and bits shifted right flow from left to right.

Negation

Starting at the bottom of the schematic, is the complex gate labeled Negation. This gate optionally selects a negated second argument by selecting either XN or /XN. (XN is the Nth bit of the second argument, which I'll call X. The / indicates the complement.) For most of the discussion below I'll assume XN is uncomplemented to keep things simpler.

Operation

Above the complement selector are a few gates labeled Operation that perform the desired 2-input operation. The NAND gate on the left generates either A NAND X or 1 based on the select_op1 control line. The OR gate on the right generates either A OR X or 1, based on the select_op2 control line. Combining these in the NAND gate yields four different possibilities:

select_op1	select_op2	Result
0	0	A NOR X
0	1	0
1	0	A NXOR X
1	1	A AND X

Note that instead of OR and XOR, the complemented value is produced by this circuit. This will be fixed in the next step.

Combine with carry

Above the operation circuit is the next block of gates labeled Combine with carry that generates the ALU output by merging the carry-in with the operation value via XOR.

To understand this circuit, first consider the following simple XOR circuit, which is used a couple times in the ALU. It can be understood fairly simply: if both inputs are 0 (top) or both inputs are 1 (bottom) then the output is 0.

Ignoring the shift_right circuit for a moment, the block of gates is simply the XOR circuit above. Note that XOR with 0 is a no-op, while XOR with 1 complements the value. And A XOR X XOR CARRY is the low-order bit of adding A, X, and CARRY.

The key point of this circuit is that the incoming carry is generated with the proper value to convert the operation output into the desired final result. The incoming carry /carry(N-1) is either 0, 1, or the complemented carry from bit N-1 as appropriate.

Op	Operation output	Carry	Result
or	A NOR X	1	A OR X
add	A NXOR X	/carry	A XOR X XOR CARRY
xor	A NXOR X	1	A XOR X
and	A AND X	0	A AND X
shift right	0	0	A(N+1)
complement	A NOR /X	1	A OR /X
subtract	A NXOR /X	/carry	A XOR /X XOR CARRY

Note that the carry-in line must have the right value in order to generate the appropriate output. For addition it passes the inverted carry from one bit to the next. But for OR, XOR, the line is set to 1. And for AND and SHIFT_RIGHT it is set to 0. As will be seen below, the carry circuitry generates the right value for the right operation.

The final aspect of this circuit is the shift-right circuit. With a 0 op input, 0 carry input, and shift_right set, the output is simply the bit from the right: A(N+1).

Generate carry

The circuit on the left, labeled Generate carry generates the carry out. It can generate three different outputs: 1, 0, or the (complemented) carry from the sum. If select_op2 is set, it will force the carry to 0. Otherwise if force_ncarry_1 is set, it will force the carry to 1. Otherwise, the carry is generated for the sum of A + X + carry-in through straightforward logic: If the carry-in is set, and one of the inputs is set, there will be a carry out. If both input bits are set, there will be a carry out.

Flags

The 8085 has a parity flag, which is 1 if the number of 1 bits is even, and 0 if the number of parity bits is odd. The parity flag is generated by XORing all the result bits together (and complementing). Each bit is XORed with the lower-order parity value by the parity circuit near the top of the schematic. The XOR circuit is the same circuit described above.

The zero flag is computed by a simple circuit: each result bit drives a transistor that will pull the zero line low if the bit is set. This forms an 8-input NOR gate, spread across the ALU.

The control lines

As seen in the schematic, the 8085 uses multiple control lines to control the activity inside the ALU. In total, the ALU provides 7 different operations and the following table summarizes the control lines that are used for each operation. It also lists the opcodes that use each ALU operation.

Operation	select_neg	select_op1	select_op2	shift_right	force_ncarry_1	Opcodes
or	0	0	0	0	1	ORA
add	0	1	0	0	0	INR,DCR,RLC,DAD,RAL,DAA,ADD,ADC,ADI,ACI
xor	0	1	0	0	1	XRA,XRI
and	0	1	1	0	1	ANA,ANI
shift right	0	0	1	1	1	RRC, RAR
complement	1	0	0	0	1	CMA
subtract	1	1	0	0	0	SUB,SBB,SUI,SBI,CMP,CPI

The ALU control lines are generated from the opcode by the programmable logic array. Specifically, they are outputs from PLA F, which is to the right of the ALU. More details are in my article on the PLA. The ALU has additional control lines to set up the registers, initialize the carry bits, and set the flags. These control the differences between different op codes, beyond the categories above. the I will explain those in a future article.

Reverse-engineering the ALU

This information is based on the 8085 reverse-engineering done by the visual 6502 team. This team dissolves chips in acid to remove the packaging and then takes many close-up photographs of the die inside. Pavel Zima converted these photographs into mask layer images, generated a transistor net from the layers, and wrote a transistor-level 8085 simulator.

I took the transistor net and used it to figure out how the ALU works. First, I converted the transistor net into gates. Next I figured out which gates are part of the ALU and put them into a schematic. Then I examined how the circuit worked for different operations and eventually figured out how it works.

Conclusion

The ALU of the 8085 is an interesting circuit. At first it seemed like an incomprehensible pile of gates with mysterious control lines, but after some investigation I figured it out. The 8085 ALU is implemented very differently from the 6502's ALU (which I'll write up later). The 6502's ALU uses fairly straightforward circuits to generate the SUM, AND, XOR, OR, and SHIFT values in parallel, and then uses a simple pass-transistor multiplexor to pick the desired operation. This is in contrast to the 8085 ALU, which generates only the desired value.

↧

Silicon reverse engineering: The 8085's undocumented flags

February 12, 2013, 10:16 pm

≫ Next: 8085 instruction set: the octal table

≪ Previous: Inside the ALU of the 8085 microprocessor

The 8085 microprocessor has two undocumented status flags: V and K. These flags can be reverse-engineered by looking at the silicon of the chip, and their function turns out to be different from previous explanations. In addition, the implementation of these flags shows that they were deliberately implemented, which raises the question of why there were not documented or supported by Intel. Finally, examining how these flag circuits were implemented in silicon provides an interesting look at how microprocessors are physically implemented.

Like most microprocessors, the 8085 has a flag register that holds status information on the results of an operation. The flag register is 8 bits: bit 0 holds the carry flag, bit 2 holds the parity, bit 3 is always 0, bit 4 holds the half-carry, bit 6 holds the zero status, and bit 7 holds the sign. But what about the missing bits: 1 and 5?

Back in 1979, users of the 8085 determined that these flag bits had real functions.[1] Bit 1 is a signed-number overflow flag, called V, indicating that the result of a signed add or subtract won't fit in a byte.[2] Bit 5 of the flag is poorly understood and has been given the names K, X5, or UI. For an increment/decrement operation it simply indicates 16-bit overflow or underflow. But it has a totally diffrent value for arithmetic operations. The flag has been described[1][3] as:

K =  O1·O2 + O1·R + O2·R, where:
O1 = sign of operand 1
O2 = sign of operand 2
R = sign of result
For subtraction and comparisons, replace O2 with complement of O2.

As I will show, that published description is mistaken. The K flag actually is the V flag exclusive-ored with the sign of the result. And the purpose of the K flag is to compare signed numbers.

The circuit for the K and V flags

The following schematic shows the reverse-engineered circuit for the K and V flags in the 8085. The V flag is simply the exclusive-or of the carry into the top bit and the carry out of the top bit. This is a standard formula for computing overflow[2] for signed addition and subtraction. (The 6502 computes the same overflow value through different logic.) The V flag has values for other arithmetic operations, but the values aren't useful.[4] A latch stores the value of the V flag. The computed V value is stored in the latch under the control of a store_v_flag control signal. Alternatively, the flag value can be read off the bus and stored in the latch under the control of the bus_to_flags control signal; this is how the POP PSW instruction, which pops the flags from the stack, is implemented. Finally, a tri-state superbuffer (the large triangle) writes the flag value to the bus when needed.

The K flag circuitry is on the right. The first function of the K flag is overflow/underflow for an INX/DEX instruction. This is implemented simply: the carry_to_k_flag control line sets the K flag according to the carry from the incrementer/decrementer. The next function of K flag is reading from the databus for the POP PSW instruction, which is the same as for the V flag. The final function of the K flag is the result of a signed comparison. The K flag is the exclusive-or of the V flag and the sign bit of the result. For subtraction and comparison, the K flag is 1 if the second value is larger than the first.[5] The K flag is set for other arithmetic operations, but doesn't have a useful value except for signed comparison and subtraction.[4]

The circuit in the 8085 for the undocumented V and K flags. The flags are generated from the carries and results from the ALU. The K flag can also be set by the carry from the incrementer/decrementer.

One mystery was the purpose of the K flag: "It does not resemble any normal flag bit."[1] Its use for increment and decrement is clear, but for arithmetic operations why would you want the exclusive-or of the overflow and sign? It turns out the the K flag is useful for signed comparisons. If you're comparing two signed values, the first is smaller if the exclusive-or of the sign and overflow is 1.[6] This is exactly what the K flag computes.

From the circuit above, it is clear that the V and K flags were deliberately added to the chip. (This is in contrast to the 6502, where undocumented opcodes have arbitrary results due to how the circuitry just happens to work for unexpected inputs.[7]) Why would Intel add the above circuitry to the chip and then not document or support it? My theory is that Intel decided they didn't want to support K or (8-bit) V flags in the 8086, so in order to make the 8086 source-compatible with the 8085, they dropped those flags from the 8085 documentation, but the circuitry remained in the chip.

The silicon

The 8085 microprocessor showing the data bus, ALU, flag logic, registers, and incrementer/decrementer.

The remainder of this article will show how the V and K flag circuits work, diving all the way down to the silicon circuits. The above image of the 8085 chip shows the layout of the chip and the components that are important to the discussion. In the upper left of the chip is the ALU (arithmetic-logic unit), where computations happen (details). The data bus is the main interconnect in the chip, connects the data pins (upper left), the ALU, the data registers, the flag register, and the instruction decoding (upper right). In the lower left of the chip is the 16-bit register file. Underneath the register file is a 16-bit increment/decrement circuit which handles incrementing the program counter, as well as supporting 16-bint increment and decrement instructions. The increment/decrement circuit has a carry-out in the lower right corner - this will be important for the discussion of the K flag. For some reason, the ALU has the low-order bit on the right, while the registers have the low-order bit on the left.

The flag logic circuitry sits underneath the ALU, with high-current drivers right on top of the data bus. The flags are arranged in apparently-random order with bit 7 (sign) on the left and bit 6 (zero) on the right. Because the carry logic is much more complicated (handling not only arithmetic operations but shifts and rotates, carry complement, and decimal adjust), the carry logic is stuck off to the right of the ALU where there was enough room.

Zooming in

Next we will zoom in on the V flag circuitry, labeled V1 above. Looking at the die under a microscope shows the metal layer of the chip, consisting of mostly-horizontal metal interconnects, which are the white lines below. The bottom part of the chip has the 8-bit data bus. Other wires are the VCC power supply, ground, and a variety of signals. While modern processors can have ten or more metal layers, the 8085 only has a single layer. Some of the circuitry underneath the metal is visible.

The metal layer of the 8085 microprocessor, zoomed in on the V flag circuit.

If the metal is removed from the chip, the silicon layer becomes visible. The blotchy green/purple is plain silicon. The pink regions are N-type doped silicon. The grayish regions are polysilicon, which can be considered as simply conductive wires. When polysilicon crosses doped silicon, it forms a transistor, which appears light green in this image. Note that transistors form a fairly small portion of the chip; there is a lot more connection and wiring than actual transistors. The small squares are vias, connections to the metal layer.

The V flag circuit in the 8085 CPU. This is the silicon/polysilicon after the metal layer has been removed. The data bus is not visible as it is in the metal layer, but it is in the lower third of the image. The rectangles at the bottom connect the data bus to the registers.

MOSFET transistors

For this discussion, a MOSFET can be considered simply a switch that closes if the gate input is 1 and opens if the gate input is 0. A MOSFET transistor is implemented by separating two diffusion regions, and putting a polysilicon wire over the gate. An insulating layer prevents any current from flowing between the gate and the rest of the transistor. In the following diagram, the n+ diffusion regions are pink, the polysilicon gate conductor is dull green, and the insulating oxide layer is turquoise.

NOR gate

The NOR gate is a fundamental building block in the 8085, since it is a very simple gate that can form more complex logic. A NOR gate is implemented through two transistors and a pullup transistor. If either input (or both) is 1, the corresponding transistor connects the output to ground. Otherwise, the transistors are open, and the pullup pulls the output high. The pullup is shown as a resistor in the schematic, but it is actually a type of transistor called a depletion-mode transistor for better performance.

By zooming in to a single NOR gate in the 8085, we can see how the gate is actually implemented. One surprise is that the circuit is almost all wiring; the transistors form a very small part of the circuit. The two transistors are connected to ground on the left, and tied together on the right. The pullup transistor is much larger than the other transistors for technical reasons.[8]

To understand the circuit, trace the path from ground to each transistor, across the gate, and to the output. In this way you can see there are two paths from ground to the output, and if either input is 1 the output will be 0.

The layout of the gate is intended to be as efficient as possible, given the constraints of where the power (VCC), ground, and other connections are, yielding a layout that looks a bit unusual. The power, ground, and input signals are all in the metal layer above (not shown here), and are connected to this circuit through vias between the metal and the silicon below.

A NOR gate in the 8085 microprocessor, showing the components.If either input is high, the associated transistor will connect the output to ground. Otherwise the pullup transistor will pull the output high.

Exclusive-or gates

The exclusive-or circuit (which outputs a 1 if exactly one input is 1) is a key component of the flag circuitry, and illustrates how more complex logic can be formed out of simpler gates. The schematic below shows how the exclusive-or is built from a NOR gate and an AND-NOR gate; it is straightforward to verify that if both inputs are 0 or both inputs are 1, the output is will be 0.

You may wonder why the 8085 uses so many "strange" gates such as a combined AND-NOR, instead of "normal" gates like AND. The transistor-level schematic shows that an AND-NOR gate can actually be implemented very simply with MOSFETs, in fact simpler than a plain AND gate. The two rightmost transistors form the "AND" - if they both have 1 inputs, they connect the output to ground. The transistor to the left forms the other part of the NOR - if it has a 1 input, it pulls the output to ground.

The following diagram shows an XOR circuit in the 8085 that matches the schematic above. (This is the XOR gate that generates the K flag.) On the left is the NOR gate discussed above, and on the right is the AND-NOR circuit, both outlined with a dotted line. As before, the circuit is mostly wiring, with the transistors forming a small part of the circuit (the green regions between pink diffusion regions).

An XOR gate in the 8085 microprocessor, formed from a NOR gate and an AND-NOR gate. If both inputs are 0, the NOR gate output will be 1, and the NOR transistor will pull the output to 0. If both inputs are 1, the AND transistors will pull the output to 0. Otherwise the pullup transistor will pull the output 1.

The flag latch

Each flag bit is stored in a simple latch circuit made up of two inverters. To store a 1, the inverter on the right outputs a 0, which is fed into the inverter on the left, which outputs a 1, which is fed back to the inverter on the right. A zero is stored in a similar (but opposite) manner. When the clock input is low, the pass transistor opens, breaking the feedback loop, and new data can be written into the latch. The complemented output (/out) is taken from the inverter.

You might wonder why the latch doesn't lose its data whenever the clock goes low. There's an interesting trick here called dynamic logic. Because the gate of a MOSFET consists of an insulating layer it has very high resistance. Thus, any electrical charge on the gate will remain there for some time[9] when the pass transistor opens. When the pass transistor closes, the charge is refreshed.

The latch used in the 8085 to store a flag value. The latch uses two inverters to store the data. When the clock is low, a new value can be written to the latch.

The following part of the 8085 chip shows the implementation of the latch for the V flag. The circuit closely matches the schematic above. The two inverters are outlined with dotted lines. The red arrows show the flow of data through the circuit. As before, the wiring and pullup transistors take up most of the silicon real estate.

Each flag in the 8085 uses a two-inverter latch to store the flag. This shows the latch for the undocumented V flag. The red arrows show the flow of data.

Driving the data bus with a superbuffer

Another interesting feature of the flag circuit is the "superbuffer". Most transistors in the 8085 only send a signal a short distance. However, to send a signal on the data bus across the whole chip takes a lot more power, so a superbuffer is used. In the superbuffer, one transistor is driven to pull the output low, while a second transistor is driven to pull the output high. (This is in contrast to a regular gate, which uses a depletion-mode pullup transistor to pull the output high.) In addition, these transistors are considerably larger, to provide more current.[8] These two transistors are shown at the bottom the schematic below.

The other feature of this superbuffer is that it is tri-state. In addition to a 0 or 1 output, it has a third state, which basically consists of providing no output. This way, the flags do not affect the data bus except when desired. In the schematic, it can be seen that if the control input is 1, both NOR gates will output 0, and both transistors will do nothing.

The superbuffer used in the 8085 to drive the data bus.

The following diagram shows the two drive transistors, as well as the line used to read the flag from the data bus. (The NOR gates are not shown.) Note the size of these transistors compared to transistors seen earlier. Each flag bit requires a superbuffer such as this. Even flag bit 3, which is always 0, requires a large transistor to drive the 0 onto the bus - it's surprising that a do-nothing flag still takes up a fair bit of silicon.

Each flag in the 8085 uses a superbuffer to drive the value onto the data bus. This figure shows the two large transistors that drive the V flag onto bit 1 of the data bus.

Putting it all together

The above discussion has shown the details of the XOR gate that computes the K flag, and the latch and superbuffer for the V flag. The following diagram shows how these pieces fit into the overall circuitry. The latch and driver for the K flag are outside this image, to the right. The circuits below are tied together by the metal layer, which isn't shown. Compare this diagram with the schematic at the top of the article to see how the components are implemented. The two XOR circuits look totally different, since their layouts have been optimized to fit with the signals they need.

The 8085 circuits to implement the undocumented V and K flags. The ALU provides /carry6, /carry7, and result7. The XOR circuit on the left generates V, and the XOR circuit in the middle generates K. On the right are the latch for the V flag, and the superbuffer that outputs the flag to the data bus. The K flag latch and superbuffer are to the right, not shown.

By looking at the silicon chip carefully, the transistors, gates, and complex circuits start to make sense. It's amazing to think that the complex computers we use are built out of these simple components. Of course, processors now are way more complex than the 8085, with billions of transistors instead of thousands, but the basic principles are still the same.

If you found this discussion interesting, check out my earlier analysis of the 6502's overflow flag and the 8085's ALU. You may also be interested in the book The Elements of Computing Systems, which describes how to build a computer starting with Boolean logic.

Credits

The chip images are from visual6502.org. The visual6502 team did the hard work of dissolving chips in acid to remove the packaging and then taking many close-up photographs of the die inside. Pavel Zima converted these photographs into mask layer images, a transistor net, and an 8085 simulator.

Notes and references

[1] The undocumented instructions and flags of the 8085 were discovered by Wolfgang Sehnhardt and Villy M. Sorensen in the process of writing an 8085 assembler, and were written up in the article Unspecified 8085 op codes enhance programming, Engineer's Notebook, "Electronics" magazine, Jan 18, 1979 p 144-145.

[2] See my article The 6502 overflow flag explained mathematically for details on overflow. There are multiple ways of computing overflow, and the 6502 uses a different technique.

[3] Tundra Semiconductor sold the CA80C85B, a CMOS version of the 8085. Interestingly, the undocumented opcodes and flags are described in the datasheet for this part: CA80C85B datasheet, 8000-series components.

The interesting thing about the Tundra datasheet is the descriptions of the "new" flags and instructions are copied almost exactly from Dehnhardt's article except for the introduction of errors, missing parentheses, and renaming the K flag as UI. In addition, as I described earlier, the published K/UI flag formula doesn't always work. Thus, it appears that despite manufacturing the chip, Tundra didn't actually know how these circuits worked.

[4] The V flag makes sense for signed addition and subtraction, and the K flag makes sense for signed subtraction and comparison. Many other operations affect these flags, but the flags may not have any useful meaning.

The V flag is 0 for RRC, RAR, AND, OR, and XOR operations, since these operations have constant carry values inside the ALU (details). The RLC and RAL operations add the accumulator to itself, so they can be treated the same as addition: V is set if the signed result is too big for a byte. The V flag for DAA can also be understood in terms of the underlying addition: V will only be set if the top digit goes from 7 to 8. However, since BCD digits are unsigned, V has no useful meaning with DAA. DAD is an interesting case, since the V flag indicates 16-bit signed overflow; it is actually computed from the result of the high-order addition. For INR, the only overflow case is going from 0x7f to 0x80 (127 to -128); note that going from 0xff to 0x00 corresponds to -1 to 0, which is not signed overflow even though it is unsigned overflow. Likewise, DCR sets the V flag going from hex 80 to 7f (-128 to 127); likewise 0x00 to 0xff is not signed overflow.

The K flag has a few special cases. For AND, OR, and XOR, the K flag is the same as the sign, since the V flag is 0. Note that the K flag is computed entirely differently for INR/DCR compared to INX/DCX. For INR and DCR, the K flag is S^V, which almost always is S. The K flag is set for DAA if S^V is true, which doesn't have any useful meaning since BCD values are unsigned.

The published formula for the K flag gives the wrong value for XOR if both arguments are negative.

[5] The following table illustrates the 8 possible cases when comparing signed numbers A and B. The inputs are the top bit of A, the top bit of B, and the carry from bit 6 when subtracting B from A. The outputs are the carry, borrow (complement of carry), sign, overflow, and K flags. An example is given for each row. Note that the K flag is set if A is less than B when treated as signed numbers.

Inputs			Outputs					Example
A₇	B₇	C₆	C	B	S	V	K	Hex	Signed comparison
0	1	0	0	1	0	0	0	0x50 - 0xf0 = 0x60	80 - -16 = 96
0	1	1	0	1	1	1	0	0x50 - 0xb0 = 0xa0	80 - -80 = -96
0	0	0	0	1	1	0	1	0x50 - 0x70 = 0xe0	80 - 112 = -32
0	0	1	1	0	0	0	0	0x50 - 0x30 = 0x120	80 - 48 = 32
1	1	0	0	1	1	0	1	0xd0 - 0xf0 = 0xe0	-48 - -16 = -32
1	1	1	1	0	0	0	0	0xd0 - 0xb0 = 0x120	-48 - -80 = 32
1	0	0	1	0	0	1	1	0xd0 - 0x70 = 0x160	-48 - 112 = 96
1	0	1	1	0	1	0	1	0xd0 - 0x30 = 0x1a0	-48 - 48 = -96

[6] A detailed explanation of signed comparisons is given in Beyond 8-bit Unsigned Comparisons by Bruce Clark, section 5. While this article is in the context of the 6502, the discussion applies equally to the 8085.

[7] The illegal opcodes in the 6502 are discussed in detail in How MOS 6502 Illegal Opcodes really work. In the 6502, the operations performed by illegal opcodes are unintended, just chance based on what the chip logic happens to do with unexpected inputs. In contrast, the undocumented opcodes in the 8085, like the undocumented flags, are deliberately implemented.

[8] The key parameter in the performance of a MOSFET transistor is the width to length ratio of the gate. Oversimplifying slightly, the current provided by the transistors is proportional to this ratio. (Width is the width of the source or drain, and length is the length across the gate from source to drain.) For an inverter, the W/L ratio of the pullup should be approximately 1/4 the W/L ratio of the input transistor for best performance. (See Introduction to VLSI Systems, Mead, Conway, p 8.) The result is that pullup transistors are big and blocky compared to pulldown transistors. Another consequence is that high-current transistors in a superbuffer have a very wide gate. The 8085 register file has some transistors where the W/L ratios are carefully configured so one transistor will "win" over the other if both are on at the same time. (This is why the 8085 simulator is more complex than the 6502 simulator, needing to take transistor sizes into account.)

[9] One effect of using pass-transistor dynamic buffers is that if the clock speed is too small, the charge will eventually drain away causing data loss. As a result the 8085 has a minimum clock speed of 500 kHz. Likewise, the 6502 has a minimum clock speed. The Z-80 in contrast is designed with static logic, so it has no minimum clock speed - the clock can be stepped as slowly as desired.

↧

8085 instruction set: the octal table

February 23, 2013, 10:46 am

≫ Next: The 8085's register file reverse engineered

≪ Previous: Silicon reverse engineering: The 8085's undocumented flags

The instruction set of the 8085 microprocessor has an underlying structure that becomes much clearer if expressed in an octal-based table, rather than usual hexadecimal-based table:

	\0_0	\0_1	\0_2	\0_3	\0_4	\0_5	\0_6	\0_7	\1_0	\1_1	\1_2	\1_3	\1_4	\1_5	\1_6	\1_7
\00_	NOP	LXI B,d16	STAX B	INX B	INR B	DCR B	MVI B,d8	RLC	MOV B,B	MOV B,C	MOV B,D	MOV B,E	MOV B,H	MOV B,L	MOV B,M	MOV B,A
\01_	dsub	DAD B	LDAX B	DCX B	INR C	DCR C	MVI C,d8	RRC	MOV C,B	MOV C,C	MOV C,D	MOV C,E	MOV C,H	MOV C,L	MOV C,M	MOV C,A
\02_	arhl	LXI D,d16	STAX D	INX D	INR D	DCR D	MVI D,d8	RAL	MOV D,B	MOV D,C	MOV D,D	MOV D,E	MOV D,H	MOV D,L	MOV D,M	MOV D,A
\03_	rdel	DAD D	LDAX D	DCX D	INR E	DCR E	MVI E,d8	RAR	MOV E,B	MOV E,C	MOV E,D	MOV E,E	MOV E,H	MOV E,L	MOV E,M	MOV E,A
\04_	RIM	LXI H,d16	SHLD a16	INX H	INR H	DCR H	MVI H,d8	DAA	MOV H,B	MOV H,C	MOV H,D	MOV H,E	MOV H,H	MOV H,L	MOV H,M	MOV H,A
\05_	ldhi r8	DAD H	LHLD a16	DCX H	INR L	DCR L	MVI L,d8	CMA	MOV L,B	MOV L,C	MOV L,D	MOV L,E	MOV L,H	MOV L,L	MOV L,M	MOV L,A
\06_	SIM	LXI SP,d16	STA a16	INX SP	INR M	DCR M	MVI M,d8	STC	MOV M,B	MOV M,C	MOV M,D	MOV M,E	MOV M,H	MOV M,L	HLT	MOV M,A
\07_	ldsi r8	DAD SP	LDA a16	DCX SP	INR A	DCR A	MVI A,d8	CMC	MOV A,B	MOV A,C	MOV A,D	MOV A,E	MOV A,H	MOV A,L	MOV A,M	MOV A,A
\20_	ADD B	ADD C	ADD D	ADD E	ADD H	ADD L	ADD M	ADD A	RNZ	POP B	JNZ a16	JMP a16	CNZ a16	PUSH B	ADI d8	RST 0
\21_	ADC B	ADC C	ADC D	ADC E	ADC H	ADC L	ADC M	ADC A	RZ	RET	JZ a16	rstv	CZ a16	CALL a16	ACI d8	RST 1
\22_	SUB B	SUB C	SUB D	SUB E	SUB H	SUB L	SUB M	SUB A	RNC	POP D	JNC a16	OUT d8	CNC a16	PUSH D	SUI d8	RST 2
\23_	SBB B	SBB C	SBB D	SBB E	SBB H	SBB L	SBB M	SBB A	RC	shlx	JC a16	IN d8	CC a16	jnk a16	SBI d8	RST 3
\24_	ANA B	ANA C	ANA D	ANA E	ANA H	ANA L	ANA M	ANA A	RPO	POP H	JPO a16	XTHL	CPO a16	PUSH H	ANI d8	RST 4
\25_	XRA B	XRA C	XRA D	XRA E	XRA H	XRA L	XRA M	XRA A	RPE	PCHL	JPE a16	XCHG	CPE a16	lhlx	XRI d8	RST 5
\26_	ORA B	ORA C	ORA D	ORA E	ORA H	ORA L	ORA M	ORA A	RP	POP PSW	JP a16	DI	CP a16	PUSH PSW	ORI d8	RST 6
\27_	CMP B	CMP C	CMP D	CMP E	CMP H	CMP L	CMP M	CMP A	RM	SPHL	JM a16	EI	CM a16	jk a16	CPI d8	RST 7

The large-scale structure of the instruction set is by quadrant (i.e. the top two bits): MOV instructions in the pink quadrant, arithmetic instructions in the cyan quadrant, increment, decrement, rotates in the yellow quadrant, and control flow (jump, call, return, push, pop, rst) in the purple quadrant. It's not totally regular, of course. Some instructions are wedged in where they can fit, for example the spot where memory-to-memory move (MOV M, M) would go is replaced by HLT.

Note how registers are controlled by an octal digit in the sequence B, C, D, E, H, L, M, and A. This is especially notable for the MOV instructions and arithmetic instructions. For instructions acting on register pairs, the structure is similar: BC, BC, DE, DE, HL, HL, SP, SP.

Although octal is unpopular now, early microprocessors were designed with octal in mind, using groups of three bits to select registers and operations. Now hexadecimal is popular, but when the opcodes are displayed in a hex-based table, the underlying structure of the instructions is obscured.

Note that the four blocks have been arranged for ease of display - strictly speaking they should be stacked vertically rather than a 2x2 grid. The table includes undocumented instructions, which are shown in lower case. Mouse over a cell to see the hex value of the instruction. Credits: original data from pastraiser.com 8085 instruction table.

How the 8085 decodes instructions internally

The 8085 uses a set of PLAs to decode and process instructions. In the first step of processing an instruction the instruction decode ROM (details) decodes the instruction into one of 48 different instruction groups. The grid below is colored according to the instruction group (0 through 47).

NOP

LXI B,d16
42

STAX B
40

INX B
36

INR B
38

DCR B
38

MVI B,d8
14

RLC
25

MOV B,B
45

MOV B,C
45

MOV B,D
45

MOV B,E
45

MOV B,H
45

MOV B,L
45

MOV B,M
44

MOV B,A
45

dsub
21

DAD B
20

LDAX B
41

DCX B
37

INR C
38

DCR C
38

MVI C,d8
14

RRC
25

MOV C,B
45

MOV C,C
45

MOV C,D
45

MOV C,E
45

MOV C,H
45

MOV C,L
45

MOV C,M
44

MOV C,A
45

arhl
24

LXI D,d16
42

STAX D
40

INX D
36

INR D
38

DCR D
38

MVI D,d8
14

RAL
25

MOV D,B
45

MOV D,C
45

MOV D,D
45

MOV D,E
45

MOV D,H
45

MOV D,L
45

MOV D,M
44

MOV D,A
45

rdel
22

DAD D
20

LDAX D
41

DCX D
37

INR E
38

DCR E
38

MVI E,d8
14

RAR
25

MOV E,B
45

MOV E,C
45

MOV E,D
45

MOV E,E
45

MOV E,H
45

MOV E,L
45

MOV E,M
44

MOV E,A
45

RIM
3

LXI H,d16
42

SHLD a16
12

INX H
36

INR H
38

DCR H
38

MVI H,d8
14

DAA
6

MOV H,B
45

MOV H,C
45

MOV H,D
45

MOV H,E
45

MOV H,H
45

MOV H,L
45

MOV H,M
44

MOV H,A
45

ldhi r8
23

DAD H
20

LHLD a16
13

DCX H
37

INR L
38

DCR L
38

MVI L,d8
14

CMA
6

MOV L,B
45

MOV L,C
45

MOV L,D
45

MOV L,E
45

MOV L,H
45

MOV L,L
45

MOV L,M
44

MOV L,A
45

SIM
3

LXI SP,d16
42

STA a16
8

INX SP
36

INR M
39

DCR M
39

MVI M,d8
16

STC
6

MOV M,B
43

MOV M,C
43

MOV M,D
43

MOV M,E
43

MOV M,H
43

MOV M,L
43

HLT
47

MOV M,A
43

ldsi r8
23

DAD SP
20

LDA a16
9

DCX SP
37

INR A
38

DCR A
38

MVI A,d8
14

CMC
6

MOV A,B
45

MOV A,C
45

MOV A,D
45

MOV A,E
45

MOV A,H
45

MOV A,L
45

MOV A,M
44

MOV A,A
45

ADD B
1

ADD C
1

ADD D
1

ADD E
1

ADD H
1

ADD L
1

ADD M
4

ADD A
1

RNZ
19

POP B
27

JNZ a16
29

JMP a16
30

CNZ a16
33

PUSH B
26

ADI d8
2

RST 0
5

ADC B
1

ADC C
1

ADC D
1

ADC E
1

ADC H
1

ADC L
1

ADC M
4

ADC A
1

RZ
19

RET
18

JZ a16
29

rstv
7

CZ a16
33

CALL a16
34

ACI d8
2

RST 1
5

SUB B
1

SUB C
1

SUB D
1

SUB E
1

SUB H
1

SUB L
1

SUB M
4

SUB A
1

RNC
19

POP D
27

JNC a16
29

OUT d8
17

CNC a16
33

PUSH D
26

SUI d8
2

RST 2
5

SBB B
1

SBB C
1

SBB D
1

SBB E
1

SBB H
1

SBB L
1

SBB M
4

SBB A
1

RC
19

shlx
10

JC a16
29

IN d8
15

CC a16
33

jnk a16
31

SBI d8
2

RST 3
5

ANA B
1

ANA C
1

ANA D
1

ANA E
1

ANA H
1

ANA L
1

ANA M
4

ANA A
1

RPO
19

POP H
27

JPO a16
29

XTHL
35

CPO a16
33

PUSH H
26

ANI d8
2

RST 4
5

XRA B
1

XRA C
1

XRA D
1

XRA E
1

XRA H
1

XRA L
1

XRA M
4

XRA A
1

RPE
19

PCHL
32

JPE a16
29

XCHG
46

CPE a16
33

lhlx
11

XRI d8
2

RST 5
5

ORA B
1

ORA C
1

ORA D
1

ORA E
1

ORA H
1

ORA L
1

ORA M
4

ORA A
1

RP
19

POP PSW
27

JP a16
29

DI
0

CP a16
33

PUSH PSW
26

ORI d8
2

RST 6
5

CMP B
1

CMP C
1

CMP D
1

CMP E
1

CMP H
1

CMP L
1

CMP M
4

CMP A
1

RM
19

SPHL
28

JM a16
29

EI
0

CM a16
33

jk a16
31

CPI d8
2

RST 7
5

Colors by iWantHue

The internal decoding shown above reveals a few interesting things. The NOP instruction is literally no operation - it doesn't get decoded into any instruction group. The MOV instructions are all decoded together, except for the memory operations. Similarly, the arithmetic instructions are all grouped together, except for the memory instructions. There are other smaller groups (e.g. INR/DCR, conditional jumps, conditional calls, returns), and 21 instructions that are handled uniquely(e.g. CALL, PCHL, XCHG, HALT, and 6 undocumented instructions). Surprisingly, DAA, CMA, STC, and CMC are handled together at this stage, despite having very different actions.

↧

The 8085's register file reverse engineered

March 2, 2013, 3:35 pm

≫ Next: Wealth distribution in the United States

≪ Previous: 8085 instruction set: the octal table

On the surface, a microprocessor's registers seem like simple storage, but not in the 8085 microprocessor. Reverse-engineering the 8085 reveals many interesting tricks that make the registers fast and compact. The picture below shows that the registers and associated control circuitry occupy a large fraction of the chip, so efficiency is important. Each bit is implemented with a surprisingly compact circuit. The instruction set is designed to make register accesses efficient. An indirection trick allows quick register exchanges. Many register operations use the unexpected but efficient data path of going through the ALU.

While the 8085's register complement is tiny compared to current processors, it has a solid register set by 1977 standards - about twice as many registers as the 6502. The 8085 has a 16-bit program counter, a 16-bit stack pointer, 16-bit BC, DE, and HL register pairs, and the 8-bit accumulator. The 8085 also has little-known hidden registers that are invisible to the programmer but used internally: the WZ register pair, and two 8-bit registers for the ALU: ACT and TMP.

Photograph of the 8085 chip showing components relevant to register operations.

The register file is in the lower left quadrant of the chip. It contains the 6 register pairs and associated circuitry. Underneath the registers is the 16-bit address latch and increment/decrement circuit. The register file is controlled by a set of control lines on the right, which are driven by register control logic circuits and the register control PLA. The current instruction is loaded into the instruction register (upper right) via the data bus. In the upper left is the 8-bit arithmetic-logic unit (ALU), with the accumulator and two temporary registers (ACT and TMP).

The 8085 has only 40 pins (visible around the edge of the image) to communicate with the outside world, a tiny number compared to current microprocessors with more than 1000 pins. For memory accesses, the 8085 reads or writes 8 bits of data using a 16-bit memory address (for a maximum of 64K of memory). In the image above, memory addresses flow through the 16-bit address bus (abus) provides memory addresses, while data flows through the chip over the 8-bit data bus (dbus). The 8 A pins handle half of the address, while the 8 AD pins are used both for the other half of the address and for data (at different times). This frees up pins for other uses, but makes computers using the 8085 slightly more complicated. In comparison, the 6502 is more straightforward, with separate pins for address and data.

Overall architecture of the register file

The diagram below shows the implementation of the 8085 register file in the same layout as on the actual chip. The 8-bit data bus is at the top, and the 16-bit address bus is at the bottom. The register control lines are on the right.

In the middle are the registers, arranged as pairs of 8-bit registers. Note that the registers are arranged "backwards" with the high-order bit on the right and the low-order bit on the left. The 16-bit program counter and stack pointer are first. Next is the WZ temporary register, and underneath it the BC register pair. The HL and DE register pairs are at the bottom - these registers do not have fixed locations, but can swap roles during execution. A 16-bit register bus (regbus) provides access to the registers.

Underneath the registers is the address latch, which holds a 16-bit value that is written to the address bus. This value is also the input to the 16-bit increment/decrement circuit. The output of the incrementer/decrementer can be written back to the registers.

The triangles indicate tri-state buffers, basically switches that control the flow of data. Buffers containing a + are amplifiers to boost the weak signals from the registers. Buffers containing a S are superbuffers, that provide extra current to send data across the long data bus.

Architecture diagram of the 8085 register file, as it is implemented on the chip. The register file is connected to the data bus at top, and address bus at bottom. The control lines are along the right.

The picture below zooms in on the chip image above, showing the register file in detail. The components in silicon exactly map onto the diagram above. Note the repeated patterns for the 16-bit circuits. The large transistors used as high-current drivers are clearly visible. The transistors in each bit of register storage are much smaller.

A closeup of the 8085 microprocessor, showing the details of the register file and the locations of the major components.

Storing bits in the register file

The implementation of the 8085 registers is unusual in several ways. The registers don't have explicit read and write modes; instead the register will be overwritten if there is a stronger signal on the bus. Instead of having a bus with one wire for each bit, the 8085 uses a sort of differential bus, with two wires for each bit: one wire transmits the value, and the other transmits the complement of the value.

Each bit consists of two inverters in a feedback loop, with pass transistors to connect the inverters to the bus. An unusual feature of this is the lack of any circuit to break the feedback loop when modifying the register (unlike the 6502). Instead, the 8085 uses a "might makes right" technique - if a stronger signal is written to the bus, it will overwrite a register connected to the bus. The transistors driving the register bus are about twice as large as the transistors in the inverters, so they can forcibly overwrite the inverter loop.

One consequence of this register implementation is that a register can't be copied directly to another register, since there's nothing to distinguish the source register from the destination register - each register could potentially damage the other's bits. To get around this, the 8085 uses an interesting trick - copies are actually done through the ALU, as will be explained later.

One bit of a register in the 8085 register file. Each bit is stored in two inverters in a feedback loop. The register bus uses two lines of opposite polarity for each bit. Access to the register is controlled by the reg_rw control line, which connects the inverters to the bus, allowing the value to be read or written.

The image below zooms in on the chip closer, showing the silicon for six individual register bits. The schematic for one bit is overlaid, as are some of the metal lines providing power, ground, and the register bus. Each bit consists of two transistors for the inverters, two depletion pullup transistors for the inverters (shown as resistors), and two pass transistors connecting the bit to the register bus. The pink regions are transistors, with the green strips the gates (details).

Detail of the 8085 chip showing six bits in the 8085's register file. Bit 2 of the stack pointer is shown with schematic. The two transistors form two inverters in a feedback loop. The light blue lines are the metal layer wires connected to bit 2. The program counter is in the upper half of the image.

To read a register, an amplifier circuit is used to boost the signal from the differential register bus to write it to the dbus or address latch. I assume this is a tradeoff to make the register file smaller. Each inverter pair can be made as small as possible, but then requires amplification to produce a signal strong enough for use elsewhere in the chip. The amplification circuit that drives the data bus is more complex than I'd expect, probably because of the extra power to drive the bus (details and schematic).

The incrementer/decrementer

The 16-bit incrementer/decrementer at the bottom of the register file is used for multiple purposes. It increments the program counter as instructions execute, increments and decrements the stack pointer as needed, and supports the 16-bit increment and decrement instructions.

An interesting feature of the incrementer is it also supports incrementing by 2, which is used to quickly skip over the two byte address in a call or jump not taken. This allows these operations to complete faster on the 8085 than the 8080.

Two bits of the 16-bit increment/decrement circuit in the 8085. Odd bits and even bits use a different circuit for efficiency. The carry out from even bits is complemented.

The incrementer/decrementer is implemented by a chain of adders with ripple carry - the carry from each bit flows into the adder for the next bit. (The above schematic shows two bits, and is repeated 8 times in the full circuit.) The DREG_INC and DREG_DEC control lines select increment or decrement. One performance trick is that alternating bits are implemented with different circuits and the carry out of even bits is inverted. This avoids the inverters that would otherwise be needed to flip the carry back to its regular state. This saves space, but even more importantly it speeds up carry propagation. Because the carry has to propagate bit-by-bit through all 16 bits to generate the final result, adding an inverter to each bit would slow it down significantly. The carry out is used to compute the undocumented K flag value (details).

In comparison, the 6502 has a 16-bit incrementer (no decrement) used exclusively by the program counter. To reduce the carry propagation speed, this incrementer uses a carry-skip. That is, the carry out of the low-order byte is immediately generated and fed into the high-order byte. Thus the carries only need to propagate through 8-bits, the two bytes working in parallel. (The carry is easily generated by ANDing together the low-order bits. If they are all 1, there will be a carry into the high-order byte.)

The WZ Temporary registers

The WZ register pair in the 8085 is used for temporary storage, but is invisible to the programmer. Internally, the WZ register pair is implemented like the other register pairs.

The primary use of WZ is to hold operands from a two or three byte instruction until it can be used. The WZ registers are used to hold 16-bit addresses for LDA, STA, LHLD, JMP, CALL, and RST instructions. The registers hold the port for IN and OUT. The WZ register pair can also temporarily hold information read from memory. The registers hold the address popped off the stack for RET. For XTHL, the registers hold the value from the stack.

Register decoding and the instruction set

The instruction set of the 8085 is organized so an instruction can be quickly and easily decoded to determine the instruction to use. The underlying structure for most 8085 instructions is the octal bit pattern bbDDDSSS, where destination bits DDD and/or source bits SSS select the register usage. The move (MOV) instructions follow this structure. Other instructions (e.g. INR) use just the DDD bits to select the register, while math instructions use the three SSS bits. Some instructions only use DDD or SSS, and some instructions operate on register pairs so they don't use the lowest bit. This instruction pattern is visible if the instructions are arranged in an instruction table according to their octal values.

The three bits select the register as follows:

D₂D₁D₀	Register
000	B
001	C
010	D
011	E
100	H
101	L
110	M
111	A

M indicates a memory operation and is treated as a pseudo-register in the instruction set. Some instructions (e.g. INX) use the top two bits to select a register pair: BC, DE, HL, or "special" (stack pointer or accumulator). Note that in the table above the low-order bit selects a register out of a register pair.

This instruction set structure allows simple logic to control the registers. A multiplexer pulls out the right group of three bits, depending on the instruction and the cycle in the instruction (link to schematic). These three bits are then used to pick the specific register control lines to activate at each step.

The registers are controlled by about 18 control lines that affect the movement of data and the operation of the incremented/decrementer. The following table summarizes the control lines.

`/RREG_RD`	Reads the right-hand side register bus onto the data bus. This implements the multiplexing of 16-bit registers onto the 8-bit data bus.
`/LREG_RD`	Reads the left-hand side register bus onto the data bus.
`LREG_WR`	Writes the data bus to the left-hand side register bus. This implements the demultiplexing of the 8-bit data bus to the 16-bit registers.
`RREG_WR`	Writes the data bus to the right-hand side register bus.
`REG_PC_RW`	Connects the PC to the register bus.
`REG_SP_RW`	Connects the SP to the register bus.
`REG_WZ_RW`	Connects the WZ register pair to the register bus.
`REG_BC_RW`	Connects the BC register pair to the register bus.
`REG_HL_RW`	Connects the HL (DE) register pair to the register bus.
`REG_DE_RW`	Connects the DE (HL) register pair to the register bus.
`DREG_WR`	Writes the output of the incrementer/decrementer to the register bus.
`DREG_RD`	Reads the register bus into the address latch.
`/DREG_RD`	Inverted DREG_RD.
`DREG_DEC`	Incrementer/decrementer performs decrement.
`DREG_INC`	Incrementer/decrementer performs increment.
`CARRY_OUT`	The carry/borrow out from the incrementer/decrementer.
`DREG_CNT`	Increment/decrement by 1.
`DREG_CNT2`	Increment/decrement by 2.

The first step in register control is the register control PLA, which generates 19 control signals based on the instruction type and the cycle step. The register control logic (between the register file and the PLA) mixes in the register selection bits as appropriate (and a few other inputs) to generate the register control lines listed above.

For instance, REG_BC_RW control line is activated if the PLA indicates a register access and the register bits are 00x. The RREG_RD control line is activated for a single-register read instruction if the register bits are xx0, and LREG_RD is activated if the bits are xx1. Both control lines are activated at the same time if the PLA indicates a register pair read.

The DE/HL exchange trick

The XCHG instruction exchanges the contents of the HL register pair with the contents of the DE register pair in a single M-cycle. You might wonder how the registers can be exchanged so quickly. It turns out that this instruction is implemented with a trick - an extra level of indirection.

Although most 8085 architecture diagrams label one register pair as DE and another as HL, this isn't exactly true. In fact, the 8085 has two register pairs and either one can be the DE or HL pair. A status flip flop keeps track of which pair is DE and which is HL. As Pavel Zima figured out, the XCHG instruction doesn't move any data; it simply toggles the flip flop. The data remains in the same place, but the DE register is now HL and vice versa. Thus, the XCHG instruction is completed quickly. The consequence is every use of DE or HL uses this flip flop to determine which register to access (link to schematic).

Using the ALU to move registers

You wouldn't expect the ALU (arithmetic-logic unit) to take part in a register-to-register move, but it happens in the 8085. Many register operations take advantage of the ALU's temporary registers.

The ALU doesn't directly operate on the accumulator and input register. Instead, the accumulator is copied to the ACT (Accumulator Temporary) register and the other input is copied to the TMP register. This way, the result can be written to the accumulator without the race condition that would occur if the accumulator were an input and output at the same time.

For register moves, the source value is copied to the TMP register, the ACT register is set to 0, and the ALU performs an OR operation (ALU details), writing the result (i.e. the source value) to the dbus. This result can then be stored to the register file during a later cycle.

The register file in action

The step-by-step operation of the register file is surprisingly complex. One complication is that the register file and buses must handle stepping the program counter, fetching the instruction, and performing any register moves, without interference. A second complication is that register moves go through the ALU as described above.

Stepping through an operation in detail will show the complexity of the register operations. The following shows the data flow for a MOV B,E instruction, which copies the contents of the E register into the B register.

To understand this table, a bit of background on 8085 instruction timing. An instruction cycle is broken down into one or more M (machine) cycles, where an 8-bit memory access can be done in one M cycle. Each M cycle is broken down into several T-states, where each T-state corresponds to one clock cycle. Each clock cycle has a low phase and a high phase.

The single-byte register-to-register MOV instruction takes one M cycle (M1), 4 T cycles, or 8 clock phases. Each clock phase is a separate line in the table. To make things more complicated, the activity for an instruction isn't entirely within its own instruction cycle. To improve performance, the 8085 uses simple pipelining, where the M1 opcode fetch of the next instruction overlaps with completion of the previous instruction.

The MOV B, E instruction (which copies the E register to the B register) is illustrated in the table below. The PC is copied to the incrementer latch at the end of the previous operation, and then is written to the address pins during the T1 cycle. The PC is updated with the incremented value at the end of the T2 cycle.

The instruction opcode is fetched in the T3 cycle, and at this point execution can start on the instruction. It's not until the T1 cycle of the next instruction that the register file swings into action. The E register is written to the dbus at the end of the T1 cycle. Then the ALU's TMP register is loaded from the dbus. The ALU's other argument, the ACT register is 0 at this point, and the ALU is configured to perform an OR operation. At the end of the (next instruction's) T3 cycle, the result of the ALU operation (i.e. the E register) is stored in the B register via the dbus. Meanwhile, the next instruction is getting fetched (grayed out).

Cycle	T/clock	PC action	Register action
	T4/0
	T4/1	PC → inc latch
M1 opcode fetch	T1/0	inc latch → address pins
	T1/1	inc latch → address pins
	T2/0
	T2/1	inc → PC
	T3/0		data pins → dbus → instruction reg
	T3/1
	T4/0
	T4/1	PC → inc latch
M1 opcode fetch	T1/0	inc latch → address pins
	T1/1	inc latch → address pins	E reg → dbus
	T2/0		dbus → TMP reg
	T2/1	inc → PC
	T3/0		data pins → dbus → instruction reg
	T3/1		ALU → dbus → B reg

Each step in the table above is activated by the appropriate register control lines. For instance, in T2/1, the PC is updated by triggering the reg_pc_rw and dreg_wr lines.

Conclusion

The 8085 has a complex register set, and it uses some interesting tricks to reduce the size of the chip and to optimize some operations. The register set is much harder to understand than I expected, but with careful examination it reveals its secrets.

Credits: The chip images are from visual6502.org. The visual6502 team did the hard work of dissolving chips in acid to remove the packaging and then taking many close-up photographs of the die inside. Pavel Zima converted these photographs into mask layer images, a transistor net, an 8085 simulator, and register file schematics (top, bottom).

See discussion at Hacker News. Thanks for visiting!

↧

Wealth distribution in the United States

March 5, 2013, 12:09 am

≫ Next: Tenma 72-7740 multimeter: review and teardown

≪ Previous: The 8085's register file reverse engineered

Today's Forbes billionaires list inspired me to visualize the wealth inequality in the United States. Use the Forbes list and other sources, I've created a graph that shows wealth distribution in the United States. It turns out that if you put Bill Gates on a linear graph of wealth, pretty much the entire US population is crammed into a one-pixel bar around 0.

This graph shows the wealth distribution in red. Note that the visible red line is one pixel wide and disappears everywhere else - this is the key point: essentially the entire US population is in that first bar. The graph is drawn with the scale of 1 pixel = $100 million in the X axis, and 1 pixel = 1 million people in the Y axis. Away from the origin, the red line is invisible - less than 1/1000 of a pixel tall since so few people have more than $100 million dollars. It's striking just how much money Bill Gates has; even $100 million is negligible in comparison.

Since the median US household wealth is about $100,000, half the population is crammed into a microscopic red line 1/1000 of a pixel wide. (The line would be narrower than the wavelength of light so it would be literally invisible). And it turns out the 1-pixel-wide red line isn't just the "99%", but the 99.999%. I hypothesize this is why even many millionaires don't feel rich.

Wealth inequality among billionaires

Much has been written about inequality in the US between the rich and the poor, but it turns out there is also huge inequality among the ranks of billionaires. Looking at the 1.9 trillion dollars held by US billionaires, it turns out that the top 20% of billionaires have 59% of this wealth, while the bottom 20% of billionaires have less than 6%. So even among billionaires, most of the money is skewed to the top. (I originally pointed this out in Forbes in 1998, and the billionaire inequality has grown slightly since then.)

Sources

The billionaire data is from Forbes billionaires list 2013. Median wealth is from Wikipedia. Also Measuring the Top 1% by Wealth, Not Income and More millionaires despite tough times. Wealth data has a lot of sources of error including people vs households, what gets counted, and changing time periods, but I've tried to make this graph as accurate as possible. I should also mention that wealth and income are two very different things; this post looks strictly at wealth.

↧

Tenma 72-7740 multimeter: review and teardown

April 15, 2013, 11:19 pm

≫ Next: Teardown and exploration of Apple's Magsafe connector

≪ Previous: Wealth distribution in the United States

The Tenma 72-7740 digital multimeter is a multimeter in the $70 price range. Overall, it's a nice, solidly-build meter and it has performed well for me. I received this DMM from Newark element14 for review; in this article I describe its functionality followed by a teardown.

What you get

What comes in the box with the Tenma 72-7740 DMM: temperature probe, battery, alligator clips, and probes.

The DMM comes with a temperature probe, battery, alligator clips, and test probes. Note that the test probes have very short metal tips, unlike the long tips on most probes. The alligator clip probes are a nice addition. My biggest complaint with the DMM is the temperature probes connections are soldered with no strain relief so I worry the wires will break off.

The DMM also comes with a pocket-sized 36 page operating manual - a real, physical manual on paper, not a PDF file like most products these days. The DMM doesn't really need a manual - functions work pretty much as you'd expect - but it's nice to have the manual.

Specifications

The DMM is autoranging with maximum reading of 3999. It is full-size (177mm × 85mm × 40mm), not a pocket DMM, and has a built-in stand. The LCD display is large and clear and has a backlight, which is nice if I ever end up using the meter in the dark. It has 10M&ohm; input impedance and maximum voltages of 1000V DC and 750V AC. The top current range is 10A.

The DMM also includes capacitance, diode, temperature, frequency, and duty cycle measurements. The top capacitance range (100µF) can take up to 15 seconds to get a measurement, so be patient with those big electrolytics. The lowest capacitance range is 40 nF with 10pF resolution claimed. The DMM also has a continuity buzzer, although I find the sound crackles a bit. The temperature readings are only in °C; I know using Fahrenheit makes me a bad person, but that's what I need to check my appliances. The temperature range is -40°C to 1000°C. The upper range is hotter than I need, but since I sometimes go outside below -40°C my multimeter should be able to handle it too.

Buttons provide hold and relative mode. The meter goes into sleep mode after 30 minutes.

I don't have the equipment to measure the accuracy of the DMM myself, so I'm going off the published values. The specification for DC voltage accuracy is a reasonable ±0.8%; the considerably more expensive Fluke 177 has ±0.09% accuracy, so you get what you pay for.

The function knob has 7 positions: V, resistance/capacitance/diode/continuity, Hertz, °C, µA, mA, and A. The blue function button switches between AC and DC or switches among resistance, capacitance, diode, and continuity.

There are a few functions found in more advanced multimeters that aren't found here: min/max measurement, RS-232 support, °F, a 4nF capacitance scale, and an analog bar graph.

For full specifications, see the specification chart.

Tenma 72-7740 digital multimeter measuring 60Hz line frequency

Teardown

Of course I was interested in what was inside the multimeter and opened it up. The instruction manual describes how to remove the screws under the feet. The force required to pry the case apart made me a little nervous, but it snapped open without breaking anything. Note that the case must also be opened in this way to replace the fuses - they are not accessible from the battery compartment. Unfortunately I tend to blow fuses a lot measuring charger performance, but this may motivate me to be more careful.

A foil shield covers most of the circuit board, with holes for some adjustments. Near the bottom is a thick wavy wire, which is the precision resistor used for the high-current measurements, a fraction of an ohm. There's nothing particularly interesting directly under the foil shield; almost all the components are on the other side.

Removing the circuit board and flipping it over shows the circuitry. The large LCD display is at the top, with the pushbuttons below. The most visually striking part of the board is the round circuitry for the function knob, which I will explain in more detail below. To the left are three precision (blue) resistors for mA and µA measurement. Below are 5 diodes which I believe are for input protection. The large black cylinder in the lower right appears to be a spark gap to protect the input from high voltages - the DMM is rated to 1000V DC overload protection. Below it is a large yellow PTC resistor to protect from input overloads. The 8-pin IC is a STMicroelectronics TL062C low power J-FET dual op amp.

The circuit board for the Tenma 72-7740 DMM.

Underneath the LCD is the 100-pin controller IC and a bunch of SMD components. The Semico CS7721CN chip powers the Tenma 72-7740 DMM. I wasn't expecting that a DMM chip would need 100 pins, but that seems to be common. I couldn't find a datasheet for this specific chip, but datasheets for other similar chips (such as the Fortune FS9721 and Cyrustek ES51982) give an idea of how digital multimeters works. The chip has signal inputs for the different functions (voltage, current, resistance, frequency, capacitance, etc.) The four blue precision resistors below the chip divide the input by powers of 10 as appropriate. Six mode pins are connected to the function selector switch to select the appropriate function, as will be displayed below. The function pushbuttons are also connected to the IC. About 17 pins from the IC drive the LCD segments. The crystal provides accurate timing, which is critical for the accuracy of the dual-slope analog-to-digital converter that measures the input.

The 100-pin Semico CS7721CN chip powers the Tenma 72-7740 DMM.

How the function selector switch works

Rotary switches have always been mysterious to me. The pattern on the circuit board seems to be made up of random lines rather than any obvious switch contacts, and looks as much like an Aztec symbol as a switch. So I figured it was time to dive in and figure out how it works.

The selector knob has 7 positions, rotating a bit under 180 degrees in total. Looking at the back of the selector knob, you can see six independent sliders for six separate switching circuits. Each slider has two peaks in the middle, which bridge two contacts on the circuit board. Note that the two outermost sliders are offset 90 degrees from the others. Since the knob turns a bit under 180°, the sliders n

The following diagram shows how the switches work. Each of the six colored semi-circular rings is associated with one of the sliders. The seven lines inside each semicircle indicate the seven possible positions of the associated slider. The most counterclockwise position in each ring is the V setting, followed by &ohm;, Hz, °C, µA, mA, and A. If there are two traces lined up with the slider, the slider will connect the two traces.

One surprise is that many of the traces don't actually form a circuit. The highlighted positions in the diagram are active positions that close two contacts, but the other positions don't form a connection. In particular, the red ring is only active in one position, and the blue ring in two positions. Many positions have the same circuit trace on both ends of the line, which means the switch does nothing and the trace is unnecessary. My guess is that the redundant metal is there because metal-on-metal is lower friction than metal-on-circuit-board.

The white, cyan, and blue rings ground various combinations of "mode" pins on the IC to select the function. The left half of the purple ring directs the µAmA°C input to the appropriate circuit based on the function. The right half of the purple ring directs the HzV&ohm; input appropriately. The red ring has a connection only for the °C setting. The orange ring makes connections for &ohm;, °C, and A.

Conclusion

The 72-7740 DMM is a solid meter that gets the job done and I have only minor complaints. It currently sells for about $70. Inexplicably, the next model up, the 7745, is cheaper despite having true RMS and a serial RS-232 output. The model down, the 7735 is a good deal at about half the price; it also has RS-232, although it lacks temperature measurement, the backlight, and sleep mode.

The Tenma 72-7740 DMM in the box.

Thanks to Newark element14 for giving me this digital multimeter free for review. (Newark element14 consists of the merger of the well-known Newark electronics distributor and the element14 online electronics community into a single global brand.)

↧

Teardown and exploration of Apple's Magsafe connector

June 2, 2013, 9:13 am

≫ Next: The Mili universal car/wall USB charger, tested in the lab

≪ Previous: Tenma 72-7740 multimeter: review and teardown

Have you ever wondered what's inside a Mac's Magsafe connector? What controls the light? How does the Mac know what kind of charger it is? This article looks inside the Magsafe connector and answers those questions.

The Magsafe connector (introduced by Apple in 2006) is very convenient. It snaps on magnetically and disconnects if you pull on it. In addition it is symmetrical so you don't need to worry about what side is up. A small LED on the connector changes color to indicate the charging status.

The picture below shows the newer Magsafe 2 connector, which is slimmer. Note how the pins are arranged symmetrically; this allows the connector to be plugged in with either side on top. The charger and computer communicate through the adapter sense pin (also called the charge control pin), which this article will explain in detail below.

The pins of a Magsafe 2 connector. The pins are arranged symmetrically, so the connector can be plugged in either way.

Magsafe connector teardown

I had a Magsafe cable that malfunctioned, burning the power pins as you can see in the photo below, so I figured I'd tear it down and see what's inside. The connector below is an older Magsafe; notice the slightly different shape compared to the Magsafe 2 above. Also note that the middle adapter sense pin is much smaller than the pins, unlike the Magsafe 2.

Removing the outer plastic shell reveals a block of soft waxy plastic, maybe polyethylene, that helps diffuse the light from the LEDs and protects the circuit underneath.

Cutting through the soft plastic block reveals a circuit board, protected by a thin clear plastic coating. The charger wires are soldered onto the back of this board. Only two wires - power and ground - go to the charger unit. There is no data communication via the adapter sense pin with the charger unit itself.

Disassembling the connector shows the spring-loaded "Pogo pins" that form the physical connection to the Mac. The plastic pieces hold the pins in place. The block of metal on the left is not magnetized, but is attracted by the strong magnet in the Mac's connector.

The circuit board inside the Magsafe connector is very small, as you can see below. In the middle are two LEDs, orange/red and green. Two identical LEDs are on the other side. The tiny chip on the left is a DS2413 1-Wire Dual Channel Addressable Switch. This chip has two functions. It switches the status LEDs on and off (that's the "dual channel switch" part). It also provides the ID value to the Mac indicating the charger specifications and serial number.

The chip uses the 1-Wire protocol, which is a clever system for connecting low-speed devices through a single wire (plus ground). The 1-Wire system is convenient here since the Mac can communicate with the Magsafe through the single adapter sense pin.

Understanding the charger's ID code

You can easily pull up the charger information on a Mac (Go to "About this Mac", "More Info...", "System Report...", "Power"), but much of the information is puzzling. The wattage and serial number make sense, but what about the ID, Revision, and Family? It turns out that these are part of the 1-Wire protocol used by the chip inside the connector.

Every chip in the 1-Wire family has a unique 64-bit ID that is individually laser-programmed into the chip. In the 1-Wire standard, the 64-bit ID consists of an 8-bit family code identifying the type of 1-Wire device, a 48-bit unique serial number, and an 8-bit non-cryptographic CRC checksum that verifies the ID number is correct. Companies (such as Apple) can customize the ID numbers: the top 12 bits of the serial number are used as a customer ID, the next 12 bits are data specified by the customer, and the remaining 24 bits are the serial number.

With this information, the Mac's AC charger information now makes sense and the diagram below shows how the 64-bit ID maps onto the charger information. The ID field 100 is the customer ID indicating Apple. The wattage and revision are in the 12 bits of customer data (hex 3C is 60 decimal, indicating 60 watts). The Family code BA is the 1-Wire family code for the DS2413 chip. Thus, much of the AC charger information presented by the Mac is actually low-level information about the 1-Wire chip.

The 1-Wire chip inside a Magsafe connector has a 64-bit ID code. This ID maps directly onto the charger properties displayed under 'About this Mac'.

There are a few complications as the diagram below shows. Later chargers use the family code 85 for some reason. This doesn't indicate an 85 watt charger. It also doesn't indicate the family of the 1-Wire device, so it may be an arbitrary number. For Magsafe 2 chargers, the customer ID is 7A1 for a 45 watt charger, 921 for a 60 watt charger, and AA1 for an 85 watt charger. It's strange to use separate customer IDs for the different models. Even stranger, for an 85 watt charger the wattage field in the ID contains 60 (3C hex) not 85, even though 85 watts shows up on the info screen. The Revision is also dropped from the info screen for later chargers.

In a Magsafe 2 connector, the 64-bit ID maps onto the charger properties displayed under 'About this Mac'. For some reason, the 'Customer data' gives a lower wattage.

How to read the ID number

It's very easy to read the ID number from a Magsafe connector using an Arduino board and a single 2K pullup resistor, along with Paul Stoffregen's Arduino 1-Wire library and a simple Arduino program.

The circuit to access a 1-Wire chip from an Arduino is trivial - just a 2K pullup resistor.

Touching the ground wire to an outer ground pin of the Magsafe connector and the data wire to the inner adapter sense pin will let the Arduino immediately read and display the 64-bit ID number. The charger does not need to be plugged in to the wall - and in fact I recommend not plugging it in - since one interesting feature of the 1-Wire protocol is the device can power itself parasitically off the data wire, without a separate power source.

The 64-bit ID can be read out of a Magsafe connector by probing the outer pin with ground, and the middle pin with the 1-Wire data line.

To make things more convenient, the serial number can be displayed on an LCD display. The circuit looks complicated, but it's just a tangle of wires connecting the LCD display. Using a simple program, the 64-bit ID number is displayed on the bottom line of the display. The top line is a legend indicating the components of the code: "cc" CRC check, "id." customer id, "ww" wattage, "r" revision, "serial" serial number, and "ff" family. The number below corresponds to an 85 watt charger (55 hex = 85 decimal).

A 1-Wire ID reader with LCD display. Touching the wires to the contacts of the Magsafe connector displays the ID code on the bottom line of the display. The top line indicates the components of the code: CRC check, customer id, wattage, revision, serial number, and family.

Controlling the Magsafe status light

The Mac controls the status light in the Magsafe connector by sending commands through the adapter sense pin to the 1-Wire DS2413 switch IC to turn the two pairs of LEDs on or off. By sending the appropriate commands to the IC through the adapter sense pin, an Arduino can control the LEDs as desired.

The picture below demonstrates the setup. The same simple resistor circuit as before is used to communicate with the chip, along with a simple Arduino program that sends commands via the 1-Wire protocol. These commands are described in the DS2413 datasheet but should be obvious from the program code.

I used a cable removed from a dead charger for simplicity. The LEDs are normally powered by the charger's voltage, which I simulated with two 9-volt batteries. To hook the Arduino to the connector, this time I used a Mac DC input board that I got on eBay; this is the board in a Mac that the Magsafe connector plugs into. The only purpose of the board here is to give me a safer way to attach the wires than poking at the pins.

The connector contains a pair of orange/red LEDs and a pair of green LEDs, which can be switched on and off independently. When both pairs are lit, the resulting color is yellow. Thus, the connector can display three colors. The Arduino program cycles through the three colors and off, as you can see from the pictures above.

The charger startup process

When the Magsafe connector is plugged into a Mac, a lot more happens than you might expect. I believe the following steps take place:

The charger provides a very low current (about 100 µA) 6 volt signal on the power pins (3 volts for Magsafe 2).
When the Magsafe connector is plugged into the Mac, the Mac applies a resistive load (e.g. 39.41K&ohm;), pulling the power input low to about 1.7 volts.
The charger detects the power input has been pulled low, but not too low. (A short or a significant load will not enable the charger.) After exactly one second, the charger switches to full voltage (14.85 to 20 volts depending on model and wattage). There's a 16-bit microprocessor inside the charger to control this and other charger functions.
The Mac detects the full voltage on the power input and reads the charger ID using the 1-Wire protocol.
If the Mac is happy with the charger ID, it switches the power input to the internal power conversion circuit and starts using the input power. The Mac switches on the appropriate LED on the connector using the 1-Wire protocol.

This process explains why there is a delay of a second after you connect the charger before the light turns on and the computer indicates the battery is charging. It also explains why if you measure the charger output with a voltmeter, you don't find much voltage.

The complex sequence of steps provides more safety than a typical charger. Because the charger is providing extremeley low current at first, there is less risk of shorting something out while attaching the connector. Since the charger waits a full second before powering up, the Magsafe connector is likely to be firmly attached by the time full power is applied. The safety feature are not foolproof, though, as the burnt-up connector I tore apart shows.

Don't try this at home

Warning: I recommend you don't try any of these experiments. 85 watts is enough to do lots of damage: blow out your Mac's DC input board, send flames out of a component, blow fuses, or vaporize PC traces, and that's just the things I've had happen to me. The Mac and charger both have various protection mechanisms, but they won't take care of everything. Poking at your charger while it's plugged in is a high-risk activity.

Reading your charger's ID by probing the pins while it's not plugged in is considerably safer, but I can't guarantee it. If you mess up your charger, computer or Arduino you're on your own.

Conclusions

There's more to the Magsafe charger connector than you might expect. The center pin of the connector - the adapter sense pin - controls a tiny chip that both identifies the charger and controls the status LED. It is part of a complex interaction between the charger and the Mac. Using an Arduino microcontroller, this chip can be accessed and controlled using the 1-Wire protocol. Is this useful? Not really, but hopefully you found it interesting.

↧

The Mili universal car/wall USB charger, tested in the lab

June 20, 2013, 10:54 pm

≫ Next: Twelve tips for using the Rigol DS1052E Oscilloscope

≪ Previous: Teardown and exploration of Apple's Magsafe connector

I received a Mili universal USB charger for review from Mobile Fun. This interesting charger has some features that make it my current favorite travel charger. It runs off both wall power and car accessory power. It comes with swappable plugs for Europe, UK, US, or Australia, and runs on 120 or 240 volts. It has two USB outputs - I thought this was pointless until I discovered how useful it is in car trips if two people can charge at the same time. In addition, one of the ports provides 10 watts for charging tablets (when plugged into AC). The charger also lights up - red indicates charging, and green indicates the devices are charged.

The charger has a few disadvantages. It is a bit expensive with a list price of $49. Measuring about 2 3/4 inches by 2 1/4 inches, it's much larger than Apple's super-compact inch-cube charger - although it has much more functionality. Finally, due to the design, it ends up blocking both outlets when you plug it into the wall.

In the remainder of this article, I test the performance of the charger both in the car and with AC power. To summarize, the power quality is excellent in the car, but has more noise than the average charger when plugged into the wall.

The Mili charger with adapters for different countries.

The label shows that when connected to AC, the charger is rated as 2.1A for output 1 and 1A for output 2; that is, it is designed to power an iPad from output 1 and a phone from output 2. When plugged in to a car accessory outlet, it is only rated to provide 1 amp, so charging a tablet will be slower. In the measurements below, I find that the charger's power exceeds these ratings when plugged into the wall, which is good, but provides a bit less than the expected one amp when plugged into a car output, which may make charging slower.

Label from the Mili charger.

Apple devices can reject "wrong" chargers with the error "Charging is not supported with this accessory"; Apple uses special proprietary voltages on the USB data pins to distinguish different types of chargers (details). I measured these voltages on the Mili charger and verified that it is configured to appear as an Apple 2A charger on ouput 1, and an Apple 1A charger on output 2.

Cars: a hostile electrical environment

You might expect to find 12 volts at your car's accessory outlet, but what comes out can be surprisingly noisy and variable. This voltage will have spikes from the ignition system as well as very large transients due to starting, malfunctions, or jump starting. A car charger must handle this hostile voltage input, and make sure the output to your device is smooth.

Test setup to measure charger performance in a car.

I measured the voltage in my car to see what happens in a real-world environment using the setup illustrated above. The Mili charger is plugged in just to the left of the gear shift. Above it is the USB interface board, which is connected to the oscilloscope on the dash.

Car voltage drops and rises when the car is started (left). Car voltage at idle showing ignition spikes (right).

The oscilloscope trace (yellow) on the left shows the large voltage fluctuations when I started the car. At the very left, with the ignition off, the battery provides about 12.5 volts. The starter pulls the voltage down to 8.88 volts until the engine starts. The voltage gradually rises over 6 seconds, settling around 14 volts.

On the right, zooming in shows that while the car is idling, the accessory output has 1/2 volt spikes every 28 milliseconds, due to the ignition firing. Note the voltage on the left is much noisier with the car running than on battery - the line on the left is thin, and the line on the right is thick.

Performance of the Mili charger in a car

The Mili charger has a plug that folds out from the side for use in a car. While this makes the charger larger than a dedicated wall charger, having a charger that works both in the car and with AC is more convenient than I expected, especially when traveling.

The Mili USB charger with car adapter.

I looked at the output of the Mili charger while starting the car, to see if the large voltage fluctuations shown above affected the charger's output. The Mili output remained steady, which is good. I also didn't see any of the ignition spikes in the output from the Mili charger. This indicates that the Mili charger does a good job of filtering out noise from the automotive environment.

I tested the Mili charger with inputs from 0 to 30 volts. 30 volts may seem excessive, but jump-starts often use 24 volts, and car electrical failures can result in a 120 volt "load dump". Fortunately, the Mili survived 30 volts just fine (unlike some other chargers I'm testing). The image below shows that the Mili generates a stable output voltage (horizontal line) for inputs from 7 volts to 30 volts. This is a good thing, showing that the Mili won't overload your phone even if your car is providing too much voltage. As expected, the Mili can't produce the full output voltage if the input voltage is too low (left side of the graph).

Output voltage (Y axis) of the Mili charger as the input ranges from 0 to 30 volts (X axis).

The oscilloscope displays below show the output and frequency spectrum with 12V DC input and a 5W load. The power quality is very good - the yellow line is thin and has very few spikes. The high frequency spectrum (orange) shows a spike at the switching frequency, but overall the power quality is among the best of chargers I've looked at.

High frequency spectrum (left) and Low frequency spectrum (right) of the Mili charger on 12V input.

Next, I measured the voltage the charger can provide under increasing load (details). The horizontal line shows the voltage drops from about 5 volts to 4.5 volts as the load increases. The vertical line shows the charger maxes out around .9 amps with less than the expected 5 volts. This is slightly less than the rated 1 amp the charger is supposed to provide. Both USB outputs provide the same current when plugged into a car outlet.

Voltage vs Current for the Mili charger with 12V input.

Charger performance with wall input

I also examined the performance of the Mili charger when plugged into the wall (120V AC). One minor annoyance with using the Mili as a wall charger is that due to the position of the USB ports, both wall outlets are blocked either by the charger or USB cables.

The Mili charger.

The images below show the voltage the charger can provide under increasing load (details). When plugged into the wall, the two USB outputs provide different maximum currents, unlike when plugged into a car outlet. Output 1 (the high current output) is on the left, and output 2 (the low current output) is on the right. Output 1 reaches about 2.45A before the voltage starts dropping, well above the 2.1A rating. The line for output 1 gets fairly wide above 1A, showing the voltage is not too stable. The line also slopes downwards to the right, indicating the voltage drops somewhat as the load increases. Output 2 reaches about 1.1A before the light starts flashing and the power drops and climbs (the curved lines). This graph shows strange behavior under overload that I haven't seen in other chargers. The lines are all fairly wide, showing the voltage is

Voltage vs current for the Mili charger (output 1 left, output 2 right) with 120V AC input.

I looked at the voltage output along with the high frequency and low frequency spectrums (below), to examine the quality of the power outputs. The yellow line is much wider than when plugged into the car outlet, showing a lot more noise in the output. The large orange spike in the middle of the high frequency spectrum shows that a lot of the charger's switching noise is appearing on the output. Compared to other chargers, the power quality is lower than average. On the positive side, the flat low-frequency spectrum shows the charger is very good at eliminating ripple due to the 60 Hz power lines.

High frequency (left) and low frequency (right) spectrum of the Mili charger with 120V AC input.

Conclusions

The Mili charger is convenient for travel because it has plugs for multiple countries, works as an auto charger, and has dual outputs. The power quality is very good in the car, but not so good with AC power. This charger is my favorite charger now - while I'd like to tear it apart and examine the circuit inside, I like it too much to destroy it. Hopefully if you get one you'll like it too. And if you found this interesting, check out my detailed analysis of a dozen chargers in the lab.

Thanks to Mili, Mobile Fun, and Mihnea for providing me with the charger and patiently waiting for the review.

↧

Twelve tips for using the Rigol DS1052E Oscilloscope

July 5, 2013, 10:30 am

≫ Next: Reverse-engineering the flag circuits in the 8085 processor

≪ Previous: The Mili universal car/wall USB charger, tested in the lab

In this article I share a few tips I've learned about using the Rigol DS1052E oscilloscope.

The Rigol DS1052E digital oscilloscope.

Push the knobs

The knobs all have convenient actions if you push them: pushing Vertical Position or Horizontal Position centers the trace vertically or horizontally. Pushing Tigger Level sets it to zero. Pushing Scale sets it to fine adjust mode.

Long Memory

If you don't use Long Memory, you're wasting most of the capacity of the oscilloscope. Long Memory stores 64 times as much data, so you can really zoom in on the waveform. To enable Long Memory, push the Acquire menu button, then select MemDepth to set Long Mem. There's additional documentation here.

The Long Memory depth option of the Rigol DS1052E oscilloscope.

Use zoom

Once you've recorded a waveform, you can pan across it using the horizontal position knob - the waveform window indicator at the very top of the screen shows where you are. In mid-range settings, however, the pan range is fairly limited (about a factor of 5) compared to how deep you can zoom with the horizontal scale knob (about a factor of 1000 with Long Memory). Note: zoom works best with Single triggering; if you use Auto or Norm triggering and hit Run/Stop, sometimes the detailed data isn't in memory and zoom doesn't show more than is on the display.

Pushing the Scale knob turns on the cool zoom mode, which lets you see the trace and a zoomed-in version at the same time, letting you zoom and pan.

The zoom feature of the Rigol DS1052E oscilloscope.

Using the menus

Most of the menu buttons are in the group of 6 at the top. However, there's also a trigger menu button under the trigger knob and a time base menu under the horizontal position knob. This is in addition to the four vertical menu buttons: CH1, CH2, MATH, and REF.

The menus hide about 1/6 of the display, so close the menu when you're done: push the round Menu On/Off button or push a menu button a second time.

Don't press Auto

The Auto button is right next to the Run/Stop button, so you might think it will set the trigger to Auto Sweep. Instead this button sets the controls to seemingly-random values to aotomatically display your traces. This is good if you're totally lost, but more likely to wipe out the settings you want.

Screenshots

Some oscilloscopes make screenshots easy, but the Rigol is more complicated. To take a screenshot on the Rigol, plug a USB drive into the front panel, then hit the Storage menu button, select Bit map under Storage, select External, New File, and Save. This will save NewFile0.bmp to your flash drive. (It's much easier to rename the file on your computer than on the oscilloscope.)

An alternative is to run the slightly clunky UltraScope software on your computer, which gives you access to the oscilloscope via USB. You can download "UltraScope for DS1000E" from the Rigol Software Applications page; although it has a PDF icon, it's actually a Zip file with the software.

Built-in help

If you hold down a button or knob for three seconds, the oscilloscope displays a help screen explaining its action. (I was surprised when I discovered this by accident.)

The built-in help feature of the Rigol DS1052E oscilloscope is triggered by holding down a button or knob.

Triggering

The three trigger sweep modes are Auto, Normal, and Single. Auto will keep displaying traces until you hit Run/Stop. Normal will display a trace every time the trigger condition is satisfied. Single will display a single trace when triggered and then stop. Auto is the way to see a waveform without worrying about triggering. But if you want a nice, stable waveform, set up the trigger and use Normal. Also make sure you're triggering from the right channel - the oscilloscope likes to default to using Channel 2 as the trigger.

Controlling the channels

If you've used an oscilloscope with separate controls for each channel, you may expect the knobs near CH1 to control channel 1, and the knobs near CH2 to control channel 2. Instead, if you hit CH1, the knobs control channel 1's scale and position, while if you hit CH2, Math, or Ref, the same knobs control that channel's scale and position. Make sure you're controlling the trace you think you're controlling.

Use the colored probe rings

Maybe this is too obvious to mention, but putting matching colored rings on both ends of the oscilloscope probes lets you easily tell which probe goes with which channel.

Label oscilloscope probes with colored rings that match the trace colors.

Cursors

The cursors are very handy to measure voltages, times between two points, frequency, etc. (The Measure mode provides lots of automated measurements, but often doesn't measure what you want.) Manual mode lets you position two cursors (either vertical or horizontal), and the positions and difference are displayed. Track lets you position a cursor along the waveform, and a voltage cursor automatically tracks the waveform. Both time and voltage values are displayed. Auto mode is the mode you should use with Measure, in order to see what the measurements mean.

A tracking cursor puts X-Y lines on the waveform and gives measurements.

Finding the manual

Search for DS1000E (not DS1052E) to find the user's guide and other documentation.

Conclusions

I'm glad I bought the Rigol DS1052E - it performs very well for a low-price ($329) oscilloscope. (If money is no object, there's Agilent's $439,000 Infiniium oscilloscope. :-) I hope you find these tips useful. If you have any additional oscilloscope tips, please leave a comment.

↧

Reverse-engineering the flag circuits in the 8085 processor

July 16, 2013, 10:13 pm

≫ Next: Four Rigol oscilloscope hacks with Python

≪ Previous: Twelve tips for using the Rigol DS1052E Oscilloscope

Processors all have status flags to keep track of conditions such as a zero value, a carry, or a negative value. Whenever you write a loop or conditional, these flags ultimately are in control. But how are these flags implemented in the chip's silicon? I've reverse-engineered the flag circuits in the 8085 microprocessor and explain what is really going on.

The photograph below is a highly magnified image of the 8085's silicon, showing the relevant parts of the chip. In the upper-left, the arithmetic logic unit (ALU) performs 8-bit arithmetic operations. The status flag circuitry is below the ALU and the flags are connected to the data bus (indicated in blue). To the right of the ALU, the control PLA decodes the instructions into control lines that control the operations of the ALU and flag circuits.

The 8085 has seven status flags.

Bit 7 is the sign flag, indicating a negative two's-complement value, which is simply a byte with the top bit set.
Bit 6 is the zero flag, indicating a value that is all zeros.
Bit 5 is the undocumented K (or X5) flag, indicating either a carry from the 16-bit incrementer/decrementer or the result of a signed comparison. See my article on the undocumented K and V flags.
Bit 4 is the auxiliary carry, indicating a carry out of the 4 low-order bits. This is typically used for BCD (binary-coded decimal) arithmetic.
Bit 3 is unused and set to 0. Interestingly, a fairly large transistor drives the data bus line to 0 when reading the flags, so this unused flag bit doesn't come for free.
Bit 2 is the parity flag, which is set if the result has an even number of 1 bits.
Bit 1 is the undocumented signed overflow flag V (details).
Bit 0 is the carry flag.

The image below zooms in on the flag silicon, showing individual transistors. The large transistors labeled with the flag name drive the flag value onto the data bus. From the data bus, the flag values control the results of conditional jumps, calls, and returns. The complex circuits above these transistors compute and store the flag values.

The schematic below shows the flag circuit that is implemented in the silicon above.

Schematic of the flag storage in the 8085 microprocessor.

Each flag bit has a latch and control lines to write a value to the latch. Most flags are updated by the same arithmetic instructions and controlled by the arith_to_flags control line. The carry flag is affected by additional instructions and has its own control line. The undocumented K and V flags are updated in different circumstances and have their own control lines.

The bus_to_flags control loads the flags from the data bus for the POP PSW instruction, while the flags_to_bus control sends the flag values over the data bus for the PUSH PSW instruction or for conditional branches.

The circuitry to compute most flag values is straightforward. The sign flag is set based on bit 7 of the result. The auxiliary carry flag is set on the carry out of bit 3. The K and V flags are set based on the top two bits (details). The zero flag is normally set from the alu_zero signal that indicates all bits are zero.

The zero flag has support for multi-byte zero: at each step it can AND the existing zero flag with the current ALU zero value, so the zero flag will be set if both bytes are zero. This is only used for the (undocumented) DSUB 16-bit subtract instruction. Strangely, this circuit is also activated for the 16-bit DAD instructions, but the result is not stored in the flag.

If you look at the chip photograph at the top of the article, the flags are arranged in apparently-random order, not in their bit order as you might expect. Presumably the layout used is more efficient. Also notice that the carry flag C is off to the right of the ALU. Because of the complexity of the carry logic, which will be discussed next, the circuitry wouldn't fit under the ALU with the rest of the flag logic.

The carry logic

The schematic below shows the circuit for the carry flag. The logic for carry is more complex than for the other flags because carry is used in a variety of ways.

Schematic of the carry circuitry in the 8085 microprocessor.

The value stored in the carry flag

The top part of the circuit computes carry_result, the value stored in the carry flag. This value has several different meanings depending on the instruction:

For arithmetic operations, the carry flag is loaded with the value generated by the ALU. That is, alu_carry_7 (the high-order carry from bit 7 of the ALU) is used. (See Inside the ALU of the 8085 microprocessor for details on how this is computed.)
For DAA (decimal adjust accumulator), the carry flag is set if the high-order digit is >= 10. This value is alu_hi_ge_10, which is selected by the daa control line.
For CMC (complement carry), the carry flag value is complemented. To compute this, the previous carry flag value c_flag is selected by use_carry_flag and complemented by the xor_carry_result control line.
For ARHL/RAR/RRC (rotate right operations), bit 0 of the rotated value goes into the carry. In the circuit, reg_act_0 (the low-order bit in the undocumented ACT (accumulator temp) register) is selected by the alu_shift_right control line.

The xor_carry_result control inverts the carry value in a few cases. For subtraction and comparison, it flips the carry bit to be the borrow bit. For STC (set carry), the xor_carry_result control forces the carry to 1. For AND operations, it forces the carry to 0.

Generating the carry input signal

The middle part of the circuit selects the appropriate carry_in value that is supplied to the ALU.

The first option is to set the carry in to either 0 or 1, by using carry_in_0 and optionally xor_carry_in. This is used for most instructions.

The next option is to use the current carry flag value as an input for additions or subtractions (allowing multi-byte arithmetic). For subtraction, this is inverted to convert borrow to carry; the xor_carry_in control does this.

The final option uses the carry latch to temporarily hold the carry for the undocumented LDHI and LDSI instructions. These instructions add a constant to a 16-bit register pair, so they need to add the carry from of the low-order sum to the high-order byte. The carry latch temporarily holds the carry, and this value is selected by the use_latched_carry control line. You might wonder why not just use the normal carry flag; the LDHI and LDSI instructions are designed to leave the carry flag unchanged, so they need somewhere else to temporarily store the carry. The surprising conclusion that Intel deliberately included circuitry in the 8085 specifically to support these undocumented instructions, and then decided not to support these instructions. (In contrast, the 6502's unsupported instructions are just random consequences of unsupported opcodes.)

Generating the shift_right input signal

Each bit of the ALU has a shift right input. For most of the bits, the input comes from the bit to the left, but the high-order bit uses different inputs depending on the instruction. The bottom circuit in the schematic below generates the shift right input for the ALU. This circuit has two simple options.

Normally the carry flag is fed into shift_right_in. For the ARHL and RAR instructions, this causes the carry flag to go into the high-order bit.
For the RRC and RLC instructions (rotate A left/right), the rotate_carry control selects bit 0 as the shift right input.

Conclusions

By reverse-engineering the 8085, we can see how the flag circuits in the 8085 actually works at the gate and silicon level. One interesting feature is the circuitry to implement undocumented instructions and flags. Another interesting feature is the complexity of the carry flag compared to the other flags.

Footnotes on rotate

I recommend you skip this section, but there are few confusing things about the rotate logic that I wanted to write down.

For some reason the rotate operations are named very strangely in the 8080 and 8085. RRC is the "rotate accumulator right" instruction and RAR is the "rotate accumulator right through carry" instruction. Based on the abbreviations, the names seem reversed. The left rotates RLC and RAL are similar. The Z-80 processor has a similar RRC instruction, but calls it "rotate right circular", making the abbreviation slightly less nonsensical.

Bit 0 of ACT is fed into shift_right_in for both RRC and RLC. However, this input is just ignored for RLC since the rotation is the other direction, so I assume this is just a result of the control logic treating RRC and RLC the same.)

To reduce the control circuitry, the rotate_carry and use_latched_carry control lines are actually the same control line since the instructions that use them don't conflict. In other words, there is just one control line, but it has two distinct functions.

↧

Four Rigol oscilloscope hacks with Python

July 19, 2013, 12:31 am

≫ Next: Reverse-engineering the 8085's ALU and its hidden registers

≪ Previous: Reverse-engineering the flag circuits in the 8085 processor

A Rigol oscilloscope has a USB output, allowing you to control it with a computer and and perform additional processing externally. I was inspired by Cibo Mahto's article Controlling a Rigol oscilloscope using Linux and Python, and came up with some new Python oscilloscope hacks: super-zoomable graphs, generating a spectrogram, analyzing an IR signal, and dumping an oscilloscope trace as a WAV file. The key techniques I illustrate are connecting to the oscilloscope with Windows, accessing a megabyte of data with Long Memory, and performing analysis on the data.

Analyzing the IR signal from a TV remote using an IR sensor and a Rigol DS1052E oscilloscope.

Super-zoomable graphs

One of the nice features of the Rigol is "Long Memory" - instead of downloading the 600-point trace that appears on the screen, you can record and access a high-resolution trace of 1 million points. In this hack, I show how you can display this data with Python, giving you a picture that you can easily zoom into with the mouse.

The following screenshot shows the data collected by hooking the oscilloscope up to an IR sensor. In the above picture, the sensor is the three-pin device below the screen. Since I've developed an IR library for Arduino, my examples focus on IR, but any sort of signal could be used. By enabling Long Memory, we can download not just the data on the screen, but 1 million data points, allowing us to zoom way, way in. The graph below shows what it sent when you press a button on the TV remote - the selected button transmits a code, followed by a periodic repeat signal as long as the button is held down.

The IR signal from a TV remote. The first block is the code, followed by period repeat signals while the button is held down.

But with Long Memory, we can interactively zoom way on the waveform and see the actual structure of the code - long header pulses followed by a sequence of wide and narrow pulses that indicate the particular button. That's not the end of the zooming - we can zoom way in on an edge of a pulse and see the actual rise time of the signal over a few microseconds. You can do some pretty nice zooming when you have a million datapoints to plot.

To use this script, first enable Long Memory by going to Acquire: MemDepth. Next, set the trigger sweep to Single. Capture the desired waveform on the oscilloscope. Then run the Python script to upload the data to your computer, which will display a plot using matplotlib. To zoom, click the "+" icon at the bottom of the screen. This lets you pan back and forth through the data by holding down the left mouse button. You can zoom in and out by holding the right mouse button down and moving the mouse right or left. The magnifying glass icon lets you select a zoom rectangle with the mouse. You can zoom on your oscilloscope too, of course, but using a mouse and having labeled axes can be much more convenient.

A few things to notice about the code. The first few lines get the list of instruments connected to VISA and open the USB instrument (i.e. your oscilloscope). The timeout and chunk size need to be increased from the defaults to download the large amount of data without timeouts.

Next, ask_for_values gets various scale values from the oscilloscope so the axes can be labeled properly. By setting the mode to RAW we download the full dataset, not just what is visible on the screen. We get the raw data from channel 1 with :WAV:DATA? CHAN1. The first 10 bytes are a header and should be discarded. Next, the raw bytes are converted to numeric values with Mahto's formulas. Finally, matplotlib plots the data.

There are a couple "gotchas" with Long Memory. First, it only works reliably if you capture a single trace by setting the trigger sweep to "single". Second, downloading all this data over USB takes 10 seconds or so, which can be inconveniently slow.

Analyze an IR signal

Once we can download a signal from the oscilloscope, we can do more than just plot it - we can process and analyze it. In this hack, I decode the IR signal and print the corresponding hex value. Since it takes 10 seconds to download the signal, this isn't a practical way of using an IR remote for control. The point is to illustrate how you can perform logic analysis on the oscilloscope trace by using Python.

This code shows how the Python script can wait for the oscilloscope to be triggered and enter the STOP state. It also shows how you can use Python to initialize the oscilloscope to a desired configuration. The oscilloscope gets confused if you send too many commands at once, so I put a short delay between the commands.

Generate a spectrogram

Another experiment I did was using Python libraries to generate a spectrogram of a signal recorded by the oscilloscope. I simply hooked a microphone to the oscilloscope, spoke a few words, and used the script below to analyze the signal. The spectrogram shows low frequencies at the bottom, high frequencies at the top, and time progresses left to right. This is basically a FFT swept through time.

A spectrogram generated by matplotlib using data from a Rigol DS1052E oscilloscope.

To use this script, set up the oscilloscope for Long Memory as before, record the signal, and then run the script.

Dump data to a .wav file

You might want to analyze the oscilloscope trace with other tools, such as Audacity. By dumping the oscilloscope data into a WAV file, it can easily be read into other software. Or you can play the data and hear how it sounds.

To use this script, enable Long Memory as described above, capture the signal, and run the script. A file channel1.wav will be created.

How to install the necessary libraries

Before connecting your oscilloscope to your Windows computer, there are several software packages you'll need.

I assume you have Python already installed - I'm using 2.7.3.
Install NI-VISA Run-Time Engine 5.2. This is National Instruments Virtual Instrument Software Architecture, providing an interface to hardware test equipment.
Install PyVISA, the Python interface to VISA.
If you want to run the graphical programs, install Numpy and matplotlib.

You can also use Rigol's UltraScope for DS1000E software, but the included NI_VISA 4.3 software doesn't work with pyVisa - I ended up with VI_WARN_CONFIG_NLOADED errors. If you've already installed Ultrascope, you'll probably need to uninstall and reinstall NI_VISA.

If you're using Linux instead of Windows, see Mehta's article.

How to control and program the oscilloscope

Once the software is installed (below), connect the oscilloscope to the computer's USB port. Use the USB port on the back of the oscilloscope, not the flash drive port on the front panel.

Hopefully the code examples above are clear. First, the Python program must get the list of connected instruments from pyVisa and open the USB instrument, which will have a name like USB0::0x1AB1::0x0588::DS1ED141904883. Once the oscilloscope connection is open, you can use scope.write() to send a command to the oscilloscope, scope.ask() to send a command and read a result string, and scope.ask_for_values() to send a command and read a float back from the oscilloscope.

When the oscilloscope is under computer control, the screen shows Rmt and the front panel is non-responsive. The "Force" button will restore local control. Software can release the oscilloscope by sending the corresponding ":KEY:FORCE" command.

Error handling in pyVisa is minimal. If you send a bad command, it will hang and eventually timeout with VisaIOError: VI_ERROR_TMO: Timeout expired before operation completed.

The API to the oscilloscope is specified in the DS1000D/E Programming Guide. If you do any Rigol hacking, you'll definitely want to read this. Make sure you use the right programming guide for your oscilloscope model - other models have slightly different commands that seem plausible, but they will timeout if you try them.

Conclusions

Connecting an oscilloscope to a computer opens up many opportunities for processing the measurement data, and Python is a convenient language to do this. The Long Memory mode is especially useful, since it provides extremely detailed data samples.

↧

Reverse-engineering the 8085's ALU and its hidden registers

July 19, 2013, 9:07 am

≫ Next: Simulating a TI calculator with crazy 11-bit opcodes

≪ Previous: Four Rigol oscilloscope hacks with Python

This article describes how the ALU of the 8085 microprocessor works and how it interacts with the rest of the chip, based on reverse-engineering of the silicon. (This is part 2 of my ALU reverse-engineering; part 1 described the circuit for a single ALU bit.) Along with the accumulator, the ALU uses two undocumented registers - ACT and TMP - and this article describes how they work in detail, as well as how the ALU is controlled.

The arithmetic-logic unit is a key part of the microprocessor, performing operations and comparisons on data. In the 8085, the ALU is also a key part of the data path for moving data. The ALU and associated registers take up a fairly large part of the chip, the upper left of the photomicrograph image below. The control circuitry for the ALU is in the top center of the image. The data bus (dbus) is indicated in blue.

Photograph of the 8085 chip showing the location of the ALU, flags, and registers.

The real architecture of the 8085 ALU

The following architecture diagram shows how the ALU interacts with the rest of the 8085 at the block-diagram level. The data bus (dbus) conneccts the ALU and associated registers with the rest of the 8085 microprocessor. There are also numerous control lines, which are not shown.

The ALU uses two temporary registers that are not directly visible to the programmer. The Accumulator Temporary register (ACT) holds the accumulator value while an ALU operation is performed. This allows the accumulator to be updated with the new value without causing a race condition. The second temporary register (TMP) holds the other argument for the ALU operation. The TMP register typically holds a value from memory or another register.

Architecture of the 8085 ALU as determined from reverse-engineering.

The 8085 datasheet has an architecture diagram that is simplified and not quite correct. In particular, the ACT register is omitted and a data path from the data bus to the accumulator is shown, even though that path doesn't exist.

The accumulator and ACT registers

To the programmer, the accumulator is the key register for arithmetic operations. Reverse-engineering, however, shows the accumulator is not connected directly to the ALU, but works closely with the ACT (accumulator temporary) register.

The ACT register has several important functions. First, it holds the input to the ALU. This allows the results from the ALU to be written back to the accumulator without disturbing the input, which would cause instability. Second, the ACT can hold constant values (e.g. for incrementing or decrementing, or decimal adjustment) without affecting the accumulator. Finally, the ACT allows ALU operations that don't use the accumulator.

The accumulator and ACT (Accumulator Temporary) registers and their control lines in the 8085 microprocessor.

The diagram above shows how the accumulator and ACT registers are connected, and the control lines that affect them. One surprise is that the only way to put a value into the accumulator is through the ALU. This is controlled by the alu_to_a control line. You might expect that if you load a value into the accumulator, it would go directly from the data bus to the accumulator. Instead, the value is OR'd with 0 in the ALU and the result is stored in the accumulator.

The accumulator has two status outputs: a_hi_ge_10, if the four high-order bits are ≥ 10, and a_lo_ge_10, if the four low-order bits are ≥ 10. These outputs are used for decimal arithmetic, and will be explained in another article.

The accumulator value or the ALU result can be written to the databus through the sel_alu_a control (which selects between the ALU result and the accumulator), and the alu/a_to_dbus control line, which enables the superbuffer to write the value to the data bus. (Because the data bus is large and connects many parts of the chip, it requires high-current signals to overcome its capacitance. A "superbuffer" provides this high-current output.)

The ACT register can hold a variety of different values. In a typical arithmetic operation, the accumulator value is loaded into the ACT via the a_to_act control. The ACT can also load a value from the data bus via dbus_to_act. This is used for the ARHL/DAD/DSUB/LDHI/LDSI/RDEL instructions (all of which are undocumented except DAD). These instructions perform arithmetic operations without involving the accumulator, so they require a path into the ALU that bypasses the accumulator.

The control lines allow the ACT register to be loaded with a variety of constants. The 0/fe_to_act control line loads either 0 or 0xfe into the ACT; the value is selected by the sel_0_fe control line. The value 0 has a variety of uses. ORing a value with 0 allows the value to pass through the ALU unchanged. If the carry is set, ADDing to 0 performs an increment. The value 0xfe (signed -2) is used only for the DCR (decrement by 1) instruction. You might think the value 0xff (signed -1) would be more appropriate, but if the carry is set, ADDing 0xfe decrements by 1. I think the motivation is so both increments and decrements have the carry set, and thus can use the same logic to control the carry.

Since the 8085 has a 16-bit increment/decrement circuit, you might wonder why the ALU is also used for increment/decrement. The main reason is that using the ALU allows the condition flags to be set by INR and DCR. In contrast, the 16-bit increment and decrement instructions (INX and DCX) use the incrementer/decrementer, and as a consequence the flags are not updated.

To support BCD, the ACT can be loaded with decimal adjustment values 0x00, 0x06, 0x60, or 0x66. The top and bottom four bits of ACT are loaded with the value 6 with the 6x_to_act and x6_to_act control lines respectively.

It turns out that the decimal adjustment values are easily visible in the silicon. The following image shows the silicon that implements the ACT register. Each of the large pink structures is one bit. The eight bits are arranged with bit 7 on the left and bit 0 on the right. Note that half of the bits have pink loops at the top, in the pattern 0110 0110. These loops pull the associated bit high, and are used to set the high and/or low four bits to 6 (binary 0110).

The ACT register in the 8085. This image shows the silicon that implements the 8-bit register.

Building the 8-bit ALU from single-bit slices

In my previous article on the 8085 ALU I described how each bit of the ALU is implemented. Each bit slice of the ALU takes two inputs and performs a simple operation: or, add, xor and, shift right, complement, or subtract. The ALU has a shift right input and a carry input, and generates a carry output. In addition, each slice of the ALU contributes to the parity and zero calculations. The ALU has five control lines to select the operation.

One bit of the ALU in the 8085 microprocessor

The ALU has seven basic operations: or, add, xor, and, shift right, complement, and subtract. The following table shows the five control lines that select the operation, and the meaning of the carry line for the operation. Note that the meaning of carry in and carry out is different for each operations. For bit operations, the implementation of the ALU circuitry depends on a particular carry in value, even though carry is meaningless for these operations.

Operation	select_neg_in2	select_op1	select_op2	select_shift_right	select_ncarry_1	Carry in/out
or	0	0	0	0	1	1
add	0	1	0	0	0	/carry
xor	0	1	0	0	1	1
and	0	1	1	0	1	0
shift right	0	0	1	1	1	0
complement	1	0	0	0	1	1
subtract	1	1	0	0	0	borrow

The eight-bit ALU is formed by linking eight single-bit ALUs as shown below. The high-order bit is on the left, and the low-order bit on the right, matching the layout in silicon. The carry, parity, and zero values propagate through each ALU to form the final values on the left. The right shift input is simply the bit from the right, with the exception of the topmost bit which uses a special shift right input. The auxiliary carry is simply the carry out of bit three. The control lines to select the operation are fed into all eight ALU slices. By combining eight of these ALU slices, the whole 8-bit ALU is created. The values from the top bit are used to control the parity, zero, carry, and sign flags (as well as the undocumented K and V flags). Bit 3 generates the half carry flag.

The 8-bit ALU in the 8085 is formed by combining eight 1-bit slices.

The control lines

The ALU uses 29 control lines that are generated by a PLA that activates the right control lines based on the opcode and the position in the instruction cycle. For reference, the following table lists the 29 ALU control lines and the instructions that affect them.

Control line	Relevant instructions
`ad_latch_dbus, write_dbus_to_alu_tmp, /ad_dbus`	`IN/LDA/LHLD`
`/ad_dbus`	`ARHL/DAD/DSUB/LDHI/LDSI/RDEL`
`/alu/a_to_dbus`	all
`/dbus_to_act`	`ARHL/DAD/DSUB/LDHI/LDSI/RDEL`
`a_to_act`	`ACI/ADC/ADD/ADI/ANA/ANI/CMP/CPI/ORA/ORI/RAL/RAR/RLC/RRC/SBB/SBI/SUB/SUI/XRA/XRI`
`0/fe_to_act`	all
`sel_alu_a`	all
`alu_to_a`	`ACI/ADC/ADD/ADI/ANA/ANI/CMA/CMC/DAA/DCR/IN/INR/LDA/LDAX/MOV/MVI/ORA/ORI/POP/RAL/RAR/RIM/RLC/RRC/SBB/SBI/SIM/STC/SUB/SUI/XRA/XRI`
`/daa`	`DAA`
`sel_0_fe`	`DCR`
`store_v_flag`	`ACI/ADC/ADD/ADI/ANA/ANI/ARHL/CMP/CPI/DAA/DCR/INR/ORA/ORI/RAL/RAR/RLC/RRC/SBB/SBI/SUB/SUI/XRA/XRI`
`select_shift_right`	`ARHL/RAR/RRC`
`arith_to_flags`	`ACI/ADC/ADD/ADI/ANA/ANI/CMP/CPI/DAA/DCR/DSUB/INR/ORA/ORI/SBB/SBI/SUB/SUI/XRA/XRI`
`bus_to_flags`	`POP PSW`
`/zero_flag_combine`	`DAD/DSUB`
`/flags_to_bus`	`ACI/ADC/ADD/ADI/ANA/ANI/ARHL/CALL/CC/CM/CMA/CMC/CMP/CNC/CNZ/CP/CPE/CPI//CPO/CZ/DAA/DAD/DCR/DCX/DI/DSUB/EI/HLT/IN/INR/INX/JC/JK/JM/JMP/JNC/JNK/JNZ/JP/JPE/JPO/JZ/LDA/LDAX/LDHI/LDSI/LHLD/LHLX/LXI/MOV/MVI/NOP/ORA/ORI/OUT/PCHL/POP/PUSH/RAL/RAR/RC/RDEL/RET/RIM/RLC/RM/RNC/RNZ/RP/RPE/RPO/RRC/RST/RSTV/RZ/SBB/SBI/SHLD/SHLX/SIM/SPHL/STA/STAX/STC/SUB/SUI/XCHG/XRA/XRI/XTHL`
`shift_right_in_select`	`ARHL`
`xor_carry_in`	`ANA/ANI/ARHL/CMP/CPI/DCR/DSUB/INR/RAR/RRC/SBB/SBI/SUB/SUI`
`select_op2`	`ANA/ANI/ARHL/RAR/RRC`
`/use_latched_carry /rotate_carry`	`LDHI/LDSI/RLC/RRC`
`/carry_in_0`	0 except for `ACI/ADC/DAD/DSUB/LDHI/LDSI/RAL/RDEL/RLC/SBB/SBI`
`select_op1`	`ACI/ADC/ADD/ADI/ANA/ANI/CMP/CPI/DAA/DAD/DCR/DSUB/INR/LDHI/LDSI/RAL/RDEL/RLC/SBB/SBI/SUB/SUI/XRA/XRI`
`select_ncarry_1`	`ACI/ADC/ADD/ADI/CMP/CPI/DAA/DAD/DCR/DSUB/INR/LDHI/LDSI/RAL/RDEL/RLC/SBB/SBI/SUB/SUI`
In combination with first control line, `write_dbus_to_alu_tmp`	`ADC/ADD/ANA/CMA/CMC/CMP/DAA/DCR/INR/MOV/ORA/RAL/RAR/RIM/RLC/RRC/SBB/SIM/STC/SUB/XRA`
`select_neg_in2`	`CMA/CMP/CPI/DSUB/SBB/SBI/SUB/SUI`
`carry_to_k_flag`	`DCX/INX`
`store_carry_flag`	`ACI/ADC/ADD/ADI/ANA/ANI/ARHL/CMC/CMP/CPI/DAA/DAD/DSUB/ORA/ORI/RAL/RAR/RDEL/RLC/RRC/SBB/SBI/STC/SUB/SUI/XRA/XRI`
`xor_carry_result`	xor for `ANA/ANI/CMC/CMP/CPI/DSUB/SBB/SBI/STC/SUB/SUI`
`/latch_carry use_carry_flag`	`CMC/LDHI/LDSI`

Conclusions

By reverse-engineering the 8085, we can see how the ALU actually works at the gate and silicon level. The ALU uses many standard techniques, but there are also some surprises and tricks. There are two registers (ACT and TMP) that are invisible to the programmer. You'd expect a direct path from the data bus to the accumulator, but instead the data passes through the ALU. The increment/decrement logic uses the unexpected constant 0xfe, and there are two totally different ways of performing increment/decrement. Several undocumented instructions perform ALU operations without involving the accumulator at all.

This information builds on the 8085 reverse-engineering done by the visual 6502 team. This team dissolves chips in acid to remove the packaging and then takes many close-up photographs of the die inside. Pavel Zima converted these photographs into mask layer images, generated a transistor net from the layers, and wrote a transistor-level 8085 simulator.

↧

Simulating a TI calculator with crazy 11-bit opcodes

August 11, 2013, 10:50 am

≫ Next: Hippies, clever hardware and Steve Jobs' body odor: Visiting Apple in 1976

≪ Previous: Reverse-engineering the 8085's ALU and its hidden registers

I've built a register-level simulator of a 1974 TI calculator chip that shows what actually happens inside a calculator when you perform operations and shows the calculator source code as it executes. The architecture of the calculator chip is pretty interesting, with 11-bit opcodes, a 9-bit address bus, and 44-bit BCD registers. The chip doesn't support multiplication or division, so these are performed with repeated addition or subtraction.

The simulator is at righto.com/ti.

↧

Hippies, clever hardware and Steve Jobs' body odor: Visiting Apple in 1976

August 24, 2013, 10:21 am

≫ Next: Reverse-engineering and simulating Sinclair's amazing 1974 calculator with half the ROM of the HP-35

≪ Previous: Simulating a TI calculator with crazy 11-bit opcodes

A guest posting from William Fine:

I saw the "Jobs" movie yesterday and it revived some ancient memories of my dealings with Jobs and Holt in the "old days"! When I returned home, I researched Rod Holt on the Internet and ran across your Power Supply Blog, which I found most interesting. Perhaps you can add my ensuing comments to your blog as you see fit.

In 1973 I started a company in my garage in Cupertino to design and manufacture custom Magnetic Products. It was called Mini-Magnetics Co. Inc. After a few months I was forced out of the garage into a small office complex on Sunnyvale-Saratoga Road, and had about 5/10 employees.

I believe it was around 1975/76 or so, I had a visit from a insulation and wire salesman named Mike Felix. He informed me that I may soon be getting a call from a new /start-up company called Apple Computer located just a few blocks away in Cupertino. He gave them my name when he was asked to recommend a Magnetics manufacturing house.

I promptly forgot about it, as I was already quite busy and I never had to solicit business or even advertise. A week or two went by, and I received a call from a female at Apple who set up a appointment for the next day with a guy named Steve Jobs. She gave me the address and it turned out to be located in a office complex located just behind the "Good Earth" restaurant.

The next day I went over to the location and knocked on the door, and it was opened by Jobs, with Wozniack in the background and a young hippie looking girl at a desk in the corner talking on the phone while eating. That was Apple Computer. They had just moved out of their garage into this new location. It appeared to be a large room with "stuff" scattered hap-hazardly all over, on benches and on the floor. From Jobs' appearance, I was a bit afraid to even shake hands with him, especially after getting a whiff of his body odor!

He immediately took me over to a bench that had a few cardboard boxes on it and showed me some transformer cores, bobbins and spools of wire, and unfolded a hand written diagram of the various magnetic components that he wanted me to wind and assemble for Apple.I took a quick look, and while it was all quite sketchy, looked do-able. He said that he needed them within 10 days and I said ok, since he was furnishing the materials.

I told him that I would call him with a quote after I got back to my office and he said ok and as we were parting he mentioned that if I had any technical questions to get a hold of a guy named Rod Holt and wrote down a phone number where he could be reached.

As I recall, there were about 5-6 magnetic components from simple toroids to a complex switching main power transformer. I believe that the price came to about $10.00 per set,and they wanted 35 sets, so the entire matter would be about $350.00. I called it into Apple the next day and they gave me a Purchase Order number over the phone. When I asked if they would be mailing me a hard copy confirming the order, they had no idea of what I was talking about!

I figured, what the hell, worst case, I would be out $350 bucks if they didn't pay the bill. No big deal.

After I got into examining the sketches I discovered something quite interesting about the power transformer. In all previous designs that I had seen, there was a primary, a base feedback winding and several output windings. What Holt had contrived was a interesting method of assuring excellent coupling of the base winding by using a single strand of wire from a multi-filar bundle that was custom ordered from the wire factory. For example, I think that there was a bundle of 30 strands twisted together, which were all coated in red insulation and one strand of green insulation also twisted together in the bundle, which gave a precise turns ratio together with excellent coupling between the windings.

I am uncertain if that contributed much to improving the efficiency of the switcher, but it seemed clever at the time I discovered it. That transformer, is the one that is shown with the copper foil external shield pictured in your blog. I did speak with Holt once or twice but never met him in person.

The 35 sets of parts were delivered on time and much to my surprise, we were paid within 10 days. I attributed that to the arrival of Mike Markkula onto the scene who had provided some money and organization to Apple.

At the time, after seeing the Apple operation, I wouldn't have given a nickle for a share of their stock if it had been offered! Ha!

I had been involved with power supplies for many years prior to this Apple issue, and can say that switchers were known for a long time, but only became practical with the advent of low loss ferrite core materials and faster transistors as your blog implies.

So, thats the Apple Power Supply story ! Be happy to answer any questions that you may come up with. Regards, wpf

↧

Reverse-engineering and simulating Sinclair's amazing 1974 calculator with half the ROM of the HP-35

August 30, 2013, 2:57 pm

≫ Next: Reverse-engineering the 8085's decimal adjust circuitry

≪ Previous: Hippies, clever hardware and Steve Jobs' body odor: Visiting Apple in 1976

I've reverse-engineered the Sinclair Scientific calculator. The remarkable thing about this calculator is they took a simple 4-function calculator chip and reprogrammed its 320-instruction ROM to be a full scientific calculator. By looking at the chip, I've extracted the original code, reverse-engineered how it works, and written a JavaScript simulator that runs the original code and shows what the calculator is doing internally.

The simulator is at righto.com/sinclair. My earlier TI calculator simulator is at righto.com/ti. (The image above is courtesy of Hackaday.)

↧

Reverse-engineering the 8085's decimal adjust circuitry

August 31, 2013, 2:04 pm

≫ Next: 9 Hacker News comments I'm tired of seeing

≪ Previous: Reverse-engineering and simulating Sinclair's amazing 1974 calculator with half the ROM of the HP-35

In this post I reverse-engineer and describe the simple decimal adjust circuit in the 8085 microprocessor. Binary-coded decimal arithmetic was an important performance feature on early microprocessors. The idea behind BCD is to store two 4-bit decimal numbers in a byte. For instance, the number 42 is represented in BCD as 0100 0010 (0x42) instead of binary 00101010 (0x2a). This continues my reverse engineering series on the 8085's ALU, flag logic, undocumented flags, register file, and instruction set.

The motivation behind BCD is to make working with decimal numbers easier. Programs usually need to input and output numbers in decimal, so if the number is stored in binary it must be converted to decimal for output. Since early microprocessors didn't have division instructions, converting a number from binary to decimal is moderately complex and slow. On the other hand, if a number is stored in BCD, outputting decimal digits is trivial. (Nowadays, the DAA operation is hardly ever used).

One problem with BCD is the 8085's ALU operates on binary numbers, not BCD. To support BCD operations, the 8085 provides a DAA (decimal adjust accumulator) operation that adjusts the result of an addition to correct any overflowing BCD values. For instance, adding 5 + 6 = binary 0000 1011 (hex 0x0b). The value needs to be corrected by adding 6 to yield hex 0x11. Adding 9 + 9 = binary 0001 0010 (hex 0x12) which is a valid BCD number, but the wrong one. Again, adding 6 fixes the value. In general, if the result is ≥ 10 or has a carry, it needs to be decimal adjusted by adding 6. Likewise, the upper 4 BCD bits get corrected by adding 0x60 as necessary. The DAA operation performs this adjustment by adding the appropriate value. (Note that the correction value 6 is the difference between a binary carry at 16 and a decimal carry at 10.)

The DAA operation in the 8085 is implemented by several components: a signal if the lower bits of the accumulator are ≥ 10, a signal if the upper bits are ≥ 10 (including any half carry from the lower bits), and circuits to load the ACT register with the proper correction constant 0x00, 0x06, 0x60, or 0x66. The DAA operation then simply uses the ALU to add the proper correction constant.

The block diagram below shows the relevant parts of the 8085: the ALU, the ACT (accumulator temp) register, the connection to the data bus (dbus), and the various control lines.

The circuit below implements this logic. If the low-order 4 bits of the ALU are 10 or more, alu_lo_ge_10 is set. The logic to compute this is fairly simple: the 8's place must be set, and either the 4's or 2's. If DAA is active, the low-order bits must be adjusted by 6 if either the low-order bits are ≥ 10 or there was a half-carry (A flag).

Similarly, alu_hi_ge_10 is set if the high-order 4 bits are 10 or more. However, a base-10 overflow from the low order bits will add 1 to the high-order value so a value of 9 will also set alu_hi_ge_10 if there's an overflow from the low-order bits. A decimal adjust is performed by loading 6 into the high-order bits of the ACT register and adding it. A carry out also triggers this decimal adjust.

Schematic of the decimal adjust circuitry in the 8085 microprocessor.

The circuits to load the correction value into ACT are controlled by the load_act_x6 signal for the low digit and load_act_6x for the high digit. These circuits are shown in my earlier article Reverse-engineering the 8085's ALU and its hidden registers.

Comparison to the 6502

By reverse-engineering the 8085, we see how the simple decimal adjust circuit in the 8085 works. In comparison, the 6502 handles BCD in a much more efficient but complex way. The 6502 has a decimal mode flag that causes addition and subtraction to automatically do decimal correction, rather than using a separate instruction. This patented technique avoids the performance penalty of using a separate DAA instruction. To correct the result of a subtraction, the 6502 needs to subtract 6 (or equivalently add 10). The 6502 uses a fast adder circuit that does the necessary correction factor addition or subtraction without using the ALU. Finally, the 6502 determines if correction is needed before the original addition/subtraction completes, rather than examining the result of the addition/subtraction, providing an additional speedup.

↧

9 Hacker News comments I'm tired of seeing

September 1, 2013, 10:24 am

≫ Next: Reverse-engineering the Z-80: the silicon for two interesting gates explained

≪ Previous: Reverse-engineering the 8085's decimal adjust circuitry

As a long-time reader of Hacker News, I keep seeing some comments they don't really contribute to the conversation. Since the discussions are one of the most interesting parts of the site I offer my suggestions for improving quality.

Correlation is not causation: the few readers who don't know this already won't benefit from mentioning it. If there's some specific reason you think a a study is wrong, describe it.
"If you're not paying for it, you're the product" - That was insightful the first time, but doesn't need to be posted about every free website.
Explaining a company's actions by "the legal duty to maximize shareholder value" - Since this can be used to explain any action by a company, it explains nothing. Not to mention the validity of statement is controversial.
[citation needed] - This isn't Wikipedia, so skip the passive-aggressive comments. If you think something's wrong, explain why.
Premature optimization - labeling every optimization with this vaguely Freudian phrase doesn't make you the next Knuth. Calling every abstraction a leaky abstraction isn't useful either.
Dunning-Kruger effect - an overused explanation and criticism.
Betteridge's law of headlines - this comment doesn't need to appear every time a title ends in a question mark.
A link to a logical fallacy, such as ad hominem or more pretentiously tu quoque - this isn't a debate team and you don't score points for this.
"Cue the ...", "FTFY", "This.", and other generic internet comments are just annoying.

In general if a comment could fit on a bumper sticker or is simply a link to a Wikipedia page or is almost a Hacker News meme, it's probably not useful.

What comments bother you the most?

↧

Reverse-engineering the Z-80: the silicon for two interesting gates explained

September 2, 2013, 9:48 am

≫ Next: Intel x86 documentation has more pages than the 6502 has transistors

≪ Previous: 9 Hacker News comments I'm tired of seeing

I've been reverse-engineering the Z-80 processor, using images from the Visual 6502 team. One interesting thing about the Z-80's silicon is it uses complex gates with multiple inputs and multiple levels of logic. It also implements an XOR gate with an unusual pass-transistor circuit. I thought it would be interesting to examine these gates at the silicon level and show how they work.

The image above shows the overall organization of the Z-80 chip. I'm going to zoom way in on the ALU and look at the silicon that implements one of the complex gates there: a 5-input, three-level gate. I'll walk through this gate and show how it works at the silicon level. While the silicon look like a jumble of lines, its operation is actually straightforward if you step through it.

Let's begin with an (oversimplified) description of how the chip is constructed. The chip starts with the silicon wafer. Regions are diffused with an element such as boron, yielding conductive diffusion regions. A layer of polysilicon strips is put on top. Finally, a layer of metal "wires" above the polysilicon provides more connections. For our purposes, diffusion regions, polysilicon, and metal can all be consider conductors.

In the image below, the bright vertical bands are metal wires. The slightly darker horizontal bands are polysilicon; the borders are more visible than the regions themselves. In this part of the Z-80, the polysilicon connections run mostly horizontally, and the metal wires run vertically. The large irregular regions outlined in black are doped silicon diffusion regions. The circles are vias between different layers.

Transistors are formed where a polysilicon line crosses a diffusion region. You might expect transistors to be very visible in the image, but a polysilicon line looks the same whether its a conductor or a transistor. So transistors just appear as long skinny regions in the image. The diagram below shows the physical structure of a transistor: the source and drain are connected if the gate is positive.

Let's dive in and see how this circuit works. There's a lot going on, but the image below has been colored to make it clearer. Only three of the vertical metal lines are relevant. On the left, the yellow metal line ties together parts of the gate. In the middle is the blue ground line, which is critical to the operation of the gate. At the right, the red positive voltage line is used to pull the output high through a resistor. The large diffusion region has been tinted cyan. This region can be thought of as big conductive areas interrupted by transistors. There are 5 pinkish polysilicon input wires, labeled A, B, C, D, E. When they cross the diffusion region they still act as wires, but also form a transistor below in the diffusion region. For instance, input A is connected to two transistors.

With all the pieces labeled, we can figure out the operation of the circuit. If input A is high, the first transistor will conduct and connect the yellow strip to ground (dotted line 1). Likewise, if input B is high, the second transistor will conduct and ground the yellow strip (dotted line 2). C will ground the yellow strip via 3. So the yellow strip will be grounded for A or B or C. This forms a three-input OR gate.

If input D is high, transistor 4 will connect the yellow strip to the output. Likewise, if input E is high, transistor 5 will connect the yellow strip to the output. Thus, the output will be grounded if (A or B or C) and (D or E).

In the upper right, arrow 6/7/8 will ground the output if A and B and C are high and the three associated transistors (6, 7, 8) conduct. This computes A and B and C.

Putting this all together, the output will be grounded if [(A or B or C) and (D or E)] or [A and B and C]. If the output is not grounded, the resistor (actually a depletion transistor) will pull the output high. Thus, the final output is not [(A or B or C) and (D or E)] or [A and B and C].

The diagram below shows the gate logic implemented by this circuit. This rather complex gate is created from just nine transistors. Note that the final AND and NOR gates are "for free" - they are formed by wiring together previous outputs and don't require additional transistors. Another point of interest is that with NMOS, the output will be high unless something pulls it low, which explains why circuits are based on NAND and NOR gates rather than AND and OR gates.

If you want to see more low-level silicon analysis, see my article on the overflow circuit in the 6502 at the silicon level.

What does this gate do?

This gate is a key part of one bit of the Z-80's ALU. The gate generates the (inverted) sum, AND, OR, or XOR of B and C depending on the inputs. Specifically, B and C are the two operand inputs, and A is the carry in. D is a control input and E is an inverted intermediate carry from B plus C plus carry_in. By controlling D and overriding A and E, the operation is selected.

The Z-80's interesting XOR gate

The Z-80 uses an unusual circuit for its XOR gate. XOR is an inconvenient function to implement since it has a worst-case Karnaugh map, making it expensive to implement from simple gates. Instead, the Z-80 uses a combination of inverters and pass transistors, different from regular NMOS logic.

As before, the diagram below shows the power and ground metal lines, a connecting metal line in yellow, the polysilicon in pink, the polysilicon transistor gates in green, and diffusion in cyan. The two inputs are A and B.

Starting with input A: if it is high, transistor 1 will connect A' to ground. Otherwise the pullup resistor (way on the left), will pull A' high. (Note that A' is the whole diffusion region between transistor 1 and transistor 3 up to the resistor.) Thus transistor 1 forms a simple inverter with inverted output A'. Likewise, transistor 2 inverts input B to give inverted B' (in the whole diffusion region between transistors 2 and 4).

Now comes the tricky part. If A' is high, pass transistor 4 will connect B' to the yellow metal. If B' is high, pass transistor 3 will connect A' to the yellow metal. The third pullup resistor will pull the yellow metal high unless something ties it to ground . Working through the combinations, if A' and B' are both high, both A' and B' are connected to the yellow metal, which gets pulled high. If A' is high and B' is low, B' is connected to the yellow metal, pulling it low. Likewise, if A' is low and B' is high, A' pulls the yellow metal low. Finally if A' and B' are low, nothing gets connected to the yellow metal, so the resistor pulls it high.

To summarize, the yellow metal is pulled high if A' and B' are both high or both low. That is, it is the exclusive-nor of A' and B', which is also the exclusive-or of A and B.

Finally, the xnor value controls transistors 5a and 5b which form an inverter. If xnor is high, transistors 5a and 5b conduct and the xor output is connected to ground, and if xnor is low, the pullup resistors pull the xor output high. One unusual feature here is the parallel transistors 5a and 5b with separate pullup resistors. I haven't seen this in the 8085 or 6502; they use a single larger transistor instead of parallel transistors.

The schematic below summarizes the circuit. In case you're wondering, this XOR gate is used to compute the parity flag. All the bits are XORed together to generate the parity flag.

Comparison to other processors

From what I've seen so far, the Z-80 uses considerably more complex gates than the 8085 and the 6502. The 6502 uses mostly simple NAND/NOR gates and only a few two-level gates, not as complex as on the Z-80. The 8085 uses more complex gates, but still less than the Z-80. I don't know if the difference is due to technical limits on the number of gate levels, or the preferences of the designers.

The XOR circuit in the Z-80 is different from the 8085 and 6502. I'm not sure it saves any transistors, but it is unusual. I've seen other pass-transistor implementations of XOR, but none like the Z-80.

Credits: The Visual 6502 team especially Chris Smith, Ed Spittles, Pavel Zima, Phil Mainwaring, and Julien Oster.

↧