The following article was printed in issue 55, May 1981 of the magazine „Dr. Dobb's Journal of Computer & Orthodontia".
The author demonstrates the limits of the Z80 processor if its instructions will be used ingenuous.
|
|
Using (and Misusing) the Z-80 Microprocessor
by Ray Duncan
Ray Duncan, Laboratory Microsystems, 4147 Beethoven Street, Los Angeles, CA 90066.
Ever since its introduction, the Zilog Z-80 microprocessor has had an aura of glamor and power compared to the workhorse Intel 8080 or 8085.
This reputation is justified to some extent by the additional instructions, registers, addressing modes, and vectored interrupt capabilities of the Z-80.
However, unless the additional Z-80 instructions are employed very carefully, they may cause a definite decrease in performance compared to using the 8080 instruction set alone (assuming the same machine speed, of course).
In particular, indiscriminate use of the index registers can lead to severe speed and memory requirement penalties.
In this brief communication I will try to point out some "good" and "bad" examples of Z-80 coding.
Zilog mnemonics will be used throughout for clarity.
All execution times assume a 4 mHz system clock, and are obtained from the specifications published by Zilog, Inc.
Observe the following common example of an "indirect" 16-bit fetch from memory to register BC, using the contents of a named variable for the target address:
8080 Version | bytes | microseconds |
LD HL,(label) | 3 | 4.00 |
LD C,(HL) | 1 | 1.75 |
INC HL | 1 | 1.50 |
LD B,(HL) | 1 | 1.75 |
| 6 | 9.00 |
|
Z-80 Version | bytes | microseconds |
LD IX,(label) | 4 | 5.00 |
LD C,(IX) | 3 | 4.75 |
LD B,(IX+1) | 3 | 4.75 |
| 10 | 14.50 |
The Z-80 oriented code looks much neater, but causes an approximately 60% increase in both execution time and memory requirements.
How about an indirect increment of a counter in memory?
8080 Version | bytes | microseconds |
INC (HL) | 1 | 2.75 |
|
Z-80 Version | bytes | microseconds |
INC (IX) or (IY) | 3 | 5.75 |
Using the index in this case causes more than doubling of executing time and memory requirements!
Your use of the index registers also causes penalties when subroutines must save and restore them:
8080 Version | bytes | microseconds |
PUSH BC or DE or HL | 1 | 2.75 |
POP BC or DE or HL | 1 | 2.50 |
|
Z-80 Version | bytes | microseconds |
PUSH (IX) or (IY) | 2 | 3.75 |
POP (IX) or (IY) | 2 | 3.50 |
One would imagine that the index registers would be well suited to table translations by incrementing the index to compare bytes through one table until a match was found, then exploiting the indexed offset addressing mode to extract a corresponding byte from another table.
Yet in the simpler cases at least, the overhead of the index registers again outweighs the cleanness of the assembler code.
For example:
8080 Version | bytes | microseconds |
CP A,(HL) | 1 | 1.75 |
JP NZ,nomatch | 3 | 2.50 |
LD DE,offset | 3 | 2.50 |
ADD HL,DE | 1 | 2.75 |
LD A,(HL) | 1 | 1.75 |
| 9 | 11.75 |
|
Z-80 Version | bytes | microseconds |
CP A,(IX) | 3 | 4.75 |
JP NZ,nomatch | 3 | 2.50 |
LD A,(IX+offset) | 3 | 4.75 |
| 9 | 12.00 |
What about the relative jump (JR)?
Many programmers new to the Z-80 observe that the unconditional JR instruction occupies only two bytes compared to the three required by the absolute jump (JP), and quickly set about editing all their tight little time-dependent loops to incorporate this neat looking mnemonic.
Surprise - the JR instruction takes 3 microseconds to execute, compared to 2.50 microseconds for JP, an increase of about 20% in execution time.
Now that we have excellent relocating assemblers available, use of unconditional JR should probably be avoided.
On the other hand, the conditional relative jump gives a slight advantage in execution time (1.75 microseconds false case, 3.00 microseconds true case) over the conditional absolute jump (always 2.50 microseconds), as long as the branch condition is not met over 50% of the time.
Naturally, the Z-80 would never have become so popular unless there were significant hardware and software advantages to its use.
Careful study of the code set reveals a number of instances where use of the Z-80 instructions and registers can yield striking improvements in execution time and memory requirements.
Consider an entry to a subroutine where all registers will be needed for working storage yet the main program requires that the contents of all registers be unchanged at the end of the call.
The 8080 requires that everything be pushed onto the stack and later restored, but the Z-80 allows you to swap with the alternate register set for a dramatic increase in speed:
8080 Version | bytes | microseconds |
PUSH BC | 1 | 2.75 |
PUSH DE | 1 | 2.75 |
PUSH HL | 1 | 2.75 |
(subroutine body) |
POP HL | 1 | 2.50 |
POP DE | 1 | 2.50 |
POP BC | 1 | 2.50 |
| 6 | 15.75 |
|
Z-80 Version | bytes | microseconds |
EXX | 1 | 1.00 |
(subroutine body) |
EXX | 1 | 1.00 |
| 2 | 2.00 |
The augmented sixteen-bit arithmetic capabilities of the Z-80 are a godsend both for the sake of performance and clarity of program code.
For example, subtracting DE from HL, leaving the result in HL:
8080 Version | bytes | microseconds |
LD A,L | 1 | 1.00 |
SUB A,E | 1 | 1.00 |
LD L,A | 1 | 1.00 |
LD A,H | 1 | 1.00 |
SBC A,D | 1 | 1.00 |
LD H,A | 1 | 1.00 |
| 6 | 6.00 |
|
Z-80 Version | bytes | microseconds |
SBC HL,DE | 2 | 3.75 |
(This assumes that the status of the carry flag is known, otherwise it must be turned off with a preceding OR A,A
instruction which wastes some time.)
The additional direct register load and store instructions of the Z-80 are both handy and speedy.
Contrast the task of getting a 16-byte quantity from memory into the BC register:
8080 Version | bytes | microseconds |
PUSH HL | 1 | 2.75 |
LD HL,(label) | 3 | 4.00 |
LD B,H | 1 | 1.00 |
LD C,L | 1 | 1.00 |
POP HL | 1 | 2.50 |
| 7 | 11.25 |
|
Z-80 Version | bytes | microseconds |
LD BC,(label) | 4 | 5.00 |
The Z-80 has a string move instruction, as everyone knows, which is practically worth the price of the chip itself.
If one sets up HL with the source address, DE with the destination address, and BC with the number of bytes to transfer:
8080 Version | bytes | microseconds |
LOOP: | LD A,(HL) | 1 | 1.75 |
| LD (DE),A | 1 | 1.75 |
| INC HL | 1 | 1.50 |
| INC DE | 1 | 1.50 |
| DEC BC | 1 | 1.50 |
| LD A,B | 1 | 1.00 |
| OR A,C | 1 | 1.00 |
| JP NZ,LOOP | 3 | 2.50 |
| 10 | 12.50 per loop |
|
Z-80 Version | bytes | microseconds |
LDIR | 2 | 5.25 per byte transferred |
The corresponding compare with auto-increment instructions is more difficult to use effectively, but can be exploited with good results in table lookup routines.
The point of this little demonstration was that the Z-80 is indeed a powerful processor, but the proper use of its many added capabilities in a time-dependent application requires an intimate knowledge of the instruction set at the machine level.
The very elegant Zilog mnemonics, which tremendously improve the understandability of an assembly listing compared to the Intel mnemonics, blur the machine-level code distinctions for a new programmer, since they make all register-memory and register-register operations appear very similar.
I hope that other readers will be provoked into contributing other examples of "bad" and "good" uses of the Z-80.
Scanned by
Werner Cirsovius
September 2002
© Dr. Dobb's Journal