The following article was printed in January 1980 of the magazine „BYTE".
An interesting article about how to relocate machine code.
But the problems involved may be solved more elegant with PRL (Page ReLocatable) files used by CP/M Plus.
|
|
Relocating 8080 System Software
John G Lipham
Dept of Physics
University of North Carolina
at Charlotte
Charlotte NC 28223
|
Owners of both large and small computer systems often experience software problems when the time comes to upgrade the system.
All old applications programs will have to be modified to run under the new system.
However, the real problem occurs when you want to use some or all of the old system software.
This was recently the situation at the University of North Carolina at Charlotte (UNCC) Physics Department.
The original hardware consisted of an IMSAI mainframe with 20 K bytes of memory interfaced with a Teletype and audio cassette.
We added a floppy disk and Tektronix 4006-2 graphics terminal.
To operate the disk, we acquired the CP/M operating system written by Digital Research and distributed by IMSAI.
The CP/M system has a disk-based version of BASIC called BASIC-E, which was written by Gordon Eubanks.
This is an excellent version that allows up to 31 characters for variable names, nearly form-free entry of statements with line numbers required only for program transfer (eg: GOTO ..., GOSUB ..., etc), and numerous built-in functions, as well as file handling capabilities.
However, it is unusual for BASIC because programs are first created using an editor, compiled into an intermediate file (using BASIC-E), and finally run (using RUN-E).
Our system is used primarily for instructional purposes and some of our students have had no previous programming experience.
Hence, we felt that it was desirable to have an interactive version of BASIC for their use.
We already had an interactive BASIC with our old system.
However, there was a catch.
To run under the CP/M system, it was necessary to shift the origin of BASIC to the address hexadecimal 0100 from its original starting address of 0000.
(The CP/M monitor uses the addresses hexadecimal 0000 thru 00FF.)
In principle, if you have an assembly-language listing and an assembler program, it is always possible to reassemble the assembly-language code to machine code with a new starting address.
However, with our old version of BASIC, this listing consisted of 113 typed pages!
Ignoring the difficulty of just entering this amount of code, a moment's reflection will show that the assembler and the code would never fit in 20 K bytes.
(The machine code itself occupies about 9 K bytes.)
Assembling the code in pieces that fit is a possibility.
But, even with a cross-reference table of variable names, this would be an excruciating process.
Hence, we were left with the only practicable alternative: relocating the machine code directly.
Thus it was with great interest that I read Leor Zolman's article in the July 1977 BYTE entitled "
A Machine Code Relocator for the 8080."
I have used the program written by Zolman and have found that it works as advertised.
However, I have oversimplified my initial statement of the problems faced in modifying our old BASIC to run under CP/M.
There were segments of the old software that had to be removed to be compatible with CP/M.
Thus, to avoid a lot of NOPs, various relocations to lower-memory addresses had to be made.
(Various additions and replacements also had to be made.)
As pointed out by Zolman, his program works by moving blocks of code tail-to-tail.
Hence, "relocating backward into lower memory fails if the difference between the source and destination address is not greater than the block length."
Also, his suggested solution to this limitation of performing two relocations was impracticable because of our memory limitations.
I found that by making some modifications I could remove the limitation in Zolman's original program at the cost of 36 additional bytes of program code.
This modified relocator program is presented in
listing 1.
As written, the program is designed to run with the CP/M system's Dynamic Debugging Tool (DDT), which is a type of monitor program for machine-code programs.
I found this to be a useful procedure, since the Dynamic Debugging Tool allows the machine's memory to be reviewed and modified via a terminal keyboard.
I also found the disassembler routine of this program to be invaluable.
(The program can be modified to run without a monitor, or with another monitor, by changing memory location hexadecimal
2DCC
.)
While there are some differences in detail as to the operation of the modified relocator, this program is run in the same manner as Zolman's original program.
For ease of reference I have retained Zolman's nomenclature.
(nb: Using this nomenclature, you view the memory as though you were looking down into a barrel.
Numerically smaller addresses are at the top and numerically larger ones are at the bottom.)
The same pieces of information are required for a relocation and reference fix, as in Zolman's program.
This required information is outlined in table 1, which, except for memory addresses, is the same as Zolman's.
Table 1:
These six pieces of information must be entered into the locations shown
(beginning at hexadecimal location 2E01 for listing 1)
before a relocation can be performed with the modified relocator program.
Label | Number of Bytes | Address | Comments | |
a | 2 | 2E01, 2E02 | First address of block to be relocated |
|
b | 2 | 2E03, 2E04 | Last address of block to be relocated |
|
c | 2 | 2E05, 2E06 | Destination address |
|
d | 2 | 2E07, 2E08 | First address to have references fixed |
|
e | 2 | 2E09, 2E0A | Last address to have references fixed |
| 00 = fix references only |
f | 1 | 2E0B | Function select: | 01 = move block and fix references |
| 02 = move block only |
|
|
The LXI Problem
As pointed out by Zolman, the load immediate (LXI) instruction is a potential source of problems in relocating machine-code programs.
The main difficulty is that this instruction is frequently used for two different jobs:
to load a constant into a register pair, and to load an address into a register pair.
The relocator program cannot distinguish between these two uses.
Hence, if a program constant happens to be equal to an address within the program block being moved, an erroneous reference fix will be made.
Unfortunately there seem to be no widely accepted conventions for the use of this instruction that produce easily relocatable machine code.
Adoption of the following conventions is suggested for all those desiring to write relocatable code:
- The LXI instruction shall be used only to load addresses into a register pair (eg:
LXI H,3101H
).
- All program constants shall be loaded into a register pair using 2 move immediate instructions (eg:
MVI H,31H MVI L,01H
).
The cost of adopting these conventions is relatively modest in that it will take 4 bytes to load a 2-byte constant into a register pair, instead of the 3 bytes required using the LXI instruction.
Furthermore, if you only want to zero out a register pair, the following sequence of instructions achieves the same result at no additional cost, without using the LXI instruction:
Hexadecimal
Code Operator Operand
AF XRA A
67 MOV H,A
6F MOV L,A
Meanwhile, back in the real world, the LXI problem will usually be encountered by anyone relocating software.
Going through a massive assembly listing and manually fixing references would be a tedious and time-consuming chore.
Fortunately, the computer can be used to do the "grit" work.
A program that enables the computer to look through the machine code for LXI Operation codes is presented in
listing 2.
I have called it FIXLXI, though corrections must still be made manually.
However, the computer does the tedious job of finding LXI operation codes.
Upon finding an LXI Operation code, the computer outputs the address where the instruction is located, followed by the operation code (eg:
01
for
LXI B
), and finally the 2-byte hexadecimal constant that is loaded into the register pair.
For compatibility with the terminal, all output is in the form of ASCII code. (The conversion from binary to ASCII is done by a simple table look-up procedure.)
For example, upon finding the machine code equivalent of LXI D,21AEH
at hexadecimal address 24AB
, the program will cause output of the following:
24AB 11 21AE.
Afterward, control passes to the monitor and the operator consults the listing to verify that the code is correct.
If not, a manual fix must be performed.
(Using CP/M's Dynamic Debugging Tool program as a monitor makes this an easy task. Simply typing in S24AB
to the terminal invokes a routine that displays both the memory address (24AB
) and the memory contents (01
) of the designated location.
It then waits for a change to be entered, a command to look at the next memory location, or a command to quit.)
When the program is reentered, the search for LXI operation codes resumes at the next operation code following the previously found LXI operation code.
To operate FIXLXI you need only specify the starting address (SSTAR) and the ending address (SSTP) of the code to be examined.
In
listing 2 this information is entered at hexadecimal addresses
2E42
and
2E44
.
Employing the FIXLXI program with a terminal operating at a data rate of 1200 bits per second (bps), I found that I could get through our BASIC listing in less than two hours.
Similar results were obtained when I relocated another old assembler program.
Data Block Problems
It is not good programming practice to place program constants in the midst of executable code.
Unfortunately, this and other kludges are frequently found.
However, you will find in most cases that the program constants are at least huddled together in a contiguous block.
If this is true, the data block can be moved, but no fixing of references should be performed within the data block.
As indicated by Zolman, the procedure in this case is to perform the fixing of references in two stages.
First, program references are fixed in the program block up to, but not including, the data block.
Then, skipping over the data block, program references after the data block are fixed for the remaining portion of the program block.
In addition to the usual data block problems that have been mentioned, there is another difficulty encountered when systems software is relocated.
The data blocks in an applications program will normally contain constants that are independent of the location of the program.
In a systems software program like BASIC this is not true for all constants.
This is so because of the design logic of an interpreter program.
Essentially the interpreter works by comparing an input command or function to a table of legal commands or functions.
If a match is found, control is passed to that routine within the BASIC code.
This procedure is frequently implemented by storing the address of the desired routine immediately adjacent to the command (function).
(Actually, since commands (functions) are not all the same length (RESTORE is larger than FOR), it is common practice to place a delimiter, such as 0, immediately after the command (function).
The address of the proper routine then follows.)
Thus, after the system software has been relocated and program references fixed, the command and function table addresses must also be fixed.
These areas will usually be clearly indicated in the program listing.
Also, since the data that must be changed is reasonably small, a manual fix can be readily performed.
The success of this process is dependent upon your knowing the new addresses of the command (function) routines.
Consequently, if a number of shifts and/or additions must be made, I would strongly suggest that changes and fixes be made one at a time.
While this procedure requires more work, it is preferable to making all changes at once, since it is easy to lose track of where everything is located.
Caveat Emptor
After carefully implementing the programs and following the procedures that have been outlined, you may still find that your relocated software has glitches.
Excluding pilot error, the source of any problems can logically be only an improper reference fix.
While there may be many ways for this to happen, I have found only two species of software bugs that create this problem.
The first, and potentially least troublesome, bug occurs when an isolated byte or two of data is buried in the middle of executable code.
With this particular gem I also found a call to a subroutine whose sole function was to implement a jump over the data!
(I'm not making this up.
I really did find this super kludge.)
If you are extremely lucky, the isolated byte(s) will not just happen to be the same as one of the 2- or 3-byte operation codes.
In this case the relocator program will assume incorrectly that it is a 1-byte operation and correctly continue to search for 3-byte operation codes.
In the more usual case, the isolated byte will be identical to a 2- or 3-byte operation code.
Then the possibility exists not only for an improper reference fix, but also for a mangling of the operation code(s).
Fortunately, this mangling process is generally not self-propagating, so the damage is usually localized.
The second, and potentially most troublesome, bug involves the writing of relocatable code.
While it may come as a surprise, yes there is such a thing as nonrelocatable code.
To see that this is so, recall that the relocator program fixes references by operating only on the 2-byte hexadecimal constant following 3-byte operation codes.
Implicit in this procedure is the logical assumption that all references to program addresses will be made via 3-byte operation codes.
Certainly this is the easiest and most natural way to handle addresses.
However, it is possible to use the 1- and 2-byte operation codes to manipulate addresses.
As a case study of this particular "buggy" (and bugging) practice, I submit the verbatim example from a listing of an assembler program in listing 3.
Hexadecimal Instruction
Address Code Mnemonic Operand Commentary
BDC4 3E F0 MVI A,ABUFF and 0FFH ;LOAD LOW BUFFER ADDRESS
BDC6 80 ADD B ;ADD LENGTH OF OP CODE
BDC7 5F MOV E,A
BDC8 3E D4 MVI A,(ABUFF and 0FF00H)/256
BDCA CE 00 ACI 0 ;GET HIGH ORDER ADDRESS
BDCC 57 MOV D,A
BDCD 1A LDAX D ;FETCH CHARACTER AFTER OP CODE
|
Listing 3:
An example of poor programming practice.
In this example, the programmer has loaded the DE register pair without disturbing the HL register pair.
However, because the reference to the address hexadecimal D4F0 is done via 1 and 2-byte op code, this machine is not machine relocatable.
|
In this example the programmer needed to load the character following an operation code into the DE register pair without disturbing the HL register pair.
Without a detailed knowledge of other program constraints, it is difficult to specify a foolproof fix for this code.
Assuming no stack problems, appropriate substitutions are suggested in listing 4.
Hexadecimal Instruction
Address Code Mnemonic Operand Commentary
BDC4 11 F0 D4 LXI D,D4F0 ;LOAD BUFFER ADDRESS
BDC7 E5 PUSH H ;SAVE H,L PAIR
BDC8 68 MOV L,B ;GET LENGTH OP CODE
BDC9 26 00 MVI H,00 ;PAD WITH ZEROS
BDCB 19 DAD D ;ADD LENGTH TO BUFFER
BDCC EB XCHG ;PUT RESULTS IN D,E
BDCD E1 POP H ;GET H,L BACK
BDCE 1A LDAX D ;FETCH CHARACTER AFTER OP CODE
|
Listing 4:
Another method of performing the operation shown in listing 3.
Here the reference to the address hexadecimal D4F0 is done using the LXI op code.
This code is machine relocatable.
|
Those wishing to write relocatable code will avoid use of the programming practice illustrated in listing 3.
(This is not an onerous requirement, since code that violates this convention tends to be tortured and unnatural.)
Those who, for proprietary or other reasons, wish to write nonrelocatable code will liberally sprinkle their code with such examples.
What To Do
At this point it is reasonable to ask what can be done if you encounter one of the exotic bugs I have discussed.
Unfortunately, there is no quick fix that is generally applicable.
However, the following guidelines and suggestions may be helpful.
First, the source of the bug needs to be isolated to an area less than the size of the whole program.
To do this, study the actual operation of the program.
For which commands or functions does the program fail?
After this bit of detective work, examination of the command or function table of your listing will tell you where to begin looking for the bug(s).
If nothing turns up at this point, the bug may be in a subroutine called by the command (function) routine.
Even worse, it may be in a subroutine called by the subroutine, etc.
Finally, if all else fails, it will be necessary to perform a step-by-step trace of the operation of the program.
At best this is a tedious process.
If, however, you have isolated the bug, it is possible to set up a breakpoint that is activated only upon entry to the program segment that is suspect.
(A breakpoint works by causing program control to pass to the monitor when the breakpoint is encountered.
Before the breakpoint is activated, program execution is performed at normal machine speed.)
With the Dynamic Debugging Tool program of the CP/M system distributed by IMSAI, a single breakpoint can be set by temporarily replacing a byte of the suspect software with the RST 07
instruction (FF
in machine code).
After the monitor has control, you can use it to generate a detailed trace of the program's operation for the suspect area.
After this recounting of the perils of relocating systems software, I hope that the reader is not totally discouraged.
For well-designed software, relocation can be easily managed using the relocator and FIXLXI programs.
About the Author
John Lipham's first contact with programming was as a graduate student in physics at the University of North Carolina at Chapel Hill.
There he discovered PL/1 and assembly language for the IBM 360.
Recently he has been working with a colleague on a project using an IMSAI microcomputer system.
The goal of the project is to develop software to aid in teaching physics, and to interface the system to scientific instruments for research purposes.
|
Scanned by
Werner Cirsovius
September 2002
© BYTE Publications Inc.