The following article was printed in May 1979 of the magazine „BYTE".

A very good article handling asynchronous events within a „polled" system such as CP/M. It may be used for applications using MODEM based programs.

W D Maurer
University Library Bldg
Boom 634
George Washington University
Washington DC 20052

Simultaneous Input and Output for Your 8080

The process of I/O (input/output) in assembly language on a typical microcomputer system is rather crude. You input the status register and perform a logical AND with a mask consisting of one bit. If the result is not zero, you know the bit was on and the I/O device was therefore ready. In that case, you either input or output the data register, as appropriate. Otherwise, you loop back to input the status register again. On the 8080, it goes like this:

           Input

ILOOP:  IN      ISTAT
        ANI     IREADY
        JZ      ILOOP
        IN      IDATA

           Output

OLOOP:  IN      OSTAT
        ANI     OREADY
        JZ      OLOOP
        OUT     ODATA

where the quantities ISTAT, IDATA, OSTAT, ODATA, IREADY, and OREADY are what is called, in the world of big computers, "installation-dependent" (that is, they differ from one person's 8080 to another). The first four of these might be given by:

        ISTAT   EQU     3
        IDATA   EQU     2
        OSTAT   EQU     3
        ODATA   EQU     2

describing a single channel for both input and output involving two ports, with port numbers 3 and 2. The other two might be given as:

        IREADY  EQU     1
        OREADY  EQU     2

to denote that the rightmost bit of the status register is the input-ready flag and the second bit from the right in this register is the output-ready flag. (Your dealer must supply you with these values, or show you how to find what they are, when you buy your system.) You can also make these into subroutines by adding a return as follows:

INPUT:  IN      ISTAT
        ANI     IREADY
        JZ      INPUT
        IN      IDATA
        RET

OUTPUT: IN      OSTAT
        ANI	OREADY
        JZ      OUTPUT
        OUT     ODATA
        RET

This allows you to CALL INPUT to bring a newly input character into register A, or to CALL OUTPUT whenever you have a new character in register A that you want to put out.

The trouble with this kind of I/O is that it is not simultaneous. When you are doing input, that is all you are doing; when you are doing output, that is all you are doing. Meanwhile, your system is sitting uselessly in a loop, which it is performing several thousands of times, or sometimes (particularly in the case of input) several millions of times. What you need in order to increase the efficiency of your system, if you have 190 bytes of read only memory and 65 bytes of programmable memory to spare, is a simultaneous I/O package which allows you to do input, processing, and output, all at the same time.

The basic idea of simultaneous I/O is that of the queue. Any queue can be considered by analogy to a waiting line for a bus. (The story, told to this author second or third hand, is that in England people line up for buses in lines that look like spirals or, more informally, like the tail of a pig - a shape that is in turn called queue in French, presumably because it looks vaguely like the letter Q.) Consider the characters waiting for the bus as ASCII characters, rather than as local town characters, and consider the bus not as a bus in the technical sense, but (for output) as the actual output device - the teletypewriter video display terminal, Selectric terminal, or whatever. When your routine wants to output a character, this character goes on the end of the queue. It then has to wait for a while until the characters in front of it, which were entered earlier, get on the bus - that is, until they are actually output - before it can be output.

The analogy with the bus is not a perfect one, because a real bus, when it comes along, takes everybody waiting for it all at once. A waiting line in a supermarket at the checkout counter would be a better analogy, because characters, like shoppers, leave the queue one at a time, as well as entering it one at a time.

For input, there is another queue, but this time the input device feeds new characters onto the end of the waiting line, and they come off the front - that is, board the bus - when they are actually used by the program which is asking for input. Several characters might be typed before they were actually used by the program, presumably because it is doing something else, such as a long computation. For output, the use of the queue is more common, because programs typically produce output characters much faster than they can actually be put out; these characters enter the queue and are then output from it, one at a time, while the computer goes on to whatever it has to do next.

Before we discuss how a queue like this is actually implemented, let us digress a bit and answer one fundamental question: how are we to handle three programs going simultaneously - an input program, an output program, and something else which is reading input and writing output? There are two ways, one being the use of interrupts, the other making use of a technique called polling. We shall use polling, mainly because it does not require any special hardware (not all 8080 systems have a priority interrupt control unit) and also allows the user who might not have written his own monitor to use simultaneous I/O without interfering with any interrupt conventions which his monitor might have established.

Polling, in this case, assumes that the functions of watching the input device and the output device to see if they are ready, and taking appropriate action when they are ready, are subroutines of the user's program. We shall call them IPOLL and OPOLL. They are not to be confused with the ordinary I/O subroutines which supply input to the user's program and accept output from it; we shall call these IP and OP. To summarize the functions of our four routines:

IP is called when the user's program wants an input character, and IP returns with that character in register A.

IP:	PUSH	H	; SAVE HL REGISTER
	LHLD	FIQ	; FRONT OF INPUT Q TO HL
	LDA	EIQ	; END OF INPUT. Q (LO) TO A
	CMP	L	; COMPARE FIQ(LO):EIQ(LO)
	JNZ	IP3	; IF UNEQUAL, Q NONEMPTY
IP2:	CALL	OPOLL	; Q EMPTY. TIGHT LOCP
	CALL	IPOLL	; (KEEP POLLING I AND O)
	JNC	IP2	; (UNTIL IN CHAR. RECEIVED)
IP3:	MOV	A,M	; FIRST IN Q CHAR. TO A
	PUSH	PSW	; SAVE THIS CHARACTER
	INX	H	; UPDATE FRONT OF INPUT Q
	MVI	A,TIQ	; WRAPAROUND TEST (COMPARE
	CMP	L	;  FIQ(LO) AND TOP OF IN Q
	JNZ	IP4	;  (LO) --  IF =, RESET TO
	MVI	L,BIQ	;  BOTTOM OF IN Q (LO)
IP4:	SHLD	FIQ	;  PUT FIQ BACK IN MEMORY
	POP	PSW	; RESTORE INPUT CHARACTER
	POP	H	; RESTORE HL REGISTER
	RET		; OUT OF THIS ROUTINE

Listing 1: Subroutine IP, written in 8080 assembler language and called when the user's program wants an input character. IP returns that character in the A register.

OP is called when the user's program has a character to be output, and this character must be in register A when OP is called.

OP:	PUSH	PSW	; SAVE A-REGISTER
	PUSH	H	; SAVE  HL-REGISTER
	LHLD	EOQ	; END OF OUTPUT Q
	MOV	M,A	; PUT CHAR. ON END OF Q
	INX	H	; UPDATE END OF OUTPUT Q
	MVI	A,TOQ	; WRAPAROUND TEST (COMPARE
	CMP	L	;  EOQ(LO) AND TOP OF OUT Q
	JNZ	OP2	;  (LO) -- IF =, RESET TO
	MVI	L,BOQ	;  BOTTOM OF OUT Q (LO))
OP2:	LDA	FOQ	; FRONT OF OUTPUT Q (LO)
	CMP	L	;  TO A -- IF = EOQ (LO)
	JNZ	OP4	;  AFTER INCR., Q FULL
OP3:	CALL	IPOLL	; Q FULL. TIGHT LOOP
	CALL	OPOLL	; (KEEP POLLING I AND O)
	JNC	OP3	; (UNTIL SMALLER OUT Q)
OP4:	SHLD	EOQ	; PUT EOQ BACK IN MEMORT
	CALL	OPOLL	; MAKE SURE OPOLL AND IPOLL
	CALL	IPOLL	;  ARE CALLED AT LEAST ONCE
	POP	H	; RESTORE HL-REGISTER
	POP	PSW	; RESTORE A-REGISTER
	RET		; OUT OF THIS ROUTINE

Listing 2: Subroutine OP, called when the user's program has a character to be output. This character must be in the A register when OP is called.

IPOLL is called every so often (in a sense to be described more precisely below) to check whether the user has keyed in a new character that has to be placed on the end of the input queue.

IPOLL:	IN	ISTAT	; GET STATUS BITS (IN)
	ANI	IREADY	; READY BIT ZERO MEANS
	RZ		;  NOTHING TYPED - OUT
	PUSH	H	; SOMETHING TYPED - SAVE
	IN	IDATA	;  HL REG. AND INPUT IT
	LHLD	EIQ	; END OF INPUT Q TO HL
	MOV	M,A	; PUT CHAR. ON END OF Q
	INX	H	; UPDATE END OF INPUT Q
	MVI	A,TIQ	; WRAPAROUND TEST (COMPARE
	CMP	L	;  EIQ(LO) AND TOP OF IN Q
	JNZ	IPOLL2	;  (LO) -- IF  =, RESET TO
	MVI	L,BIQ	;  BOTTOM OF IN Q (LO))
IPOLL2:	LDA	FIQ	; FRONT OF INPUT Q (LO)
	SUB	L	;  TO A -- IF = EIQ (LO)
	JZ	IPOLL3	;  AFTER INCR., Q FULL
	SHLD	EIQ	; NOT FULL. RESTORE EIQ
IPOLL3:	JNC	IPOLL4	; IF FIQ-EIQ IS NEGATIVE,
	ADI	LIQ	;  ADD SIZE OF INPUT Q
IPOLL4:	CPI	IFUDGE	; TEST IN Q WITHIN FUDGE
	JNC	IPOLL7	;  FACTOR (7) OF BEING
	LXI	H,IAC	;  FULL. IF SO, BUMP INPUT
	INR	M	;  ALARM COUNTER BY 1
IPOLL7:	POP	H	; RESTORE HL REGISTER
	STC		; SET CARRY (CHAR. THERE)
	RET		; OUT OF THIS ROUTINE

Listing 3: Subroutine IPOLL, called periodically to check whether the user has keyed in a new character that has to be placed at the end of the input queue.

OPOLL is called every so often to check whether the output device has completed its processing of the previous character to be output; if it has, the next one is sent out.

OPOLL:	IN	OSTAT	; GET STATUS BITS (OUT)
	ANI	OREADY	; READY BIT ZERO MEANS
	RZ		;  PORT STILL BUSY - OUT
	LDA	IAC	; GET INPUT ALARM COUNTER
	DCR	A	; AND DECREASE IT BY 1
	JM	OPOLL1	; IF WAS ZERO, NO ALARM
	STA	IAC	; STORE DECREASED VALUE
	MVI	A,CTRLG	; CONTROL-G (BELL) TO A
	OUT	ODATA	; OUTPUT (TYPING TOO FAST,
	RET		;  ALARM AND EXIT
OPOLL1:	PUSH	H	; SAVE HL REGISTER
	LHLD	FOQ	; FRONT OF OUTPUT Q TO HL
	LDA	EOQ	; END OF OUT Q (LO) TO A
	CMP	L	; COMPARE FOQ (LO):EOQ (LO)
	JZ	OPOLL7	; IF EQUAL, NOTHING IN Q
	MOV	A,M	; GET FIRST THING IN Q
	OUT	ODATA	;  AND PUT IT OUT
	INX	H	; UPDATE FRONT OF OUTPUT Q
	MVI	A,TOQ	; WRAPAROUND TEST (COMPARE
	CMP	L	;  FOQ (LO) AND TOP OF OUT
	JNZ	OPOLL5	;  Q(LO) -- IF  =, RESET TO
	MVI	L,BOQ	;  BOTTOM OF OUT Q (LO))
OPOLL5:	SHLD	FOQ	; PUT FOQ BACK IN MEMORY
OPOLL7:	POP	H	; RESTORE HL REGISTER
	STC		; SET CARRY (WORK DONE)
	RET		; OUT OF THIS ROUTINE

Listing 4: Subroutine OPOLL, called periodlcally to check whether the output device has completed its processing of the previous character to be output. If it has, the next character is sent out.

IPOLL and OPOLL are called both from IP and OP and from the user's program. When they are called from IP and OP, they employ an additional feature, not discussed above. IPOLL returns with the carry set if a new character is placed on the input queue, and clear otherwise. OPOLL returns with the carry set if a new character was removed from the output queue and put out, and clear otherwise. This information is used by IP and OP, but it is not needed by the user program. In fact, for the user program, there is no need to distinguish between the functions of calling IPOLL and calling OPOLL. It is enough to have a single subroutine, POLL, whose only function is to call IPOLL and OPOLL and then return; the subroutine POLL can then be called by the user program.

How often must the user program call the subroutine POLL? The answer is that the user program must be so organized that there is never a significant amount of real time during which POLL is not called. (How to ensure this will be described below.) The reason, of course, is that if this is not so, we could have the bad luck to push an input key during such a period of real time, and then, since POLL was not called, that input character will never be placed on the input queue and will therefore never be seen by the user's program. (Remember Murphy's law: if anything can go wrong, it will.)

On output, the situation is not that bad, but if there were a significant amount of time during which POLL was not called, the output device would effectively be stopped during that period of time. If this were a recurrent phenomenon, you would see the output device starting and stopping in jerks, like a car that loses power.

The easiest way to call POLL often enough from the user's program is to call POLL once in every loop and at least once in every subroutine. (If there is a subroutine call instruction in a loop, we do not need to call POLL explicitly in that loop, since POLL will be called by the called subroutine.) Or, for a more explicitly stated method, call POLL just before every return instruction and at every labeled instruction to which there is a backward jump. (That is, if the label is ALPHA, then somewhere later in the program there must be a jump to ALPHA.) This insures that POLL will be called often enough. [In a system with a real time dock, calling POLL from the interrupt handler for the clock every few milliseconds will accomplish the same end.... CH]

We now discuss the way in which we implement a queue in memory, namely as a "wraparound array." We start with an array IQ (input queue) of characters, together with two 16 bit pointers, or variables whose values are addresses, called FIQ (front of input queue) and EIQ (end of input queue). Figure 1 shows a typical configuration of the input queue.

Figure 1: The "wrap-around" queue. The queue is a method for storing data in the form of a list: the first item into the list becomes the first out of the list, in the same manner as a waiting line of people at a supermarket checkout counter. Figure 1a shows the data for an input queue in memory with two pointers, FIQ (front of input queue) and EIQ (end of input queue). When an item is added to the end of the queue, EIQ is incremented by 1. In removing an item from the queue, FIQ is incremented by one. Note that the queue is "upside-down" here; that is, the end of the queue is on top. When the top of the array in memory is reached, EIQ is altered so it points to the bottom of the array, thus "wrapping" the queue around the array as in figure 1b. Notice also that pointer EIQ points to the location that is one beyond the end of the queue. This enables the program to detect an empty or full array when EIQ = FIQ.

The shaded area shows the characters that are actually in the queue; the unshaded area shows the rest of the array in memory. To take a character off the front of the queue, assuming that FIQ is in register pair HL (which we can bring about by doing LHLD FIQ), we get the character to which FIQ points (by doing MOV A,M) and then increase FIQ by one (by doing INX H). To put a character on the end of the queue, assuming that EIQ is in the HL register pair (by means of LHLD EIQ), we move it to memory at the place where EIQ points (by doing MOV M,A - assuming that the new character is in the A register) and then increase EIQ by one (by doing INX H). Note that, in a sense, the queue is "upside-down" - the end of the queue is on top. If it were "right-side-up" we would have to decrease FIQ and EIQ by one in the above processes (by doing DCX H), rather than increasing them by one. Of course, after either decreasing or increasing, we must put FIQ (or, respectively, EIQ) back in memory (by doing SHLD FIQ or SHLD EIQ).

Of course, we cannot keep increasing FIQ and EIQ forever. Eventually, in figure 1a, EIQ will get to the top of the array in memory. When this happens, we alter it to point to the bottom of this array (this is the "wraparound" feature). After a while, the situation looks like figure 1b. Here again, the shaded area represents the characters actually in the queue. The first one is where FIQ points, the next one is right above that and so on up to the top of the array; then we start at the bottom of the array, and so on up to where EIQ points. We are treating the array as if it were cyclical, and, in fact, on big computers, this setup is often known as a "circular array" or a "ring buffer."

We note that FIQ points to the first character in the queue, but EIQ does not point to the last character in the queue - it points to the position one beyond the last character. To see why this is so, suppose the queue has exactly one character in it. We do not want FIQ and EIQ to be the same, because we want that to happen only when the queue is empty - when there are no characters in it - or else when it is entirely full (since these are the two cases in which special action has to be taken). By adopting the convention illustrated in figure 1, both of these conditions can be sensed by testing for FIQ = EIQ. Of course, the entire setup of figure 1 has to be duplicated for the output queue OQ and its two associated pointers FOQ and EOQ.

Let us make the simplifying assumption that each queue is entirely within one 256 byte page (from hexadecimal addresses xx00 through xxFF for some hexadecimal value of xx). This means that we can compare register pair HL with the address of the top of a queue by simply comparing register L with the low-order eight bits of this address. On equality, we set register L only (register H does not change) to the low-order eight bits of the address of the bottom of the queue. Here the top and the bottom refer to the array in memory, and are distinct from the front and the end as discussed above.

What happens when our queues get full? First of all, let us discuss how big we want the queues to be. The two queues and the four addresses FIQ, EIQ, FOQ, and EOQ must of course be in programmable memory, while the four routines IP, OP, IPOLL, and OPOLL can be in read only memory. So to a certain extent it depends on how much programmable memory is available in your system. An input queue of n characters allows you to type n characters ahead of where the program is at any given moment; an output queue of n characters allows your program to put out n more characters than have actually been output yet by the output device at any given moment. While the device is outputting these n characters, your system can be doing something else simultaneously. There is no reason for the input and the output queues to be the same size, and in a typical application you might be using 10 characters in the input queue and 55 characters in the output queue. A bit of experimentation here will satisfy you as to what is comfortable for your application.

When the output queue gets full, it means that the capacity of the queue for temporarily saving output characters has been used up. In that case we simply go back to what we used to do before we had simultaneous I/O - that is, wait for a character to be actually put out before we do anything else. Whenever the user's program puts a new character into the output queue, we perform our incrementation, as discussed above, and then check to see if the output queue is full (FOQ = EOQ). In that case, we go into a loop, calling IPOLL and OPOLL until OPOLL returns with the carry set. This indicates that OPOLL sensed output ready and put out a character - an operation that reduces the size of the output queue. The result is that, when we enter the output routine OP, the output queue will never be full, and, if FOQ = EOQ, we know that the output queue is not full but empty.

When the input queue becomes full, we are typing too fast. Any further characters which we type will not be read by the user's program. The only thing we can do in this case is to give the user a warning that this has happened, so that he will retype the characters involved. Fortunately we can do this easily, with most output devices, by putting out a control-G (hexadecimal 07, or on some output devices 87) which will either ring a bell or put out a high-pitched beep. A variation on this system, which we use, involves putting out the control-G when the output queue is almost full (let us say seven or fewer spaces remaining) so that the last few characters do not have to be retyped; the user simply stops typing for awhile and waits for a decent interval.

A minor technical point: We cannot sound the bell simply by calling OP. Recall that calling OP simply puts a character on the output queue; it may be a second or longer before that character is actually put out. When we type a character that has to be retyped, however, we need an immediate indication of this fact. We therefore use a single-byte input alarm counter IAC which is normally zero. To specify a bell, as above, we simply increment IAC by one, and then OPOLL checks IAC before it does anything else (if the output device is ready) and outputs a bell if IAC does not equal 0, decrementing IAC by one as it does so.

The complete code for IP, OP, IPOLL, and OPOLL is given in listings 1 through 4, with the data definitions given in listing 5 and the initialization given in listing 6.

FIQ:	DS	2		; FRONT OF INPUT Q (2 BYTES)
EIQ:	DS	2		; END OF INPUT Q (2 BYTES)
FOQ:	DS	2		; FRONT OF OUTPUT Q (2 BYTES)
EOQ:	DS	2		; END OF OUTPUT Q (2 BYTES)
IAC:	DS	1		; INPUT ALARM COUNTER (1 BYTE)
LIQ	EQU	36		; LENGTH OF INPUT Q
LOQ	EQU	36		; LENGTH OF OUTPUT Q
IQ:	DS	LIQ		; INPUT Q (SINGLE PAGE)
OQ:	DS	LOQ		; OUTPUT Q (SINGLE PAGE)
BIQ	EQU	IQ MOD 256	; BOTTOM OF INPUT Q (LO)
BOQ	EQU	OQ MOD 256	; BOTTOM OF OUTPUT Q (LO)
TIQ	EQU	BIQ+LIQ		; TOP OF INPUT Q (LO)
TOQ	EQU	BOQ+LOQ		; TOP OF OUTPUT Q (LO)
ISTAT	EQU	3		; INPUT STATUS PORT
OSTAT	EQU	3		; OUTPUT STATUS PORT
IDATA	EQU	2		; INPUT DATA PORT
ODATA	EQU	2		; OUTPUT DATA PORT
IREADY	EQU	2		; MASK FOR INPUT READY
OREADY	EQU	1		; MASK FOR OUTPUT READY
CTRLG	EQU	7		; CONTROL-G (SOMETIMES 87H)
IFUDGE	EQU	7		; INPUT FUDGE FACTOR

Listing 5: Suggested data definitions.

INIT:	LXI	H,IQ	; BOTTOM OF INPUT Q IS
	SHLD	FIQ	;  INITIAL VALUE OF FRONT
	SHLD	EIQ	;  AND END OF INPUT Q
	LXI	H,OQ	; BOTTOM OF OUTPUT Q IS
	SHLD	FOQ	;  INITIAL VALUE OF FRONT
	SHLD	EOQ	;  AND END OF OUTPUT Q
	XRA	A	; ZERO IS INITIAL VALUE
	STA	IAC	;  OF INPUT ALARM COUNTER

Listing 6: Initialization of the system.

To summarize the steps needed in order to use the system:

Include in your program (kept in either read only memory or programmable memory) the subroutines given in listings 1, 2, 3, and 4.
Include as part of the initialization of your main program the initialization steps given in listing 6.
Include as part of your data (kept in programmable memory) the data definitions of listing 5.
In your program, whenever you need an input character, write CALL IP to put a new character into the A register; whenever you have a character to put out, put it in the A register and then CALL OP.
Have a subroutine POLL in your program, as follows:
```
POLL:	PUSH	PSW
	CALL	IPOLL
	CALL	OPOLL
	POP	PSW
	RET
```
and have your program call POLL once in each loop and just before each subroutine return.

As soon as you have gotten this much working, it will be possible for you to tinker with this system a bit further. Some suggested ways of doing this are as follows:

The sizes of the input and output queues can be altered. Make sure to alter the entire data structure of listing 5 to insure that all the routines of listings 1, 2, 3 and 4 operate on the same version of the data structure.
There is a section of code in IP that almost duplicates a similar section of code in IPOLL. With a little ingenuity, this can be made into a subroutine called by both IP and IPOLL. (Hint: the first instruction is INX H, and JNZ can be replaced by RNZ.) The same thing happens with OP and OPOLL.
The input alarm logic can be further changed. For example, two kinds of alarms could be given: a single bell when the input queue is almost full, and a long string of bells (say, ten of them) when the queue is actually full.