Reconfigurable Architecture Computation Engine
RACE (tm.)

John R. Hart, Testra Corporation


    Forth is almost the ideal processor for a PLD because Forth performance isn't compromised by the limited number of registers available in current PLDs. The performance of Forth words coded in the primitive set, was used as a key benchmark in the process of refining the design. When the execution speed of Forth words reached 4 MIPS, the firmware optimization was stopped, and work on the application began. The finished design fit into the PLD and left room for logic needed by the target application.

Forth was used as the simulator in the search for the design.
Forth was used as the hardware definition language for the design.
Forth was used to convert the design into logic equations.
Forth was used to fit the logic equations into the PLD.
Forth was used to route the PLD's internal connections.
Forth was used to verify the logic equations.
Forth was used to assemble the application code.
Forth was used as the meta-compiler.


    Even the largest PLD has a limited number of registers so our effort was focused on making an efficient stack machine. The end product was a simplified Forth model that fit in the PLD, and left room for the timing sequencer and the current control state machines, needed by the application.

    The design process began by making a software simulation of a very simple Forth processor, called the miniForth. Getting the miniForth up and running was one of the easiest parts of the job, The simulation code for the 27 primitives needed to build the Forth kernel took only a few days to write and debug.

    The miniForth was just the starting point in an evolutionary process that involved running the application on the simulator, finding bugs, and correcting shortcomings. The viability of this method was clearly evident, when the prototype hardware booted up, and said OK, without a glitch.


    The RACE is a 16 bit RISC processor that executes code at 25 MIPs. Most Forth primitives take from 4 to 8 cycles, so Forth runs about 4MIPs. Code operators were devised so they could be combined to build efficient Forth primitives, and make best use of the PLD's limited resources. This meant some things were done in unconventional ways. Functions like anding, counting, and shifting, are easily done in one cycle, but arithmetic functions had to be broken into two parts. In the first part the operands are half added using an XOR instruction that takes one cycle. In the second part a special instruction is executed four times to propagate the carry through all 16 bits.

    Several different conditions can be selected to control branching. PC branching takes one cycle, but because the instruction following a branch executes, a null may have to be placed there, making the branch take two cycles. The 0BRANCH primitive is built using an instruction that copies the jump address into the IP only if the top of the stack is equal to zero.

    The RACE has two Interrupts, one for the timer and one for external events. The branch on interrupt is part of the next instruction. To maintain an interrupt latency of less than 2 micro-seconds, there can be no more than 128 clocks between NEXT instructions.

    Multiplication is done by conditionally adding and shifting. Because the multiply process takes longer than the maximum interrupt latency, a special case of next is used as an interruptable loop instruction in the multiply and divide code. Subroutine nesting is activated by the least significant bit in the instruction. If the bit is zero the instruction is a call, and the PC is loaded with the address of the nesting code. If the bit is one the higher bits are loaded into the PC.

    The majority of the instructions were made for building Forth primitives, but there are several application specific instructions and instructions for booting, accessing DRAM, loading the timer, doing I/O, loading CODE memory, and addressing local variables.

    Code Instructions are divided into five fields, the control field, the accumulator field, the memory address field, the stack pointer field, and the register address field.

    The control field (CSu) is five bits wide and it controls system timing, memory access, and instruction modes. Execution of the CSu is deferred until the next instruction.

    The accumulator field (ACu) is two bits wide and it controls the operation of the AC register. There are four modes for AC instructions that are set by the previous CSu.

    The memory address field (MAu) is three bits wide and it selects one of seven different memory address.

    The stack pointer field (PTu) is three bits wide and it controls incrementing, decrementing and loading of the stack pointers.

    The register address field (RAu) is four bits wide and it is used for addressing local variables and loading small constants.


    Reconfigurable processors are computational devices with reprogramable logic and data paths, that can be adapted to the needs of an application. For many applications, reconfigurable hardware has a computation potential orders of magnitude faster than fixed hardware.

    This project demonstrates that a practical reconfigurable micro-controller can be build using a PLD. The main disadvantage of current PLDs for this application, is the large number of pins used connecting them to the memory. If as little as 2K of internal memory was available, 28 fewer I/O pins would be needed, and the economic advantage of using a PLD as a reconfigurable processor would be much higher.


The configurable processor draws near.
Stanley Yang, p63 EE Times, 10/19/1998

Speedy 8-Bit Microcontroller Crafts Virtual Peripherals.
Dave Bursky, p36 Electronic Design, 8/4/1997

Soft Computing Reconfigures Designer Options.
Jim Turley, p76 Embedded Systems, April 1997

From Code to Logic.
Man/Machine, p23 OEM Magazine, Dec/Jan 1996

Stalking the Chameleon Computer.
Murray Disman, p67 OEM Magazine, Dec/Jan 1996

Shattering the programable-logic speed barrier.
Brian Dipert, EDN, 5/22/1997

CPLD/FPGA devices, tools lure PLD designers into faster, denser logic.
Mike Donlin, Computer Design, Nov 1995

Apendix A

Instruction Word Format: F E D C B A 9 8 7 6 5 4 3 2 1 0

                         `-------------.-------------' |
                                       |               |
    ADDS ------------------------------'               |
    NEST ----------------------------------------------'

If NEST is one, the word points to code,

if it is zero, the word points to another list.

Apendix B

Code Word Format: F E D C B A 9 8 7 6 5 4 3 2 1 0

                  `---.---' `.' `-.-' `-.-'--.--'
                      |       |     |     |     |
    CSu control ------'       |     |     |     |
    ACu accumulator ----------'     |     |     |
    MAu memory address -------------'     |     |
    PTu stack pointers -------------------'     |
    RAu register address -----------------------'

Apendix C

Internal Registers:

    AC Accumulator             16 bits    (top of stack)
    TR Temporary Register      16 bits
    IP Instruction Pointer     15 bits
    SP Stack Pointer            6 bits
    RP Return Pointer           6 bits
    MA Memory Address          15 bits
    PC Program Counter         12 bits


    CRY Carry Flag (not borrow flag when subtracting)
    FLG Shift Flag (memory address bit 0)
    ACZ AC is zero (top of stack)

If you have comments or suggestions, email us at
Testra Corporation   1201 N. Stadem Drive  Tempe, AZ 85281  Ph. 480-895-8439  Fax: 480-895-3589