Technical topic: Support of FPGAs in TASTE
Let’s assume a TASTE system must be built, that tests input integers and reports whether they are prime numbers or not. This system cannot be exclusively implemented in SW, since...
- the Leon processor of the target platform is running at a relatively low frequency, so it will take quite a lot of time to do the necessary calculations each time, and...
- Leon may be busy doing other tasks, and performing a blocking calculation that takes a lot of time is undesirable.
The prime-number checking functionality will therefore be hosted inside a HW component – inside the VIRTEX-4 100K, with which Leon can communicate at runtime.
The interface view
The interface is a very simple one: the integer value will be passed to the circuit, which will work on it, and report back the first factor of the number. If this output number is the same as the input number, then the number is prime. A simple ASN.1 data type is therefore created, describing the input and output:
Large 64bit integers will be used, so the additional constraint is specified using the proper ASN.1 constraint declarations. The TASTE tools take the constraints into account when automatically creating the equivalent declarations and code (that will be shown further below), so this is an important step.
The compute SUBPROGRAM is then specified in the interface view:
The contents are simple: the function will take one input parameter (in_tocheck), of type T_INTEGER, and will return one output (out_factor). Notice that the encodings used when communicating with the function are also specified, and more specifically, Unaligned PER encoding (UPER) are used for the input, and Native (memory dumps) for the output.
This information is then used by the TASTE tools, and the generated output is described in the next paragraphs.
The VHDL skeleton/glue
The generated skeletons of TASTE subsystem have complete input/output specifications, including the parameter type info. Indeed, this is a section from the generated VHDL spec of the compute subsystem, TASTE.vhd:
The interface parameters (the two integers) have been mapped to corresponding VHDL entities. The skeleton also includes signals start, finish and clock:
- clock is the chip’s clock signal
- start is the signal raised by the circuit’s user, as soon as the in_tocheck parameter has been written – it tells the circuit: "go on, your input data are there"
- finish is the signal raised by the circuit, as soon as the computation is completed – it tells its user: "I am done, go read out_factor".
This is just the core "declaration" of the circuit.
One might argue that this could be written manually, and it would not be a big deal. If however, the parameter type is more complex, then the input mapping becomes quite a daunting task.
For example, if instead of a simple integer, a more complex type was used - like T_POS from the grammar shown in Figure 4 - then writing the equivalent declarations would be much more time consuming:
Mapping the T-POS type to VHDL with the TASTE mapper, is just as easy to handle, as with a simple INTEGER.
And that’s only one part of what the TASTE tools automatically create.
The TASTE hardware mapper (vhdl_B_mapper.py) knows
- what the target FPGA architecture is
- what bus the FPGA is operating over (PCI? USB? etc)
- what FPGA type this is (Spartan? Virtex?), etc.
It can therefore generate ALL THE CODE necessary for "speaking" – at runtime – to the chip, intercepting write accesses (over the bus) and responding to read access (over the bus).
If, for example, an 8bit communications bus is being used, the automatically generated VHDL skeleton includes code like this:
By inspecting this VHDL code, it is easy to see that write accesses over the bus are intercepted, and the VHDL mapper *knew* how to map them to the appropriate input registers on the HW side. TASTE, in effect, automatically handles the allocation and mapping of "interface" registers between the FPGA side and the incoming message parameters.
This is an important part of designing chips, and with TASTE, it is done completely automatically.
Since the code generator knew that this is an 8-bit bus, it mapped the input parameter (the 64bit input integer value that is passed over the bus) to 8 bus addresses, from 0x2001 to 0x2009.
0x2000 is reserved for the "kick-off" signal – when all input parameters are written to the input registers, and the circuit is therefore ready to process them, the device driver writes to this address, and thus raises the chip’s "start" signal.
There is corresponding code for the reverse direction: the reading of the response over the bus:
Again, this code – regardless of the complexity of the input message, and whether it has one or one hundred or one thousand fields – is written automatically.
Notice that the read accesses were automatically mapped to different offsets than the write ones – this depends of course, on many things, including whether the FPGA board accepts bidirectional register access or not – but the point is, the user doesn’t have to be involved with these parts, since they are automatically written.
The device driver
So far, a ready-to-use VHDL skeleton/glue has been generated, and all the user has to do is write the implementation of the "compute" component. The VHDL parts that relate to communication over the bus – the realtime “talking” to the chip - is already written.
But what about communicating with the SW world? There will be other SW components (TASTE subsystems) that will be speaking to this component. These others will most probably run inside CPUs (LEON, or x86 Linux, so far). How will they "speak" to the chip?
Well, since the VHDL "bridge" was written automatically by TASTE code generators, the same code generators – who knew the register offsets they allocated to each parameter – can also write a complete device driver!
Here’s a part of the automatically generated driver code, for our 8bit example:
The code is using a simple API to "speak" to the chip, over the bus, which allows it to write to FPGA registers. It therefore decodes the incoming parameter (since it is encoded in UPER, see Figure 3), and uses the API to place the incoming information (the integer input) to the appropriate input registers. Notice the two last parameters that the generated code passes to the WriteRegister function (above): a register offset, and a value. BASE_ADDR is in fact, 0x2000 – so, this function:
- Obtains an incoming INTEGER value – sent, presumably, from other TASTE subsystems that are curious whether this number is a prime or not
- Writes the incoming value over the bus, one byte at a time, in the appropriate target offsets.
So, since the TASTE code generator created the "receiving" code of the VHDL side, it knows how to write the corresponding "sending" side, in the driver code - and it knows to "kick-off" the chip, as soon as all input parameters are in...
...and wait for the result to be calculated (that is, for the chip to raise the "finish" flag).
Writing all this code, is a very tedious, and very error-prone process. It becomes all the more problematic, when this is not done for the simple case of an INTEGER – but for a complex type like the T_POS shown before (in Figure 4).
Equally important, during development, the order of parameters might change – if for example, a reason comes up for adding yet another field in a compound type definition. The new field will shift all the register addresses by some offset. Having to update all the necessary code in the driver and in the VHDL side is not just annoying – it is also very error-prone – something may go amiss.
TASTE solves this problem automatically.
The TASTE mapper doesn’t just generate a set of VHDL (for the HW) and C (for the SW) files. It also generates all the required components of a full Xilinx project:
In fact, a Windows batch file is also generated which allows the user to build the FPGA bitfile by a simple invocation of "build.bat".
The parts described so far are offering quite an incentive to use TASTE for HW components. TASTE however moves beyond this; the mapper handling hardware subsystems is also generating the necessary "boilerplate" for a SystemC implementation of the subsystem.
Note that SystemC code is NOT executing in a CPU. With SystemC, development goes as follows:
- the user writes the subsystem’s code in C++
- he then uses a normal C++ compiler to compile it
- the generated binary accurately simulates the chip, and thus the user can verify that the design works correctly
- finally, the user uses a SystemC compiler to compile his design to VHDL, so that he can download the design to the target FPGA.
To that end, TASTE automatically generates the SystemC header of the subsystem:
And this is the automatically generated SystemC skeleton:
Inspecting this code reveals the corresponding side of the HW implementation: The chip...
- waits until someone (the device driver) raises the “start” signal (the first “wait” loop).
- the user-written code will then be executed (notice that automatically generated comments tell the user what to fill-in: “read data from in_tocheck”, “write result to “out_factor”)
- when the user code is finished, the chip will raise the ‘finish’ flag
Easy implementation and communication with HW subsystems
To conclude, the executive summary of what was shown above is this:
By using the TASTE support for HW subsystems:
- The work is significantly reduced, since major parts of the development are written automatically. More specifically:
- The device driver - the SW side of the component – is written 100% automatically, there are no human-writable parts.
- The Xilinx project files are written 100% automatically, no human-writable parts.
- The VHDL register interfaces that map input and output parameters are completely automatically written, no human-writable parts.
- The VHDL and SystemC skeletons are written 100% automatically, and include specific comments to the user, indicating where to add the processing logic.
- The designs are therefore far more adaptable, since the user can add extra inputs/outputs without any difficulty – the automatic mapping to input/output VHDL registers and the corresponding adaptations to the device driver are done automatically, so there is no associated cost.