When you use 3L Diamond to create a multiprocessor application, you follow three simple steps:
Design Your Application Code
Divide your problem into a number of independent tasks that can communicate with each other, then specify the channels that will be used to carry data between your tasks. This step is independent of any hardware you may subsequently decide to use.

Select the Hardware to Use
Choose the actual DSP and FPGA processors you want in your system from the list of supported modules, then describe the hardware links they will use to intercommunicate.
Map Your Code to the Hardware
Finally, let Diamond know on which processor each of your tasks must run.
Once you have completed these steps, Diamond can do everything else needed to construct your complete application.
Today’s high performance applications often contain a combination of DSPs and FPGAs as well as high-speed data acquisition devices. Hardware vendors provide solutions that make it easy for customers to tailor complex and high-performance hardware systems from DSP, FPGA, and I/O components. The vast majority of DSP development tools focus on targeting a specific single device. For example, in the case of C6000 program development, Code Composer Studio from Texas Instruments has been designed to handle applications running on one DSP. Similarly, the Xilinx System-Generator tool also targets a single processor.
It is possible to use these tools for multiprocessor development, but their lack of any form of true multiprocessor support adds non-trivial complications to both the development process and the final deployment of applications. Perhaps the worst of these complications is the way in which the absence of a coherent multiprocessor model inevitably leads to knowledge of the hardware structure being hard-coded into the design from an early stage of development. This makes it difficult to change the system once the different parts have been completed.
3L Diamond enhances these single-processor tools by providing a proven and simple multiprocessor model with a level of abstraction that leads to efficient, coherent, reliable, and flexible systems. Based on the Communicating Sequential Processes (CSP) model, Diamond gives an extremely simple but powerful way to develop applications that make use of one or more processors. You build your application from tasks, self-contained blocks of code that communicate with other tasks using an abstraction called a channel. Diamond automatically implements references to channels using the target hardware's most effective physical data-transfer mechanisms.
A task can be built for processor-based technologies (such as DSPs) or logic-based technologies (such as FPGAs), and gives you a simple but efficient abstraction from the actual hardware. This abstraction hides many of the low-level details letting you enjoy greater independence from the hardware. The structure of an application is independent of the actual processors on which the tasks are placed. You can choose the most appropriate hardware, and even update with very little effort to newer generations as they become available.
A processor-based element, like a DSP or PowerPC, executes a sequential program. It typically executes a small number of discrete instructions (often only one) for each clock cycle. Such processing elements are well-suited to sequential operations and calculations: high-level control algorithms, for example.
A task on a processor like these is a complete C program that has been compiled and linked against the Diamond run-time library and any user libraries it may require. Each task has its own main function and this is invoked as a thread when the task starts to execute. You may place as many tasks on a processor as you wish; you are only limited by the amount of private memory the processor can access.
Tasks have full access to the capabilities of the processor on which they are running, and Diamond provides controlled access to features such as DMA channels and other peripherals. The configurer will place a microkernel on each DSP in the system. This kernel provides the primitives needed by Diamond (semaphores, events, timers, threads) and controls the basic operation of the processor by handling interrupts, devices, threads, and context switching.
Logic-based processing elements such as FPGAs are well suited to highly parallel processing, where, during any particular clock cycle, the FPGA may be performing a great many operations in parallel.
This kind of processing is very well suited to calculations like correlations and filters. Using Diamond, you can incorporate these processing elements into your system and populate them with VHDL tasks following the same model you would use with processor-based tasks. Diamond takes care of adding any communication resources that you may require to connect these processing elements to the rest of your system. Diamond also takes care of issues such as clock domains and clock domain crossings.