High performance, low power consumption and small footprint requirements imposed by the embedded market on the processor industry is causing a definite move away from single-core processors to multicore processors. Multicore processors have been deemed as the future of Size, Weight, and Power (SWaP) constrained applications like military and avionics. They provide higher performance (MHz/W) at lower power. They also allow consolidation of multiple functions/ applications onto a single platform.
IMA and Certification
Modern avionics systems are moving from federated systems to Integrated Modular Avionics (IMA) where multiple applications with mixed criticality reside on the same computing platform. The IMA concept is detailed out in a set of standards like DO 297, and implementation guidelines like ARINC 653 and ARINC651. A safety-critical avionics system has to be certified by the Federal Aviation Administration (FAA) in the United States or the European Aviation Safety Agency (EASA) in Europe, covering both hardware and software. The standard RTCA/ DO-254 (ED-80) provides guidance for the development of airborne electronic hardware, and the standard RTCA/DO-178C (ED-12C) provides guidance for the development of airborne software.
Advantages of multicore platforms in terms of performance, power, and size make them ideal for IMA applications. One of the application areas could be the dual redundant two channel command monitor lane architecture of Automatic Flight Control Systems. A typical IMA backplane is divided into Channel A and Channel B powered by independent power supply modules for redundancy. Four processing elements are required to host Channel A command and Channel A monitor, and Channel B command and Channel B monitor processes for primary AFCS, and similarly four processing elements for backup AFCS. Both IMA cabinets have to be connected with the interconnecting digital bus. So, we need four IMA cabinets and 8 processing elements to implement the above architecture, as shown in Figure 1.
With the introduction of multicore processing elements, Channel A command and Channel B monitor applications can be hosted on one computing platform P-1, and Channel B command and Channel A monitor can be hosted on the second computing platform P-2. It is then possible to implement Dual- Dual architecture with two IMA cabinets and four processing elements as shown in Figure 2, thus reducing the overall weight by 50%. Also, with increased processing power available, custom I/O processing modules can be integrated with the processing elements for handling memory and intensive signal processing, thereby reducing the external and cabling requirements. However, the isolation requirements of the functions hosted should be met, and robust partitioning needs to be ensured in the integrated environment.
In IMA, robust partitioning, to achieve fault containment, has traditionally been implemented in the federated architecture with dedicated hardware per application or function. With the introduction of IMA and multicore, the robust partitioning property needs to be addressed and ensured.
Robust partitioning, or separation, is the central concept to avoid influences between different applications in space and time. Space relates to access of memory regions or I/O interfaces. An example of space partitioning support is a memory management unit (MMU). It maps the partitions to memory regions and enforces access pattern according to a defined configuration. With means of time partitioning it can be guaranteed that one function’s changing demand for hardware resources will never prevent another function from obtaining a specific minimum level of service. Furthermore it can be ensured that the timing of a function’s access to these resources will not be affected by variable demand or failure of another function.
It shall be noted that the guarantee of partitioning in an IMA (e.g., enforced and supported by operating system and hardware) generally needs to be assured with a probability of the highest (most demanding) application running on this hardware and operating system. Partitioning isolates faults by means of access control and usage quota enforcement for resources in software.
For safety-critical embedded systems, quality requirements usually refer to timing or safety attributes. In some cases, security attributes play an important role as well. Timing attributes often consist of deadlines for specific tasks, which must be met under all circumstances. This leads to the requirement of fully deterministic system behavior.
Adopting multicore platforms in IMA systems where multiple partitions are executing on different cores in parallel will bring in significant challenges as described in the following sections.
Robust Partitioning — Time
Temporal separation is fundamentally violated for multicore as there are multiple applications executing in parallel. The inter-chip interconnect may violate temporal separation at the microscopic level. The scheduling policy should ensure that interferences between parallel executions are controlled, known, and hence bounded for deterministic behavior. Specific challenges are scheduling strategy and configuration across cores such that interference patterns are controlled and bounded.
Robust Partitioning — Space (Memory and I/O)
Because there are multiple applications executing in parallel, there is a fundamental violation of space partitioning as processor resources like chip interconnect are shared. In order to achieve spatial separation, memory and I/O contention should be controlled and monitored by the scheduling policy or other mechanisms. The impact of many implicit resources like chip interconnect need to be studied in the hardware architecture level for achieving deterministic behavior.
Specific challenges include:
- Scheduling policy and configuration to control resource (memory and I/O) contention;
- Software mechanisms to manage concurrent access to shared resources like real-time locking protocols.
Inter-Partition / Inter-Core Communication
Communication and synchronization are not restricted to partitions that execute serially. Synchronization is required between parallel executing cores. Inter-core communication is facilitated by hardware features like doorbell interrupts. This can also be accomplished by software. This communication needs to be deterministic and synchronized.
Worst Case Execution Time
In addition to the complexities for WCET estimation for single core processors, analysis multicore processors need to model and account for the following:
- Interference patterns between the parallel tasks in different cores due to memory and I/O contention. A detail of hardware arbitration of shared resources is generally not available for COTS hardware.
- Impact of implicit shared resources like interconnects. This information is generally not available for COTS processors and is a source of error.
- Impact of cache management policy.
- Interleaving by the inter-chip interconnects of the concurrent transaction flows in order to maximize the global average bandwidth.
Cache Memory Handling
Cache memory handling was complicated for single processor designs. This is further complicated by multiple cores executing in parallel sharing cache. Analysis of cache usage is extremely complex and almost infeasible. Specific issues that need to be addressed are:
- Cache management policy to isolate applications running on the same core and different cores trying to access the cache, resulting in cache contention and loss of determinism.
- Ensuring the cache content integrity.
- Reducing the gap between the WCET and ACET by ways of cache content prediction and cache partitioning, or other methods.
Maintained in hardware, multi-level cache hierarchy makes the coherency model quite complex. These are usually through implicit access to shared resources like chip interconnect. Determinism of this function is usually not known for COTS processors.
As WCET estimates are over-estimated, during normal execution, a lot of unutilized processor time would be available. Performance optimization techniques utilize this spare capacity. The key challenge is to reduce the gap between the WCET and ACET so that the performance advantage of multicore is not overshadowed by the inflated worst case execution times resulting in underutilizing the processor execution time.
Non-real-time systems, which focus on the average case performance challenges, are not of significance, and multicore platforms used are tuned for maximizing the average case performance. In safety critical systems it is mandatory to know the worst case behavior up front. So, identifying the key aspects during evaluation of the platform will be helpful in providing firsthand information on potential challenges for system engineers and platform designers.
Certification Aspects for Evaluation
Employing multicore COTS processors in safety critical applications will be cost effective and enhance the time to market. Platforms will have to be designed around COTS processors, carefully choosing the features which can be used in safety critical environments and disabling the features which may not be consistent with the safety and certifiability aspects. Table 1 summarizes the various aspects of multicore which are critical for ensuring the safe and deterministic operation of the applications that need to be evaluated.
Symmetric Multi-Processing (SMP) is an architecture that provides fast performance by making multiple CPUs available to complete individual processes simultaneously (multiprocessing). Any idle CPU can be assigned any task, and additional CPUs can be added to improve performance and handle increased loads. SMP uses a single operating system and shares common memory, all the IO, and interrupt resources. Processes and threads are distributed among CPUs.
In Asymmetric Multi Processing (AMP), each CPU group runs its own OS, which may be the same or different from each other. Each CPU group can be given a specific application to run. All CPU groups must cooperate to share the resources, meaning no single OS can own the whole system. I/O and interrupts are divided up amongst the CPU groups.
Virtualization is a concept that addresses the need to run multiple OS on a single system. Using virtualization, one can define a logical partition to represent a collection of actual or emulated hardware resources. Virtualization is a computing concept in which an OS runs on a software implementation of a machine, i.e., a virtual machine (VM). A single virtual computing machine runs on a single logical partition. The VMs are managed by a low-level software program virtual machine manager, also called a hypervisor layer, which provides abstraction between the underlying physical hardware and the VMs. It can also provide communications between VMs if required, as well as security and reliability (e.g., one VM could crash without affecting the rest of the system). It manages globally shared resources and virtualizes some as required.
AMP offers easy portability of legacy applications from single core design to multicore design, compatible with the existing debugging tools, and offers better control over determinism and performance.
AMP or Microkernel configuration can be used for hosting a mix of safety critical and non-safety critical applications, maximizing utilization and ensuring that the safety critical partitions are not impacted by the non-critical partitions and applications running in parallel. The following sections discuss generic steps for evaluating specific aspects in a chosen software configuration AMP or Microkernel. Typical test setup is shown in Figure 6.
I. Determinism and Performance
Introduction of multicore platforms will pose significant challenges in arriving at deterministic behavior. Challenges in estimating the worst case behavior could result in inflating the execution time budgets for various partitions for ensuring the deterministic behavior, which in turn can easily nullify any performance advantages. So, there is a need to evaluate the platforms with respect to performance, while meeting the deterministic behavior required by the safety critical hard real-time systems.
- Establish the partitions on various cores.
- Establish Worst Case Execution time. References    give comprehensive study on WCET estimation for multicore platforms. Supplier provided tools can also be used for the estimation. Analytical tools to identify the worst-case paths and analysis can be used.
- Derive test case scenarios, which can lead to worst-case paths leading to maximum contention delays including concurrency and coherence setup.
- Arrive at the Average Case Execution time.
- Compare the ratio of ACET/WCET.
- Study the various mechanisms for dynamically using the available slack time for running non-critical or slower rate tasks.
- Study the overall processor utilization across the cores.
II. Resource Partitioning and Allocation
Introducing multicore in IMA systems with multiple applications executing in parallel causes a fundamental violation of space partitioning as processor resources like chip interconnect are shared. The impact of many implicit resources like chip interconnect need to be studied in the hardware architecture level for achieving deterministic behavior.
- Study the various mechanisms available for partitioning of the shared resources.
- Study the mechanisms available for allocation and de-allocation of resources to various partitions.
- Perform the static analysis to ensure that the resource partitioning is not violated, or any violations are detected and annunciated.
III. Functional Safety Analysis
In IMA it is critical to maintain the robust separation between the various functions hosted. Mechanisms for error detection and fault containment and mitigation are to be analyzed in detail.
- Identify various probable errors that can occur in software or hardware or in combination.
- Study the fault identification and annunciation mechanisms built into the platform. Analyze if there is a possibility of un-annunciated failure occurring.
- Provide the analysis that, on detection of a fault in a partition, erroneous partition can be isolated, halted, and aborted, preventing it from affecting the healthy partitions.
- Study if the provisions available to recover the faulted partition can be inducted back into the execution schedule.
IV. Device Configurability
In practice many configurable features (like some cores, register settings, pins, and debug functions) of the platform are deactivated.
- Check the provisions to create, store, and load the device configurations.
- Establish the working configuration of the device used.
- Provide analysis of the safety mechanisms employed.
Make sure that the device functions as per the configuration and inadvertent activation of disabled functions is either prevented or will not lead to any unintended effects on the operation.
V. Cache Memory Handling
Cache memory management is one of the key aspects to be studied.
- Identify the cache organization on a given platform.
- Identify the cache management policy employed, like cache partitioning or cache trashing.
- Derive test case scenarios that can lead to worst case paths leading to maximum contention delays including concurrency and coherence setup.
- Identify the impact of concurrent access to the shared cache.
- Establish the ratio of ACET to WCET.
- A WCET equal to ACET multiplied by a factor of 2 can be deemed acceptable.
Experimental results  suggest that cache partitioning can help in reducing the gap between the average case execution time and worst case execution time significantly, thereby helping the application schedules to be tight yet safe for safety critical applications.
VI. Interconnect Architecture
COTS hardware has to be deeply analyzed to understand all the interference channels, including interconnect or coherency fabric. The behavior of the interconnect needs to be analyzed, as interconnect manages the transactions from the cores and shared resources. The behavior depends on interconnect architecture, arbitration policy, and topology of network. With limited availability of the data from the hardware designers, the evidence of determinism in transactions happening in parallel needs to be established with sufficient experimental data and analysis.
VII. Inter-Partition and Core Communication
Inter-process communication IPC mechanism is employed in ARINC 653 compliant Real Time Operating systems, which will ensure the communication between the partitions in a deterministic manner.
- Establish partitions on various cores.
- Study the communication mechanisms available for partitions running on different cores.
- Characterize the delays caused due to interference from the partition running on the other cores.
- Evaluate the IPC missed deadlines or periods if any.
VIII. Interrupt Handling
Interrupts are unsynchronized tasks that originate randomly, introducing nondeterminism. The problem gets amplified in multicore processors because of the interrupts originating from multiple cores simultaneously. Methods such as disabling interrupts and making interrupts periodic, etc., are used in handling the interrupts. Evaluate the interrupt handling mechanism.
- Study mechanisms available to partition hardware resources available to each operating system.
- Study impact of two operating systems trying to access the PIC simultaneously and their effect on the task execution.
- Characterize the interrupt latency with multiple processes running on different cores simultaneously.
- Worst case interrupt latency should be defined per core, services, and drivers.
IX. Development and Analysis Tools
For the development and integration of Time and Space partitioned systems and to ease the certification effort, availability of tools is very important. Evaluation of platform based on the toolset it provides or supports will be one of the key criterion. Following are the list of desirable tools and features:
- Schedule Generation tools: Availability and effectiveness of the tools to define scheduling components like partitions, processes, threads, system objects, and associated parameters such as partition cycle, WCET, memory size, etc. Generate platform registries that define the scheduling, resource partitioning, access permissions, inter-partition, and intra-partition communication schedules, etc.
- Schedule Analysis tools: It is required to provide evidence that the schedule generated is actually schedulable in real time. Static analysis tools to evaluate the schedulability of the system need to evaluated.
- Debug tools: Real time debugging tools for development and verification will help simplify the system development.
- Performance and Status monitoring tools: The tools monitor the status of the creation of processes, threads and other system objects; execution times with respect to the allocated budgets; and utilization of memory and IO resources. Exceptions occurring in system and user domains need to be evaluated.
X. Manufacturer Public and Private Data Availability
In order to evaluate the platform, availability of accurate data is required to model the behavior of the platform like interconnect bus, cache, and the impact of shared resources on the performance. Proof availability of data needs to be established to the certification authorities.
To summarize, the application of MCPs in avionics and other safety critical systems is inevitable because of the performance advantages and obsolescence of single core processors in the future. However, they present significant challenges for the developers of certifiable, safety-critical applications. If one wishes to use an MCP in such applications, bounding and controlling interference patterns on shared resources, and effectively managing CPU utilization, are essential. Without the first capability, certification of safety critical software is impossible. Without the second, much of an MCP’s increased computing power is wasted.
Because of the limited in-service experience of multicore platforms in the safety critical industry, there is not enough evidence available to readily adopt the multicore technology. It is very beneficial to evaluate the commercial platforms which can be certified to DO 178B Level A/DO 254-Level A with respect to the criterion identified to have first-hand information on the performance, safety features, availability of tools to ease the development effort, and availability of required data for providing certification evidence.
This article is based on SAE Technical Paper 2015-01-2524 by Srikanth Gampa of UTC Aerospace Systems (Charlotte, NC). http://saemobilus.sae.org .
- RTCA, Inc, “RTCA/DO-254, design assurance guidelines for airborne electronic hardware,” 2000.
- RTCA, Inc, “RTCA/DO-178C, software considerations in airborne systems and equipment certification,” 2012.
- RTCA, “DO-297: Integrated Modular Avionics (IMA) Development, Guidance and Certification Considerations” 2005.
- Society of Automotive Engineers (SAE), “ARP 4754: Certification Considerations for Highly-Integrated or Complex Aircraft Systems”, 1996.
- Aeronautical Radio Inc (ARINC), “ARINC 653: Avionics Application Software Standard Interface Part 1 - Required Services,” 2010.
- Aeronautical Radio Inc (ARINC), “ARINC 651: Design guidance for Integrated Modular Avionics”, 1997.
- Jan Nowotsch, Michael Paulitsch, “Leveraging Multi-Core Computing Architectures in Avionics,” EADS Innovation Works.
- Xavier Jean, David Faura, Marc Gatti, Laurent Pautet, Thomas Robert, “Ensuring Robust Partitioning in Multicore Platforms for Multicore Systems,” 31st Digital Avionics Systems Conference, 2012-10-16.
- Petar Radojkovi’, Sylvain Girbal, Arnaud Grasset, Eduardo Quiñones, Sami Yehia, Francisco J. Cazorla, “On the Evaluation of the Impact of Shared Resources in Multithreaded COTS Processors in Time-Critical Environments,” ACM Transactions on Architecture and Code Optimization, Vol. 8, No. 4, Article 34, Publication date: January 2012.
- João Craveiro and José Rufino, Frank Singhoff, “Architecture, Mechanisms and Scheduling Analysis Tool for Multicore Time- and Space-Partitioned Systems”.
- Nan Guan, Martin Stigge, Wang Yi, Ge Yu, “Cache-Aware Scheduling and Analysis for Multicores,” EMSOFT'09, October 12-16, 2009, Grenoble, France.Bach D. Bui, Caccamo Marco, Lui Sha; Martinez, J., “Impact of Cache Partitioning on Multi- Tasking Real Time Embedded Systems,” Embedded and Real-Time Computing Systems and Applications, 2008. RTCSA ‘08. 14th IEEE International Conference.
- Reddy Rakesh, Petrov Peter, “Eliminating Inter-Process Cache Interference through Cache Reconfigurability for Real-Time and Low-Power Embedded Multi-Tasking Systems,” CASES’07, September 30-October 3, 2007, Salzburg, Austria.
- Chattopadhyay Sudipta, Chong Lee Kee, “A Unified WCET Analysis Framework for Multi-core Platforms,” ACM Transactions on Embedded Computing Systems.
- Marco Paolieri, Eduardo Quiñones “Hardware Support for WCET Analysis of Hard Real-Time Multicore Systems,” ISCA’09, June 20-24, 2009, Austin, Texas, USA.