This thesis was submitted in August and accepted on 28th November 2003.
The role of the communications bus is fundamental in distributed real-time control systems. Such systems are increasingly used for critical functions in avionics, automotive and factory control situations, placing increased dependability and real-time constraints on the bus. Environmental influences such as electromagnetic interference are hard to avoid so a flexible bus may be able to provide active fault tolerance. However its effects on reliability and timeliness are difficult to predict.
This thesis contends that guaranteeing to meet all deadlines in communication is not only impractical, but often impossible, due to the unpredictability of environmental interference, no matter which type of electrical bus is used. However, many applications are capable of safe operation if a small number of communication deadlines are missed. In such systems, an analysable and reliable system can be achieved through the use of flexible fault tolerance.
Using CAN (a widely used bus protocol) as a basis, this thesis first shows how weakly-hard analysis (which considers timing behaviour over a number of invocations) can be applied to flexible bus scheduling. This allows consideration of more than just the worst case scenario, leading to analysable and predictable behaviour under severe environmental conditions. A second form of analysis based on a probabilistic fault model is used to provide accurate probabilities of failure, providing the facility to explore system behaviour analytically for fault scenarios which exceed normal behaviour. Finally, a simple extension to the CAN protocol, TCAN (Timely-CAN), is proposed which enforces timely recovery from faults by only using CAN message retransmission where it is useful to do so without imposing further delays on the bus. Hence the flexibility of CAN is exploited to provide fault tolerance, and both timeliness and predictability are achieved.