Robustness
Robustness = f (Null pointers, Mem leaks, CPU load, Exceptions, Coding errors):
The exact transfer function could have been found out by doing a number of “Designs of experiments”. This would have consumed a lot of effort and would have turned out to be an academic exercise.
The purpose of finding transfer function is really to find out which of the Xs are really correlating heavily with Y so that optimizing them would yield maximum benefits. In case of robustness however each of these Xs are equally important and all of them need to be optimized. So we decided to take actions on each of these input parameters in different ways.
Some of these actions are as follows:-
A. A small tool was developed to find null pointers if any in the code stack. Most of these were then eliminated.
B. Stringent limits set for memory allocation of subsystems. This was tracked at every release to ensure that all subsystems are within budget and that there is no overlap of memory space. (Subsystems come from different project teams and external suppliers).
C. From programming experience, it has been found that CPU load > 65% makes the embedded system unstable and unpredictable. Hence different combinations of concurrent scenario’s (stressed conditions) were chosen and CPU load tracked using a tool (proprietary) for every release as shown Figure 15. Any increase in the load led to code optimization
FMEA was done to identify failure modes leading to exceptional conditions for new features. Graceful exits and error recovery mechanisms were implemented.
For example- exit with an error message rather than be in a continuous loop when nonstandard USB device is connected to the recorder.
Static analyzer tools such QAC was run and the target set was 85% code coverage. Errors and warnings were closed. For each code module, the QAC coverage was measured and improved before “build generation”.
A. An operational profile of typical user scenarios was created and the software runs on the product with different profiles continuously for 4-days at elevated temperatures (Duration test). The results were verified every alternate week.
B. Finally the overall CTQ of robustness – hangs and crashes were measured on weekly builds to verify the results in normal as well as stressed conditions.