The Quest for Bugs:
(Avoiding security bug strikes with hardware security verification)
10 minute read
Bryan Dickman, Valytic Consulting Ltd.,
Joe Convey, Acuerdo Ltd.
New Breeds of Security Threats in Hardware
In previous Quest for Bugs articles and whitepapers we characterized various categories of hardware bug that IP Core, or ASIC developers, need to be cognisant of when considering their overall verification strategy. There is a further category of bugs that can be better described as security vulnerabilities. We first introduced this categorisation in The Origin of Bugs.
With security being at the forefront of concerns for modern information technology, and it being architected and built into most systems today, developers always need to be concerned about leaving unintended vulnerabilities in the design that could be exploited by malicious code. For example, leaking or corrupting secure data could be achieved by running non-secure software.
There are three classes of security vulnerability, referred to by the infosec community as the “CIA triad”; Confidentiality, Integrity, and Availability.
The infamous Spectre and Meltdown attacks are examples of confidentiality and integrity attacks as they are mechanisms where confidential assets can be leaked or corrupted. Denial-of-service or starvation attacks are availability attacks, where malicious code has the potential to put the device into a live-lock, dead-lock or run-slow state where the device is prevented from making forward progress, or responding to inputs.
Security vulnerabilities are often discovered accidently through explorative soak testing using constrained-random generation methods. It’s often the case that regular software such as the OS or Applications, will never observe the vulnerability in normal operation. This is usually because it does not contain any attack code, or the vulnerable code sequences are not present. However, an inherent vulnerability presents the possibility that a malicious agent might exploit it for mal-intent, and because of this,
security vulnerabilities are critical bugs.
Finding security bugs using the usual verification environments can be challenging and efforts must be taken to model threats and pro-actively verify them. Security verification is a major consideration for most hardware and software developers.
The Spectre of Security
In 2017, the aforementioned and much publicised Spectre and Meltdown security vulnerabilities came to light. Spectre is so named because the vulnerability arises from processors that implement speculative execution and because the problem was considered hard to fix it would therefore “haunt us for a long time”. Meltdown was named as such because it affected a number or processors across the industry. These vulnerabilities had not been previously discovered in real systems but were exposed by the Google Project Zero team through deliberate efforts in security penetration testing or white-hat hacking. The vulnerabilities were shown to exist for multiple processor vendors, affecting application processors that were architected to exploit out-of-order processing and speculative execution optimisations.
There has been much media commentary on the seriousness of the Spectre and Meltdown. Many commentators have remarked how these vulnerabilities form a whole new class of attack with far reaching implications for the security of processor-based systems everywhere. Here are a few examples.
“Meltdown is probably one of the worst CPU bugs ever found”, said Daniel Gruss, one of the researchers at Graz University of Technology who discovered the flaw.
“Spectre is not really so much an individual attack, as a whole new class of attacks. The revelation of Spectre has shifted our understanding of modern chip security as it affects almost all chips produced in the last two decades.”
“These are vulnerabilities in computer hardware, not software. They affect virtually all high-end microprocessors produced in the last 20 years. Patching them requires large-scale coordination across the industry, and in some cases drastically affects the performance of the computers. And sometimes patching isn’t possible; the vulnerability will remain until the computer is discarded.”
“It's likely to be years before researchers have determined all the various ways in which the speculative execution hardware can be used to leak information this way, and it will be longer still before robust, universal defences are available to stop the attacks.”
Researchers in the industry have been actively searching for any further related cases since Spectre and Meltdown became known and to date there are 27 CVE’s (Common Vulnerabilities and Exposures) related to Speculative Execution at the time of writing, affecting a number of different processor cores. Note that CVE is the cybersecurity industry catalog of publicly disclosed cybersecurity vulnerabilities. It is hosted by mitre.org. Every published vulnerability is assigned a unique ID.
Security Back Doors
The key that unlocks the above category of security vulnerability is the side-channel attack mechanism. In computer security, a side-channel attack is any attack based on information gained from the implementation of a computer system, rather than weaknesses in the implemented algorithm itself (e.g., cryptanalysis and software bugs). Timing information, power consumption, electromagnetic leaks or even sound can provide an extra source of information, which can be exploited.
Specifically in the case of Spectre, Meltdown and other vulnerabilities related to speculative execution, the Cache Timing Side Channel Attack (CTSCA) is the principal security back door of concern. In a nutshell, speculative executions that are cancelled (sometimes called ‘squashed’), can leave traces of secure assets residing in the processor’s caches, branch target buffers, and other micro-architectural structures which are shared between contexts. These structures are normally isolated at the architecture level between contexts e.g., code executing at a lower privilege level cannot access assets from a higher privilege level. Assets cannot be leaked between contexts under normal operating conditions. A CTSCA can leak the assets via a process of inference based on the different and measurable timing properties of elements that reside in the cache versus those that are not in the cache.
It’s an elaborate attack, but the point is it is achievable and practical. If known about, someone is likely to exploit it.
So, the security verification engineer needs to think differently about security verification and be aware of obscure attack mechanisms that can leak micro-architectural state even when the architecture prevents access. The challenge is that these types of security vulnerability arise from the combination of hardware and software – they cannot be detected by only looking at the hardware.
A Hardware Security Development Lifecycle
We’ve talked about a very specific class of security vulnerability that affects some classes of processors, but there are many more that the system developer needs to be aware of at the system level e.g., cryptographic keys that are stored in secure memory, debug mechanisms that can bypass security privileges, and hardware root-of-trust subsystems that ensure separation integrity from the rest of the system.
A methodical approach is required.
This approach must encompass all stages of the hardware development lifecycle from security requirements capture to security verification and security sign-off. In many ways the rigour and scrutiny required is similar to that which is required for functional safety (FuSa), and of course security has direct implications for functional safety also, especially in the modern automotive sector where this is a hot topic of discussion.
To start with you might want to train or hire some cybersecurity domain experts into you team. System architects, hardware designers, software developers and hardware verification engineers all need to skill up on cybersecurity. They need to be aware of security threats and attack mechanisms and they need to understand how to implement secure systems. Hardware development lifecycles and workflows need to be augmented with security reviews and checklists. These security SMEs (subject matter experts) need to ensure that all security requirements are captured unambiguously, security mechanisms are specified and implemented, security verification test plans are complete and reviewed, and security verification is measured and signed-off.
An engineering team or organisation needs to accumulate and curate security knowledge and learning, for the benefit of all product developments with security requirements. A good starting point for this is the CWE (Common Weaknesses Enumeration) database, also hosted by mitre.org, which catalogues common weaknesses around concepts that are frequently used or encountered in both software and hardware design. As an example, and referring back to our speculative execution examples, CWE-1303 states that “Hardware structures shared across execution contexts (e.g., caches and branch predictors) can violate the expected architecture isolation between contexts.”
Start with Requirements, always!
These days we are familiar with the practice of requirements capture and requirements tracking, especially if already working in the domain of functional safety. Established standards such as ISO 26262 for automotive for example, state clearly what is required in terms of safety requirements. Standards are similarly emerging for security. ISO 21434 for road vehicles is still under development at the time of writing.
There is also a move on the part of IP Vendors to encourage security-first thinking around design use case, system architecture and implementation, particularly in the IoT space. For example, Arm’s PSA offers this methodological approach and also encourages design certification to drive up security standards in the market place. Regardless of these methodologies and standards,
the need for best practice in security requirements capture is clear.
Security requirements should be captured using natural language, so that they are understandable and easy to scrutinise. Ideally they should also be unambiguously expressed using an appropriate specification language as either properties, assertions or rules, to facilitate mechanised and automated security verification.
The process is much the same as for any other class of requirement capture process i.e., hierarchical, starting with the products high level security objectives, and decomposing down to specific functional security requirements for which specific security mitigations will be specified and implemented. At every stage of this, you need to engage your security SMEs, to ensure that this essentially iterative process is rigorous and scrutinised.
This process of reasoning about security requirements is often referred to as “threat modelling”. Threat modelling is a process of identifying security requirements, reasoning about potential threats and vulnerabilities, quantifying by risk and severity, and prioritizing mitigations or countermeasures.
As we said earlier, security verification needs some different approaches. Security requirements can lead to hardware security functionality that must be functionally verified. This functionality can be verified using traditional verification tools and methods, be it dynamically using stimulus, coverage and checking, or statically with properties or analysis. However, further strategies are needed for security exploration. How do you track the passage of security assets through the system? If security requirements can be expressed precisely, can you detect events where security integrity is violated? Tainting of secure data assets can be used to track the flow of assets through the design and automate the checking that assets are not leaked directly, or that traces of assets left behind in micro-architectural hardware structures cannot be leaked by some cunning side-channel attack mechanism.
This is quite different from regular functional verification.
You are still concerned about the verification of specified security behaviours, but you are also concerned with how security behaves under irregular or unexpected operating conditions. This is sometimes referred to as negative testing or fuzz testing. The aim being to check how bad stimulus is taken care of and that under these conditions, secure assets cannot be leaked or corrupted nor availability of the system impaired. Regular programmatic stimulus might not achieve this level of coverage. You may need to think about different strategies to inject security stress or disruption into existing verification payloads. You may need to develop new test sequences that act like attacks such as the cache timing side channel attack for example. You may need to engage professional security penetration testers who will apply their deep knowledge of cybersecurity to explore more exotic attack scenarios to understand what attacks are practical and achievable.
As we have often said in earlier Quest for Bugs articles,
you need a plan!
Test planning is the starting place to capture the high-level strategy for security verification and the low-level details of features, testcases and testbenches that need to be written. Also, as we have said previously, security test planning is an iterative process of capture, review, refine and repeat. In this instance you do this from the lens of cybersecurity, reviewing the vulnerabilities, attack threats and security mitigation features with the support of your security SMEs.
Finally, Security Sign-off
At the end of the day, you need to sign-off your security verification against your security requirements. How do you achieve this? This is no different from the methodical approach taken for other aspects of functional verification. You start with requirements, develop design specs and test plans, execute verification and sign-off the results against the original requirements and the test plans. When it comes to security verification, sign-off requires that you can demonstrate that all security requirements have been adequately verified, but also that a process of threat modelling has identified all conceivable threats and that there is evidence of analysis and verification to mitigate against each threat. No matter if you are developing an IP Core, that will be integrated into a larger ASIC system, or developing a full ASIC, there is always a downstream integrator or user of the hardware that needs to be informed about the security integrity of the delivered product.
Some security risks will remain for the downstream integrator. Let’s call them residual risks.
These are the risks that hardware integrators or software developers need to be aware of and take appropriate mitigations for at the system level to avoid inadvertently introducing a system level security vulnerability. Security sign-off needs to also identify and document these residual risks.
Security verification is another important dimension of the multi-dimensional verification problem and the everlasting quest to find bugs. Although all the normal rules of verification apply, and all of the afore mentioned Dilemmas of Hardware Verification apply, it is an area that requires some special attention and some lateral thinking when it comes to threat modelling and security verification. You may be very confident that your design is functionally robust and performant, but is it secure? Check carefully. Security matters!