Deja Vu

· 3 min read
![If Inception took place outside my office/cube](/img/vmware-inception.png)

Every six months or so, I find myself having the same discussion with someone in the InfoSec community:

Someone: _____ is crazy, they want to run a sensitive security app on a virtual machine. We have to have bare metal for our security apps!

Me: Why?

Someone: Duh! It's theoretically possible to jump between guests. If someone has a guest adjacent to ours, they might get into our security app.

Me: so... I need to go to the CIO and start getting all of our critical client data onto bare metal, right?

Someone: well, no, I wouldn't go that far. We just shouldn't put our critical security apps on VMs, that's all.

Me: ....

This scene plays out all too regularly, and always ends up in a discussion on the number of non-theoretical attacks against virtualization, the size of the remainder of the attack surface (OS, middleware, apps, etc.), the type of adversary who would use a 0-day VM exploit on our organization, and inevitably leading to a more rational discussion about server sizing, support options, and the other non-religious factors that typically go into an operational deployment discussion in a large enterprise.

All of this made recent research by Dr. Ari Juels and a team of co-conspirators from academia so interesting to me. Finally, the headlines screamed, scientists are proving all those hypotheticals and we will have a real opportunity to discuss the risks of virtualization on an even playing field, based on research from RSA's Chief Scientist. Alas, the details have turned out to be less exciting. From Dr. Juels' post:

the attack results in complete compromise of one form of encryption in GnuPG. As demonstrated, the attack is fairly narrow: It targets one vulnerable application in a particular class of virtualized environment. (GnuPG relies on a cryptographic package called libgcrypt that lacks well-established side-channel countermeasures.) It’s also fairly involved, requiring heavyweight use of machine learning, among other things.

Reading the original paper is illuminating as well. In it, the authors give credit to the crypto community for anticipating these attacks:
Recent versions of some cryptographic libraries attempt to prevent the most egregious side-channels; e.g., one can use the Montgomery ladder algorithm for exponentiation or even a branchless algorithm. But these algorithms are slower than leakier ones, legacy code is still in wide use (as exhibited by the case of libgcrypt), and proving that implementations are side-channel free remains beyond the scope of modern techniques.

[...] Future Xen releases already have plans to modify the way interrupts are handled, allowing a VCPU to preempt another VCPU only when the latter has been running for a certain amount of time (default being 1ms). This will reduce our side-channel’s measurement granularity, but not eliminate the side-channel.

It's worth noting that there is an important nuance at play in these discussions. Most often, we debate hypothetical attacks on systems when it comes time to apply a patch: vendor X releases a patch with some corresponding information about not having seen attacks in the wild. This is a very different brand of hypothetical than the discussion above. For years now, the information security community has seen that attackers are reverse-engineering exploits from vendor patches themselves - and that's relevant even if you believe that the vendor hasn't seen evidence of the attack in the wild.

In the case of any security vulnerability, the questions to ask are: who's able to exploit this (the threat) and what remediation options do I have (the countermeasure)? In most cases, if reconfiguration or other low-impact hardening (disabling unnecessary services, enabling built-in security features of available software, etc.) is an available option, that is the preferred option over patching, which is high-risk and high-impact. My immediate reaction after reading through the details on Dr. Juels' & Co.'s research: breathe a sigh of relief that I'm not running an old version of libgcrypt, update my lockdown playbook for Xen to include checking configurable values for interrupt handlers, and go back to worrying about the thing I was worried about before reading these articles.

In this case as in the many others in which the attacks-on-hypervisors argument is evaluated, we come back to our basic threat model: are we going to spend considerable capital protecting against an attacker with a exploit that is still entirely theoretical? Or can we invest those security dollars locking down our systems (i.e. selecting available hypervisors and algorithms that have been designed and proven attack-resistant) and more effectively defending against or detecting attacks from threats with a greater opportunity to exploit them: authenticated users, network neighbors, privileged administrators?

P.S. Private to BW: thank you for the good back-and-forth on this topic last week.