Digital Signatures, Certificates, TLS, and Side Channel Attacks
We often physically sign documents. Digital signatures are the electronic analog of that. They are a way for someone to verify a document. They are done using public key cryptography.
Let's say Alice wants to digitally sign a document so that everyone knows that it was she that signed it and not someone else. She creates a public/private key pair. She signs the document by encrypting a hash of the document with her private key. People can then verify the signature by decrypting it with the public key and comparing it with the hash of the document. Because the public and private keys are tied together, only Alice's private key could have produced something that would decrypt back into the actual hash. So that's how we know the document was signed by Alice.
For instance, RSA is often used for this. Suppose Alice uses the public key (n, e) and private key d. She hashes the document to create a hash H. She then does Hd mod n, which is the digital signature D. People then verify this by doing De mod n and making sure they get H.
In theory, we could skip the hash and just encrypt the document itself, but public-key cryptographic methods are slow and are best used with small amounts of info. RSA was for years the main digital signature algorithm, and its still widely used for certificates, but another algorithm, ECDSA (elliptic curve digital signature algorithm) is taking its place in newer applications.
Certificates
The last piece of modern cryptography concerns authentication. When we see a public key belonging to Alice, how do we know it belongs to Alice and not to someone impersonating her? This is the weak point of modern cryptography. Everything up to this point involves some pretty solid mathematics, but this part relies on humans, who are sometimes weak and lazy.
When Alice first generates a public/private key pair, she goes to an entity called a certificate authority (CA). The CA is charged with verifying that Alice is who she claims to be, possibly by checking her documentation. They then certify that Alice's public key is valid by digitally signing it with their own private key. This is called a certificate. From there, we would then have to wonder if that CA really is who they say they are. So that CA itself has to be verified, possibly by another higher-level CA. This process could go on and on, but it stops eventually at root certificates. These are built into browsers and operating systems.
The weak point of the system is the document verification process. Not all CAs are careful about this, and even if they are, a smart attacker could trick them. In fact, certificates are the main weak point of modern cryptography. Everything else we have looked at is build on some pretty solid mathematics, but certificates rely on humans to do the verification.
Further, the certificate system ends at the root certificates built into browsers and operating systems, so you also have to trust your browser and OS, but you also have to trust them on a lot of other things anyway. An interesting instance of this was the Lenovo Superfish scandal. Lenovo wanted a way to inject ads into HTTPs traffic. Because of certificates, they couldn't do that, so they had a special root certificate installed on some of the computers they made in 2014-5 which allowed them to man-in-the-middle HTTPs traffic to inject adds. However, the private key, which was the same on all the computers, was protected by a weak password. Once attackers had that password, they could man-in-the-middle connections as well.
Most browsers allow you to see a site's certificate. Often this works by clicking the icon next to the URL bar. Your browser does the job of verifying the certificates for the sites you visit. Occasionally, you'll get a certificate warning if something is off. A lot of times, it just comes from a site that misconfigured something with their certificate, but it could also come from someone trying to hijack a connection. Most users, unfortunately, will blindly click through the warning message. You may occasionally see self-signed certificates where the site's private key was not signed by a CA. Often this is okay, but it's something to be aware of.
TLS
The protocol for downloading web pages, HTTP, is unencrypted. User names, passwords, credit numbers, and whatever else are all visible to any network traffic. In the mid 1990s, a technology called SSL (Secure Sockets Layer) was developed to add security to HTTP. SSL went through a few versions and was eventually renamed TLS (Transport Layer Security). People still use terms SSL and TLS interchangeably. The current version of TLS is 1.3, though both versions 1.2 and 1.3 are in common use.
TLS uses much of the cryptography we've covered. It starts with a handshake. The client sends a Hello message that contains, among other things, the ciphersuites it supports. This includes what key exchange, encryption, and message authentication types it can handle. The server will look at that list and choose the strongest one that it also supports. Early on in the handshake, the server also presents its certificate, which the client verifies. Then a key exchange happens, usually using Diffie-Hellman or the elliptic curve variation on it. After that, data transfer can begin. The command-line utility curl
can be used to show the handshake when used with the -v
option. Here is an example:
* TLSv1.2 (OUT), TLS handshake, Client hello (1): * TLSv1.2 (IN), TLS handshake, Server hello (2): * TLSv1.2 (IN), TLS handshake, Certificate (11): * TLSv1.2 (IN), TLS handshake, Server key exchange (12): * TLSv1.2 (IN), TLS handshake, Server finished (14): * TLSv1.2 (OUT), TLS handshake, Client key exchange (16): * TLSv1.2 (OUT), TLS change cipher, Client hello (1): * TLSv1.2 (OUT), TLS handshake, Finished (20): * TLSv1.2 (IN), TLS handshake, Finished (20): * SSL connection using TLSv1.2 / ECDHE-RSA-CHACHA20-POLY1305
Below are a few of the couple dozen ciphersuites that my web browser supports. The first two are the strongest ones, and the last one is the weakest one.
TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 TLS_RSA_WITH_3DES_EDE_CBC_SHA
One type of attack on TLS is called a downgrade attack. This is where an attacker man-in-the-middles a TLS handshake and convinces both sides to use the weakest ciphersuite they both support. The attacker then uses some known attacks to break the encryption. For instance, 3DES in CBC mode is subject to something called a padding oracle attack.
There have been several other attacks on SSL/TLS over the years. The worst was called Heartbleed. It involved a buffer overflow exploit in code from the openSSL library that allowed attackers to read a server's memory to extract key information. It's estimated that around 17% of the HTTPs servers were vulnerable to the attack by the time it was found in 2014.
TLS is used to secure HTTP traffic. When HTTP is run over TLS, it is known as HTTPs (the “s” is for “secure”). TLS is also used to secure email protocols like SMTP. People are also starting to use HTTPs to secure DNS traffic.
Side-channel attacks
The encryption algorithms we have covered (DES, AES, RSA) are all mathematically pretty sound. The weakness is the programmers who implement the algorithms in code. If a programmer is not careful, they can introduce ways for an attacker to figure out a secret key without breaking the algorithm itself. Attacks of this sort, that rely on information obtained from observing the algorithm running, are called side-channel attacks. Some of them are especially devious.
To give a sense for how these work, consider a simple algorithm that tests if an input string matches a secret string. Here is pseudocode for what that string comparison actually does.
if length(input) != length(secret): return false else: Loop over the characters of the input: if current character of input != current character of secret: return false return true
If the two strings have a different length, then it will immediately return false. If they are the same length, then it goes through comparing character-by-character. So for strings of different length, the string comparison will be quicker than if they are of the same length. An attacker could try inputs of varying length until they find one for which the comparison takes noticeably longer. That will tell them the length. Then they start guessing starting characters A, B, C, or whatever until they get a comparison that takes longer. That indicates that they have the first character right (because then the program has to take time to check the second character, whereas it returns false right away if the first characters don't match). They continue this way, guessing the second, third, etc. letters, until they have the whole string. The side channel here is the time it takes the string comparison algorithm to perform its job. To fix the problem, some code has to be added to make sure that all comparisons take the same amount of time.
This type of timing side channel can be used to break strong algorithms, like AES and RSA. For instance, in RSA, when we compute Cd mod n, the algorithm that does the raising to a power does its work based on the binary structure of the secret key d, doing more work for 1s and less work for 0s. Timing analysis can be used to figure out the key from there.
Also, since the CPU is working harder on those 1s than on the 0s, its power output varies. People can attach some sensors to measure the power output of the CPU and use that to figure out the key. An especially wild variation of this is acoustic cryptanalysis, which uses the high-pitched sounds the CPU puts out to figure out when it is working harder than other times. These attacks are not theoretical; they have actually been implemented.
Another interesting attack is AES cache timing. Here the term cache refers to an area of fast memory on the CPU. Memory accesses to things in the cache take less time than accesses to things stored in RAM. If one process does an AES encryption, after it's done, the cache will contain some values related to the encryption. Programs can't read the cache directly, but they can use the fact that cache accesses take less time than regular ones. This is called, probing, where the program will try out certain values and note how long it takes to access them. With enough of these probes, it can make a reasonable guess as to some things that are in the cache and figure out the key from there. One place this can happen is on a virtual private server (VPS). Often a single cloud computer hosts multiple different websites. A process running on behalf of one of those sites can use cache probing to figure out AES keys based on information left in the cache by other sites.
This is just a small sampling of side-channel attacks. There are many others based on things such as keystroke timing and electromagnetic radiation leaks from monitors.