A summary of material covered during the lectures, following the organisation of the textbook (Introduction to Modern Cryptography, by Jonathan Katz and Yehuda Lindell). Since the exam is open book (all documentation is permitted), this page is intended to serve as a supplement to the textbook. $\DeclareMathOperator{\K}{\mathcal K} \DeclareMathOperator{\C}{\mathcal C} \DeclareMathOperator{\M}{\mathcal M}$
1Introduction and classical ciphers¶
1.1Cryptography and modern cryptography¶
Nothing noteworthy
1.2The setting of private-key encryption¶
There is one shared key, which is used for encryption and decryption, and which must be kept secret by its users. Also, it's assumed that the users have a way of sharing the key securely (though sometimes there's just one user, at different points in time).
- Gen
- Probabilistic algorithm that outputs a key
- Enc
- Given a key and message (plaintext), outputs ciphertext; can be probabilistic
- Dec
- Given a message and key (ciphertext), outputs corresponding plaintext; must be deterministic
(The above 3 are enough to fully specify an encryption scheme.)
- $\K$
- The set of all possible keys (finite)
- $\M$
- The set of all plaintext messages for which Enc is defined
- $\C$
- The set of all possible ciphertexts
- Correctness
- Encrypting a message then decrypting it gives you the same message back, for any message
- Kerckhoff's principle
- Only the key should be kept secret, not the algorithm (harder to keep an algorithm secret and harder to change it if it's compromised, etc)
- Ciphertext-only attack
- You only have the ciphertext and want to figure out the plaintext. The standard type of attack.
- Known-plaintext attack
- You have the plaintext for some ciphertext, and you want to figure out the plaintext for a different ciphertext.
- Chosen-plaintext attack
- You can get the ciphertext corresponding to any plaintext you want (encryption oracle)
- Chosen-ciphertext attack
- You can get the plaintext corresponding to any ciphertext you want (decryption oracle)
1.3Historical ciphers and their cryptanalysis¶
All these ciphers are broken fairly easily by known-ciphertext attacks, and are completely worthless against known-plaintext attacks.
- Caesar cipher
- Too easy. Shift 3. No key. What was Caesar thinking?
- Shift cipher
- The key is the shift. Keyspace of size 26. Can attack by trying every key, computing a value (true frequency * expected frequency for each letter and summing over all letters), and outputting the key that gives the value that's closest to the expected value (0.065).
- Brute-force/exhaustive search attack
- Try every key until you find one that gives a message that makes sense. Any encryption scheme that is vulnerable to this is not very secure at all. If the key space is smaller than the message space, then you will never achieve true security.
- Monoalphabetic substitution
- The key is a one-to-one mapping from the letters in the alphabet for the plaintext to the corresponding letters in the ciphertext. Huge key space ($26!$), but easily broken via frequency analysis
- Vigenere cipher
- A form of polyalphabetic substitution. The key is a word of length $t$. Equivalent to using $t$ different monoalphabetic shift ciphers (periodic). The key needs to be pretty long (ideally as long as the message) to attain any semblance of security.
- Kasiski's method
- Attack for Vigenere. Look for repeated patterns of 2-3 letters, find the distances between repeats, the gcd of the distances is probably $t$ or a multiple of it
- Index of coincidence method
- Another attack for Vigenere. Easier to automate. For each possible value of $t$, look at every $t$ letter and calculate the frequency value thing (same as the one used for shift cipher). If it's right, it'll be close to 0.065; otherwise, it'll be closer to 0.038 (randomness)
1.4The basic principles of modern cryptography¶
- Need to define security in a rigourous way
- Assumptions needed to prove security should be minimal, and must be stated
- Security for a system must be proved rigourously
Definition of security: "An encryption scheme is secure if no adversary can compute any function of the plaintext from the ciphertext." (Katz and Lindell, p. 22) The adversary here is generally assumed to be probabilistic and working in polynomial-time, but the power of the adversary depends on the context (sometimes the adversary has access to an encryption or decryption oracle, for instance).
The reductionist approach: To prove the security of some scheme given an assumption, we can simply reduce the task of breaking the scheme to violating the assumption. Kind of like polytime reduction to prove NP-hardness.
1.5Exercises¶
(Interesting results encountered in the exercises.)
- 1.1
- Homework 2, part B
- 1.2
- QH homework 1
- 1.3
- If Vigenere used a monoalphabetic substitution instead of a shift cipher for its letters, then it's still possible to break, you just need a longer message (and you can't be sure). Just do frequency analysis on every $t$ letters. If the resulting letter distribution is similar to the expected one, then assign probable letter mappings. Otherwise, try another value of $t$, etc.
- 1.4
- Homework 2, part B
- If Vigenere is modified so that you break the plaintext up into blocks of size $t$ and add the block number to the ciphertext, then you can't use Kasiski's anymore. Solution: for each $t$, look at $c_{t}, c_{2t} - 1, c_{3t}-2$, etc; since these have been encrypted using the same shift, you can just sum over all the frequencies and see if it's close to 0.065. The one that's closest is probably the right value of $t$.
- 1.5
- Homework 2, part B
- With the shift cipher, CPA needs 1 plaintext character to get the entire key. For monoalphabetic: 25 different characters. With Vigenere: $t$.
- 1.6
- Homework 2, part B
2Perfectly-secret encryption¶
This stuff is still kind of "classical" as opposed to modern.
2.1Definitions and basic properties¶
- $Pr[K=k]$
- the probability that a particular key is used
- $Pr[M=m]$
- the probability that a particular plaintext message is sent (independent of the previous)
- $Pr[C=c]$
- prob that a particular ciphertext is produced (fixed by the above distributions)
- The definition of perfect secrecry
- An encryption scheme is perfectly secret if $Pr[\M=m|\C=c] = Pr[\M=m]$ for every $m \in M$, for any distribution of $\M$ and any possible $c$, limited to $m$ and $c$ with non-zero probability. That is, distributions over plaintext and ciphertext are independent. This is equivalent to $Pr[\C=c|\M=c] = Pr[\C=c]$ (by Bayes' theorem).
- Perfect indistinguishability
- The ciphertext probability distribution for any message is the same. Thus $Pr[\C=c|\M=m_0] = Pr[\C=c|\M=m_1]$
- Adversarial indistinguishability
- Experiment: adversary tries to guess which message generated a given ciphertext; perfect secrecy implies that the probability of success is 1/2
2.2The one-time pad (Vernam's cipher)¶
To encrypt a message of length $l$, create a randomly-generated bitstring of length $l$ and XOR all the bits. Perfectly secret (assuming that all messages are possible). $Pr[\C=c|\M=m] = 2^{-l}$ for any plaintext and ciphertext.
Drawbacks:
- Key must be as long as message
- Can only be used once (otherwise, if you take the XOR of the ciphertexts, you get the XOR of the messages which can be interesting)
- Only secure against ciphertext-only attack; KPA or CPA can recover the key, rendering it useless in the future
2.3Limitations of perfect secrecy¶
The keyspace must be as large as the message space to guarantee perfect secrecy. Otherwise, for any given ciphertext, there's a message that could not have produced it. (p. 33)
2.4Shannon's theorem¶
If $|\K| = |\M| = |\C|$, then there must be a single key that produces a particular ciphertext from a given message. Also, the key needs to be chosen uniformly at random. Proof: p. 38-40.
(We know that $|\C| \geq |\M|$ because otherwise multiple messages would map to a single ciphertext and decryption would not be ambiguous. So assuming that the sizes are equal is the most "efficient" assumption we can make.)
2.5Summary¶
Conclusion: perfect secrecy is possible, but the key must be as long as the message
2.6Exercises¶
- 2.1
- Bayes' theorem
- 2.2
- Not necessarily ... recall that $Pr[\M =m|\C=c] = Pr[M=m]$ is what is needed. If $Pr[\M =m] \neq Pr[M=m']$ then the probabilities given the ciphertext shouldn't be the same.
- QH homework 1
- 2.3
- Bad idea.
- QH homework 1
- 2.4
- (a) if it's just a single character, we can't use assumptions about character frequencies in English text. So the probability of any character is 1/26.
- (b) Each letter is used the same number of times?
- (c) Make the key length $n$. Then it's equivalent to the one-time pad.
- 2.5
- False ... what if the key isn't even used at all, etc
- 2.6
- Take the subset of $\C$ that corresponds to $\M'$, etc
- 2.7
- QH homework 1
- 2.8
- QH homework 2
- 2.9
- Homework 2, part B
- 2.10
- Homework 2, part B
- 2.11
- QH homework 1
- 2.12
- QH homework 1
3Private-key encryption and pseudorandomness¶
If we use a short key (relative to the length of the message), we can't achieve perfect secrecy, but we can achieve computational secrecy. Perfect secrecy, aka informational-theoretic secrecy, means that the adversary just doesn't have enough information to break it, even with an infinite amount of time. Computational secrecy means that the adversary does have enough information, but it would take nearly an infinite amount of time to use it properly.
3.1A computational approach to cryptography¶
- The concrete approach
- "A scheme is $(t, \epsilon)$-secure if every adversary running for time at most $t$ succeeds in breaking the scheme with probability at most $\epsilon$" (p. 49)
- Drawbacks to this: what algorithms, computing power, etc does it assume? Not known! Also might not take into account things like Moore's law, or optimised versions of algorithms.
- The asymptotic approach
- We'll use this instead of the concrete approach
- Running time of adversary, and the adversary's success probability, are functions of a security parameter $n$ (integer), which is chosen by the honest parties
- Adversary assumed to run in probabilistic polynomial time
- Negligible probability
- When the probability function grows slower than any inverse polynomial $n^{-c}$ for some $c$. Full definition on p. 56.
- Examples: $2^{-n}$, $2^{-\sqrt{n}}$, $n^{-\log n}$
- The sum of two negl functions is still negl
- Multiplication by a polynomial preserves negligibility
The asymptotic approach's definition of security:
A scheme is secure if every PPT adversary succeeds in breaking the scheme with only negligible probability. (p. 51)
Using this approach, if faster computers are available, a larger $n$ can be chosen (by choosing a larger key size) and the adversary's running time will be even higher (without compromising the speed of the scheme for honest parties).
If the scheme is not perfectly secret (with $|\K| < |\M|$), there are several possible attacks. For a given ciphertext, try every key and see what the decrypted result is (the plaintext must be one of these results). Or, KPA with a bunch of messages and ciphertexts, can most likely figure out the key (time: linear in $|\K|$). Or, KPA, guess a key at random and check if it works for all the pairs (constant-time, probability of success is $1/|\K|$. So we need our definition of computational secrecy to basically ignore these possibilities.
Proof by reduction: $X$ is the problem that can only be solved with negligible probability in PPT. $\Pi$ is the encryption scheme, $\epsilon(n)$ is the adversary's probability of success. Show that if $\epsilon$ is not negligible, then we can solve $X$ with non-negligible probability $\epsilon(n)/p(n)$ (where the adversary for $\Pi$ is run as a kind of subroutine), contradicting the assumption. Thus $\Pi$ is computationally secure. (p. 59)
3.2Defining computationally-secure encryption¶
Defining a private-key encryption scheme: (p. 60)
- Gen
- Input: $1^n$; output, $k$, such that $|k| \geq n$ (randomly chosen)
- Enc
- Runs in time polynomial to $|k|+|m|$. Takes in a key, a message (bitstring) of some length, outputs ciphertext
- Dec
- Deterministic. Takes in key, ciphertext; outputs plaintext
Indistinguishability in the presence of an eavesdropper: almost the same as in perfect secrecy, but messages must be the same length, and we allow the probability to be 1/2 + negl. As usual, the input is a single ciphertext.
3.3Pseudorandomness¶
- No PT algorithm can tell if a string was chosen according to some fixed distribution or randomly
- Pseudorandom generator: takes a short random seed of length $n$ as input, outputs a long pseudorandom string (length $l(n)$, where $l$ is called the expansion factor)
- Full definition on p. 70
- Distinguisher: Given a string of length $l$, go through all possible seeds. If there is a seed that could generate this string, assume that it's pseudorandom (you'd only be wrong with probability $2^{n-l}$); otherwise, it's random. However, this cannot be done in probabilistic polynomial time.
- We can't prove the existence of pseudorandom functions; we believe they do, under the assumption that one-way functions exist
3.4Constructing secure encryption schemes¶
Like the one-time pad, but with a pseudorandom string and not a random one. Proof of security: if it's the same as the random string, then the proof from chapter 2 holds; otherwise, the adversary is basically distinguishing a pseudorandom string from a random one which is assumed to be impossible.
We can convert a standard PRG into a variable-length PRG.
Security for multiple encryptions: the eavesdropper experiment, but distinguishing between two sets of messages, all encrypted with a particular key. For example, the pseudorandom one-time pad is not secure under this definition. Also, any deterministic scheme is insecure under this definition - for one set, make the two messages identical, and for the other set, make the two messages different; then it's trivial to distinguish between them (p. 79).
Using stream ciphers for secure multiple encryptions:
- Synchronised mode, to prevent re-use of the same part of the stream (must maintain state between encryptions). Alice uses the first few bits, Bob uses the next few, etc. (p. 80)
- Unsynchronised: State is not maintained. The PRG (which must be an augmented pseudorandom generator) must take another input - an IV (initialisation vector) of length $n$, which basically randomly sets the initial state. It is sent in the clear. (p. 81)
3.5Security against CPA¶
CPA is a very realistic threat (see: gardening, Midway, sending a message to a server, etc).
The adversary has access to an encryption oracle, and can encrypt as many messages as desired (in PT). Obviously any CPA-secure scheme must be probabilistic, otherwise the adversary can just encrypt both messages and see which resulting ciphertext is the same as the challenge ciphertext. (p. 83)
Extending this to multiple encryption is easy. Anything that is CPA-secure is automatically secure against multiple encryptions (p. 85).
3.6Constructing CPA-secure encryption schemes¶
Pseudorandom function: maps an $n$-bit string to another $n$-bit string. Represented as a keyed function, $F_k$, which needs a key of length $k$ to select the function. The domain of such a function has size $2^n$, and there are $2^{2^n}$ functions that can be generated (p. 87). The distinguisher does not have access to the key $k$, of course, otherwise it would be trivial.
To use such functions to make a CPA-secure encryption scheme, we can't just take the output of the function with the given input, since that would be deterministic and thus clearly not CPA-secure. Instead, we generate a random value $r$, apply $F$ to it, and XOR the result with the plaintext. It's CPA-secure because the probability that $A$ has seen something with $r$ before is negligible, and if it hasn't been seen, then you can only guess. Full proof: p. 90-93.
Keyed permutation: bijective keyed function. Pseudorandom if we can't distinguish it from a random permutation. If we can't even if $D$ is given access to the inverse, then it's a strong pseudorandom permutation.
3.6.1Modes of operations for block ciphers¶
- ECB (electronic code book)
- Split the message up into blocks of length $n$, compute $F_k(m_i)$ for each block
- The simplest way probably; you just encrypt each block directly using the permutation
- Easy to parallelise
- Since this is deterministic, it's not CPA-secure
- In fact, it's not even indistinguishable in the presence of an eavesdropper (we could repeat blocks in one message, and not in the other)
- CBC (cipher-block chaining)
- Choose a random IV, set $c_0 = IV$, and let $c_i = F_k(c_{i-1} \oplus m_i)$
- Send the IV in clear, and send all the other $c_i$s too
- Cannot be parallelised
- Must be padded to a multiple of the block size
- CPA-secure
- OFB (output feedback)
- Set $r$ to a random IV, and use $F(r)$ to generate a pseudorandom stream ($F$ just has to be a function, not a permutation)
- Considered block-wise, we let $r_0 = IV$, $r_i = F(r_{i-1})$, then $c_i = r_i \oplus m_i$
- Output: the IV, plus all the $c$s
- To decrypt, compute the $r$s, then XOR with the $c$'s
- Can be quick if you compute the $r$'s and whatever beforehand
- CTF (counter, the randomised variant)
- $r_0 = IV$, $r_i = F_k(IV + i)$
- Other than that, it's the same as OFB (again, $F$ doesn't need to be a permutation)
- Supports $O(1)$ random access for blocks
- Can be parallelised
The purpose of the IV is to ensure that $F$ is always (with high probabiity) evaluate on a new input.
The choice of block length matters! The probability of success for an adversary is $\displaystyle \frac{1}{2} + \frac{q^2}{2^{n-1}}$ where $n$ is the block length.
3.7Security against CCA¶
In CCA, we assume the adversary has access to encryption and decryption oracles (except on the challenge ciphertext itself, obviously).
Everything we've learned so far is vulnerable to this type of attack. To break the one-time pad variant that uses an IV, set $m_0 = 0^n$ and $m_1 = 1^n$, get the ciphertext, flip the first bit, send that to be decrypted. The result will be either $0111\ldots$ (so $m_1$) or $1000\ldots$ (so $m_0$).
3.8Exercises¶
- 3.1
- Working with negligible functions. Not very interesting.
- 3.2
- It takes $2^{n^{1/3} (\log n)^{2/3}}$ clock cycles to factor an $n$-bit integer. In 100 years. at $4\times 10^9$ clock cycles per second, we have $4*10^9 * 60 * 60 * 24 * 365 * 100 = 12614400000000000000$ clock cycles. The log of this (base 2) is approximately 63.45. Then, $n^{1/3} (\log n)^{2/3} \geq 63.45$. Cubing both sides, we get that $n (\log n)^2 \geq 255464$ and if we try a bunch of values for $n$ we get that $n = 355$ is the size of numbers that can't be factored for 100 years.
- 3.3
- QH homework 2
- 3.4
- QH homework 2
- 3.5
- Not worth it
- 3.6
- QH homework 2
- 3.7
- What about the one-time-pad with the random IV thing? Mentioned in a previous section.
- 3.8
- Use the random function ...
- 3.9
- To make it variable-length, what about taking $n$ bits from the output and using them as the next input, concatenating the result, etc
- 2010 Homework 3
- 3.10
- Homework 3
- 3.11
- You would have to check a substantial proportion of outputs, which can't be done in polynomial time?
- 3.12
- Remove the negligible part. This is impossible because there is a certain number of messages, let's say $k$, which can correspond to the ciphertext, and we can just try encrypting them all. Now, if $k$ is too large (larger than half of $|\M|$, for instance) we can just try the other messages instead. In any case, this can probably be done in polynomial time.
- 3.13
- QH homework 4 (called "Additional problem 1")
- 3.14
- To decrypt, compute $F^{-1}_k(c)$ and ignore the first $n/2$ bits. To prove that it's CPA-secure, show that if it weren't, it could be used as a distinguisher or something. Advantage: less stuff to send? Not sure.
- QH homework 5 (called "Additional problem 2")
- 3.15
- Homework 3
- 3.16
- Homework 3
- 3.17
- 3.18
- 3.19
- 3.20
- QH homework 4 (called "Additional problem 3")
- 3.21
- QH homework 4 (called 3.18)
- Homework 3
- 3.22
- QH homework 4 (called 3.19)
4MACs and collision-resistant hash functions¶
4.1Secure communication and message integrity¶
We want to be able to ensure the integrity of messages. Secrecy is not important.
4.2Encryption vs message authentication¶
Stream ciphers provide no integrity - just flip a bit in the ciphertext and you flip the same bit in the plaintext.
For block ciphers, it depends on the mode. CTR and OFB are XOR, just like stream ciphers. With ECB, you can switch orders, or change blocks unpredictably. With CBC, flip a bit in the IV to flip the same bit in the first block (only).
Also, with all of these schemes that we've learned so far, any possible ciphertext corresponds to some message, so it's easy to spoof.
4.3Message authentication codes - definitions¶
A message authentication code is meant to allow tampering detection.
- Gen
- Input $1^n$, output $k$ where $|k| \geq n$
- Mac
- Input $k$ and $m$, output tag $t$ (probabilistic)
- Vrfy
- Input $k$, $m$ and $t$; output 1 if $t$ is valid and 0 otherwise
- Correctness
- Vrfy always works on a tag that Mac generated
The notion of security here is existential unforgeability under an adaptive chosen message attack. $A$ can request tags for any message. A break occurs when $A$ can produce a message, along with its tag, which she has not submitted to the oracle; the probability of this happening must be negligible.
Replay attacks: Mac can't take care of this. Can be dealt with using timestamps or sequence numbers or whatever, but this should be dealt with by the application. More on p. 117.
4.4Constructing secure MACs¶
Construction using pseudorandom functions: set the tag to be $F_k(m)$, for fixed-length messages. This is existentially unforgeable since if $F$ were random it would be, and if it weren't, we would have a distinguisher etc. However, if we wanted to extend this to a variable-length scheme, it might not be secure, depending on how we do it. If we XOR all the blocks and authenticate that, then it's obviously not secure because someone can just flip an even number of the same bits for each block. Alternatively, if we authenticate each block separately and output each of the tags, then this is also insecure because we can just move blocks around and shit. Finally, we could authenticate each block along with some sort of sequence number, but then we could just drop blocks at the end or mix and match blocks from different messages.
The correct way to do it is to split a message up into blocks of length $n/4$, compute random IV $r$, and compute $Mac(r\|l\|i\|m_i)$ for each block $i$ (so the random IV, the length of the message, the index, the message block). Then, output $(r, t_1, \ldots, t_d)$ (the IV plus the tags for all of the blocks). Note that the message can't be longer than $2^{n/4}$, and can be padded to be exactly $n/4$. This is exist. unforge.; proof on p. 122-124. However, this is very inefficient; a more efficient scheme is CBC-MAC.
4.5CBC-MAC¶
Only works for fixed-length message. Split the message up into $l$ blocks of length $n$ each, Let $t_0 = 0^n$, and let $t_i = F_k(t_{i-1} \oplus m_i)$. Return $t_l$ (so only the last tag). This is better than the previous fixed-length MAC because this one can handle longer message.
Note that the IV is fixed, unlike with CBC encryption. This is important! If the IV were random instead of $0^n$, it would not be secure.
To extend this to variable-length messages, there are 3 techniques:
- Apply $F$ to the length to get a key of length $n$, then do regular CBC-MAC (this ensures that different keys are used to authenticate messages of different length ... otherwise you could just add like one bit)
- Prepend the message's length to the message (appending doesn't work)
- Generate 2 different keys, $k$ and $k'$. Let $t = Mac_k(m)$; output $F_{k'}(t)$.
4.6Collision-resistant hash functions¶
A hash function has infinite domain and finite range (in our study). Collisions are of course present, but in a collision-resistant hash function, they should be rare. Characterised by two functions:
- Gen
- Input $1^n$, output $s$ (not necessarily chosen uniformly at random)
- H
- The hash function. Input: $x$ of variable length; output: $H^s(x)$ of length $l(n)$ (so it depends on the security parameter), where $H^s$ is kind of like a keyed function (though $H$ might not be defined for all $s$).
For the fixed-length case, we just need that the input length is greater than the output length. Otherwise, it doesn't really compress anything so what's the point?
Collision-finding experiment: to define collision-resistance. $A$ creates 2 strings, and wins if their hashes are identical. Note that $A$ is allowed to use the hash function $H^s$ as many times as $A$ likes. The probability of $A$ winning should be negl. (p. 129)
Some weaker definitions (in order of increasing weakness)
- Second pre-image resistance: given $s$ and $x$, find $x'$ that collides with $x$
- Pre-image resistance: hard to invert a hash (so it's a one-way function)
4.6.1Collision-finding attacks¶
Birthday attack: compute $q$ messages of length $2l$, where $l$ is the length of a hash. If $q > 2^l$ then obviously we have a collision, by the pigeonhole principle. If computing $H$ is $O(1)$, then to resist this kind of attack for time $T$ , the length must be $2\log T$ bits. A drawback to this method is that a lot of memory is needed. Like, if the hashes are 64 bits, then we need billions of gigabytes just to store $2^l$ hashes.
There is another approach which uses a constant amount of memory. Pick a random $x_0$, compute $x_i = H(x_{i-1})$ and $H_{2i} = H(H(x_{2(i-1)}))$. If $x_i$ and $x_{2i}$ are ever the same, then we most likely have a collision since then $x_{i-1}$ and $H_{x_{2(i-1)}}$ will be the same (unless $x_i$ and $x_{2i}$ are the same but that is unlikely). We only need to store $x_i$ and $x_{2i}$, so the memory footprint is small; furthermore, this gives a collision with probability $1/2$ in time polynomial to $2^{l/2}$.
However, both approaches have the problem that we don't have control over the colliding message we find. If we want to create a certain type of message (say, a generic recommendation letter) then we can generate a bunch of possibilities that would work, and look for a collision with a message whose tag we have.
4.6.2The Merkle-Damgard transform¶
Used for converting a fixed-length hash function to a variable length one. If it was collision resistant before, it will still be.
- Gen
- Same as for the fixed-length hash
- H
- Given $s$, $x$ of length $L < 2^{l(n)}$, split up $x$ into blocks (pad with 0's if necessary) and make a new block, containing $L$ (encoded using $l$ bits). Use an IV $z_0$. Compute $z_i = h^2(z_{i-1}\|x_i)$ where $h$ is the fixed-length hash function. At the end, output the last $z$.
The IV can be replaced by any constant. The proof of the collision-resistance is on p. 135.
4.7NMAC and HMAC¶
Not covered
4.8Constructing CCA-secure encryption schemes¶
Two keys: one for encryption, one for MAC. You encrypt and tag, recipient verifies tag before decrypting. Full def on p. 144.
Now, if the encryption scheme is CPA-secure and the MAC is secure with unique tags, then the hybrid is CCA-secure. The proof is long (p. 145-147).
Why did we need unique tags? Otherwise, A could modify $t$ to $t'$ so that we still get a valid tag, then just send $(c, t')$ to the oracle (which is allowed). Note that this makes MAC functionally deterministic.
4.9Obtaining privacy and message authentication¶
- Encrypt-and-authenticate
- send both the tag and the ciphertext in the clear
- Can be insecure (see p. 152)
- Authenticate-then-encrypt
- send the tag encrypted
- Also can be insecure (see p. 152)
- Encrypt-then-authenticate
- Authenticate the encrypted message
- Secure!
4.9.1Message transmission scheme¶
- Gen
- Outputs 2 keys, $(k_1, k_2)$
- EncMac
- Input: keys, $m$; output: $c$, derived by calling some combination of Enc and MAC.
- Decryption
- Result is either $m$ or "INVALID"
Correctness holds. Note that this is also a private-key encryption scheme, but it gives us integrity in addition to privacy so we call it a MTS.
Authenticated communication: in the experiment, A is given oracle access to EncMac, and has to try to output a ciphertext that has some valid decryption. We need the probability of this happening to be negl.
Security is achieved if it's CCA-secure and achieves authenticated communication.
Gotta use different keys for encryption and tagging.
4.10Exercises¶
- 4.1
- 4.2
- QH homework 5
- 4.3
- QH homework 5
- 4.4
- Homework 4
- 4.5
- 4.5
- 4.7
- Homework 4
- 4.8
- Homework 4
- 4.9
- Homework 4
Maybe look at some of the rest later
5Block ciphers¶
Recall that a block cipher is a keyed permutation (takes in a key of length $n$ and an input of length $l$; returns a bitstring of length $l$). $n$ = key length, $l$ = block length. These are set as fixed constants (and can be different). Block ciphers are strong pseudorandom permutations, NOT encryption schemes in themselves. With block ciphers, since we don't use a security parameter $n$ (though the $n$ and $l$ can sort of be thought to be security parameters), we deal with asymptotic and not concrete security.'
Any pseudorandom permutation is secure against CPA. A strong pseudorandom permutation is secure against CCA.
5.1Substitution-permutation networks¶
- Goal
- Construct a concise function (can be described with few bits) that behaves like a random one.
- Confusion-diffusion
- confusion is when you take a block, and permute it to make it look completely different. Diffusion is when you somehow reorder the bits in the output. Also, confusion/diffusion steps - together called a round - are repeated several times. You can use different functions in each round (like, making them keyed), or fix them, or whatever.
- Substitution-permutation network
- basically confusion/diffusion but round functions are fixed. We assume all the S-boxes and mixing permutations are known. Only the key is secret.
- Master key
- the key to each block cipher (the same)
- Key schedule
- sub-keys that are XORed with intermediate results from each round (derived from master key)
- Invertibility of S-boxes
- if they are not invertible (not bijective), you can't decrypt
- Avalanche effect
- small changes in input have large changes in output.
- Guaranteed as long as two properties hold for S-boxes: changing 1 bit of input leads to changing 2 bits of output; also, output bits of any given S-box are spread into different S-boxes
- Randomly-chosen S-boxes
- These are actually a bad idea (top of p. 168)
These are pretty secure, but it depends on the number of rounds. Attacking a single-round network is easy; somewhat easy with 2-round (p. 169). 3-round is hard (p. 170)
5.2Feistel networks¶
Similar to sub-perm, except the high-level design is different, and S-boxes don't have to be invertible. Obviously the function as a whole is invertible, though.
For each round, the input is split up into the left half and the right half. The round function (not necessarily invertible) acts on the right half, taking a bitstring of length $n/2$ and outputting one of the same length. Then, let the new right input (for the next round) be the previous left input XORed with the output of the round function, and let the new left input be the previous right input.
Mangler function: public, takes a subkey as input (or the master key if that's fixed)
To decrypt, see p. 172.
5.3DES¶
Secure except for its short bit length.
16-round Feistel network, block length of 64, key length of 56. Each round function takes/outputs 32 bits. The key schedule is fixed and public, only the key is private. See p. 173 for more details.
Mangler function: it, along with the $i$th subkey, determine the $i$th round function. This is basically a 1-round sub/perm network. Given a 32-bit string, it's somehow expanded to 48 (duplicating half the bits), then we XOR that with the key and split this up into 8 blocks.
8 S-boxes: well-studied. All are permutations. Each maps a 6-bit string to a 4-bit string (4-1 function). Changing one input changes at least two bits of output.
To attack single-round DES, see p. 177. Same for two-round, three-round.
5.4The security of DES¶
Best known attack: brute force search. Can be done in mere days, or less. So the only real flaws are the short key length and, somewhat, its short block length.
5.5Increasing the key length of a block cipher¶
Instead of modifying the internal structure of DES (we don't want to break it etc), we treat it as a black box. One solution to increase the key length to 112 is double encryption. Basically you choose two keys, encrypt message with the first key, and encrypt the ciphertext with the second key. Then concat the keys and that's your 112-bit key. But this can be broken in $2^n$ time (where $n=56$), so this is no better (p. 183).
Triple encryption: actually works. Either 3 diff keys, or 2 keys and make the third key depend on some formula (p. 184).
5.6AES¶
Block length of 128 bits, can use keys of length 128, 192, 256. In each round, we derive a subkey of 128 bits from the master key, and keep it in a 4x4 array of bytes, XORed with the state. See p. 186 for more.
5.7Differential and linear cryptanalysis¶
Skipped
5.8Exercises¶
- 5.10
- Homework 3
- 5.14
- Homework 3
6Theoretical constructions of pseudorandom objects¶
SKIPPED
7Number theory¶
- Discrete logarithms
- hard. hardest for prime order groups
- Diffie-Hellman
- Differentiate between $(g^a, g^b, g^{ab})$ and $(g^a, g^b, g^c)$ where $a, b, c$ are randomly chosen
- also hard, and hardest for prime-order groups (p. 279)
7.1Exercises¶
- 7.1
- Suppose $e$ and $e'$ are both identity elements. Then $e = ee' = e'$ so lol yeah. To show that inverses are unique, let $g$ be some random element with inverses $h$ and $h'$. Then $gh = 1$ and $gh' = 1$. Thus $gh = gh'$. Multiply by $h$ and you get $(gh)h = (gh)h'$ so $h = h'$.
- 7.5
- See HTSEFP: Winter 2010, question 3
- 7.6
- See HTSEFP: Winter 2009, question 3
8Factoring and computing discrete logarithms¶
SKIPPED
may be interesting
9Private-key management and the public-key revolution¶
9.1Limitations of private-key cryptography¶
Have to store a lot of keys, have to deal with their secure distribution, etc
9.2A partial solution¶
Key distribution centers (p. 318), for session keys. Problematic.
9.3The public-key revolution¶
Asymmetry: an encryption key, and a decryption key. Uses one-way functions.
9.4Diffie-Hellman key exchange¶
We define a rather weak notion of security (stronger definitions are outside the scope of the book). The key-exchange experiment goes like this:
- Alice and Bob execute $\Pi$, both get a copy of the key $k$.
- Eve is gets to see a transcript of all the messages sent between Alice and Bob.
- Eve is then given two possible keys: one that is $k$, and one that is chosen at random.
- Eve tries to guess which key is the real key.
The key exchange scheme is secure in the presence of an eavesdropper if the probability of Eve winning is 1/2 + negl(n).
The actual key exchange scheme that Diffie and Hellman proposed goes like this:
- Alice and Bob both get $1^n$ (security parameter).
- Alice runs $\mathcal G$ (PPT algo that outputs a cyclic group, its order of size $n$, and a generator), gets $(\mathbb G, q, g)$.
- Alice chooses $x$ randomly from $\mathbb Z_q$, computes $h_1 = g^x$, sends all this shit to Bob
- Bob receives $(\mathbb G, q, g, h_1)$, chooses $y$ randomly from $\mathbb Z_q$, computes $h_2 = g^y$, sends $h_2$ to Alice, and saves the key $k = h_1^y$
- Alice saves the key $k = h_2^x$ (the same as Bob's)
Obviously the discrete logarithm problem must be hard. But the decisional Diffie-Hellman (DDH) problem must be hard too, otherwise it would be possible to distinguish between $g^{xy}$ and some random element, meaning that Eve would win. We can in fact prove that this key exchange protocol is secure in the presence of an eavesdropper (using the slightly modified assumption that the other possible key is a random group element, not just a random string) - see p.328-329.
Unfortunately, DH is completely insecure against man-in-the-middle attacks.
9.5Exercises¶
- 9.1 - interactive protocol for key exchange
- Eve chooses two messages, $m_0$ and $m_1$. Alice and Bob generate their key, and Eve can see any messages sent between them. Then Alice sends either the encryption of $m_1$ or of $m_0$, and Eve has to guess which it was.
- 9.2 - explain vulnerability to a man-in-the-middle attack
- Eve intercepts Alice's shit, chooses some random $a$ and substitutes $g^{a}$ for $g^{x}$ in what Alice has sent and sends that to Bob. Bob computes $g^y$ and sends that to Alice, then saves the key $k_1 = g^{ay}$. Eve intercepts Bob's shit, and sends $g^{a}$ to Alice (pretending to be Bob). Then Alice saves the key $k_2 = g^{ax}$. Now, since Eve has both $g^{x}$ and $g^{y}$ (and chose $a$), she can easily compute $k_1$ and $k_2$. So she can intercept all of Alice's messages, decrypt them, encrypt a new message (if she likes) using Bob's keys, and send that instead, etc.
- 9.3
- Homework 4
10Public-key encryption¶
10.1Overview¶
Advantages over private-key encryption:
- Solves the key distribution problem
- Single key for multiple users
Disadvantages: slower. Also, we still need to ensure that Alice can securely send her public key to Bob without it getting altered by Eve along the way, but we will ignore that in this chapter.
10.2Definitions¶
- Gen
- Input: $1^n$. Output: $(pk, sk)$ (public, private); both have length at least $n$ ($n$ can be determined from the keys)
- Enc
- Given $pk$ and $m$, outputs some ciphertext probabilistically
- Dec
- Given $sk$ and $c$, outputs either the plaintext or "INVALID"
Note that correctness is only required with 1 minus some negligible probability. Which is weird.
CPA-security: We create the standard eavesdropping experiment, except A has the public key which means access to an encryption oracle. So indistinguishable encryptions in the presence of an eavesdropper implies CPA security.
Can never be perfectly secret - with enough time, we could try all possible values of $m$, etc.
For multiple encryptions, see p. 341. Indistinguishable implies, secure under multiple.
For arbitrary-length messages, see p. 346.
10.3Hybrid encryption¶
Use public key and private key methods together. Choose a key, encrypt it using Alice's public key, Alice can decrypt so now you have the same key, which you can use for private key encryption schemes. Useful when $|m| > n$. Usually more efficient. Security analysis is done from p. 350-355.
10.4RSA encryption¶
Textbook RSA is deterministic, thus not CPA secure. See p. 356. NO TIME.
10.5El Gamal¶
Relies on hardness of DDH.
- Gen
- Input $1^n$, public key is $(\mathbb G, q, g, g^x)$, private is $(\mathbb G, q, g, x)$.
- Enc
- Choose random $y$, return $(g^y, g^{xy} \cdot m)$
- Dec
- Compute $m = c_2 / c_1^x$.
The proof of CPA-security is found on p. 366-367.
10.6Security against CCA¶
- Malleable
- For example, given $m$, create $2m$. CCA-secure schemes are never malleable.
Definition: A has access to decryption oracle, outputs 2 messages (in the plaintext space for $pk$), is given a ciphertext $c$, and can continue to decrypt any message except $c$ itself, then tries to guess which message was used. Probability should be 1/2 + negl.
10.7Exercises¶
- 10.1 - public-key encryption is not perfectly secret
- Well, obviously, an unbounded adversary can just encrypt every single message using the public key and see which one gives the ciphertext
- 10.2 - determining the message in a small message space
- Homework 4
- 10.3 - size of ciphertext, superlogarithmic in the security parameter
- Homework 4
- 10.8 - popular choices of $e$ and why
- Homework 5
- 10.11 - CPA-security of El Gamal based on hardness of DDH
- Homework 5
- 10.12 - hybrid encryption for El Gamal
- Homework 5
- 10.17
- Homework 5
11Additional public-key encryption schemes¶
SKIPPED
12Digital signature schemes¶
12.1Overview¶
The public-key counterpart to MACs for preserving message integrity. Advantage over MACs: publicly verifiable, don't need a secret key (and multiple secret keys for multiple recipients). Also, public verifiability implies transferability, so if $A$ verifies $B$, and $B$ verifies $C$, then anyone who trusts A can theoretically trust C too. Also, non-repudiability: if it looks like A signed something, then we can trust that A did indeed sign it
12.2Definitions¶
- Gen
- Input: $1^n$; output: $(pk, sk)$
- Sign
- Input: private key $sk$, message $m$ of variable length. outputs signature $\sigma$ (probabilistically)
- Vrfy
- Input: public key $pk$, message $m$, signature $\sigma$. outputs 1 if valid, 0 otherwise.
True correctness must be satisfied (obviously). Messages can't be modified without losing the validity of the signature, though replay attacks are still possible.
Definition of security: there is an experiment Sig-forge. A is given the public key and oracle access to Sign, then outputs a message (never sent to oracle) and proposed signature. The scheme is existentially unforgeable under an adaptive chosen-message attack if the probability of A outputting a valid sig is negl. (p. 425)
12.3RSA signature¶
"Textbook" RSA:
- Gen
- Run GenRSA to get $(N, e, d)$. Public key: $(N, e)$; private key: $(N, d)$
- Sign
- Given a private key $sk = (N, d)$ and message $m$ (represented in the field of units, $\mathbb Z^*_{N}$), compute $m^d \pmod N$.
- Vrfy
- Given a public key $pk = (N, e)$, message $m$, and signature $\sigma$, compute $\sigma^d \pmod N$ and return 1 if that's equal to $m$.
Correctness holds - recall that $(m^d)^n \mod N = m^{ed \mod \varphi(N)} = m^1 = m \pmod N$
12.3.1Attacks on textbook RSA¶
Highly insecure ...
- No-message attack
- Choose an arbitrary $\sigma$, compute $m = \sigma^e \pmod N$.
- Arbitrary message forging
- Requires two signatures from the signer
- Want to sign $m$. Choose $m_1$, set $m_2 = m / m_1$, get signatures $\sigma^1$ and $\sigma^2$. Then $\sigma = \sigma_1 \cdot \sigma^2$ is a signature for $m$. Proof on p. 427
12.3.2Hashed RSA¶
Modification of textbook RSA to make it harder to attack. The message is hashed (with a good one-way hash function) before it is signed. To verify, compute $H(m)$, check if that is equal to $\sigma^e$.
Obviously $H$ must be collision-resistant. Sadly we don't know of any $H$ that is proven to be secure.
However, if we assume that $H$ is not easily invertible, then the no-message attack is thwarted. Same for arbitrary message forging.
12.4The hash-and-sign paradigm¶
Hashed RSA is useful in that we can now sign arbitrary-length bitstrings rather than just elements of $\mathbb Z^*_N$, by hashing them first.
If we have some signature scheme that is existentially unforgeable under adaptive chosen-message attack, then if we add hashing into the mix, the result is also existentially unforgeable etc. Proof: p. 430.
12.5Lamport's one-time signature scheme¶
Like the one-time pad. Secure only if it signs a single message.
Security definition: in the experiment, A is given the pk and can ask the Sign oracle for a single signature. A then must output the signature for a different message. Probability of this happening must be negligible for the scheme to be existentially unforgeable under a single-message attack.
It works like this: for messages of length $l$, we choose $2l$ random values, feed them into some non-invertible function $f$, and put the inputs in one matrix $X$ and the outputs in another, $Y$ (both $2 \times l$). Then, given a message, for each bit $i$, if that bit is 0, choose the entry in $X$ in the first row, second row is bit is 1 (in the $i$th column). To verify, use the matrix $Y$ (which is the public key) and check that it all works out (p. 433)
12.6Signatures from collision-resistant hashing¶
SKIPPED
12.7The digital signature standard¶
Based on the hardness of discrete log, but no proof.
- Gen
- Input $1^n$, generates $(p, q, g)$ where $p$ and $q$ are primes with $\|q\| = n$, $q\mid(p-1)$, $q^2 \not \mid (p-1)$, and $g$ is a generator of $\mathbb Z^*_p$ with order $q$. Find some hash function with output length $n$. Choose $x$ from $\mathbb Z_q$ randomly, let $y = g^x \pmod p$. Public key: $(H, p, q, g, y)$ and private key is $(H, p, q, g, x)$.
- Sign
- Choose some random $k \in \mathbb Z_q^*$, let $r = (g^k \pmod p) \pmod q)$. Then compute $s = (H(m) + xr)\cdot k^{-1} \pmod q$, output the signature $(r, s)$.
- Vrfy
- Compute $u_1 = H(m)s^{-1} \pmod q$, and $u_2 = r\cdot s^{-1} \pmod q$. Check if $(g^{u_1}g^{u_2} \pmod p) \pmod q$ is equal to $r$.
See p. 446 for proof of correctness.
12.8Certificates and public-key infrastructures¶
Bootstrapping. Single CA, versus multiple CAs, certificate chains. web of trust,
Invalidating certificates: expiration (date on certificate), or revocation (serial number on certificate, users can check some website or something if a certificate is still valid)
12.9Exercises¶
- 12.4 - encoded RSA
- Homework 5
- 12.5 - variable-length MAC vs hash-and-sign
- Homework 5
13Identification¶
Not in textbook. Almost exactly the same as signatures though. Ident:
• An identification scheme is used in the following way. One party Id , who acts as the identifier, runs Gen(1n
) to obtain keys pk and share it with the other party, the verifier.
• When Id wants to identify, a random challenge c is computed by the verifier Vr using pk, Id computes and sends the response ρ ← Identpk(c).
•
Upon receipt of ρ, the Vr, who knows pk, can check whether Checkpk(c,ρ) ? = 1.
Security of Private-Key identification schemes
The identification experiment Ident-fakeA,Π(n):
1. Gen(1n)is run to obtain keys pk.
2. Adversary A is given pk and oracle access to Identpk(·).
Let Q denote the set of challenges whose identification were
requested by A during its execution.
3. A challenge c←{0,1}n is randomly selected and given to A.
4. The adversary A then outputs ρ.
5. The output of the experiment is defined to be 1 iff
(1) Checkpk(c,ρ) = 1, and (2) c ∉ Q
Security of identification schemes.
DEFINITION 12.2 An identification scheme Π = (Gen,Ident,Check) is existentially unforgeable under an adaptive chosen-challenge attack if for all probabilistic polynomial-time adversaries A, there exists a negligible function negl such that:
Pr[Ident-fakeA,Π(n) = 1] ≤ negl(n).