Real World Crypto 2021 - Session 12: Crypto for the CloudRWC2021 · Real World Crypto
There’s a github repo
Searchable Encryption: how do you permit keyword searching over a corpus of encrypted documents?
Symmetric: they’re only allowing use of symmetric techniques, for efficiency.
Server might have a search index, a keyvalue store, where PRF(search keyword)->Enc(matching doc ids). All value entries have same size to avoid attacks like freq analysis, so all doc id lists are padded to max size. Client makes search, gets back enc doc IDs, then goes and requests docs by enc id.
The system security is good at hiding keywords used in search in part one (hitting seach index) but breaks down if someone can analyse traffic in requesting docs by enc docid, as frequency analysis is possible there.
SWiSSSE is designed to prevent that kind of leakage. It addresses the second freq analysis attack by padding stores out with fake keywords, fake documents, and mixing up of server’s data returned in various reads. Clients maintain a stash that is used to mix things up over time and write back to the server (didn’t quite get this part).
The more you want to reduce leakage, the more padding must approach worst-case, and that consumes many resources
Their reference impl server uses Redis. PRF = HMAC-SHA256 and Enc = AES-GCM
(Unfortunately the audio in this talk had a really bad echo).
In JWT, the ‘alg’ field, can be (legitimately) set to ‘none’. The problem isn’t this so much as that there is an ‘alg’ field at all!
This allows someone to switch ‘alg’ to a symmetric scheme, after which if a client provides their public key it’d be misinterpreted as symmetric key material. Then anyone could mint a valid token without a problem. The key ID ‘kid’ field can be similarly attacked - switch out a real key for one the attacker controls.
These sorts of issues are seen elsewhere, not just JWT. It’s a result of attaching to the ciphertext metadata that dictates what’s needed to authenticate and decrypt.
Similar vuln was found in AWS S3 Client-Side Encryption library (CVE-2020-9812)
Tink Keys: In Google’s Tink API they’ve ensured these sorts of vulnerabilities are avoided. The metadata is embedded into the key itself. Key rotation is supported via keysets. You can tell a Google product that uses Tink as keys are expressed in B64 and it’s always start with ‘A’ (this is 01, from the key format’s leading version field). There follows the key ID to identify the key to be used in the keyset.
- never trust the ciphertext!
- ensure key metadata is part of the key itself - i.e. algo type, algo’s required params
- key IDs can’t be authenticated as they must be attached to a ciphertext, so key IDs should only be meaningful within a keyset, and should be equally trusted.
Many apps are transitioning their data storage requirements to cloud-hosted data stores. Those data stores (e.g. AWS S3, ElastiCache) should not necessarily be trusted.
Apps may encrypt their key-value pairs for security
- e.g. (K, V) -> (PRF(K), AEAD(V))
but this doesn’t prevent a malicious cloud provider performing access-pattern attacks.
e.g. on medical data, indexed by patient condition - the cloud provider doesn’t know which patient or specifically which condition, but could obtain independently from medical literature the prevalence of conditions. Then with frequency analysis of the data accesses made, work out which PRF(K) is which medical condition. This has been seen in practice.
- Oblivious RAM (ORAM) and Private Information Retrieval (PIR). Threat is an active adversary which might inject it’s own accesses. Security is strong but performance is poor; 8x storage requirements and 1600x bandwidth!
- Searchable Symmetric Encryption. Threat is passive adversary. SSE doesn’t do well at preventing access pattern attacks but performance is high.
Pancake introduces a trusted proxy on client side, which smoothes the accesses being made by the client, so that access patterns to the server appear to be uniformly random. Trhoughput is 100x higher than ORAM.
Possible approaches to smoothing:
- increasing size of the encrypted corpus by including fake or replicated data items. Runs risk of blowing up server-side storage requirements
- injecting fake accesses to the unpopular items, to achieve uniform distribution. This wastes bandwidth and increases datastore load
Pancake is a mix of the two above.
- replicate just enough of the keys to partially smooth distribution, and fill in the rest with injected fake accesses.
Challenge is to ensure adversary can’t distinguish real and fake accesses by e.g. timing analysis.
They’ve achieved max 3x bandwidth and 2x storage overhead.