Why point compression shouldn't scare you
Article info
Posted on by Matthias Valvekens
Summary and motivation
The most widely used method to represent ECC public keys represents the key by directly encoding the coordinates of the public key point as-is. However, there’s an alternative method that not only uses less data, but also comes with a number of security benefits (almost) for free.
The “compressed point” convention was encumbered by (dubious) patents for the better part of the past two decades, which has unfortunately severely limited widespread adoption of these ideas. Having said that, the last of these patents expired several years ago, so now we’re definitely free to use point compression as we please.
Below, I’ll attempt to argue why you should be using point compression whenever you can. Spoiler alert: it has very little to do with bandwidth optimisation or performance gains.
Background
An ECC public/private key pair consists of the following data:
- a previously agreed-upon elliptic curve \(E\) over some (finite) field \(\mathbb{K}\);
- a base point \(G\in E\) that generates a large subgroup of \(E(\mathbb{K})\) of prime order;
- a public point (often denoted by \(Q\));
- a private number (often denoted by \(d\)).
The public and private parts are related by \(Q=dG\). As such, the security of ECC hinges on the difficulty of solving the discrete logarithm problem in the subgroup generated by \(G\).
For the purposes of this post, we’ll assume that \(\mathbb{K}=\mathbb{F}_p\) for some prime \(p>3\). The techniques we discuss apply to more general fields as well, but that comes at the cost of some additional complexity. Besides, prime fields cover the vast majority of ECC applications in common use. If you want to know how the scheme works in characteristic 2, have a look at SEC 1, secs. 2.3.3, 2.3.4. Most of the time, we’ll also assume that \(G\) generates all of \(E(\mathbb{K})\) (or, in other words, that the cofactor is one). I’ll point out when that is relevant. Throughout, we identify \(\mathbb{F}_p\) with the ring of residue classes \(\mathbb{Z}/p\mathbb{Z}\).
Since elliptic curves live in the plane \(\mathbb{F}_p^2\), the public point \(Q\) is given by two coordinates \((x,y)\), each of which is a number in \(\mathbb{F}_p\). The most straightforward (and still most common) way to represent the public key \(Q\) is by serialising both of these. However, this representation actually has a lot of redundancy. We’ll explore why that is the case in the next section.
Point compression over prime fields in a nutshell
The encoding procedure
Let’s assume that our elliptic curve \(E\) is given by the equation \(y^2 = x^3 + ax + b\). Here, \(a\) and \(b\) are elements in \(\mathbb{F}_p\). Note that such a rewriting is always possible if \(p\) is a prime greater than \(3\).
Suppose now that \(Q=(x_1, y_1)\) is a point on the curve \(E\) that is not the point at infinity. Then \(x_1\) and \(y_1\) are related as follows: \(y_1^2 = x_1^3 + a x_1 + b\). In other words, \(y_1\) must be a square root of \(x_1^3 + a x_1 + b\) in \(\mathbb{F}_p\). Recall that square roots in fields are unique up to sign 1. In particular, it follows that to specify \(Q\), we only need \(x_1\) and the parity2 of \(y_1\).
In practice curve points are encoded in compressed form as follows.
- Set \(L\) to the bit length of \(p\), and allocate an output buffer of length \(L+1\).
- Let \(Q=(x_1,y_1)\) be a point on \(E\) that is not the point at infinity. We identify \(x_1\) and \(y_1\) with their integer representatives in \([0, p-1]\).
- If \(y_1\) is even, set the first byte of the result to
0x02
. If it is odd, set it to0x03
. - Encode \(x_1\) into \(L\) bytes in big-endian form, and fill the rest of the output buffer with the resulting bytes.
That’s all very straightforward. The decompression procedure is slightly more interesting.
Decoding compressed points
As we noted in the previous paragraph, recovering the \(y\)-coordinate of a curve point given the \(x\)-coordinate and parity bit amounts to taking a square root in \(\mathbb{F}_p\). However, in real-world data, it may happen that the supplied coordinate \(x_1\) is such that \(x_1^3+x_1a+b\) is not a quadratic residue in \(\mathbb{F}_p\)! That means that \(x_1\) cannot possibly be the \(x\)-coordinate of any curve point, and hence the input must be rejected.
After taking that into account, the decoding procedure is not hard to describe.
- Verify that the input starts with
0x02
or0x03
. If not, reject the input 3. - Introduce an auxiliary variable \(s\), the value of which is \(0\) if the first byte of the input is
0x02
, and \(1\) otherwise. - Decode the remaining \(L\) bytes of the input into an integer \(x\), using big-endian encoding. If \(x\notin [0, p-1]\), reject the input.
- Attempt to compute a square root of \(x^3+xa+b\) modulo \(p\), and denote the result by \(y'\). If no square root is found, reject the input.
- If \(s=y'\pmod{2}\), then output \((x,y')\). Otherwise output \((x,p-y')\).
Observe that, by construction, \((x,y')\) and \((x,p-y')\) are always valid curve points.
Security implications
The decoding procedure already illustrates one of the fundamental advantages of using compressed points: provided that the modular arithmetic is implemented correctly, it’s impossible to accidentally admit points that don’t lie on the curve!
Many real-world implementations of ECC-based algorithms don’t bother to verify whether the coordinates they ingest actually correspond to valid curve points. This is very problematic, and a source of dangerous vulnerabilities; see e.g. BMM00. Using point compression gets around this issue by forcing the implementation to compute one of the coordinates using the curve parameters, ensuring that the output is always part of the relevant curve by construction.
Of course, in order for a point to be a valid public key, it must additionally lie in the subgroup of \(E(\mathbb{F}_p)\) generated by the chosen base point \(G\). As it happens, this is one of the reasons why many ECC standards choose their curve parameters such that \(G\) generates the entire group: it absolves implementations of the duty to validate whether or not a point lands in the correct subgroup of \(E(\mathbb{F}_p)\) 4. The security impact of this kind of checks can be subtle; see e.g. BCJZ20, secs. 4.1.3, 5.4 for an analysis specific to Ed25519 5.
Wrapping up
Let’s summarise the main points.
- Any potentially relevant patents on point compression expired years ago.
- Point compression and decompression is not hard to implement.
- Using point compression gets you some extra security, basically for free.
So, if you’re in a position where you don’t have to worry about legacy systems that don’t understand compressed points: please use them!
While RFC 5480 allows both compressed and uncompressed points in certificate public key info data, support for compressed points isn’t a requirement of the specification—presumably because of perceived patent issues. As such, if you’re dealing with legacy clients or public workflows, you might not be able to get away with using compressed points everywhere…
Bibliography
In case you don’t remember why this is the case, here’s a quick proof. Let \(\mathbb{K}\) be a field, \(a\in\mathbb{K}\) an arbitrary element and suppose that \(a^2=a'^2\). Then \(0=a^2-a'^2=(a-a')(a+a')\). Since fields do not have zero divisors, it follows that either \(a=a'\) or \(a=-a'\).↩︎
There’s some abuse of language here: what we’re really asking for is the parity of the unique integer representative \(z\) in the residue class of \(y_1\) such that \(0\leq z<p\). This makes sense, because \(-z = p - z \pmod{p}\), so \(p-z\) is the unique such representative of \(-y_1\), and has opposite parity.↩︎
If you’re also interested in parsing uncompressed points (e.g. for compatibility with systems that don’t use point compression), then you should check for the marker
0x04
; see SEC 1, sec. 2.3.4.↩︎For some algorithms that work with curve parameters where the cofactor is nontrivial, there exist more sophisticated point compression procedures that are also capable of guaranteeing that the output still lands in the correct subgroup. Ristretto255 falls into this category.↩︎
Strictly speaking, the point compression scheme used in (standard) Ed25519 implementations is slightly different from the one discussed here, but the broader point stands.↩︎