PDF As She Is Wrote #1: How are signers identified?

Series introduction

This is the first installment in PDF as She Is Wrote 1. With this post, I intend to start a series on the way the PDF standards (mainly ISO 32000-x) are implemented in the “real world”. The focus is on implementation behaviour, not the minutiae of what the standard says (or should say), although I’ll occasionally comment on that as well. Given my area of interest, a significant portion of the content will relate to digital signing in some way.

There’s no set publication schedule, I’ll write things down as I think of them.

Signer identification in CMS

People who’ve worked with me will undoubtedly know that I’m a zealous advocate for the Cryptographic Message Syntax (CMS), defined in RFC 5652. CMS is the spiritual successor to PKCS #7, and is sometimes still referred to as such. The SignedData type in CMS is particularly popular in all sorts of digital signing schemes, where it’s used as a vehicle to encode a “raw” signature together with the relevant metadata necessary to understand it. PDF is part of the CMS club too: virtually everyone uses CMS SignedData objects to encode signatures in PDF documents nowadays2.

But let’s not get ahead of ourselves too much: we’re still talking about CMS. The design philosophy of CMS (as interpreted by yours truly—don’t put too much stock in that) is one of maximal extensibility: the CMS specification itself makes relatively few assumptions about what people put in. As far as SignedData is concerned, the specification only prescribes how to correctly encode SignedData objects, and sets out some basic validation rules. The idea is that other standards can then profile CMS to obtain a subset that’s more appropriate for their use cases.

One of the things that CMS does not do (by default) is to pin the signature to a specific certificate. This is actually a fairly common misconception. From a mathematical point of view, the signature validation procedure doesn’t really care about certificates, it only needs to know the signer’s public key. There can be more than one certificate made out to the same public key, for any number of reasons. The certificate is only relevant to verify the binding between a key and its owner, which is a different problem entirely. However, it’s important to note that some profiles of CMS do pin the signature to a specific certificate3, but that mechanism is beyond the scope of this post.

So, how does CMS identify signers? To answer that question, we need to take a look at some definitions from RFC 5652.

SignedData ::= SEQUENCE {
       version CMSVersion,
       digestAlgorithms DigestAlgorithmIdentifiers,
       encapContentInfo EncapsulatedContentInfo,
       certificates [0] IMPLICIT CertificateSet OPTIONAL,
       crls [1] IMPLICIT RevocationInfoChoices OPTIONAL,
       signerInfos SignerInfos }

SignerInfo ::= SEQUENCE {
       version CMSVersion,
       sid SignerIdentifier,
       digestAlgorithm DigestAlgorithmIdentifier,
       signedAttrs [0] IMPLICIT SignedAttributes OPTIONAL,
       signatureAlgorithm SignatureAlgorithmIdentifier,
       signature SignatureValue,
       unsignedAttrs [1] IMPLICIT UnsignedAttributes OPTIONAL }

SignerIdentifier ::= CHOICE {
       issuerAndSerialNumber IssuerAndSerialNumber,
       subjectKeyIdentifier [0] SubjectKeyIdentifier }

The relevant definition is the one for the SignerIdentifier type. This is a union type between IssuerAndSerialNumber and SubjectKeyIdentifier. Essentially, this is saying that we can either identify the signer by means of an IssuerAndSerialNumber value, or a SubjectKeyIdentifier value. We’ll talk about what each of these mean in a minute, but the way they’re used by the validator is more or less the same in either case: the validator searches its certificate store until it finds a certificate matching the identifier, and then tries to validate the signature against the public key found in that certificate4. The certificates entry would typically contain all signer’s certificates (among other potentially relevant ones).

It’s important to observe that the sid field is not actually part of the portion of the payload that’s cryptographically signed. This allows for some flexibility, but is also potentially problematic in many cases. This is a large part of the raison d’être for attributes like ESS-signing-certificate.

The IssuerAndSerialNumber way

IssuerAndSerialNumber is defined like this:

IssuerAndSerialNumber ::= SEQUENCE {
       issuer Name,
       serialNumber CertificateSerialNumber }

CertificateSerialNumber ::= INTEGER

The meaning of IssuerAndSerialNumber as an identifier is very straightforward: it tells the validator who the issuer of the relevant certificate is, and also mentions the certificate’s serial number. Since best practices dictate that a given CA should never issue more than one certificate with a given serial, this should uniquely identify the certificate.

The SubjectKeyIdentifier way

If you take a look at RFC 5652, you’ll notice that the definition of SubjectKeyIdentifier is simply this:

SubjectKeyIdentifier ::= OCTET STRING

This doesn’t say all that much. In an X.509 context, this value is intended to be compared against the value of a certificate’s subject key identifier (SKI) extension5. It’s wrong to expect this value to be generated by any particular algorithm, but they’re generally derived from the public key by a hashing procedure. See RFC 5280, sec. 4.2.1.2 for examples.

The advantages of this approach are twofold:

  • There’s no expectation of global uniqueness for subject key identifiers, so it becomes possible to have a signature validate against more than one signer’s certificate. Imagine a scenario where a signer doesn’t know ahead of time what certificate will be used to verify their identity.

  • It (theoretically) enables the use of CMS-based signatures together with non-X.509 certificates, in particular in a context where there’s no Name notion around.

What about PDF?

The reality

The CMS specification requires validators to implement support for both alternatives (see RFC 5652, sec. 5.3). This requirement has been part of CMS since 2002, and since both ISO 32000-1 and ISO 32000-2 normatively cite CMS for signature generation, it would seem logical for PDF signature validators to support both alternatives.

However, that’s not what we see in the wild: the vast majority of implementations in major PDF processors only support identifying the signer by issuer and serial number. If interoperability is a concern, you’re therefore better off generating your signatures with an IssuerAndSerialNumber in the sid field.

Some historical speculation

PDF had support for digital signatures long before it became an ISO standard, and in those times, PKCS #7 (the predecessor to CMS) was more widely known. In PKCS #7, the approach based on IssuerAndSerialNumber was the only available choice.

The current CMS definition still shows some hints of this history, as indicated by the fact that the subjectKeyIdentifier in the definition below has a context-specific tag of 0, while the issuerAndSerialNumber field is universally tagged.

SignerIdentifier ::= CHOICE {
       issuerAndSerialNumber IssuerAndSerialNumber,
       subjectKeyIdentifier [0] SubjectKeyIdentifier }

This tagging choice ensures compatibility with PKCS #7 in both directions, as long as the signer makes sure to identify themselves using the IssuerAndSerialNumber option.

If you care about interoperability with other PDF processors as a signer, stick to IssuerAndSerialNumber in your PDF signatures. If you’re implementing a validator, support both IssuerAndSerialNumber and SubjectKeyIdentifier.

Bibliography

ETSI TS 101 733
ETSI. ETSI TS 101 733: Electronic Signatures and Infrastructures (ESI); CMS Advanced Electronic Signatures (CAdES). V2.2.1. 2013.
ISO 32000-1
ISO 32000-1, Document management — Portable document format — Part 1: PDF 1.7
ISO 32000-2
ISO 32000-2, Document management — Portable document format — Part 2: PDF 2.0
PKCS #7
INTERNET ENGINEERING TASK FORCE (IETF). RFC 2315: PKCS #7: Cryptographic Message Syntax [online]. Version 1.5. 1998. Available from: https://tools.ietf.org/html/rfc2315.html
RFC 5126
INTERNET ENGINEERING TASK FORCE (IETF). RFC 5126: CMS Advanced Electronic Signatures (CAdES) [online]. 2008. Available from: https://tools.ietf.org/html/rfc5126.html
RFC 5280
INTERNET ENGINEERING TASK FORCE (IETF). RFC 5280: Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile [online]. 2008. Available from: https://tools.ietf.org/html/rfc5280.html
RFC 5652
INTERNET ENGINEERING TASK FORCE (IETF). RFC 5652: Cryptographic Message Syntax (CMS) [online]. 2009. Available from: https://tools.ietf.org/html/rfc5652.html

  1. For the uninitiated: The series title PDF as She Is Wrote is a nod to this magnificent piece of history.↩︎

  2. This wasn’t always the case, but all non-CMS signature encodings were deprecated in ISO 32000-2.↩︎

  3. CAdES (see RFC 5126, ETSI TS 101 733) requires either the ESS-signing-certificate or the ESS-signing-certificate-v2 attribute to be part of the signature’s signed attributes, thus binding it to one particular signer’s certificate.↩︎

  4. Obviously, there’s more to it than that, but that’s a story for another time.↩︎

  5. In the PKIX profile defined in RFC 5280, the SKI extension is optional for end-entity (i.e. non-CA) certificates. Hence, this way of identifying signers isn’t universally applicable.↩︎