The evolution of the PDF

How was PDF created?

The Portable Document Format was developed by Adobe Inc. and released in 1992. From then on, it evolved from the original idea of a simple, platform-independent digital sheet of paper. It provided a wide range of capabilities and met the needs of many different industries.

How many PDFs are there?

Due to the diversity of the format and its successful introduction to the market, countless documents were created that no longer corresponded to the original idea of PDF. Also, due to very different software products, which differ in their quality or the way they build PDF, a large number of heterogeneous PDF documents were created. By 2022, the total number of PDF documents is estimated at 3-5 trillion.

What are the problems caused by PDF?

A large number of these documents were no longer suitable for standardized business processes for further processing of documents. Especially in the area of archiving, more and more problems arose. The need for higher quality was recognized and thus the idea of PDF/A was born. The basis for a new ISO standard was created.

What you should know about PDF/A and long-term archiving

What is PDF/A?

PDF/A is based on PDF, or more concretely PDF/A-1, defined in ISO 19005-1, is based on PDF-1.4 and PDF/A-2, defined in ISO 19005-2, is based on PDF 1.7, which in turn is defined in ISO 32000-1. PDF/A describes the difference between what is mandatory from the PDF standard and what is not allowed in PDF/A. PDF/A excludes, for example, dynamic or even harmful content or dependencies on external resources.

What is good PDF archiving based on?

Portable Document Format for Archiving is the recognized standard for archiving electronic documents. Both in industry and in the public sector, PDF/A is a globally established and proven format. The PDF/A standard fulfills all important requirements and is constantly evolving.

What many process and document managers often realize too late is the difference in quality between different products. This applies to both the creation and verification of PDF/A documents. The majority of products in the PDF/A environment are content to cover only ISO 19005, but not the underlying basis ISO 32000. This leads to many products which at first glance serve the desired purpose and are free or affordable. Only a serious evaluation reveals the true differences.

What are the advantages of PDF/A?

The format follows the principle of the self-contained document. This means that the visual appearance of a document is preserved over time, regardless of the tools and systems used to create, store and reproduce it.

This standard does not specify the method or purpose of archiving. It merely defines a standard for electronic documents that is intended to guarantee that a document can be represented faithfully in the future.

Therefore, the document must not refer directly or indirectly to an external source. An example of this would be an external image or a font not embedded in the document itself.

The PDF/A standard consists of a set of rules that specify under what conditions a document is PDF/A compliant. This is more streamlined than that of PDF, since the PDF standard forms its basis.

PDF/A Compliance

What are the versions and conformance levels of PDF/A?

PDF/A is designed as a multi-part series of standards: PDF/A-1, PDF/A-2, PDF/A-3 and PDF/A-4. A later issued type does not replace or supersede earlier ones in any way. For example, previously created PDF/A-1 compliant documents remain valid for long-term archiving. They do not need to be modified, i.e. an "upgrade" to PDF/A 2 is not necessary.

PDF/A versions 1-3 are additionally subdivided into two to three conformance levels, which indicate whether a document, in addition to unambiguous visual reproducibility (Basic = b), also allows the use of Unicode text (Unicode = u) or barrier-free use (Accessibility = a). With PDF/A-4, only two levels are created depending on the content or intended use.

Which PDF/A best suits my requirements?

First of all, you should ask yourself what type of content needs to be covered by your documents. Or more specifically: do you need to archive other formats in addition to pure PDF? If you answer yes to this question, then there are two obvious solutions.

PDF/A-3: This standard allows non-PDF documents to be embedded in a PDF. So similar to an e-mail with an attachment. A good example of this is e-invoice formats such as ZUGFeRD, which use PDF/A-3 as a carrier format and embed an XML file in it.

PDF/A-2 and other permitted document formats: There are cases where conversion to PDF/A does not make sense. For example, converting Excel spreadsheets to PDF/A is feasible and useful in some cases. However, information is lost in the process, such as formulas. If it is in the interest of archiving to include this information, then conversion to PDF/A should not take place in such cases.

Why should a PDF/A document be validated?

PDF/A can be understood as an analogy to the TüV. The TüV, in the form of an inspection, must first be passed before the certificate is issued. The same approach should be taken with PDF/A, because not everything that says PDF/A on it is also PDF/A in it.

A PDF/A validator checks the conformance of a PDF document against a PDF/A version and conformance level, such as PDF/A-2u.

What are the PDF/A versions and conformance levels?

PDF versions:

PDF/A-1: 2005, based on PDF 1.4.

PDF/A-2: 2011, based on PDF 1.7 (ISO 32000-1).

PDF/A-3: 2012, based on PDF 1.7 (ISO 32000-1).

PDF/A-4: 2020, based on PDF 2.0 (ISO 32000-2).

Conformance levels:

b: The lowest level, which describes all basic requirements.

u: Analogous to b, requires Unicode for fonts. Not available for PDF/A-1.

a: The highest level, additionally requires Accessibility.

