PDF Tags & Reading Order, Explained for Non-Developers

PDF Tags & Reading Order, Explained for Non-Developers

Tags are the invisible structure that makes a PDF accessible. Learn what they are, why reading order matters, and how screen readers use them.

PDF Compliance TeamFebruary 27, 20268 min read
Share

Open a PDF and you see pages, columns, and pictures. A screen reader does not see any of that — it reads a hidden layer underneath called the tag tree. Tags tell assistive technology what each piece of content is (a heading, a paragraph, a list, a table) and the order in which it should be read. When the tags are right, a blind or low-vision reader experiences the document the same way you do. When they are missing or scrambled, the same PDF becomes confusing or unusable.

This guide explains, without any code, what PDF tags are, how reading order works, why the order a person sees can differ from the order a screen reader hears, and how to fix it.

What PDF tags actually are

A PDF stores two things: the visual layer (where ink lands on the page) and, if the file is "tagged," a separate structure layer describing what that ink means. That structure layer is the tag tree — sometimes called the logical structure.

Think of it like the outline behind a well-formatted web page or Word document. The visual layer makes a line of text look big and bold; the tag says "this is a Heading 1." Sighted readers infer structure from visual cues. Assistive technology cannot — it relies entirely on the tags.

A PDF with no tags at all is effectively a flat image of text to a screen reader. This is why "tagged PDF" is the foundation of every standard we cover in the WCAG 2.2 and PDF/UA pillar guide: the tags are how an accessible PDF is built.

How assistive technology uses tags

When someone navigates a tagged PDF with a screen reader, the tags drive almost everything:

  • Navigation by structure. Users jump heading to heading, or pull up a list of all headings, to skim a document — just like a sighted reader scans a page. That only works if your headings are tagged as headings.
  • Announcing content type. The screen reader says "heading level 2," "list, 5 items," "table with 3 columns," so the listener understands the shape of the content, not just the words.
  • Reading in the right sequence. The tag tree's order is the order content is spoken.
  • Tables read meaningfully. Properly tagged table headers let the screen reader announce "Row 2, Region: West, Sales: 1,200" instead of reading a meaningless stream of numbers. (More on this in our guide to accessible tables in PDFs.)

Common tag types and what they mean

You do not need to memorize the full tag set, but recognizing the common ones helps you spot problems. These are the standard structure tags:

TagMeaning
H1H6Headings, in nested levels (don't skip levels)
PA paragraph of body text
FigureAn image or graphic (needs alt text)
Table / TR / TH / TDA table, its rows, header cells, and data cells
L / LI / LBodyA list, each list item, and the item's content
LinkA clickable link (with the link text as its label)
ArtifactDecorative or repeating content to be ignored

Two things matter as much as the tag names themselves: tags should be nested logically (list items inside a list, header cells inside a table) and heading levels should not skip (an H1 followed by an H3 with no H2 confuses navigation). Figures need meaningful alt text so the image isn't a silent gap in the reading.

Reading order vs visual order

This is the concept that trips up the most people. Visual order is how content is arranged on the page for the eye. Reading order is the sequence stored in the tag tree — the order a screen reader speaks the content. Ideally they match. Often they don't.

The two diverge whenever a layout is more sophisticated than a single top-to-bottom column:

  • Multi-column layouts. Your eye reads down column one, then jumps to column two — but the tags may run left-to-right across both columns, interleaving the two columns line by line into nonsense.
  • Sidebars and pull quotes. A boxed callout sitting beside the main text might be read in the middle of a sentence if its tag falls there in the tree.
  • Footnotes and captions. A footnote at the bottom of the page may be announced right after the word it references — or far away from it — depending on where it sits in the order.
  • Headers, footers, and page numbers. Repeating page furniture can interrupt the flow if it is woven into the reading order instead of marked as decorative.

The key insight: moving something visually does not change its place in the tag tree, and vice versa. They are two separate layers, and accessibility depends on the tag order, not what your eyes do.

How reading order breaks

Reading order problems usually come from how a PDF was created:

  • Exported from a tool that ignored structure. Many "Save as PDF" or "Print to PDF" paths produce untagged files or tag content in the order it was drawn, not the order it should be read.
  • Designed in a layout app where text frames were placed and rearranged on the canvas; the export follows frame creation order, not visual flow.
  • Scanned documents that were run through OCR but never given a proper structure.
  • Manual edits that added a text box or image late, dropping it at the end of the tree even though it appears mid-page.

The result is the same: the file may look perfect and read like a jumble.

How to fix tags and reading order

The single most effective fix is upstream, before the PDF exists:

  1. Tag from a well-structured source. Build the original document with real structure — use the heading styles in Word or Google Docs, real list and table tools, and the built-in accessibility export. A clean source produces clean tags and correct reading order automatically. This is far cheaper and more reliable than fixing tags after the fact.
  2. If you only have the PDF, fix it in a PDF editor. When you can't regenerate from source, a PDF editor with accessibility tools lets you inspect the tag tree, correct tag types, re-nest items, and reorder the reading sequence by hand. Validate with an accessibility checker as you go.
  3. Test with the experience, not just the checker. Read the document with a screen reader (or have someone who relies on one review it). Automated tools catch missing tags; only listening catches a reading order that is technically tagged but still wrong.

For a step-by-step walkthrough of cleaning up an existing file, see how to remediate an inaccessible PDF.

Artifacts: telling the reader what to ignore

Not everything on a page should be read aloud. Decorative lines, background images, watermarks, repeating page numbers, and running headers add nothing for a screen reader user — and reading them on every page is exhausting.

The fix is to mark this content as an artifact. An artifact is part of the visual layer but is deliberately left out of the reading order, so assistive technology skips it. Used well, artifacts keep the spoken document clean and focused on actual content. The same principle applies to purely decorative images: rather than writing alt text for them, mark them as artifacts so they're ignored entirely (our alt text guide covers when an image is decorative versus informative).

Key takeaways

  • Tags are the hidden structure layer of a PDF; screen readers rely on them entirely, because they cannot see the visual layout.
  • Common tags include headings (H1H6), paragraphs, figures, tables, lists, and links — and they must be nested logically with no skipped heading levels.
  • Visual order and reading order are different layers. They diverge in multi-column layouts, sidebars, footnotes, and page furniture, which is how reading order "breaks."
  • The best fix is to tag from a well-structured source; otherwise repair the tag tree in a PDF editor and verify with a real screen reader, not just an automated checker.
  • Mark decorative and repeating content as artifacts so it's skipped instead of cluttering the reading experience.

Keep reading