eDoc Gorilla · Article · 7 min read

How PDF Conversion Works

Converting to and from PDF sounds simple, but under the hood several different techniques are at work depending on the source and target format.

Converting other formats into PDF

When you convert a Word document to PDF, the converter walks through the document's structure (paragraphs, headings, tables, images) and re-emits each element as PDF drawing instructions on a fixed page.

Image conversions are simpler: each image is wrapped on a PDF page with the correct dimensions, optionally scaled to fit the paper size.

Converting PDF back into editable formats

Going the other direction is harder. A PDF does not store paragraphs or tables, only positioned text fragments. Converters have to group nearby fragments into lines, lines into paragraphs, and detect tables and lists.

For scanned PDFs the text does not exist as text at all, so optical character recognition (OCR) is needed before the file can be turned into editable Word content.

Why output is not always pixel perfect

Word and PDF describe pages differently. Word reflows content based on margins and styles, while PDF places every glyph at a fixed coordinate. Round-tripping between the two requires educated guesses, which is why small layout shifts are normal.

Client-side vs server-side conversion

Modern browsers can run conversion entirely on your device using libraries such as pdf-lib, mammoth, and pdfjs. That keeps your files private because nothing is uploaded.

Server-side conversion can offer higher fidelity for complex documents, but at the cost of sending your file to a third party.

Frequently asked questions

Converting to PDF preserves the visual layout exactly. Converting out of PDF is lossy because PDFs do not store paragraph or table structure directly.