Question 1

Does this work on scanned PDFs?

Accepted Answer

No. Scanned PDFs are basically images of text — there are no actual character codes inside the file. Extracting "text" from them requires OCR (optical character recognition), which we do not run in the browser. If you need OCR, services like Google Drive (open in Docs) or Tesseract.js are options.

Question 2

Why is some text missing or out of order?

Accepted Answer

PDF was designed for printing, not text extraction. Words can be drawn in any order on the page, and pdfjs-dist reconstructs them based on x/y coordinates. Most documents come out cleanly, but tables, columns, and decorative typography sometimes scramble. For the cleanest extraction, use a PDF that was exported from Word/Pages/Markdown rather than scanned.

Question 3

Is the PDF uploaded to your server?

Accepted Answer

No. Everything runs in your browser tab using pdfjs-dist (the same library Firefox uses for its built-in PDF viewer). You can verify with DevTools → Network — there are zero upload requests when you click Extract.

Question 4

What is the maximum PDF size?

Accepted Answer

100 MB per file (the same limit as the rest of OhMyPDF). Realistically pdfjs-dist will run fine on most PDFs under that. Very large picture-heavy PDFs may slow down the page during the rendering step.

Question 5

Can I extract the text per-page?

Accepted Answer

The output uses "--- Page N ---" separators between pages, so you can search for that pattern and split. We may add per-page download in a future version.

Question 6

Why not give me a Word document?

Accepted Answer

A separate "PDF to Word" feature is in the pipeline. The honest version: text-only Word output is easy, but preserving the original layout (columns, tables, images, fonts) cannot be done well in the browser. We would rather ship a clean text dump than a misleading "Word file."

PDF to Text

Why I built this

How to extract text from a PDF

What you can do with the output

Limitations

FAQ

You might also like

Split PDF

Compress PDF

PDF to Images