Why I built this
I needed to copy a few quotes from a research paper PDF for a blog post in May 2026. The PDF's built-in selection was scrambled by the two-column layout — pasting into a text editor produced a mess of half-sentences. Most "PDF to Text" online tools wanted me to upload the paper to their server, then displayed banner ads during a 30-second wait.
OhMyPDF Text uses pdfjs-dist, the same engine that powers Firefox's built-in PDF viewer. Text extraction happens locally in your browser tab. The output is byte-for-byte identical to what Firefox would extract, with page separators added so you can navigate large documents.
How to extract text from a PDF
- Drop a PDF into the upload area, or click to browse.
- Click Extract text. A progress indicator shows page-by-page extraction.
- Copy the result, or click Download .txt to save as a plain text file.
What you can do with the output
- Search: paste into your text editor and use grep/regex.
- Translate: feed to DeepL, Google Translate, or a local translator.
- Summarize: feed to ChatGPT or our PDF Chat.
- Analyze: count word frequency, build keyword lists, generate citations.
- Archive: store as a tiny .txt next to your big PDF for full-text search.
Limitations
- Scanned PDFs: image-only documents return no text. Use OCR software first.
- Complex layouts: multi-column papers, tables, and footnotes may extract out of order.
- Encrypted PDFs: locked files need to be unlocked first via the Unlock PDF tool.
- Embedded fonts: PDFs that subset their fonts and reference glyphs by index (rare, but exists) extract as gibberish.
FAQ
Does this work on scanned PDFs? No. Scanned PDFs are basically images of text — there are no actual character codes inside the file. Extracting "text" from them requires OCR (optical character recognition), which we do not run in the browser. If you need OCR, services like Google Drive (open in Docs) or Tesseract.js are options.
Why is some text missing or out of order? PDF was designed for printing, not text extraction. Words can be drawn in any order on the page, and pdfjs-dist reconstructs them based on x/y coordinates. Most documents come out cleanly, but tables, columns, and decorative typography sometimes scramble. For the cleanest extraction, use a PDF that was exported from Word/Pages/Markdown rather than scanned.
Is the PDF uploaded to your server? No. Everything runs in your browser tab using pdfjs-dist (the same library Firefox uses for its built-in PDF viewer). You can verify with DevTools → Network — there are zero upload requests when you click Extract.
What is the maximum PDF size? 100 MB per file (the same limit as the rest of OhMyPDF). Realistically pdfjs-dist will run fine on most PDFs under that. Very large picture-heavy PDFs may slow down the page during the rendering step.
Can I extract the text per-page? The output uses "--- Page N ---" separators between pages, so you can search for that pattern and split. We may add per-page download in a future version.
Why not give me a Word document? A separate "PDF to Word" feature is in the pipeline. The honest version: text-only Word output is easy, but preserving the original layout (columns, tables, images, fonts) cannot be done well in the browser. We would rather ship a clean text dump than a misleading "Word file."