Repair PDF
Best-effort repair of a damaged PDF — re-save normalised or extract text.
How to use Repair PDF
What this tool does, and what it doesn’t
“Repair PDF” is a best-effort operation. PDFs go wrong in a wide variety of ways and no single tool fixes all of them. This page is honest about which kinds of damage it does fix, which it doesn’t, and what you get back in either case.
The strategy is two-tier:
- Tier 1 — pdf-lib re-save. The damaged file is opened with
pdf-lib’s permissive parser, parsed in full, and re-serialised
to a clean PDF. This rebuilds the cross-reference table, drops
orphaned objects, normalises object numbering, and writes a fresh
%%EOF. For the most common kinds of damage, this alone produces a viewer-friendly file. - Tier 2 — pdf.js text fallback. If Tier 1 throws, the tool loads the file in pdf.js (which uses a different parser with different tolerances) and extracts whatever text and page structure it can. The output of this tier is a text-only PDF — you get the words back, but images, vector graphics, fonts and layout are lost. It’s the file equivalent of “this contract is shredded; here’s a transcript”.
You get to see which tier produced the result so you know what you’re holding.
What counts as repairable
PDF files end with a xref cross-reference table and a %%EOF
marker. Most “PDF won’t open” errors come from one of:
- A truncated or corrupt xref table — the index at the end of
the file got partially overwritten, lost, or appended after
another
%%EOF. pdf-lib scans the body for object headers (N M obj), rebuilds the xref from the objects it finds, and writes the file out. Highly recoverable. - Dangling indirect references — an object claims to point to another object that doesn’t exist, or to a wrong byte offset. The re-serialisation simply omits the dead pointer.
- Missing
%%EOF— common when a transfer was interrupted, an email client truncated trailing bytes, or a generating program crashed. The parser scans for the last good object and writes a fresh trailer. - Multiple appended revisions where the last one is broken — PDFs allow incremental updates appended to the end of the file. If the last revision is corrupt, the parser can often roll back to the previous good revision and re-save.
These are the bulk of real-world PDF damage and they’re what this tool is good at.
What this tool cannot repair
- Severely truncated files. If most of the body is gone (e.g. 95% of the file was lost in transit), there’s nothing to rebuild. pdf-lib will fail; the fallback may extract some text from whatever fragment remains, but a 200-page document reconstructed from a 4 KB tail is going to be mostly empty.
- Destroyed content streams. Page content lives in compressed
streams (
/FlateDecode, sometimes others). If those streams are corrupt rather than the xref pointing to them, the parser can read the page object but can’t render its contents. pdf.js may recover partial text via the streams it can decode; the rest is lost. - Encrypted files with a lost password. Encryption is not damage. No tool — local, cloud, or commercial — can decrypt a modern PDF without its password.
- Password-recovery via brute force. Out of scope. This is a parser-level repair tool, not a cryptanalysis tool.
- Files corrupted by being re-saved as something else. A PDF
renamed to
.pdfafter being saved as a Word document or a JPEG isn’t a PDF at all. Use the relevant convert tool, not repair.
Common use cases
- A PDF that “opens but is blank” or “fails to load” in Chrome, Acrobat, or Preview — usually a Tier 1 repair fixes it in one pass.
- An email attachment that downloaded with the wrong content length — Tier 1 rebuilds the trailer.
- A file from an old archive that worked years ago and now doesn’t — modern parsers have got stricter; the lax re-serialisation here often makes it readable again.
- Recovering text from a “mostly broken” PDF — Tier 2 gives you what’s left as searchable text, even if the layout is gone.
How to use this Repair PDF tool
- Drop the damaged PDF onto the dropzone.
- Click Repair. The tool tries Tier 1 first.
- If Tier 1 succeeds, a Download button appears with the re-saved file. Open it in your usual reader to confirm.
- If Tier 1 fails, the tool falls back to Tier 2 and offers you the text-only reconstruction. The download is labelled clearly so you know it isn’t a full restoration.
- If both tiers fail, the file’s damage is beyond what’s repairable from the browser. Adobe Acrobat or a specialised recovery service is the next step.
Security and limits considerations
A “successful” Tier 1 repair means the file is structurally valid PDF that opens in any reader. It does not mean every page rendered identically to the original — if a content stream was silently corrupt and the parser quietly skipped a damaged region, you can end up with a missing image or a blank patch on a page. Compare the output to whatever you remember of the original before relying on it.
A “successful” Tier 2 repair is text only by design. Treat it as a transcript, not as the document itself. Fonts, page layout, images, signatures, form fields and annotations are gone.
Privacy
Both repair tiers run entirely in this browser tab. The damaged file is read into memory, parsed locally, re-serialised locally, and offered back to you as a Blob. There is no upload, no temporary cloud storage, and no telemetry on file contents. The only network requests this page makes are for its initial JavaScript bundle.
Compatibility notes
The repaired file is a standard PDF 1.7 document. Every modern reader opens it: Adobe Acrobat, Apple Preview, the browser viewers in Chrome / Edge / Firefox / Safari, and the system viewers on iOS and Android. The text-only fallback is also a standard PDF, just with no images or fonts beyond the default sans-serif.
Frequently asked questions
What kinds of damage can this tool actually repair?
%%EOF marker. pdf-lib's parser is tolerant of all three — it scans the body for objects, rebuilds the index, and writes a clean file. Damage outside those classes is increasingly likely to fail; see the limits section.Can it repair a password-protected file I've lost the password to?
How does this compare to Adobe Acrobat's repair or Adobe Document Cloud?
Will the repaired file be smaller than the original?
Does the broken PDF get uploaded anywhere?
Related tools
PDF Compressor
Reduce PDF file size for easier sharing.
PDF Flatten
Flatten PDF layers and form fields into static content.
PDF Text Extractor
Pull the plain text content out of a PDF.
PDF to TXT
Extract a PDF's text and download it as a .txt file.
PDF Merger
Combine multiple PDF files into one document.
PDF Metadata Editor
Edit the title, author, and subject of a PDF.