PDFs are the most common source type in serious research workflows, and they are also the source type most likely to cause problems. A 400-page dissertation, a scanned legal document, or a dense technical manual will hit one of NotebookLM's limits if you do not prepare it first.
This guide covers the specific limits, how to diagnose which problem you have, and the practical workarounds that actually work.
For the complete NotebookLM overview, see the Complete NotebookLM Guide.
Understanding the three distinct limits prevents confusion:
| Limit | Value | What Happens When Exceeded | |---|---|---| | File size per source | 200MB | Upload rejected with error | | Words per source | 500,000 | Upload may fail or content truncated silently | | Sources per notebook | 50 | Cannot add more until existing sources are removed | | Total notebook capacity | ~25M words (approx.) | Degraded performance |
Most academic papers (5,000-30,000 words) are well within limits. The problems arise with: - Full books or dissertations - Multi-year annual reports bundled into one PDF - Court documents, transcripts, or legal filings - Technical documentation and specification sets
Before trying workarounds, identify which specific problem you have:
Symptoms: Upload fails with a file size error.
Cause: The PDF contains large images, embedded fonts, or high-resolution scans.
Diagnosis: Check the file size in your operating system. On Windows: right-click → Properties. On Mac: right-click → Get Info.
Fix: Compress the PDF first (see below).
Symptoms: PDF uploads without error, but the AI cannot answer questions about its content. Queries return "I don't see this in the provided sources" for content you know is in the document.
Diagnosis: Open the PDF, try to select text with your cursor. If you cannot select any text, it is image-only.
Fix: Run OCR first (see below).
Symptoms: Upload fails with a word count error, or upload succeeds but AI queries only reference content from the beginning of the document.
Diagnosis: Word count can be checked in Word or by converting to text and using a word counter. A 500,000-word document is approximately 1,000 pages of dense academic text.
Fix: Split the document (see below).
Symptoms: Upload succeeds, but the AI gives garbled or incorrect information. Common with heavily formatted documents (textbooks with sidebars, columns, tables).
Diagnosis: Ask the AI to quote a specific paragraph you know exists. If the quote is incorrect or mangled, extraction failed.
Fix: Try a text-only export (see below).
For PDFs that exceed 200MB due to images or embedded assets:
Free tools: - Smallpdf (smallpdf.com) — drag, compress, download. Works for most cases. - ilovepdf (ilovepdf.com) — similar, slightly more compression options - Adobe Acrobat Reader (File → Print → Save as PDF, then adjust quality) — reduces size by removing some embedded data
Settings to use: - Target "Screen" or "Web" quality (72 DPI) if images are not critical - "Print" quality (150 DPI) if you need to preserve image clarity - Remove embedded fonts if the tool provides this option
After compression, verify the PDF is still readable before uploading.
For image-only PDFs with no text layer:
Google Drive (free, best quality for most documents): 1. Upload the PDF to Google Drive 2. Right-click → Open with Google Docs 3. Google automatically OCRs the document when opening as Docs 4. File → Download → PDF Document to get a text-layer PDF 5. Upload the downloaded PDF to NotebookLM
Adobe Acrobat (best quality, subscription required): 1. Open PDF in Acrobat 2. Tools → Scan & OCR → Recognize Text 3. Save the file 4. Upload to NotebookLM
Free online tools: - OCR.space — good for short documents - PDF24 OCR — handles longer documents, free - Adobe Acrobat online (limited free OCR) — acrobat.adobe.com
For documents exceeding 500,000 words or 200MB:
Split by chapter or section (recommended):
Dissertation-Ch1-Introduction.pdf, Dissertation-Ch2-Literature.pdfSplit by content type (for textbooks): - Part 1: Theoretical chapters - Part 2: Case studies and examples - Part 3: Appendices and references (often can be omitted)
For documents with heavy formatting (multi-column layouts, textbooks with sidebars):
From Word/Google Docs: 1. Open the source document in Word or Docs 2. File → Download/Save As → Plain text (.txt) 3. Upload the .txt file to NotebookLM instead of the PDF
From PDF (when you have the source file): 1. Use the original Word/InDesign file if available 2. Export as plain text or simple PDF (single-column, no sidebars)
Last resort — copy-paste into a text note: For short documents (under 20,000 words), select all text in the PDF (Ctrl+A), copy, and paste into a NotebookLM text note. This strips all formatting but preserves all content.
When you have split a long document into multiple parts, help the AI understand the structure:
Add a "navigator" text note: ``` DOCUMENT STRUCTURE NOTE: This notebook contains a split version of: [Document title]
When answering questions about conclusions or implications, prioritize Part 4. For methodology questions, prioritize Part 3. ```
This context note dramatically improves the AI's ability to give coherent answers about a split document.
Effective prompts for split documents:
Most academic papers are well within limits. The common issue is multi-paper collections (literature compilations, edited volumes). Split these into individual papers and label clearly.
Legal documents often contain extensive headers, footers, page numbers, and formatting that confuse text extraction. The plain text export approach works well here — paste text into a NotebookLM note, stripping the formatting noise.
Documentation sets (e.g., an entire API reference, a multi-chapter technical manual) benefit from topical splitting. Rather than splitting by page count, split by what you are trying to learn: "Authentication docs", "Data models", "Error handling" as separate sources.
Textbooks vary widely. Recent textbooks are often text-searchable PDFs within size limits. Older textbooks from scanned sources need OCR. Highly illustrated textbooks may need compression. The most useful split for textbooks is by chapter or major unit.
The troubleshooting flow: file size problem → compress first; scanned PDF → OCR first; too long → split by chapter; bad formatting → use text export. Most large-document problems are solvable with these four fixes. The key is diagnosing which problem you have before trying workarounds.