Three years of invoices, every one of them named statement.pdf or invoice.pdf or scan_20240412_001.pdf. None of them findable by name. The vendor, date, and amount are all inside the PDF, but the filename tells you nothing. This is the most consistent complaint from self-employed people, bookkeepers, and anyone who has ever had to assemble a year of receipts before a tax deadline.
Renaming PDFs by hand is not realistic past about thirty files. The fix is to read the PDF's content, extract the fields you care about (vendor, date, amount, account number, category), and build a filename from a template. This post is the full walkthrough: what the right filename pattern looks like, why Hazel and Renamer cannot do it reliably, and how Sortio handles it with one prompt.
The short version
A good filename pattern is {YYYY-MM-DD}_{Vendor}_{Amount}.pdf. Hazel cannot reliably extract those fields from a real-world mix of vendor PDFs because regex on OCR text drifts. Sortio reads PDF content semantically (LLM, not regex), extracts the fields, applies the template, and routes the file to the right folder. One prompt covers invoices, receipts, statements, and contracts.
What a good rename pattern looks like
A useful PDF filename is one you can search for, sort by, and skim. After helping users set this up for a couple of years, the pattern that survives the longest is:
2026-04-12_Comcast_$142.83.pdf 2026-04-15_Apple_$1284.00.pdf 2026-04-22_Whole_Foods_$87.14.pdf 2026-04-28_Office_Depot_$45.99.pdf
Four parts. ISO date first (YYYY-MM-DD) so files sort chronologically in Finder without any custom view. Vendor next so the folder is scannable by eye. Amount last so when a vendor sends two invoices in the same month you can tell them apart at a glance. Extension as expected.
For receipts you want a category in there as well, because Schedule C and most bookkeeping software want it: 2026-04-12_Office_Depot_Supplies_$23.45.pdf. For bank statements you usually do not want the day, just the year and month, and you want the account number suffix to disambiguate multi-account households: 2026-04_Chase_3829.pdf. For contracts the pattern is different again, 2026-04-12_VendorName_ContractType_v1.pdf, because the date is the signing date and there can be multiple versions.
All four of these are unstructured enough that you cannot generate them with a regex unless every vendor uses an identical layout, and structured enough that you cannot live with "let the LLM pick a name" without templates. The combination (template plus content extraction) is exactly what Sortio does.
Why Hazel and traditional renamers cannot do this reliably
Hazel is the obvious tool to reach for. It has rules. It has content matching. It has filename templates with captured groups. On paper, this is exactly the workflow it was built for, and for clean digitally generated PDFs from a single vendor it works well.
In practice it falls apart fast. The vendor changes its statement template once a year and your regex stops matching. You add a second vendor and now you have two rules, each with its own regex. Add a third and you have three rules. Add ten and you are maintaining a regex library, debugging which rule matched which file, and quietly losing the ones that fell through the cracks. We wrote a separate post on why Hazel content matching breaks on real-world PDFs that goes into the OCR drift and layout-order problems in detail.
Pure batch renamers (NameChanger, A Better Finder Rename, Transnomino) do not read file content at all. They are excellent for renaming photos by EXIF date or stripping prefixes, but they cannot pull "Comcast" and "$142.83" out of the PDF body. They are not the right tool for this job.
Receipt-specialized SaaS (Klippa, Renamer.ai, Wisfile) does read content and can extract vendor and amount well, but it is receipt-only, cloud-only, per-file priced, and not a file organizer. It is the right tool if you only need to convert receipts into a structured data export and nothing else. It is the wrong tool if you also want to organize invoices, statements, contracts, and the rest of your filing.
How Sortio reads a PDF and builds a filename
The flow is the same whether you run it on one PDF or a thousand. Sortio opens the file, extracts the text layer (image-only PDFs need an OCRmyPDF pass first to add one), and sends the text to an LLM with a structured extraction prompt. The model returns the fields you asked for: vendor, date, amount, type, account. Sortio then assembles the filename from your template, runs a preview, and (after you confirm) renames and moves the file.
The reason this works where regex does not is that an LLM is robust to OCR noise. "Account Number: 0123-4567" and "Acct # 0123 4567" and "A/C No 01234567" all read as the same thing. "Comcast Cable" and "Comcast Communications LLC" both resolve to a canonical "Comcast" if you ask for the short vendor name. The model also understands the document structure, so it pulls the total amount from the totals row rather than from a partial subtotal in the middle of the page.
The trade-off is cost and speed. Every AI sort draws from your AI allowance, and inference takes longer than regex. For renaming 30 invoices a month, this is invisible: a single sort takes seconds and barely registers against your allowance. For renaming 30,000 files where the pattern is stable enough that a regex would work, you would build the rule once in AI Rule Builder and let it run for free thereafter.
A working Sortio prompt for invoices and receipts
Drop this into the Sortio prompt box, point it at your Downloads folder or an "Inbox" folder where invoices accumulate, and run Preview before applying.
Read each PDF and decide which of these categories it
belongs to: invoice, receipt, bank statement, contract,
or other.
Invoices (a bill addressed to me, with an amount due) go
to ~/Documents/Finance/2026/Invoices/, renamed to
{YYYY-MM-DD}_{Vendor}_{Amount}.pdf.
Receipts (proof of a completed purchase, with a total
paid) go to ~/Documents/Finance/2026/Receipts/, renamed
to {YYYY-MM-DD}_{Vendor}_{Category}_{Amount}.pdf.
Category is one of: Office, Software, Meals, Travel,
Hardware, Other.
Bank statements (a multi-page statement from a bank or
credit card with a statement period and account number)
go to ~/Documents/Finance/Statements/{year}/, renamed
to {YYYY-MM}_{Bank}_{AccountLast4}.pdf.
Contracts (a signed or unsigned agreement, usually with
"Agreement" or "Contract" in the title) go to
~/Documents/Contracts/{Counterparty}/, renamed to
{YYYY-MM-DD}_{Counterparty}_{ContractType}.pdf.
Anything else goes to ~/Documents/Inbox/ untouched.
Use short canonical vendor names (Comcast not Comcast
Cable Communications LLC). Use the total/due amount, not
subtotals. Dates in YYYY-MM-DD. Amounts as $X.XX.Click Preview. Sortio shows you the proposed name for every file, the target folder, and the field values it extracted. Override any individual decision before applying. If a vendor name comes out wrong (a generic invoice template that does not include the vendor cleanly), you can fix that one and Sortio will remember the correction.
Apply commits the moves. A 200-file backlog of invoices typically completes in two to three minutes on the managed AI tier. The preview-before-apply step is the safety net: nothing is destructive, and the Sortio backup folder keeps the original copies of any renamed files for 30 days in case you want to revert.
Setting up a watch folder for new invoices
Once the backlog is clean, the natural next step is to keep it clean. On Sortio Pro ($14.99/month or $99/year) you can turn the same prompt into a watch folder. The workflow is:
- Pick the folder where invoices land. For most people this is ~/Downloads (because that is where browsers and email clients write) or a dedicated ~/Documents/Inbox.
- In Sortio, open Watch Folders, add the folder, and paste the prompt above.
- Set the trigger. "On new file" runs the prompt the moment a file arrives. "Hourly" or "Daily" batches make more sense for high-volume folders.
- For the first week, leave the watch in Preview mode. Sortio queues proposed moves and notifies you instead of applying them, so you can sanity-check that the AI is not misclassifying anything. After a week of clean previews, switch to Apply.
From that point forward, the Downloads folder approaches zero by default. Every invoice that arrives is renamed and filed within seconds. Tax-season prep stops being a four-hour archaeology project and becomes "open ~/Documents/Finance/2026/Receipts and export."
How to organize scanned documents by what is inside them
To organize scanned documents by what is inside them, you have to read the content, because the filename (scan_0421.pdf, IMG_4471.jpg) tells you nothing and there is no metadata to sort on. Give each scan a text layer (searchable PDFs have one already; OCRmyPDF adds one to image-only scans), then Sortio reads that text with an LLM and files the document by what it actually is, a lease, a 1099, a signed contract, an invoice, rather than by a meaningless filename.
That is the difference from a batch renamer or a folder-watcher rule: those act on the name or the extension, while Sortio acts on the meaning of the page. Point it at a folder of mixed scans, run Preview to confirm the proposed names and destinations, and apply. A drawer of scanned paper becomes a searchable, correctly filed archive in one pass instead of a manual sort that never gets started.
How to batch organize incoming documents per client automatically
To batch organize incoming documents per client automatically, point Sortio at the folder where they arrive and give it one rule: read each document, identify the client it belongs to from the content, and file it under that client's folder. Sortio processes the entire batch in a single run, and as a watch folder it keeps doing it on every new arrival without anyone opening the app.
This is the workflow behind a clean per-client archive: intake lands in one place, Sortio reads the client name, matter, or account off each file and routes it, and the folder for each client stays current on its own. For the firm version of this, with a per-client, per-year structure, see how to organize client tax documents for an accounting firm.
Privacy and local processing
Some PDFs are sensitive. Medical bills, tax returns, legal contracts, brokerage statements. Sortio supports local-only processing through Ollama: the LLM runs on your Mac (Llama 3, Mistral) and no file content leaves the machine. Setup takes a few minutes and the trade-off is honestly captured in our piece on local AI vs cloud AI for file organization. The short version: local is slower and slightly less accurate, but it is fully functional and the right pick for anything you would not want a third-party AI provider to see.
When to skip AI and use a rule
Not every rename needs an LLM. If you have one vendor that always sends the same template (your accountant's software, for example), the routing decision is deterministic and you can promote it to an AI Rule Builder rule. The Rule Builder takes a plain-English description of the rule and generates a deterministic config that does not use your AI allowance at all. For high-volume single-vendor flows this is the right tool. See our piece on AI Sort vs Rule Builder for the full picture of when to use which.
FAQ
Can Hazel rename PDFs based on their content?
Hazel can match a regex against the PDF text layer and use captured groups in the new filename. This works on digitally generated PDFs with clean OCR. It breaks on scanned PDFs where the OCR text drifts between documents from the same source (the regex stops matching). It also requires you to write the regex yourself for every vendor format, which gets unmaintainable past a handful of patterns. Sortio reads the PDF semantically with an LLM and is robust to OCR noise.
What is the right filename pattern for invoices and receipts?
The pattern that survives the longest is {YYYY-MM-DD}_{Vendor}_{Amount}.pdf, for example 2026-04-12_Comcast_$142.83.pdf. Date first means files sort chronologically in Finder. Vendor next means you can scan the folder by eye and group by company. Amount last is the disambiguator for vendors that send multiple invoices a month. For receipts add a category: 2026-04-12_Office_Depot_Office_$23.45.pdf. Both work with Schedule C and the standard small-business tax workflow.
Can Sortio handle scanned PDFs where the OCR is bad?
Yes, with a caveat. Sortio reads the PDF's existing text layer; it does not run its own OCR. If the layer is missing or unusable, pre-process the file with OCRmyPDF (free) and Sortio reads the layer it adds. The LLM is robust to OCR noise that breaks regex-based tools, which is the entire reason this approach holds up where Hazel falls down.
Does Sortio process PDFs locally or in the cloud?
Both. Sortio supports Ollama for local-only inference (Llama 3, Mistral) so sensitive PDFs (medical, legal, financial) never leave the machine. The managed AI option (Sortio-hosted or BYOK) is faster and more accurate. For the full tradeoff see our piece on local AI vs cloud AI for file organization.
What about bank statements and brokerage statements?
Same workflow. Sortio reads the statement, identifies the institution and the account number, and routes the PDF to the right account subfolder. Filename pattern: {YYYY-MM}_{Bank}_{AccountLast4}.pdf, for example 2026-04_Chase_3829.pdf. Multi-account households end up with a clean per-account archive instead of the typical "statement (3).pdf" pile.
How is this different from Klippa or Renamer.ai?
Klippa and Renamer.ai are receipt-specific OCR services. They do one thing well (read a receipt, extract vendor and amount) but they are not file organizers. Sortio is a general AI file organizer that happens to handle this workflow as one of its use cases, alongside screenshot renaming, photo sorting, document filing, and watch-folder automation. If you only need receipt extraction and nothing else, Klippa might be a better fit; if you want the rename plus the routing plus the long-tail of other organizing tasks, Sortio is the right tool.
Do I have to write a different prompt for each document type?
No. One Sortio prompt can route invoices, receipts, statements, and contracts to different folders with different filename templates. The model handles the routing decision per file. For very high volume (thousands of receipts a month) you can promote each rule to the AI Rule Builder, which generates a deterministic rule that does not use your AI allowance at all.
How do you organize scanned documents by what is inside them?
Read the content, because a scanned file has no useful name and no metadata to sort on. Make sure each scan has a text layer (searchable PDFs already do; OCRmyPDF adds one to image-only scans), then Sortio reads that text with an LLM and files the document by what it actually is (a lease, a 1099, a signed contract) instead of by a filename like scan_0421.pdf. Point it at a folder of mixed scans, run Preview to confirm the proposed names and destinations, then apply, and a drawer of scanned paper becomes a searchable, correctly filed archive in one pass.
How do you batch organize incoming documents per client automatically?
Point Sortio at the folder where documents arrive and give it one rule: read each file, identify the client it belongs to from the content, and file it under that client folder. Sortio processes the whole batch in a single run, and as a watch folder it keeps doing it on every new arrival without anyone opening the app, so each client folder stays current on its own.
