Three years of invoices, every one of them named statement.pdf or invoice.pdf or scan_20240412_001.pdf. None of them findable by name. The vendor, date, and amount are all inside the PDF, but the filename tells you nothing. This is the most consistent complaint from self-employed people, bookkeepers, and anyone who has ever had to assemble a year of receipts before a tax deadline.
Renaming PDFs by hand is not realistic past about thirty files. The fix is to read the PDF's content, extract the fields you care about (vendor, date, amount, account number, category), and build a filename from a template. This post is the full walkthrough: what the right filename pattern looks like, why Hazel and Renamer cannot do it reliably, and how Sortio handles it with one prompt.
The short version
A good filename pattern is {YYYY-MM-DD}_{Vendor}_{Amount}.pdf. Hazel cannot reliably extract those fields from a real-world mix of vendor PDFs because regex on OCR text drifts. Sortio reads PDF content semantically (LLM, not regex), extracts the fields, applies the template, and routes the file to the right folder. One prompt covers invoices, receipts, statements, and contracts.
What a good rename pattern looks like
A useful PDF filename is one you can search for, sort by, and skim. After helping users set this up for a couple of years, the pattern that survives the longest is:
2026-04-12_Comcast_$142.83.pdf 2026-04-15_Apple_$1284.00.pdf 2026-04-22_Whole_Foods_$87.14.pdf 2026-04-28_Office_Depot_$45.99.pdf
Four parts. ISO date first (YYYY-MM-DD) so files sort chronologically in Finder without any custom view. Vendor next so the folder is scannable by eye. Amount last so when a vendor sends two invoices in the same month you can tell them apart at a glance. Extension as expected.
For receipts you want a category in there as well, because Schedule C and most bookkeeping software want it: 2026-04-12_Office_Depot_Supplies_$23.45.pdf. For bank statements you usually do not want the day, just the year and month, and you want the account number suffix to disambiguate multi-account households: 2026-04_Chase_3829.pdf. For contracts the pattern is different again, 2026-04-12_VendorName_ContractType_v1.pdf, because the date is the signing date and there can be multiple versions.
All four of these are unstructured enough that you cannot generate them with a regex unless every vendor uses an identical layout, and structured enough that you cannot live with "let the LLM pick a name" without templates. The combination (template plus content extraction) is exactly what Sortio does.
Why Hazel and traditional renamers cannot do this reliably
Hazel is the obvious tool to reach for. It has rules. It has content matching. It has filename templates with captured groups. On paper, this is exactly the workflow it was built for, and for clean digitally generated PDFs from a single vendor it works well.
In practice it falls apart fast. The vendor changes its statement template once a year and your regex stops matching. You add a second vendor and now you have two rules, each with its own regex. Add a third and you have three rules. Add ten and you are maintaining a regex library, debugging which rule matched which file, and quietly losing the ones that fell through the cracks. We wrote a separate post on why Hazel content matching breaks on real-world PDFs that goes into the OCR drift and layout-order problems in detail.
Pure batch renamers (NameChanger, A Better Finder Rename, Transnomino) do not read file content at all. They are excellent for renaming photos by EXIF date or stripping prefixes, but they cannot pull "Comcast" and "$142.83" out of the PDF body. They are not the right tool for this job.
Receipt-specialized SaaS (Klippa, Renamer.ai, Wisfile) does read content and can extract vendor and amount well, but it is receipt-only, cloud-only, per-file priced, and not a file organizer. It is the right tool if you only need to convert receipts into a structured data export and nothing else. It is the wrong tool if you also want to organize invoices, statements, contracts, and the rest of your filing.
How Sortio reads a PDF and builds a filename
The flow is the same whether you run it on one PDF or a thousand. Sortio opens the file, extracts the text layer (or runs OCR on the spot if the layer is missing), and sends the text to an LLM with a structured extraction prompt. The model returns the fields you asked for: vendor, date, amount, type, account. Sortio then assembles the filename from your template, runs a preview, and (after you confirm) renames and moves the file.
The reason this works where regex does not is that an LLM is robust to OCR noise. "Account Number: 0123-4567" and "Acct # 0123 4567" and "A/C No 01234567" all read as the same thing. "Comcast Cable" and "Comcast Communications LLC" both resolve to a canonical "Comcast" if you ask for the short vendor name. The model also understands the document structure, so it pulls the total amount from the totals row rather than from a partial subtotal in the middle of the page.
The trade-off is cost and speed. Every AI sort consumes credits, and inference takes longer than regex. For renaming 30 invoices a month, this is invisible: a single sort takes seconds and consumes a handful of credits. For renaming 30,000 files where the pattern is stable enough that a regex would work, you would build the rule once in AI Rule Builder and let it run for free thereafter.
A working Sortio prompt for invoices and receipts
Drop this into the Sortio prompt box, point it at your Downloads folder or an "Inbox" folder where invoices accumulate, and run Preview before applying.
Read each PDF and decide which of these categories it
belongs to: invoice, receipt, bank statement, contract,
or other.
Invoices (a bill addressed to me, with an amount due) go
to ~/Documents/Finance/2026/Invoices/, renamed to
{YYYY-MM-DD}_{Vendor}_{Amount}.pdf.
Receipts (proof of a completed purchase, with a total
paid) go to ~/Documents/Finance/2026/Receipts/, renamed
to {YYYY-MM-DD}_{Vendor}_{Category}_{Amount}.pdf.
Category is one of: Office, Software, Meals, Travel,
Hardware, Other.
Bank statements (a multi-page statement from a bank or
credit card with a statement period and account number)
go to ~/Documents/Finance/Statements/{year}/, renamed
to {YYYY-MM}_{Bank}_{AccountLast4}.pdf.
Contracts (a signed or unsigned agreement, usually with
"Agreement" or "Contract" in the title) go to
~/Documents/Contracts/{Counterparty}/, renamed to
{YYYY-MM-DD}_{Counterparty}_{ContractType}.pdf.
Anything else goes to ~/Documents/Inbox/ untouched.
Use short canonical vendor names (Comcast not Comcast
Cable Communications LLC). Use the total/due amount, not
subtotals. Dates in YYYY-MM-DD. Amounts as $X.XX.Click Preview. Sortio shows you the proposed name for every file, the target folder, and the field values it extracted. Override any individual decision before applying. If a vendor name comes out wrong (a generic invoice template that does not include the vendor cleanly), you can fix that one and Sortio will remember the correction.
Apply commits the moves. A 200-file backlog of invoices typically completes in two to three minutes on the managed AI tier. The preview-before-apply step is the safety net: nothing is destructive, and the Sortio backup folder keeps the original copies of any renamed files for 30 days in case you want to revert.
Setting up a watch folder for new invoices
Once the backlog is clean, the natural next step is to keep it clean. On Sortio Pro ($14.99/month or $99/year) you can turn the same prompt into a watch folder. The workflow is:
- Pick the folder where invoices land. For most people this is ~/Downloads (because that is where browsers and email clients write) or a dedicated ~/Documents/Inbox.
- In Sortio, open Watch Folders, add the folder, and paste the prompt above.
- Set the trigger. "On new file" runs the prompt the moment a file arrives. "Hourly" or "Daily" batches make more sense for high-volume folders.
- For the first week, leave the watch in Preview mode. Sortio queues proposed moves and notifies you instead of applying them, so you can sanity-check that the AI is not misclassifying anything. After a week of clean previews, switch to Apply.
From that point forward, the Downloads folder approaches zero by default. Every invoice that arrives is renamed and filed within seconds. Tax-season prep stops being a four-hour archaeology project and becomes "open ~/Documents/Finance/2026/Receipts and export."
Privacy and local processing
Some PDFs are sensitive. Medical bills, tax returns, legal contracts, brokerage statements. Sortio supports local-only processing through Ollama: the LLM runs on your Mac (Llama 3, Mistral) and no file content leaves the machine. Setup takes a few minutes and the trade-off is honestly captured in our piece on local AI vs cloud AI for file organization. The short version: local is slower and slightly less accurate, but it is fully functional and the right pick for anything you would not want a third-party AI provider to see.
When to skip AI and use a rule
Not every rename needs an LLM. If you have one vendor that always sends the same template (your accountant's software, for example), the routing decision is deterministic and you can promote it to an AI Rule Builder rule. The Rule Builder takes a plain-English description of the rule and generates a deterministic config that runs without consuming AI credits. For high-volume single-vendor flows this is the right tool. See our piece on AI Sort vs Rule Builder for the full picture of when to use which.
FAQ
Can Hazel rename PDFs based on their content?
Hazel can match a regex against the PDF text layer and use captured groups in the new filename. This works on digitally generated PDFs with clean OCR. It breaks on scanned PDFs where the OCR text drifts between documents from the same source (the regex stops matching). It also requires you to write the regex yourself for every vendor format, which gets unmaintainable past a handful of patterns. Sortio reads the PDF semantically with an LLM and is robust to OCR noise.
What is the right filename pattern for invoices and receipts?
The pattern that survives the longest is {YYYY-MM-DD}_{Vendor}_{Amount}.pdf, for example 2026-04-12_Comcast_$142.83.pdf. Date first means files sort chronologically in Finder. Vendor next means you can scan the folder by eye and group by company. Amount last is the disambiguator for vendors that send multiple invoices a month. For receipts add a category: 2026-04-12_Office_Depot_Office_$23.45.pdf. Both work with Schedule C and the standard small-business tax workflow.
Can Sortio handle scanned PDFs where the OCR is bad?
Yes, with a caveat. Sortio reads the PDF's text layer first; if the OCR is missing or unusable, Sortio can re-OCR the file as part of the sort run (managed AI tier) or you can pre-process with OCRmyPDF locally. The LLM is robust to OCR noise that breaks regex-based tools, which is the entire reason this approach holds up where Hazel falls down.
Does Sortio process PDFs locally or in the cloud?
Both. Sortio supports Ollama for local-only inference (Llama 3, Mistral) so sensitive PDFs (medical, legal, financial) never leave the machine. The managed AI option (Sortio-hosted or BYOK) is faster and more accurate. For the full tradeoff see our piece on local AI vs cloud AI for file organization.
What about bank statements and brokerage statements?
Same workflow. Sortio reads the statement, identifies the institution and the account number, and routes the PDF to the right account subfolder. Filename pattern: {YYYY-MM}_{Bank}_{AccountLast4}.pdf, for example 2026-04_Chase_3829.pdf. Multi-account households end up with a clean per-account archive instead of the typical "statement (3).pdf" pile.
How is this different from Klippa or Renamer.ai?
Klippa and Renamer.ai are receipt-specific OCR services. They do one thing well (read a receipt, extract vendor and amount) but they are not file organizers. Sortio is a general AI file organizer that happens to handle this workflow as one of its use cases, alongside screenshot renaming, photo sorting, document filing, and watch-folder automation. If you only need receipt extraction and nothing else, Klippa might be a better fit; if you want the rename plus the routing plus the long-tail of other organizing tasks, Sortio is the right tool.
Do I have to write a different prompt for each document type?
No. One Sortio prompt can route invoices, receipts, statements, and contracts to different folders with different filename templates. The model handles the routing decision per file. For very high volume (thousands of receipts a month) you can promote each rule to the AI Rule Builder, which generates a deterministic rule that runs without consuming AI credits.
