How to Batch Organize Incoming Documents Per Client Automatically

How to batch organize incoming documents per client automatically

To batch organize incoming documents per client automatically, point Sortio at the folder where documents arrive and give it one rule: read each file, identify the client it belongs to from the content, and file it under that client folder. Sortio processes the whole batch in a single run, and as a watch folder it keeps doing it on every new arrival without anyone opening the app. Each client folder stays current on its own, with no manual filing.

That is the whole loop. The rest of this post is the detail that makes it hold up at volume: the folder structure that scales to hundreds of clients, how the content match works so it does not put the wrong file in the wrong client folder, how to set the watch folder for ongoing intake, and when to drop the AI entirely and run a deterministic rule instead.

The short version

One intake folder, one rule, one preview. Sortio reads each document, matches it to a client from the content (name, account number, matter reference, address), and files it under that client folder using your template. Run it once to clear a backlog, then set a watch folder so every new arrival routes itself. Promote the stable flows to deterministic Rule Builder rules that do not use your AI allowance at all.

The structure: one intake folder, one folder per client

The mistake that makes per-client filing painful is letting documents pile up in Downloads, in email attachments, and in a scanner output folder all at once. The fix is one intake folder that everything lands in, and one stable destination tree of client folders that the intake feeds. The intake stays near empty. The client tree grows.

For most teams the destination looks like this. Pick the depth that matches how you actually work: client then year for an accounting firm, client then matter for a law firm, client then project for an agency.

~/Documents/Intake/                  <- everything lands here first
~/Documents/Clients/
  Acme_Corp/
    2026/
      2026-04-12_Acme_Corp_Invoice.pdf
      2026-04-18_Acme_Corp_Signed_Contract.pdf
    2025/
  Bryce_Holdings/
    Matter_1042_Lease/
      2026-03-02_Bryce_Holdings_Lease.pdf
    Matter_1088_Dispute/
  Carver_Design/
    Project_Rebrand/
      2026-04-09_Carver_Design_Brief.pdf
  _Needs_Review/                     <- low-confidence matches land here

The naming convention inside each client folder is worth standardizing too, so files sort chronologically and stay readable when there are hundreds of them. A pattern that survives is date first, client next, then a short document type:

{YYYY-MM-DD}_{Client}_{DocType}.pdf

2026-04-12_Acme_Corp_Invoice.pdf
2026-04-18_Acme_Corp_Signed_Contract.pdf
2026-03-02_Bryce_Holdings_Lease.pdf

The two pieces that matter most are the _Needs_Review folder and the ISO date prefix. The review folder is the safety valve: anything the model is not confident about goes there instead of into a wrong client folder, so a misfile never happens silently. The date prefix means Finder sorts every client folder by time with no custom view. Stanford University Libraries makes the same point about the value of consistent, descriptive naming, that it should be obvious where to find a file and what it contains. That principle is exactly what a per-client structure encodes.

How Sortio decides which client a document belongs to

The decision is made on content, not on the filename. An incoming document is almost never helpfully named. It is scan_0421.pdf, or document(3).pdf, or a vendor invoice number. But the client identity is inside the file: a company name on the header, an account number, a matter or case reference, a billing address, a contact name. Sortio extracts the text layer (image-only scans need an OCRmyPDF pass first), sends it to an LLM, and asks the model which of your known clients it matches.

Because the match is semantic rather than a literal string compare, it is robust to the variation that breaks rigid rules. "Acme Corp", "Acme Corporation LLC", and an Acme statement that only prints the account number all resolve to the same client. You give Sortio the client roster in the prompt, so it routes to existing folders rather than inventing a new one for a name it has not seen. When the document does not clearly belong to anyone on the roster, the rule sends it to _Needs_Review instead of guessing.

This is the difference between content-aware routing and the older approaches. A regex rule needs the client name to appear in a predictable place in predictable text. The moment a vendor reformats a statement, or a scan introduces OCR noise, the regex stops matching and the file falls through. Reading the meaning of the page holds up where the literal text shifts.

Manual vs rule-based vs AI-by-content

There are three honest ways to file documents per client. They are not strictly better or worse, they fit different volumes and different levels of variability. Here is the fair comparison.

Approach	How it routes	Best for	Breaks down when
Manual filing	A person opens each file and drags it to a client folder.	A handful of files a week, or one-off oddities.	Volume rises. It is the first thing to fall behind under deadline.
Rule-based (Hazel, File Juggler)	Filename or text-layer patterns map to folders via rules you write.	Stable, predictable sources where the same format arrives every time.	Formats vary across clients, OCR drifts, or you would need one rule per client.
AI-by-content (Sortio)	An LLM reads each document and matches it to a client from meaning.	Mixed formats, many clients, scanned intake, variable layouts.	Cost and speed: each AI sort draws from your AI allowance and takes longer than regex.

Hazel and File Juggler are genuinely good tools. Hazel is the long-standing macOS automation app and File Juggler is its closest Windows equivalent. Both can watch a folder and route by filename or by text matched against the document. If your intake is one or two sources that always look the same, a deterministic rule is the right tool and it runs essentially for free. The wall you hit with rule-based tools is variability: once you have dozens of clients sending dozens of formats, you are maintaining a rule library, and scanned documents with noisy OCR slip past the patterns. That is the gap content-aware AI fills. For the deeper version of this trade-off, see our piece on File Juggler vs Sortio.

A working Sortio prompt for per-client routing

Drop this into the Sortio prompt box, point it at your intake folder, and run Preview before applying. Replace the client roster with your own. The roster is what keeps the model from inventing new folders for clients you already have.

Read each document and identify which client it belongs
to. Match against this client roster (use the exact
folder name on the right):

  Acme Corporation, Acme / ACME LLC   -> Acme_Corp
  Bryce Holdings, Bryce LP            -> Bryce_Holdings
  Carver Design Studio, Carver        -> Carver_Design

Identify the client from the company name, account
number, matter or case reference, billing address, or
contact named in the document. Treat name variations as
the same client (Acme Corporation = Acme LLC).

File the document at:
  ~/Documents/Clients/{ClientFolder}/{YYYY}/
renamed to:
  {YYYY-MM-DD}_{ClientFolder}_{DocType}.pdf

DocType is a short label for what the document is:
Invoice, Statement, Contract, Lease, Letter, Form,
Receipt, or Other.

If you cannot confidently match the document to a client
on the roster, do NOT guess. Move it unchanged to
~/Documents/Clients/_Needs_Review/.

Use the date printed on the document (invoice date,
statement date, signing date). Dates in YYYY-MM-DD.

Click Preview. Sortio shows the proposed client, folder, and new name for every file, along with the identifier it matched on. Reassign any borderline case before applying. Apply commits the moves, and the backup folder keeps the originals of anything renamed or moved for 30 days in case you want to revert. For a per-matter or per-project structure, change the destination template to ~/Documents/Clients/{ClientFolder}/{Matter}/ and add a line telling the model to read the matter or project reference too.

Clearing a backlog with a single batch run

The first run is usually a backlog. Months of intake sitting in a shared drive, a scanner output folder, or an email-export dump, none of it filed. Point Sortio at that folder and run the prompt once. It reads every file in the batch, builds the full set of proposed moves, and shows you one preview covering all of them.

The review pass is fast because you are not checking every file, you are checking the exceptions. Everything that matched a client cleanly you can trust at a glance. The work is the handful that landed in _Needs_Review or matched with low confidence. Reassign those, then apply. A backlog of several hundred documents typically clears in a few minutes of processing plus a few minutes of review, instead of an afternoon of dragging files. For the per-document side of this (reading scanned paper that has no useful name), see how to organize scanned documents by content.

Keeping it current with a watch folder

Clearing the backlog once is satisfying but it does not stay clean by itself. The ongoing version is a watch folder. On Sortio Pro ($14.99/month or $99/year) you attach the same prompt to the intake folder and Sortio runs it automatically as new files arrive. The setup is:

Designate the single intake folder everything lands in. Train the team (or your scanner, or an email rule) to drop new documents there and nowhere else.
In Sortio, open Watch Folders, add the intake folder, and paste the per-client routing prompt with your roster.
Set the trigger. "On new file" routes each document the moment it lands. For high-volume intake, an hourly or daily batch is calmer and easier to review.
Run in Preview mode for the first week. Sortio queues the proposed moves and notifies you instead of applying them, so you can confirm the client matching is right before trusting it. After a clean week, switch to Apply.

From there the intake folder approaches empty by default and every client folder stays current without anyone filing. The same idea scales to filing per client in batches across a whole team. For the team-level workflow, see how to organize client tax documents for an accounting firm, which covers the per-client, per-year structure and shared rules in more depth.

Promoting stable flows to deterministic rules

Not every routing decision needs an LLM forever. Once a flow is stable (one client always sends the same statement format from the same address, say), the routing is deterministic and you no longer need the model to make a judgment call. That is exactly the case to promote to the AI Rule Builder. You describe the rule in plain English and Sortio generates a deterministic config that does not use your AI allowance at all.

The practical pattern at scale is a hybrid. Let AI-by-content handle the messy, high-variability part of your intake, the new clients, the scanned paper, the formats you have never seen, while a set of Rule Builder rules handles the predictable, high-volume flows for free. You get content-aware coverage where you need it and unmetered throughput where you do not. For a fuller treatment of when each one fits, see AI Sort vs Rule Builder.

Confidentiality and reverting mistakes

Client documents are sensitive by definition, so two things matter. First, nothing is destructive: Sortio previews before it applies, and it keeps backups of renamed and moved files for 30 days, so a misroute is always recoverable. Second, for documents you do not want a third-party AI provider to see, Sortio supports local-only inference through Ollama (Llama 3, Mistral). The model runs on your machine and no file content leaves it. Managed AI and bring-your-own-key are available if you prefer hosted inference. For teams, Sortio for Teams ($29/seat/month) adds shared rules and an admin console so the same per-client routing applies consistently across everyone, which is the same discipline auditors expect. PCAOB AS 1215 frames it as documentation that lets an experienced person with no prior connection to the work understand it, and a consistent per-client structure is how filing meets that bar.

FAQ

How do you batch organize incoming documents per client automatically?

Point Sortio at the folder where documents arrive and give it one rule: read each file, identify the client it belongs to from the content, and file it under that client folder. Sortio processes the whole batch in a single run, and as a watch folder it keeps doing it on every new arrival without anyone opening the app, so each client folder stays current on its own.

How does the tool know which client a document belongs to?

It reads the content, not the filename. A scanned invoice named scan_0421.pdf still has the client name, an account number, a matter reference, or an address inside it. Sortio sends the extracted text to an LLM, which matches it against your list of known clients and returns the best match. Because the match is semantic, "Acme Corp", "Acme Corporation LLC", and an Acme invoice that only shows the account number all resolve to the same client folder. You can give Sortio your client roster in the prompt so it never invents a new folder for an existing client.

Can it handle a high-volume intake folder with hundreds of files a day?

Yes. A batch run reads every file in the intake folder in one pass and shows you a single preview of where each one will go. For ongoing high volume, set a watch folder on an hourly or daily trigger so the batch runs on a schedule instead of file-by-file. Once a flow is stable and the routing is deterministic enough that you no longer need the AI to make a judgment call, promote it to an AI Rule Builder rule, which does not use your AI allowance at all.

What if a document could belong to more than one client, or to none?

Tell the rule what to do with ambiguity. The recommended pattern is to route anything the model is not confident about to a _Needs Review folder rather than guessing. Sortio surfaces low-confidence matches in the preview, so during a batch run you simply reassign those few files before applying. For watch folders, run in Preview mode for the first week so you can confirm the matches before anything moves.

Can I file by matter or project under each client, not just by client?

Yes. A per-client, per-matter structure is just a deeper template: Clients/{Client}/{Matter}/{YYYY}/. The rule reads both the client and the matter or project reference from the document and builds the path. Law firms use client/matter, accounting firms use client/year, agencies use client/project. Sortio reads whatever identifier is in the document and routes to the matching subfolder.

Does this work for scanned paper, or only digital PDFs?

Both, as long as the scan has a text layer. Scans saved as searchable PDFs (the default on most office scanners) route by client the same way born-digital files do. Image-only scans need a one-time OCR pass first; OCRmyPDF is a free tool that adds the text layer in bulk, and Sortio reads the layer it produces. For a deeper walkthrough see our post on organizing scanned documents by content.

Is it safe to run this on confidential client files?

Sortio is preview-before-apply, so nothing moves until you approve it, and it keeps backups of renamed and moved files for 30 days in case you need to revert. For confidentiality, Sortio supports local-only inference through Ollama (Llama 3, Mistral) so client documents never leave the machine. Managed AI or bring-your-own-key are also available if you prefer hosted inference.

How to Batch Organize Incoming Documents Per Client Automatically