In early March, a founder we know forwarded an email. "My CFO is quitting, my month-end takes nine days, and I think we have an AP problem. Can you look?" That was the entire pitch. Two weeks later we knew what was wrong. Two months later it was fixed.
This is a lightly anonymized walkthrough of what we found, what we changed, and (importantly) what we left alone. There is a current pattern in the AI consulting world where every engagement turns into "and now everything is an agent." That is not how operations work, that is not how this engagement worked, and the result is better for it.
The intake call
We do every diagnostic the same way. A 30-minute call to figure out if we should even be on the engagement, then two weeks of paid diagnostic if we should. The intake call has three questions:
- What is the function that absorbs every new ops hire? Whatever that is, that is your bottleneck.
- What does the CFO complain about on Friday afternoons? That is your near-term cost center.
- If you could automate one thing tomorrow with no risk, what would it be? That is the founder's gut talking. Usually right.
For this client all three answers were the same. AP. They were running a five-person AP function on $8M in revenue. The benchmark for a CPG company at that scale is one to two people. Five was a tell.
What was broken
The team had bought four pieces of software in three years. Each one promised to fix AP. Each one fixed a slice of it. None of them spoke to each other. The result was that the AP team's actual job had become reconciling between the AP tools.
Three of the five AP people were not paying invoices. They were copying data between systems so the books would close. This is the most common operational failure mode we see, by a mile.
The job people are paid to do is rarely the job they are actually doing. The diagnostic is finding the gap.
What we kept
Almost everything. The accounting system was fine. The bill-pay rails were fine. The expense tool was fine. We kept all three. The fix was a single integration layer that read from all three, deduplicated, and wrote back to the system of record.
No new SaaS subscriptions. No re-implementing the GL. No "let's move you to NetSuite." The least-glamorous answer is almost always the right one when the bottleneck is connective tissue.
Where AI showed up
In two places. One: classifying invoices into the right GL accounts. The team had been doing this manually because the existing tool's auto-categorization was 64% accurate and they did not trust it. We fine-tuned a small model on three years of their own historical data and got to 96%. The remaining 4% routes to a human for review with the model's confidence and reasoning attached.
Two: drafting vendor follow-up emails when an invoice was missing PO context. The model knew the tone, the standard ask, the escalation history. The AP person reviewed and sent. This was a 70% time reduction on a task that was eating four hours a day.
We measure every AI step against a deterministic baseline. If the model is not measurably better than a rule, we ship the rule and move on.
Where AI did not
Approval routing. Fraud checks. Anything that touched a payment going out the door. Those stay deterministic, audited, and human-owned. We do not build AI into the path of money leaving your bank account. Anyone who tells you they do is either lying or about to.
The handoff
Six weeks after kickoff, the system was live. Eight weeks in, the second AP hire transitioned to AR (where the same diagnostic now applies, a story for another note). Twelve weeks in, we ran the final review with the founder, the CFO, and the new lead operator. The KPIs we had signed off on at the start were hit. We documented the system, ran two days of training, and stepped out.
That is the engagement. AI is in there. It is not the point.
Names and a few specifics have been blurred at the client's request. Numbers are unchanged. If this matches your shape of pain, book a diagnostic.