A heuristic that’s helpe me decide if a given problem should be given completely to AI or use a human-in-the-loop model is “transaction symetry”:
If there is a human on the other side of the transaction there must be a human on your side of the transaction.
For example, if you’re filing a form with a government agency and you know they give it to a person to process it, (even if the submission is done programmatically!) what happens if they call you to verify some information? (California EDD randomly does this). What happens if the person assigned was out of office because of a flood? (I tried for weeks to reach a county clerk to find out they were unreachable because they were displaced). What if the form literally falls behind the printer? (Yes, this too has with the IRS and delayed EIN issuance for hundreds if not thousands of newly formed companies).
This also generalizes to service providers that use thin API wrappers on top of people-powered services. Getting the work “in” is no problem, but inside the blackbox of human processes there is all sorts of variation. For example, a white-labeled provider starts emailing your customers because it relied on a human process to untick a box for sending communications.
There’s all sorts of examples like this because the world operates more like humans than computers.
More generally, the number of failure modes grows larger the more manual processes are involved in the transaction and having a human-in-the-loop is a mitigation strategy to handle more errors (especially previously unknown errors) compared to what AI and software engineering can directly handle (today).
What do I mean by human-in-the-loop?
I mean a person is responsible for the completion of a transaction. They may have many they look after, but it’s ultimately their job to make sure it gets through. That can take the shape of reviews (inspect the form that came back for mistakes), answering emails (respond to a human agent that went off script), calling the counterparty (“hey we haven’t gotten EINs back in weeks”), etc.
Why not try to do it all with AI?
I think it’s probably possible given the multi-modal capabilities of LLMs, but you end up multiplying the probability of errors at every stage of the transaction. I can imagine how an TTS/STT AI could call a government agent asking why a company’s UI tax ID wasn’t assigned yet, but it introduces all sorts of other potential issues. What if the person responsds negatively and thinks it’s a scam? What if the AI hallucinates an answer? It might not even be worth it due to the potential reputational risk. (Transactions might seem 1:1 but they happen in a global context that could put all future transactions at risk).