There was recently an attempt by an independent journalist to expose fraud in a Minnesota social program. It was deeply frustrating; the journalist had notably poor epistemic standards, which secondary media seized upon to dismiss their result.
The class-based sniffing almost invariably noted that prestige media had already reported stories which rhymed with the core allegation, while sometimes implying that makes the allegations less likely to be true, through a logical pathway which is mysterious to me.
The journalism went quite viral anyway, in part because of sensationalized framing, in part because of signal boosting by an aligned media ecosystem and aligned politicians, and in part because the journalism develops one bit of evidence that has a viscerality that paperwork dives often lack: these purported childcare operations routinely have no children in them.
Fraud has become quite politicized in the United States the last few years. We had a poorly-calibrated federal initiative led by a charismatic tech entrepreneur which believed it would unearth trillions of dollars of fraud that focused substantial effort on large programs which are comparatively fraud-resistant. Across the aisle, we have reflexive dismissal that fraud happens in social programs, which functions as air cover for scaled criminal operations which loot many varied social programs [0] and are sometimes run out of geopolitical adversaries of the U.S. including by ambiguously-retired members of their clandestine services.
I worked in the financial industry for a few years. We do not have the luxury of pretending that fraud is something invented by our rivals to besmirch our good name. It hits the P&L every quarter and will eat you alive if you’re not at least minimally competent in dealing with it. Conversely, it is well-understood in industry that the optimal amount of fraud is not zero.
The financial industry has paid at least tens of billions of dollars in tuition here. Overwhelmingly, one learns about fraud in it through an apprenticeship model, with different firms having different internal levels of understanding on the shape of the elephant. The industrial organization presumes small numbers of people architecting anti-fraud systems and relatively larger numbers of investigators and analysts operating those systems on a day-to-day basis.
There does exist some informal knowledge sharing between firms. If you work in payments, try getting invited to the Chatham House rule sessions held by… oh yeah, can’t say. Despite that social technology being originally developed for the benefit of government and press actors, it is my general impression that U.S. benefits programs don’t yet see themselves as sufficiently yoked by adversarial attention to benefit from their own Chatham House series. Perhaps that should change.
And so, for the benefit of fraud investigators with badges, press cards, or GoPros, some observations from a community of practice with an extensive (and mostly nonpublic) body of work. But first a tiny bit of throat clearing.
In which we briefly return to Minnesota
Minnesota has suffered a decade-long campaign of industrial-scale fraud against several social programs. This is beyond intellectually serious dispute. The 2019 report from the Office of the Legislative Auditor (a non-partisan government body) makes for gripping reading. The scale of fraud documented and separately alleged in it staggers the imagination: the state’s own investigators believed that, over the past several years, greater than fifty percent of all reimbursements to daycare centers were fraudulent. (Separate officials took the… novel position that they were only required to recognize fraud had happened after securing a criminal conviction for it. Since they had only secured a few criminal convictions, there was no way that fraud was that high. Asked to put a number on it, repeatedly, they declined.)
The investigators allege repeatedly visiting daycare centers which did not, factually, have children physically present at the facility despite reimbursement paperwork identifying specific children being present at that specific time. The investigators demonstrated these lies on timestamped video, and perhaps in another life would have been YouTube stars.
Our social class is intensely averse to straightforwardly recounting these facts, partly due to political valence and partly due to this particular fraud being dominantly conducted within a community which codes as disadvantaged in the U.S. sociopolitical context.
Fraudsters are liars and will cheerfully mouth any words they believe will absolve them of their crimes. If an accusation of racism gets one a free pass to steal hundreds of millions of dollars, they will speciously sue you alleging racial discrimination. That empirically worked in Minnesota. The OLA takes explicit notice of this multiple times, a coordinator for the fraud operation is on record explicitly explaining the strategic logic of accusations of racism, and a judge was even moved to make an extraordinary statement to clarify that the bad-faith lawsuit alleging racism did not achieve success through the formal judicial process but rather through the voluntary compliance of governmental actors shamed by its allegations.
(As a sidenote: one has to be able to hold two thoughts simultaneously about fraudulent operations. They can be sophisticated with respect to exploiting sociopolitical cleavages in their targets while also being comically inept at faking evidence elsewhere, such as having a single person write dozens of adjacent rows in a sign-in sheet. This routinely surprises observers and it should not surprise them. The financial industry also has a division of labor in it. The person architecting the fraud department’s standard processes is well-paid, well-educated, and routinely brings crossdisciplinary expertise to bear. A Fraud Analyst I, on the other hand, bears a lot of similarity to a call center employee in terms of compensation, education, and permitted amounts of agency.)
In the immediate wake of the independent journalist’s report, the great and the good rallied around the organizations he accused. Of course it was natural that journalists wouldn’t get immediate access to children if they asked. Of course there was a certain amount of informality in the sector. Of course, as the New York Times very carefully wordsmithed recently:
Minnesota officials said in early January that the state conducted compliance checks at nine child-care centers after Mr. Shirley posted his video and found them “operating as expected,” although it had “ongoing investigations” at four of them. One of the centers, which Mr. Shirley singled out because it misspelled the word “Learning” on its sign, has since voluntarily closed.
An inattentive reader might conclude from this paragraph that the Times disputes Shirley’s reporting.
To the extent that Bits about Money has an editorial line on that controversy, it is this: if you fish in a pond known to have 50% blue fish, and pull out nine fish, you will appear to be a savant-like catcher of blue fish, and people claiming that it is unlikely you have identified a blue fish will swiftly be made to look like fools. But the interesting bit of the observation is, almost entirely, the base rate of the pond. And I think journalism and civil society should do some genuine soul-searching on how we knew—knew—the state of that pond, but didn’t consider it particularly important or newsworthy until someone started fishing on camera.
But this is not a publication about particular ponds. It is a publication about getting better at fishing.
Common signals, methods, and epiphenomena of fraud
Fraudsters are playing an iterated game
The best non-fiction work on fraud is Dan Davies’ Lying for Money. In it, you’ll find replete examples of something well-known to fraud investigators: the dominant next adventure for a former fraudster is… opening up a new fraud. And therefore, if you want to identify a ridiculously-high-hit-rate list of frauds in round N+1 of a game, a so-easy-its-practically-cheating way to do so is to look at what known fraudsters from round N are doing today.
There is a genuine difference in the culture and epistemology of the financial industry versus the government of the United States here. In the financial industry, we keep blacklists and getting a second chance after obvious misbehavior is intentionally non-trivial. This runs against deeply felt values of civil servants. An accusation is not a conviction, and absent clear authority to impose consequences in a new program, an actor convicted at enormous societal cost emerges to a new program officer as tabula rasa, equal in moral worth to any randomly chosen citizen.
I will not argue that Mastercard has better moral intuitions than the Founding Fathers. I would, however, happily suggest that the government not assume that the Constitution contains emanating penumbras obligating it to be repeatedly taken advantage of by the same people in the same fashion. We are not forbidden object permanence.
Minnesota raided the Sunshine Child Care Center in 2022 on suspicion of overbilling. No charges were brought, in what investigators imply was less an exoneration and more an inter-departmental fumble. That operation was owned by one Fowsiya Hassan. A separate childcare center owned by Fowsiya Hassan was featured on YouTube recently. This follows on $1.5 million of funds received through Feeding Our Future, a scaled fraud operation which has generated over 70 indictments, 5 criminal convictions, and 50 guilty pleas. What a set of coincidences. Perhaps Hassan has, as she has alleged in a lawsuit, been a frequent target of racially-motivated government investigations into a successful serial entrepreneur in the childcare field.
The fraud supply chain is detectable
Much of the intellectual energy in policy circles about fraud is aimed at retail-level fraud by individual beneficiaries. Most fraud, like most scaled property crime, is actually the result of a business process.
This is an elementary fact of capitalism. It is deeply disconcerting to find every benefits program independently rediscovers it a decade too late to do anything about it. Most bread is not baked by amateurs in their kitchens. It comes from a bakery which exists to bake bread and hires specialists in baking bread and then supports them with capital-intensive built infrastructure.
Fraud develops a supply chain. Some elements in the supply chain are dual-use; the bad guys use Excel for the same reason every business uses Excel. Some elements in the supply chain, though, are specialized infrastructure with no or de minimis legitimate purpose. Those elements can be profiled.
I worked at Stripe for several years and am currently an advisor there. Stripe does not endorse what I write in my personal spaces. In its own spaces, Stripe has discussed being able to follow fraudulent operations in sufficient detail to determine when the operators went to lunch.
Fraudsters share specialists quite frequently. They use the same incorporation agents, the same mail services, the same CPAs, the same lawyers, etc.
You can make the same observation about many communities of practice. It is a non-coincidence that many tech startups are at 548 Market Street in San Francisco. 548 Market Street is not the world’s hippest coworking space. It is the address for EarthClassMail in SF. There are many P.O. box providers in the world; many geeks with taste reach for ECM. (Bits about Money is legally required to maintain a postal address and, if you were ever to send it a physical letter, that would also end up in the hands of an EarthClassMail employee.)
Elsewhere in the world, there exist P.O. box providers whose customers statistically include fewer AI labs and more frauds. One imagines the specialist-in-fraud at the storefront, picking up the day’s take from fifteen separate boxes.
Elementary work graphing supporting infrastructure, even on something as unsophisticated as butcher paper, frequently unravels fraud networks. Data science has any number of more sophisticated approaches. Jetson Leder-Luis, an academic who now routinely works with the government, has previously discussed some approaches which work based on widely commercially available data sources.
There is an emerging defender’s advantage here in the age of LLMs, since exploratory work in visualizing and walking network graphs is getting much cheaper. You no longer need to buy Palantir and engage a “forward-deployed engineer” to cluster IP addresses. A non-technical fraud investigator could get an LLM to do that while eating at Chipotle, and the lunch would cost more.
This democratization of capabilities is relevant to journalists, formal and otherwise, and also to governments. RFPs and software contracting once de facto mandated a multi-year lead time to do an automated network analysis if an analyst thought perhaps their program might need one. Now that is an afternoon’s work, if we allow ourselves to do it. We should.
Investigators should expect to find ethnically-clustered fraud
As mentioned, there is enormous visceral distaste for the conclusion that a particular fraud ring operates within a particular community. This is quite common. You should expect to find circumstances which rhyme with it when conducting effective fraud investigations. You should not abandon fraud investigation when you chance upon this.
People assume a level of ethical fraughtness here which is not warranted. You would, if doing ethnographic work on perfectly legitimate businesses across industries, routinely discover ethnic concentration rather than population-level representation everywhere you looked. The Patels run the motels. One doesn’t need to adopt grand theories about how certain groups are predisposed to becoming pharmacists or startup employees or line cooks; simple microeconomic reasoning explains reality easily. Firms hire the people they already know, like, and trust. That will routinely include friends and family, who are going to be much more like the founding team than they are like randomly drawn members of the population. This is the default outcome.
Fraudsters do have one structural factor here. Everyone wants to trust their coworkers. Fraudsters need to trust their coworkers will be loyal even upon threat of prison time. That necessarily selects for tighter bonds than the typical workplace. Madoff was a family affair, SBF was in an on-again off-again romantic relationship with a chief lieutenant, and neither of those facts is accidental or incidental.
That’s the other ethical dimension of being other-than-blind to concentration: so-called affinity frauds do not merely recruit fraudsters from affinity groups. They recruit victims from affinity groups. Madoff mobilized the social infrastructure of the Jewish community in New York and Palm Beach to find his marks. Community members certainly did not intend their charitable foundations to be looted by a fraudster. It was an emergent consequence of trust networks.
This also happens to “chosen” communities. FTX was, in material part, an affinity fraud against effective altruists, who are not a religion or ethnic group as traditionally construed.
And so when the great and the good turn a blind eye towards abuses because the perpetrators share an uncomfortable common factor, they are often simultaneously turning a blind eye towards abuses of a community whose interests they purport to champion.
High growth rate opportunities attract frauds
As covered extensively in Lying for Money, the necessary fundamental conceit of a fraud is growth in a business that doesn’t happen in the real world. “Every lie told incurs a debt to the truth, and one day, that debt will be paid”, to quote the excellent drama mini-series Chernobyl. Fraudsters forestall that day of reckoning by telling a bigger lie, increasing the debt, which (mostly as a side effect) alleges that they’re growing much faster than most of your legitimate portfolio. Happily, many businesses have figured out how to keep track of fast-growing customers. Tracking rocketships doesn’t require rocket science.
Sort-by-growth-rate-descending on new accounts will turn up a lot of interesting observations about the world. One is that Fortune 500 companies sometimes open new accounts, and you probably don’t need to open a fraud investigation file in that case. Another is that some people claim to be feeding millions of meals to a community of tens of thousands of people, beginning from a standing start, and growing local social services at a rate which an Uber Eats city manager would not expect to achieve in the wildest dreams of their go-to-market plan.
Feeding Our Future had a CAGR of 578% sustained for 2 years. Uber, during their meteoric growth period in core rideshare services, had an average CAGR of 226%. Their best year was 369%. But, if you asked in Minneapolis in 2021, you’d quickly find someone who had been in an Uber, but fail to find anyone who ate courtesy of Feeding Our Future. So curious, given that they were drubbing one of the fastest growing companies in history on growth rate.
Investigators in Minnesota were ringing the alarm bells for years about implausibly fast growth in Feeding Our Future’s reimbursement requests, including at new facilities. Feeding Our Future felt it was maxed out on the fraud it could conduct at existing sites, and expanded voraciously, including (most prominently) enrolling numerous restaurants as “feeding sites.” They then copy/pasted the usual playbook and requested reimbursement for implausible volumes at those sites, paying kickbacks to many participants. This then required growing the fraud, which… you get the general idea. We could have gotten off the bus at many points, and I suppose that is at some level a question of political will.
The highest growth rates in the economy generally are newer fields (you basically can’t sustain the alternative). This doesn’t imply that those fields are fraudulent, but they will tend to disproportionately attract frauds. The defenders in those fields have not yet paid their tuition to the School of Hard Knocks, and so attackers target the weaker systems. The higher growth rates of legitimate businesses function as protective cover for high stated growth rates of illegitimate businesses; a CAGR of 1,000% looks implausible for a restaurant but barely-meets-expectations for an AI software shop.
And, not to put too fine a point on it, many people are invested, literally and metaphorically, in whatever today’s new hotness is. People who could not secure an allocation in the more legitimate ends of it will sometimes find themselves adversarially selected by less salubrious actors. This will read to those people as a justly earned success. They might even have their marketing department write up their victimization as an indisputable success.
And so, if you’re a defender who has many different lines of business and has limited resources (or political will), where should you deploy those resources? Should you place your bets on e.g. Social Security, a multi-trillion dollar program whose primary source of growth is fun to conjure but then requires 70 years of seasoning? Or should you place them on the Paycheck Protection Program, or pandemic-era unemployment insurance, or genetic testing, or non-emergency medical transportation? Despite those being smaller line items, they probably have more juice worth squeezing, and the fraud is more easily detectable. Just look.
Fraudsters find the weakest links in the financial system
Bits about Money has extensively covered anti-moneylaundering and Know Your Customer regulations and I won’t rehash those regimes here. A bit of tacit knowledge in the financial industry: some actors in the set “broadly considered trustworthy” are more worthy of trust than others… and some are less.
We are generally discreet about writing this down in as many words. But, as an analogy, cross-national regulatory bodies require that financial institutions maintain a list of high-risk jurisdictions to do business in. You are generally required to do enhanced due diligence on customers/activities/etc touching the high-risk list.
If you are particularly competent, and there are plusses and minuses to being competent in detecting fraud (you will not be the most popular person in the firm at bonus time; that goes to the folks who sold the high-growth accounts), you might have the analogous list of U.S. financial institutions which are not entirely fronts for the bad guys.
If one hypothetically has that list, that’s one more signal you can use in evaluating any particular account, and a one-stop shop for developing a list of accounts to look into. It would be uncouth of me to name an extant bank that has poor controls, but for a general example of the flavor, see my (scathing) commentary on Silvergate’s AML and KYC program. Without using any proprietary information, I predict confidently that Silvergate banked many more multi-billion dollar frauds as a percentage of its customer base than almost any of the U.S.’s 4,500 banks. (Trivial substantiation: divide FTXes-banked by total-count-of-customers.)
One might, if one has never seen the list, wonder whether it is simply proxying for something the financial industry is definitely not allowed to proxy for. One of the first things you learn as a data analyst is zip codes are extremely probative and you are absolutely not allowed to use them. The American system remembers the experience of redlining and has forbidden the financial industry from ever doing it again; the industry mostly respects that. But good news: institutions with weak controls environments are not, in fact, simply a proxy for “Who banks socially disadvantaged people?” There are many financial institutions that have that as an explicit business model. Some of them are good at their jobs. Some, less so, and the fraudsters know it.
This sometimes happens with the knowing connivance of the financial institution and/or their staff. For much more on that, see histories of the savings and loan crisis, or the Lying for Money chapter on control frauds. But more commonly it is simply a community of practice developing organic knowledge about who is just very easy to get an account with. You need accounts, as a business. As a fraudulent business, which intends to cycle through accounts and identities at a much higher rate than baseline, you would prefer to do business with a bank which will not detect that malfeasance.
And so you will disproportionately end up banked, with many of your buddies, at the least attentive place still capable of getting a license. And so an agency, trying to find a fraudulent network, might want to look at fraud-cases-by-routing-number and then start making some judgment calls.
One of the reasons the government has deputized the financial industry is it is good at keeping spreadsheets and quickly responds to requests for them. Perhaps the government should call up a few of their deputies and say “So, not alleging anything here, but we think you might have a list, carefully maintained by your fraud department for your own purposes. We want to see the list. It would be pro-social of you to give us a copy of it.”
Frauds openly suborn identities
There is a thriving market in identities to be used in fraud. This is because bad actors prefer not putting their own names on paper trails certain to become evidence, because they frequently “burn” themselves early in their careers, and because institutions have cottoned onto the wisdom of collecting lists of ultimate beneficiaries.
Sometimes this is a social process, conducted at e.g. the dinner table. Sometimes the market is explicitly a market. Jetson recounted that, having exhausted the supply of patients needing dialysis who could plausibly need ambulance services, frauds began bribing potential patients, first with donuts and then with cash. This is extremely common. In Minnesota, parents were recruited to childcare providers with the promise of cash kickbacks or (a detail we’ll return to in a moment) fictitious paperworked no-show jobs, sometimes at substantially fictitious companies.
Fraudsters sometimes exercise some level of operational discipline in their communications. The bad guys have also seen The Wire; they know Stringer Bell’s dictum on the wisdom of keeping notes on a criminal conspiracy. However, the population of people willing to be named in a federal indictment over $200 necessarily selects preferentially for individuals who are not experts at operational security. They will sometimes organize recruitment very openly, using the same channels you use for recruiting at any other time: open Facebook groups, Reddit threads, and similar. They will film TikTok videos flashing their ill-gotten gains, and explaining steps in order for how you, too, can get paid.
As a fraud investigator, you are allowed and encouraged to read Facebook at work.
Now, knowing that there exists the frequent epiphenomenon where fraudsters recruit strawmen to use their identities to qualify for payments: suppose that you have an entirely new enterprise whose first customers are individuals A, B, C, and D. You know, from past records, that A, B, C, and D have all been customers of an organization which you now know, positively, was a fraudulent actor. You might infer from this that A, B, C, and D might have sold their identities once, but you probably don’t have sufficient information to convict them in a court of law of that. (It is of course possible that they are simply unsophisticated, or that bad actors obtained their information without their knowledge, for example by misappropriating a client list from a previous corporate entity they happened to own/work for/etc.)
But do you have enough information to take a more-detailed-than-usual look at this totally new enterprise? I think you do.
Asymmetry in attacker and defender burdens of proof
We have choices, as the defender, in what levels of evidence we require to enter the circle of trust, what our epistemological standards are, and how much evidence we require to forcibly exit someone from the circle of trust.
A detail from the Minnesota cases is that these burdens are asymmetric, in a way which disadvantages the defender (all of us). That decision is a choice and we should make better choices.
For example, the primary evidence of a child attending a day-care was a handwritten sign-in sheet of minimal probative value. Prosecutors referred to them as “almost comical” and “useless.” They were routinely fraudulently filled out by a 17 year old “signing” for dozens of parents sequentially in the same handwriting, excepting cases where they were simply empty.
To refute this “evidence”, the state forced itself to do weeks of stakeouts, producing hundreds of hours of video recording, after which it laboriously reconstructed exact counts of children seen entering/exiting a facility, compared it with the billing records, and then invoiced the centers only for proven overbilling.
On general industry knowledge, if you are selected for examination in e.g. your credit card processing account, and your submission of evidence is “Oh yeah, those transactions are ones we customarily paperwork with a 17 year old committing obvious fraud”, your account will be swiftly closed. The financial institution doesn’t have to reach a conclusion about every dollar which has ever flowed through your account. What actual purpose would there be in shutting the barn door after the horse has left? The only interesting question is what you’ll be doing tomorrow, and clearly what you intend to do tomorrow is fraud.
We can architect the asymmetry in the other fashion: legitimate businesses will customarily, as a fact of their operations, put enormous effort into creating visible effects in the world which are trivial to check. In technologist circles this is sometimes called a “proof of work” function.
Once upon a time, a team of fraud analysts asked how they could possibly determine frauds from non-frauds without having extensive industry knowledge about every possible commercializable human activity. I suggested that a good first pass was “Just ask the correspondent for a quick video, shot on their cell phone, of their workspace.”
That is minimally invasive for the business owner, generates a huge amount of signal (including that which can be correlated across accounts), and can be usefully adjudicated by non-specialists in a minute. No multi-month stakeout of their storefront is required. Of course you can convincingly fake a video of working in, say, a machine shop, but fraudsters maintaining spreadsheet row 87 about the machine shop will find that difficult to juggle with all the other required lies in their backlog. Actual machine shops, meanwhile, include people, which means they include functional cell phone cameras at no additional cost to anyone.
You can also get some signal from who can trivially produce a video and who needs a week of advance notice to find a cell phone to record those machines that were absolutely milling aluminum last week.
Fundamentally, we have a choice about where we put our investments in defanging fraud, and we should stop choosing to lose.
So-called “pay-and-chase”, where we put the burden on the government to disallow payments for violations retrospectively, has been enormously expensive and ineffective. Civil liability bounces off of exists-only-to-defraud LLC. Criminal prosecutions, among the most expensive kinds of intervention the government is capable of doing short of kinetic war, result in only a ~20% reduction in fraudulent behavior. Rearchitecting the process to require prior authorization resulted in an “immediate and permanent” 68% reduction. (I commend to you this research on Medicare fraud regarding dialysis transport. And yes, the team did some interesting work to distinguish fraudulent from legitimate usage of the program. Non-emergency transport for dialysis specifically had exploded in reimbursements—see Figure 1— not because American kidneys suddenly got worse but because fraudsters adversarially targeted an identified weakness in Medicare.)
Attackers carefully respond to signals they think they are being sent from defenders. A lawyer for some of the Minnesota defendants, Ryan Pacyga, was quoted by the New York Times as saying that his clients understood Minnesota to tacitly allow their actions.
> No one was doing anything about the red flags. … It was like someone was stealing money from the cookie jar and they kept refilling it.
Don’t be the defender who sends that message. It will not work out well for you or your program.
Fraudsters under-paperwork their epiphenomena
Most frauds have rich external lives, with a soaring narrative of how deserving people are getting valuable services (and/or getting rich for being right and early regarding e.g. crypto asset cross-margining). They tend to be distinctly underpaperworked internally, partly because a synonym for “paperwork” is “evidence” and partly because… most frauds aren’t really that sophisticated, when it comes down to it. There is a true number; lie about it; done.
Like many time-pressed entrepreneurs busy talking to potential customers, fraudsters put the minimal amount of time necessary into bookkeeping and even less than that into paperworking epiphenomena of their frauds. One example of epiphenomena is sometimes the beneficiaries need their own paperwork. A legitimate mortgage company employs sales reps and a backoffice to help unsophisticated customers successfully get several hundred pages of paperwork together to sell a mortgage. Frauds… mostly don’t do that.
And so, if you have e.g. a statutory requirement that a beneficiary be employed to access services, a fraudster might say “Don’t worry about it!” They’ll just assert that you are an employee at a cleaning company. Perhaps they might even go as far as payrolling you as an employee of a cleaning company. This kills two birds with one stone, paying you your kickback while also generating the paystub they need you to have to qualify for the government reimbursement. (This happened, per the OLA’s reports summarizing the results of many investigations, in Minnesota.)
But fraudsters don’t actually operate cleaning companies even in those cases where they do operate daycares.
Cleaning companies are legitimate businesses, in the main, and working for one is an honest occupation. And so a fraud investigator should feel no chagrin at calling a cleaning company in the phone book and asking for a quote. A cleaning company which expresses complete befuddlement that someone could ask for a quote is providing, ahem, evidence in a direction.
(I have to note, as someone who pays to send children to a private school, that there is replete evidence that the school is accepting new children, knocking on the door and asking will quickly result in being given a brochure, and there are scheduled open houses and similar. I can imagine a gratuitously mismanaged educational establishment which does none of these things, and I can imagine an educational establishment which makes a lot of money, but I have trouble holding both thoughts in my head at the same time.)
The core frauds are sometimes hardened, to an attenuated degree. The peripheral frauds collapse under even a glance. Architect processes to require more signals regarding the periphery, then architect a system which takes at least a cursory look at the periphery. You will trivially catch frauds.
If you’re worried about exposing the exact signal that you are using, costing utility of it in the future, you can use this as a “parallel construction” engine. Develop leads for investigation using the non-public signal, pull the core records as a matter of routine, find the discrepancies that all frauds leave in their core records, and then put those in the indictment. Ask your friendly neighborhood lawyer if that passes muster or if you need to add a sentence rhyming with “was selected for a routine audit on the basis of information available to the department.”
Machine learning can adaptively identify fraud
We have discussed some heuristics [1] for identifying fraud. The financial industry still makes material use of heuristics, but a heuristic is a compression of the real world. It will sometimes lose fidelity to the world. It will frequently, by design, be legible to the adversary.
The defender has one advantage the attacker cannot ever replicate: data at scale. It knows what legitimate use looks like because it has all the messy, contradictory, varying quality, typos-and-all data which legitimate businesses in the real world constantly throw off. You cannot duplicate all of the shadows on the wall of Plato’s cave without first duplicating the entire world. Fraudsters, even quite talented ones, can’t do that.
There are any number of techniques for machine learning in anti-fraud; Emily Sands has previously discussed some with me. An important subset of the field can adapt in real-time or close to it to changes in adversary (or legitimate!) behavior. For example, covid surprised the fraudsters at the same time as it surprised every supermarket in the country, but the ex-post actions of the fraudsters and the supermarkets were very different. Revenue went up for both, but only one group actually runs a supermarket. And so by ingesting and constantly analyzing data from all users, including retrospective annotation of which users you’ve identified to be frauds, you get better and earlier signals on which users are likely fraudulent and which are likely not.
This can inform outright interdiction or the investigate-then-punish loop that we ordinarily expect from government. It can also inform less consequential, easier-to-reverse interventions. For example, rather than putting all users immediately through the highest-possible-ceremony process for application, you can let most users do a lower-burden process, saving the higher levels of scrutiny for those which signal greater likelihood of being fraudulent. Or you can default to approving more applicants and reserve more of your investigatory budget for post-approval review, with this being equivalently costly by using better tasking of those reviews versus random allocation. Pay-and-chase becomes more palatable if it is not pay-and-pay-and-pay-and-pay-and-chase and more pay-until-we-decide-to-chase-but-stop-payments-at-that-decision-not-after-the-catching.
Machine learning isn’t simply useful from a perspective of decreasing fraud. The history of regulation of benefits programs is the history of too-late, too-harsh overcorrection to notorious abuses. Much of what advocates find most maddening and Kafkaesque about eligibility criteria and application processes was voted on by a legislature but bears the signature of a fraudster with a novel idea.
With a good machine learning practice, you can increase data ingested but decrease the burdensome formal application/etc requirements. This is in no small part because those data points are less probative (they are under the direct control of the attacker and announce that they will be scrutinized). But it bears a dividend: if you better control fraud, and can successfully demonstrate that to the public and legislators, you can decrease application burden and perhaps even widen eligibility criteria. Those are both in the direct interests of potential marginal beneficiaries.
A political commentator might focus more on the optics here than on the substance, because that is so frequently where the point of actual leverage is in politics. But the substantive reality of fraud losses matters. It is much easier to tell the story of fraud in benefits programs being rare, opposed by all right-thinking people, and swiftly sanctioned when that story is not an obvious lie.
Frauds have a lifecycle
You can read Lying for Money or other histories of frauds for more detail on the texture, but in the main, a dedicated fraudulent enterprise is created, is seasoned for a while before crossing the rubicon, has a period of increasing brazenness, is detected, is closed, and then is resurrected when the fraudster gets the band back together from round N+1.
We can intervene against the lifecycle model if we understand it. This begins with not defaulting to the understanding of investigators that frauds are isolated incidents by disparate individual actors. Those have been known to happen, but frauds are, by total damage, dominated by repeatable business models perpetrated by professional specialized bad actors. We should study them like we study other successful entrepreneurs, and then not invest in them.
One actionable insight from the lifecycle model: because the fraudster intends to be in business multiple times in their life, we should track the person-to-business mapping much more closely than we have historically. As Lying for Money says, if you’re an accountant and willing to go to prison, and you do not get rich via fraud… well, you are very bad at your job. That’s on you. When we give you repeated chances to do it, that’s on us.
One might think that the simplest imaginable reform is passing some sort of beneficial ownership regulation to unroll complex corporate structures designed to obscure who is actually puppeting Totally Not A Fraud, LLC. But the simplest imaginable reform is probably just actually reading corporate filings that already exist and are public. Again, most fraudsters are not the hypersophisticated Moriarties of the popular imagination. The Minnesota fraudsters frequently did not even bother with fig leaves. While they did find some nominee directors in some cases, many of the convicted operated their companies in their own names, with no complicated structuring at all. Sometimes multiple times, consecutively, after the previous entities had worn out their welcome with Minnesota.
The Fed should not be surprised when the bad guys buy a bank when buying a bank requires an extended permission-seeking process and the bad guy’s corporate records, dutifully recorded by Maryland (entity D20033544), are signed by a notorious bagman. In the Fed’s defense, the bagman lied to them about his intentions, which was outside of their world model. (Pip pip to the New York Times for figuring that out before the Fed did. That is, sadly, not the usual way it works in financial journalism.)
Should we care about fraud investigation, anyway?
Responsible actors in civil society have a mandate to aggressively detect and interdict fraud. If they do not, they cede the field to irresponsible demagogues. They will not be careful in their conclusions. They will not be gentle in their proposals. They will not carefully weigh consequences upon the innocent. But they will be telling a truth that the great and the good are not.
The public will believe them, because the public believes its lying eyes.
[0] In a thing you will see frequently in fraud investigations, early detection of anomalies does not necessarily imply successful identification of the underlying fraudulent enterprise. A teacher was scandalized that a third of their students are using AI to write papers. Those “students” are identities puppeted by a criminal organization to siphon federal funding out of community colleges towards accounts controlled by the criminals. (I award myself one cookie for correctly predicting this.)
[1] A heuristic, in industry parlance, is a hard-coded rule or set of rules as opposed to a system which automatically adapts to changes in the underlying data. Compare the difference between “You are less likely to default on loans if you own versus renting”, which is absolutely demonstrable in aggregate data, versus “You are less likely to default on loans at 780 FICO versus 540 FICO.” For a variety of reasons, the culture that is legislators sees the problem with having one heuristic, which will obviously not come to the correct conclusion all of the time. It corrects for this issue by having several hundred pages of heuristics. Just one more heuristic, man, and we’ll have completely anticipated all the complexity of the world.
Heuristics are wonderful things! They’re cheap to adjudicate, easy to explain, and can be understood by lawyers, even the kind who have ascended from the practice of law to the writing of it. Happily, machine learning systems can have all of these properties if you make them priorities.
Want more essays in your inbox?
I write about the intersection of tech and finance, approximately biweekly. It's free.