Improving how credit cards work under the covers

Patrick McKenzie (patio11) • Feb 24th, 2023

We’ve previously covered how credit cards are a legacy system, beholden to some decisions made decades ago. Those were motivated by the then-prevailing relationship between participants in the credit card ecosystem, and by the behavior which the card networks and banks believed would dominate the use of the system over time. Then, history happened as history has a wont to do. We now use very old systems with a very different world. They’re battle-tested, but sometimes creaky.

But in addition to being legacy, credit cards are a living system. They are actively being worked on.

Previously, I worked for six years at Stripe, which has over the last decade gone from being the payments processor of choice for developers to powering an increasing portion of the Internet economy. (Obligatory disclaimer: I have left full-time employment at Stripe and, while I am still an advisor there, now find myself commenting on them as just “part of my beat.”)

A few ships (industry jargon for new products and other engineering changes) have been publicly announced recently and I thought I’d break them down for you. They help address a bug which costs more than $10 billion a year and which has almost certainly frustrated you personally: transactions which fail for no observable reason.

Card numbers are toxic waste, and other infelicities

“Can a number ever be dangerous to someone?” is not a question frequently asked in math class, but it turns out that for 15 and 16 digit numbers the answer is “depressingly frequently.” This is downstream of a reasonable-at-the-time-but-in-hindsight-expensive decision to treat knowledge of card numbers as being evidence of permission to authorize transactions.

The credit card industry was not oblivious to the fact that the card by its nature has the number printed on it and must get handed to e.g. a waiter to run a charge, exposing this “secret” to someone not actually authorized to use the card in perpetuity. This was just a risk the industry was willing to take, backstopped by a contractualized fraud waterfall to insulate the customer from losses caused by bad actors.

Then came scaled disclosure and abuse of card credentials over the Internet. E-commerce and network-connected IRL card processing systems aggregate thousands and millions of financial credentials. Early in the e-commerce boom, shopping was becoming more convenient than at any time in history. For the first time, you could order anything you wanted while dressed in a bathrobe. Even if what you wanted were, say, ten thousand purloined credit cards.

And the payments industry realized things had to change. That change happened like all infrastructure improvements do: slowly, over time, as the result of lots of people at different organizations putting in unglamorous engineering work in service of a better world.

One change was moving from persisting PANs promiscuously. PANs are “primary account numbers”, which are an industry term of art to refer to the long number usually printed on your card. (In casual usage, sometimes we also use PAN to refer to other information which will often imply authorization, such as the card’s security code, parts of one’s address, etc.)

PANs are a type of data that engineers sometimes call “radioactive.” You would prefer to not deal with radioactive materials. Sometimes you have to, regardless of your preferences. Given that you have to, you want to be extremely aware of the fact that you’re working with radioactive materials, limit staff’s exposure to them, have an extremely defined workflow for them, know exactly where the radioactive materials are at all times, and log absolutely everything.

So one of the first changes to how the world uses PANs was called “tokenization.” Businesses would prefer at the margin to not store PANs. However, some business decisions were so critical for revenue that they're worth doing even if they require storing PANs to implement. Two in particular were “save your card to use next time” and recurring billing for subscriptions.

Tokenization lets customers have their convenience without businesses needing to keep the PANs around anywhere they can be stolen.

The way this worked when I first implemented Stripe Payments in my own business, back in 2011, was that Stripe gave you a bit of code to put on your website. It would intercept the PAN that your user provided and communicate it to Stripe, and not to your system. Stripe would give you a “token”, a short code that substitutes for the billing relationship, which you could use to e.g. attempt to charge the card (in a few seconds or in the future, depending on your circumstances). This way if e.g. your site later got hacked, you would (hopefully) not have a bunch of PANs around to leak to the attacker. You’d have the tokens, but tokens are bound to your own account. An attacker could not use the tokens to charge the user and exfiltrate money; the worst they could do would be to charge the user and cause you to receive money, which (not being a criminal) you’d return.

Stripe pioneered this sort of use of tokens, and almost everyone in the industry now uses something intellectually downstream of this approach. But there exists another type of tokens, called “issuer tokens.” Issuer tokens are quite similar, except instead of being created by Stripe so that a business can substitute for a customer’s PAN, they’re created by banks and other card issuers so that a business or processor can substitute for a customer’s account information.

That "customer's account information" is richer than a PAN is.

Credit card numbers change more often than many commercial relationships

To limit the possibility of accidental misuse following disclosure, credit card companies re-issue cards every few years. That didn’t pose a problem for the core motivating use case historically, a business traveler paying a restaurant for dinner in a town they didn’t live in.

Well, actually, it did, and credit card systems have substantial under-the-hood work to avoid causing cards to break for someone when they expire. These include both grace periods (honoring cards that are past their printed expiry date) and out-of-band solutions. For example, many issuers are willing to FedEx you a card overnight, anywhere in the world, if you didn’t realize yours was expiring during e.g. a business trip. Some even do this at their own expense. (I sometimes feel like those of us on the technical side of finance don’t appreciate how well these systems actually work, which one would expect from core economic infrastructure that was built by very smart people over decades. Implementation infelicities aside, credit cards are a quiet triumph of people banding together to solve problems.)

But cards are still typically on a 3 to 5 year replacement cycle, while utility contracts are routinely on a (not synchronized!) 5 to 7 year cycle, cell phones are on renewable 2 year cycles that routinely stretch into multiple decades, life insurance premiums can get paid for 30 years in the hoped-for case, etc etc. And breaking all of these billing relationships each time a card expires causes substantial friction for customers and businesses.

Sometimes, this results in e.g. the power getting turned off or a life insurance policy getting canceled, because either the business or consumer bobbled handling the changeover. Those are real and concrete harms. They were largely accepted as an unfortunate tradeoff of keeping the card ecosystem operating at risk and cost levels acceptable to society.

When we talk about this problem in industry, we don’t say “the power was turned off” or “a life insurance policy was canceled.” We usually say “spurious declines.” But that’s a very bloodless way to communicate that a human wanted to do something, where they were unambiguously allowed to do it, and they were told No by a computer, for no real reason. Some things people want to pay for are very important to them, and it is very important for society that they get those things predictably by paying for them!

Issuer tokenization lets businesses keep secrets more securely, for the same reason that Stripe’s tokenization did back in the day. A hacker can’t steal the PAN you don’t have in your custody, and gets no value from stealing the token itself. And tokens introduce a virtualization layer.

Remember, we don’t care about PANs because they are account numbers. We care about them because they are presumptive evidence of an agreement by a customer and a business to establish a billing relationship. A token is much stronger evidence, because a) they can only be created by a real business actually operating the network machinery to talk to an issuer and b) they can be repudiated at will by a business, issuer, or (de facto) by a consumer, without consequence to other uses of the underlying account. You may have had the experience of having a card stolen and needing to get the bank to give you a new account number, and then spending hours informing companies about the new number. If a token gets repudiated, it is non-event for the customer, the bank, and every other organization in the world.

So do we need to expire issuer tokens every few years, like we expire credit card numbers? Reader, we do not. Banks will let you keep using an issuer token until one of the four-ish parties to the transaction (customer, business, card processor, and bank) asks to stop.

And this means that, even if the credit card number changes due to expiry, loss or theft of a card, or similar, the issuer token continues functioning. Customers and businesses don’t experience the same friction, the same power outages and insurance policies being forcibly canceled, that they did previously.

Businesses should probably use issuer tokens in preference to PANs, but this is a hard under-the-hood implementation detail. If your business has a Head of Payments, you probably know this and were careful to ask your credit card processor “So do you support issuer tokens?” If you’re the typical mom-and-pop business in the economy, you should never have to think about this at all, just like you never think about how electricity transformers work.

And if you’re a typical consumer of credit cards, you’ll never need to understand that tokens are working in the background. You’ll just fail to notice that you get asked a lot less frequently to update your credit card information, and should you lose a card or have one stolen, the cleanup process will be much easier than you expected it to be.

It will appear to you that some businesses you work with learned your new number before you did.

An aside about canceling gym memberships

One conversation I fairly frequently have with people who are not in the payments industry is how gyms make it hard to cancel their services, which is true and lamentable. And, they follow, they are glad they can simply cancel the card instead.

“You probably need to cancel with the gym anyway”, I’d tell them. Because simply causing a payment credential to not work anymore is actually not the way to terminate a contract. “Yeah, but what are they going to do?”, they would ask.

Inside the payments system, gyms could e.g. use issuer tokens, or other technological measures cooked up by the card networks, to avoid having the cancellation of a card actually cancel the gym’s ability to charge your account.

My friends often express shock and outrage at this. And, how to put this gently, they had previously depended on a bug in the credit card system. The system failed to work, and failed to work in a very particular way, which happened to be in their interests vis a particular relationship. But this bug was not in the interests of all people, and it was not positive in the context of all relationships.

Most people who find long-term contracts canceled for non-payment when their card changes have just suffered surprise and harm, sometimes serious harm. This even includes some people with gym memberships. Believe it or not, some people actually have gym memberships intentionally! And goodness knows the gym has the gym membership intentionally!

Credit card networks do not know which people in the world are relying on the old bugged behavior and so cannot selectively maintain it. They are disinclined to make value judgements like “Hmm, gyms are evil, we don’t want them to automatically get updated credentials, but health insurers walk the path of righteousness, and we do want them to automatically get updated credentials, unless either of these would cause the wrong result in which case we should intuit that on a case-by-case basis using magic.” Instead, they publish rules and then enforce them, and bind banks and businesses to operate under them, in a mostly deterministic manner.

And so you should cancel by going through the cancellation procedure with whatever business you want to break up with, and not by simply ghosting them. Which, yes, some businesses are not upstanding citizens on this score.

Good news: the financial industry has got your back. Just complain to your bank; they will overwhelmingly back you and return your funds. If the first try doesn’t work say ‘Reg E’ when you try again. (I should probably eventually write the user’s guide to Regulation E, which in addition to being an important regulation in the U.S. has an absolutely magical effect as a shibboleth when said to a financial institution. I previously ghostwrote letters which cited it dozens of times effectively.)

But you actually do need to cancel contracts you’ve agreed to, and not assume that simply defaulting on your contract has the same effect as cancellation. Note that your gym probably has had more lawyers read their contract than you have. Even if you entirely abandoned the financial system and retreated to a hermitage, they’d still attempt to enforce their contractual rights against you. In the U.S., this will frequently see your debt getting sent to a debt collector, which will be an extremely unpleasant process to resolve.

Collaborating to better calibrate risk scoring

A salient feature under-the-hood of credit card networks is that, every time you make a payment, multiple entities need to make a decision on whether that payment should go through or not. It needs to be approved by the business, the credit card processor, the network, and the issuing bank, very quickly. The “performance budget” is a few hundred milliseconds for many of these actors.

Historically, each of them made this decision independently and redundantly, without sharing notes between each other. Which sounds a little silly when you write that out in words, right? And in addition to this being an obvious duplication of effort, it causes a stupefying number of spurious declines.

A disconcerting number of spurious declines are caused by… gremlins, man. None of us know. Not the payment processors, not the credit card networks, and certainly not front line banking staff if you were to call them. The global economic symphony that processes transactions twanged out a discordant note. Everyone is too busy furiously playing to track whose finger slipped.

But the more typical case for a spurious decline is that the bank actually had a reason for it. They were trying, in good faith, to block a transaction they suspected of being potentially fraudulent. Some banks are very good at this; many are… less good.

One study found the false decline rate was 1.16% on average in e-commerce, which implies (some fraction of) $11 billion (and increasing rapidly!) in economic damage per year. As anyone who has worked in e-commerce knows, customers bounce over surprisingly small amounts of friction introduced into a process. Telling them that their card was declined and to restart the transaction is very high-friction relative to things we routinely sandblast out of payment flows, like e.g. a single redundant field.

Have you ever wondered how much information a bank has vis a particular credit card transaction? Surprisingly, it is both “a galactically huge amount” and “almost nothing, actually.”

They have KYCed you, one would hope, and so believe they know what typical usage looks like for you. They also have a portfolio of hundreds of thousands or millions of other users, and can make inferences across the portfolio on which transactions are more likely to be fraud versus which are not.

But they only see their own users’ transactions, not all transactions on the credit card network. And they get almost no information about the context of the transaction from the business. There are three so-called levels of transaction-specific data, and to save you a scintillating dive into the difference between Level 2 and Level 3, just round it to “banks get tweet-length transaction requests and virtually no other context from businesses.”

Many people assume this is probably for privacy reasons, but a bigger reason is “Well, in the 1960s, we didn’t expect anyone would want to have to type up a memorandum every time to get a credit card transaction authorized, and didn’t expect that computers would do almost all the transaction processing in the future. We still used machines which made an audible cachink-cachink noise when taking a physical imprint of the embossed portion of the card. That was an important labor-saving device!”

So here is a bit of context that many, many businesses would want to pass along to the bank: “I am really, really sure that this transaction is good.” Businesses could have many reasons for thinking that! Maybe they are e.g. an airline and this transaction is by someone who has passed government ID screening already, has flown 400,000 miles, and is trying to charge the airline’s co-branded credit card. Maybe they are a software company and this is for the 47th consecutive month of a B2B service agreement. Maybe this transaction is ancillary to a larger commercial transaction which was extensively derisked over months, like e.g. the filing fee on certain mortgage-related documents. Maybe they run their own extremely sophisticated fraud department, like e.g. Amazon does, and they have terabytes of data and very smart people which can calculate the likelihood of fraud down to a basis point in the context of their own business.

You can’t, under the traditional protocols for credit cards, communicate any of this with a credit card charge. And so many banks, which understandably want to protect their customers and which have commercial obligations to the credit card companies to protect the viability of their networks, will use sometimes blunt instruments to detect and block fraud

Some common heuristics historically include “Hmm, is this charge happening in the city we expect you to be in?” or “Hmm, is this the second transaction for the same amount in a short while?” or “Hmm, have we derisked this particular business by seeing them successfully issue millions of charges?”

People move around. People routinely get a second cup of coffee. People start new businesses. People run small boutiques. And so these blunt instruments routinely impose costs on real people, in the service of decreasing fraud.

A better tradeoff: let the business tell the bank where it thinks a transaction is on the risk spectrum. The bank can use this on an advisory basis. They can use it as one of several signals it uses to underwrite the transactions, in addition to their other advantages (like KYC data, comparison against transactions happening over their customer portfolio, etc).

Automatically adjudicating user intent in real time rounds to impossible in theory, but in practice, we’ll have the same working system we have today but better. And because businesses, processors, and banks have structurally different views on every transaction, the result of merging diverse signals is better than any individual actor could construct.

Now back in 1960, getting this information from the business didn’t make sense, at all. What additional insight is a bakery going to have as to whether a credit card transaction is good or not? Zilch. And, moreover, who at a bank has time to individually debate this with every bakery for every purchase of bread? So the standard network protocols in the card industry do not support this negotiation.

Enter Stripe’s direct partnerships with card issuers. Stripe runs a very large network with many, many businesses in various industries in dozens of countries; they have publicly stated their volume for 2021 was about $640 billion. That implies no small amount of transactional data. And Stripe operationalizes that data to do automated fraud scoring, via Stripe Radar. Radar boils down several hundred signals Stripe can observe into a number between 0 and 100 predicting transaction riskiness.

Radar protects businesses that accept payment instruments from fraudulent transactions. Businesses care about that keenly because they bear the primary economic burden for fraud, under prevailing regulations and commercial practice for the card networks and issuers. So the typical use case is businesses will, depending on their risk tolerance and margin characteristics of transactions, bounce (or flag to a fraud department for secondary screening) transactions above some riskiness threshold, and automatically approve low-risk transactions.

But the card networks have many participants who aren’t themselves taking payments, like e.g. issuing banks. Wouldn’t it be great if they could also benefit from Radar?

A card network and issuer can’t typically tell the difference between higher-risk SaaS (like e.g. file hosting) and lower-risk SaaS (like accounting software), but Radar certainly can. Mark those transactions accordingly.

A bank can’t see fraud happening on another bank’s customers in real time, but Stripe can. Mark those customers' transactions accordingly. If e.g. a bad actor makes poor life decisions and attempts to run stolen cards through Stripe, flow that knowledge over the networked graph of the economy very quickly at scale, rather than having it only be realized by some financial institutions piecemeal on individual accounts weeks later.

If you’re not immediately grokking this image, imagine a bad actor is discovered because they charged A, B, C, and D and A, B, and C report fraud. This implies a high likelihood that D’s card is also compromised, right? OK, so what would you conclude about likely fraudiness of D’s new transactions today: baseline or higher than baseline? What would you conclude about a new account for a landscaping service that happens to charge A, C, D, and E on the first day? What would you conclude about E’s new transactions?

There are many, many, many intuitions like those, and many that are less intuitive but which statistically work out extremely well. Given a sufficient amount of data and machine learning you’ll start doing things like accidentally fingerprinting the adversary’s working hours. Much of credit card fraud is done by a professional adversary, a real person with real fingers on the keyboard, who is quite like other workers in the payments industry. They have an HR department, a boss, and a quarterly performance review that they’re sweating. But, you know, evil.

You don’t want to use blunt blocks to interrupt the adversary. You don’t want to block the customer’s pre-existing monthly insurance payment! But if your card was compromised earlier today a novel payment to a novel business might be worth additional scrutiny! You can math out exactly how much with a team of smart people and transactional data covering a non-trivial percent of the Internet economy.

There is nowhere in the card network protocols to include a proprietary risk score. But that’s just an implementation detail. Banks are large institutions with smart people working at them. You can email them and eventually talk to the right smart people, and convince them that your risk scores have some merits to them.

And then you can negotiate a “side channel” for card transactions, and pass them e.g. risk scores in effectively real time with transactional data going over the usual rails. This avoids compromising user privacy. You don’t pass over e.g. the user’s entire history with the business, just like the bank doesn’t tell every bakery one’s bank balance each time they run a card. But you can giving signals for them to weight among their other signals.

Why not just change the card networks themselves? Well, one reason legacy systems take forever and a day to improve is that many people need to come to a collective decision on how to improve them. Consensus takes effort to achieve and is extremely costly. Some banks, such as many community banks, are very important to the ecosystem but would be extremely disadvantaged by the need to update their systems to accommodate even optional changes. And the value proposition for those changes starts as speculative. Why upend how the entire world does business just because a few geeks have a bright idea?

So here’s a mode of operation: instead of convincing literally the entire world to adopt a new model for risk scoring transactions, convince a few important firms to accept that model as an optional overlay for the existing networks. Run it for a while and gather evidence of effectiveness.

And lo, Stripe did this with several leading card issuers, including Capital One and Discover.

What did that negotiation entail? Well, mum’s the word, but you could reasonably assume that the banks had questions about how Radar works, whether this would be worth their time to implement, and whether it could be done in a way which respected their users’ privacy and didn’t harm other commercial interests of the bank. And you can reasonably assume they came to “Oh this is obviously a screamingly good idea.”

Why is it a screamingly good idea? Let me quote the marketing material:

Stripe users automatically benefit from the Enhanced Issuer Network, seeing an average 8% reduction in fraud and an improvement of 1%–2% authorization rate uplift on volume processed from Capital One and Discover.

This is absolutely bonkers. We’ve had credit card networks for more than half a century and it turns out that all you need to do is be able to pass one more number over the wire and it results in an 8% fraud reduction. That implies that the burden on legitimate businesses, who bear the ultimate economic risk for fraudulent credit card transactions, can be reduced by billions of dollars, without them having to take any action themselves. It is, from the perspective of businesses and consumers, a free upgrade to the financial industry.

(Radar, of course, required substantially more effort than “one more number.” But this is just a knock-on effect from telling Radar’s bottom line score to one more party than usual.)

Look at that authorization uplift, which is in many ways more interesting to businesses than the fraud improvement. (The base rate of fraud for most businesses is low, so 8% of all fraud is a lot less than some fraction of 1-2% of all revenue.)

Banks have a few different revenue streams from card issuing. Many of them are sensitive to payment volume; banks, like businesses and consumers, would prefer that good transactions go through. It is bad news for the bank if a customer, blocked from a transaction by an overly-protective system at a bank, uses someone else’s card to make it or abandons the transaction entirely.

And so Stripe’s pitch to the bank is “Hey guys, want to make 1%-2% more from your card issuing businesses with basically no downside? And we did all the hard work for you? Including years of hard science and experimentation to prove this actually works, with sufficient rigor that you’ll be able to convince internal stakeholders and your regulators that this is every bit the free lunch we represent it as being?”

This is enormously incentive compatible for every party to every transaction. Nobody’s favorite feature of cards was “Sometimes they randomly don’t work. This keeps you on your toes.” Customers want their transactions to go through. Businesses like revenue. Banks earn in rough proportion to their legitimate volume and so work aggressively to maximize it.

Stripe, of course, also earns fees on transactions. The impact to the business is even larger than the headline numbers, though, because you can use the fact of it in sales conversations. What are you buying with your fees paid to Stripe? Thousands of people working on novel, nowhere-else-in-the-industry features that routinely spit out results like “8% less fraud” and “1%-2% acceptance uplift” on major U.S. issuers.

Where do we go from here?

You might reasonably assume that Stripe is extremely willing to help other issuers also get that side channel to enhance the information they receive on their own transactions. This is straightforwardly incentive compatible for all parties and requires far less work for the marginal institution onboarded than the first partnership did, both for the participating issuer and for Stripe. And you can look at the list of 47 countries that Stripe does business in, and reasonably assume that few of them have banks which are indifferent to fraud and authorization rates.

This is quite similar to Stripe’s core product model of making APIs for developers to consume. New capabilities are hard to build right but easy to adopt. Then the adopters build things on top of them with speed and diversity the company could never have produced itself. Many businesses which benefit from Stripe's improvements here do not directly integrate with Stripe at all. They are customers of, e.g., Shopify or other platforms which use Stripe under the hood.

This is also a mechanism by which Stripe improves the financial ecosystem for people who are not directly Stripe users. Capital One and Discover cards are better today, across all possible transactions at all possible businesses, as a result of those banks being able to rely on Radar scores to improve their own internal fraud systems. Their customers passively suffer less fraud and complete more intended transactions, even if they never transact with a business powered by Stripe.

This is also, much like tokens were, a new pattern that can be reused and improved upon, throughout the industry. Improvements in computing, particularly the surprisingly novel fact of large U.S. banks having modern software engineering approaches (which I discussed on Odd Lots recently), have made side channels to widely consumed networks much more technically and organizationally viable than they used to be. Plausibly we should use them in many more places than just card payments. You could imagine bank transfers or securities transaction settlement as having similar network topologies.

We depend on many institutions which look, in broad strokes, like the credit card networks: critically systemically important and responsive to thousands of stakeholders with de facto veto capabilities over change. This approach of creating an overlay while maintaining the existing system is an interesting way to quickly iterate experimentally, with the goal of finding the sort of evidence it will take to convince central decisionmakers and motivate late adopting stakeholders.
And sometimes, those experiments really do discover improvements that create hundreds of millions or billions of dollars in free lunch.

Want more essays in your inbox?

I write about the intersection of tech and finance, approximately biweekly. It's free.