Mar 11, 2026

Governing the Generative AI Supernova – A Risk Framework for Financial Institutions

A clear guide to generative AI risk in financial institutions, from privacy and pipeline security to human oversight, fraud defense, and governance built for constant change.

By Mike Freiling, PhD


Introduction – A Whole New Dimension

I've always been fascinated by stories in which a perfectly ordinary object is opened up, only to reveal an entire new world we weren't previously aware of. In the science fiction comedy Men in Black, for instance, a small ornament dangling from a cat's collar is revealed to contain an entire galaxy. The children's classic Chronicles of Narnia opens as four children discover the far-off land of Narnia by walking into and through an apparently ordinary wardrobe in the spare bedroom of Professor Kirke's house.

What's fascinating about these newly discovered worlds is that they represent far more than simple objects or forces to contend with – they represent whole new constellations (whole new galaxies sometimes) of activity that must now be accounted for and factored into the surrounding reality.

The exploding world of generative AI (or genAI, as some call it) is like that. Continuing the astronomical metaphor, we might be tempted to liken its extremely rapid development to the explosion of a supernova, scattering new cosmic elements everywhere, from black holes to nascent stars to entire planetary systems. How is one to confront such a swirling mass – much of it still pure potential – which is throwing off new capabilities and new risks almost every day? In this article, we will try to get a sense of what those risks are, and how they impact the banking world.

First, let's survey how the world has changed with the explosion of genAI, and then discuss the new risk factors that have emerged as a result of this change. We will see how these new risk factors impact every one of the traditional functions and activities of a bank. Of course, we will also need to outline the steps that any financial institution should begin taking to prepare themselves for the world ahead.

Where It All Began – Symbolic AI

The disruptive power of technology has been with us for a long time, ever since banking evolved out of paper-based recordkeeping and began using computers. Possibly even before that. Technology has offered many benefits to the banking business: new and more sophisticated banking products, more information about customers and prospects, and a larger virtual footprint, even for small banks. Classical AI, in the form of "machine learning" algorithms, expanded these horizons. New and more sophisticated models could be rigorously tested and deployed in applications like credit scoring, loan underwriting, and payment fraud detection.

But technology has always been a two-edged sword. Increased complexity allows for more points of failure, not to mention more avenues for compromise or other adversarial actions. The recent explosion in genAI and large language models (LLMs) has added a whole new dimension to that complexity, and to the risks that come along with it.

Up until recently, most AI work in the banking world was based on an approach called symbolic AI, where the required knowledge was explicitly constructed out of building blocks that were recognizable in terms of the vocabulary of the banking world. An early credit decisioning system, for example, might have a rule like:

IF the applicant has ever declared bankruptcy
THEN decline the application.

Rules of this sort could be used in many ways to classify the cases presented to them and facilitate decisions that needed to be made — from granting a loan, to retrieving the appropriate legal document, to translating legal contracts from one language to another.

This simple rule-based approach evolved into what came to be called "machine learning" — algorithms that statistically evaluated the impact of many different characteristics on the success or failure of a decision, and organized them in sophisticated ways to provide numeric scores to clarify decision thresholds. Simple yes/no conclusions evolved into scores, for example:

IF the applicant has ever declared bankruptcy
THEN assign 100 points to the risk score.

The scores themselves were determined by statistical algorithms that estimated the impact each characteristic should have on a successful or unsuccessful decision. Scoring rules allowed many more characteristics to be incorporated into flexible decision models that could be "re-tuned" as needed to adapt to changes in the decision-making environment.

Symbolic AI — whether rule-based or statistical — had two key advantages from the perspective of a closely regulated industry like banking. First, it was deterministic: given the same inputs, it would always produce the same outputs. This regularity helped ensure that decisions were made based only on specific selected characteristics, and that other characteristics could be clearly excluded, as many banking regulations require. Second, it was explainable: every factor that went into a specific decision could be extracted and analyzed. Each business rule or statistical characteristic had a clear and explicit definition, and its impact on the final decision could be measured and communicated. A banking customer could be notified about the top three, five, or seven reasons for a decision, as mandated by the Equal Credit Opportunity Act (1974) and subsequent regulations.

The Evolution of AI – From Symbolic AI to Generative AI

Ever since AI was founded, the symbolic approach was always — at least implicitly — in competition with a completely different approach, often termed the "neural network" or "connectionist" model.

The neural approach was initially dismissed as inadequate. Knowledge in this model is not represented explicitly, but coded into a network of small calculating elements connected into complex networks, where calculations accumulate into a final answer. Early investigations found this approach lacking in power, primarily because of its limitation to a single level of calculation.

Multi-level networks resolved the initial problems. In the mid-1980s, Geoffrey Hinton and his colleagues discovered that by adding additional levels connected by "hidden nodes," the computational power of these networks would explode. It became possible to formulate training algorithms that incrementally adjust the calculation parameters based on a long sequence of examples. This development ushered in the era of "deep learning" systems that could classify cases according to very complex real-world criteria, without requiring complex up-front engineering.

As massive computational power and correspondingly massive data storage capabilities came on-stream in the 21st century, enhanced neural networks evolved from simple classification engines into machines that could synthesize a surprising variety of digital artifacts. Large language models (LLMs) proved capable of constructing coherent text — from fictional stories, to summarizations of current knowledge, to legal contracts. Static visual images could be created, from pictures of imagined scenes to faces of entirely new, non-existent humans. Formal artifacts could be made to seem legitimate, from legal documents to government IDs. Computer code could be constructed. Convincing conversations could be generated. Interacting humans could be emulated in dynamic videos to an amazing degree of realism.

All of this new power comes at a cost, however, in terms of greater risks and the loss of some critical capabilities enjoyed by traditional AI. The computational networks involved are massive — as of this writing, the current generation of generative AI platforms contain from 7 billion to around 1.7 trillion calculation parameters that are "learned" during training. Clearly, intermediate calculations based on so many parameters cannot be broken down into components comprehensible in human terms, forfeiting any ability to explain the rationale behind their decisions.

Generative AI capabilities are also open to just about anybody. The rule systems and statistical models of the symbolic AI era required meaningful technical skill to build. But the complex outputs of genAI can be created by anyone with a computer (or smartphone) and access to a genAI platform, many of which are still available at no cost. Potential bad actors who lacked the skill to hand-craft sophisticated fraud schemes can now create them with a few keystrokes. This "democratization of AI" will lead to a proliferation of AI capabilities in the hands of many more actors with adversarial intent. Fraud attempts are likely to multiply exponentially. Distinguishing the real from the fake will become an even greater challenge.

There is also a significant loss of control. In the symbolic era, dependence on third parties could be minimized — rule sets could be crafted in standard system languages like C or C++, models trained with in-house data, and then deployed as stand-alone algorithms with no external dependencies. Such an arrangement is simply impossible with genAI. No single financial institution, however large, has the resources to build a genAI platform from scratch. The capital cost to develop each of the GPT-4 generation of models has been estimated at between $100M and $200M — and those models are already a generation behind. Financial institutions wishing to use genAI are thus lashed to the mast of whatever platform they have chosen to utilize.

An FI also has no control over the training data. GenAI requires large platforms trained with enormous volumes of data, from the Greek classics to the latest corporate memos. Once this data has been baked into an LLM, it is impossible to remove selected portions. The FI is stuck with whatever training material has been used, which can be a source of bias or misinformation that bleeds into the bank's work products. In extreme cases, bias and misinformation can render those work products invalid, or problematic from a regulatory perspective. Yet due to the way in which genAI produces its results, there is no systematic method to test for bias or misinformation, or to guarantee its elimination.

Deploying Generative AI

Viewed from the perspective of its systems, banking activities can be seen as a collection of personnel and applications that process data in various ways — from recording a check deposit, to assisting a new customer. Supporting these tasks requires a complex network of pipelines that collect and move data, as well as humans to review work products and intervene if necessary. Generative AI opens up new possibilities for risk anywhere along these pipelines.

Some applications are simply not suitable for genAI. Tasks requiring decisions that are binary (yes/no) or quantitative in nature — such as transaction approval or loan pricing — tend to be better suited to symbolic AI, which has the characteristics required for effective management and regulatory compliance: repeatability and explainability. GenAI results are not designed to be repeatable. A certain amount of randomness is often built directly into the architecture in order to create an impression of natural spontaneity. Moreover, the fast-paced nature of genAI development means that platforms are constantly changing, so results are subject to change outside the control of any financial institution.

It is a practical impossibility to "freeze" the potentially trillions of an LLM's calculation parameters for each decision task just to guarantee repeatability. And as already noted, those parameters are not subject to human explanation, as effective management and regulatory compliance require. Traditional symbolic techniques — such as statistical machine learning — remain a better choice for tasks where calculations must be systematically repeatable or explanation of the results is required, such as credit decisioning, loan underwriting, and funds transfer approval.

This still leaves a wide range of possible uses for genAI, including:

- Marketing — creative collateral, illustrations, artwork
- Research and analysis — knowledge retrieval, document summarization, historical data collection, scenario generation
- Help and assistance — knowledge retrieval, instruction sequences
- Communications — legal contracts, document editing
- Quantitative — synthetic data sets, analytic models, computer code

The determination of what is suitable for genAI does not rest solely on the difference between quantitative and qualitative. It has at least as much to do with characteristics like the finality of the output and the presence of humans in the workflow who can review and shape the work products, possibly through several iterations.

The data must be protected. The data used for any decision task must be kept confidential, even as some of it needs to be presented to an LLM in order to complete the task itself. It is especially important to distinguish personal identification information (PII data) that can be used to identify an individual customer, from customer characteristics that are not PII. PII data must always be redacted before any data is passed to any genAI platform. Guarantees of confidentiality by third-party vendors cannot be relied upon.

Deployment of a genAI application therefore requires both pre-processing and post-processing steps. PII data is stripped out in the pre-processing step and replaced if necessary with internally generated tokens. Post-processing reverses the redactions, backing out the tokens and substituting the original PII. Data that is not PII must also be protected, as it can yield valuable information about a bank's internal policies. Techniques such as retrieval-augmented generation (RAG) incorporate customer data into the genAI request itself, restricting ingestion by the LLM, which helps minimize the possibility of data capture.

The pipelines must be protected too. A primary objective of any financial institution is to keep its pipelines flowing smoothly, without possibility of interruption or intrusion. Incorporating genAI anywhere in this network makes things more complicated. For example, when large, unstructured data is passed through a legacy network, there is a constant risk of "buffer overflow," causing excess data to bleed out into areas where it doesn't belong. Such vulnerabilities are hard to detect because they are intermittent, and they can play havoc with system operations, potentially bringing the entire system to a halt.

The bank has no control over the internal structure of its chosen genAI platform, and excuses like "It's the vendor's problem" are unlikely to carry weight with top management or with regulators. This is particularly true with respect to the risk of "injection" attacks, where bad actors attempt to insert spurious information — or worse, executable code — into complex data streams. The relatively small number of genAI platforms practically guarantees that they will be carefully studied by those who contemplate such attacks, because the economic payoff is potentially enormous. It is absolutely necessary to ensure that genAI artifacts are properly isolated and validated.

Significant resources are required to verify the safety, reliability, and integrity of any pipeline that carries genAI data. Steps must be taken to contain, isolate, and where necessary, bypass genAI applications whenever a problem occurs. Contingency plans must be in place for continuing operations during a bypass. Accommodating a genAI application into the bank's pipeline is not a simple task.

Even the content produced by generative AI must be carefully curated. When a legal document is created, it must be verified to ensure that all the details are correct. Marketing collateral needs to be reviewed for exaggeration and excessive claims. Visual content needs to be checked to make sure it does not convey a biased or inappropriate image of the institution. A script for new account openings, for example, must include all relevant restrictions — such as minimum balance requirements — whether it is delivered via chatbot or as instructions to call center employees.

Chatbot interactions also carry their own specific risks of alienating customers. Even the linguistic style of the chatbot must walk a razor's edge to avoid being overly obsequious on one hand, or excessively terse and unhelpfully rude on the other. The final interpretation rests with the customer and may be influenced by factors such as age, background, and education level. Dissatisfaction with poor customer service alternatives is already widespread, and the very complexity of genAI solutions makes it essentially impossible to guarantee up front that a trained LLM will have all the knowledge necessary to make appropriate recommendations at the right level of politeness.

Constant oversight of generative AI interaction is required. When up-front correctness of genAI responses cannot be guaranteed, financial institutions are likely to find it necessary to retain a "human in the middle" to monitor, review, and if necessary revise them. The degree of human intervention needed will depend on both the impact of the decision and the importance of the customer. For low-impact interactions, it may prove sufficient for the human to periodically review selected interactions against established guidelines. For high-impact cases — such as approving a large loan — or for high-value customers, the human may be required to review every case, or even interact with the customer directly.

The best people to review and revise genAI outputs are subject matter experts themselves. Contracts generated by genAI should be reviewed by legal counsel or a contract administrator. Suspicious Activity Reports and other artifacts related to financial crime should be reviewed by a BSA officer. Even routine chatbot interactions should be periodically reviewed by experienced customer service representatives. Ultimately, these potential "humans in the middle" need to be involved earlier in the development cycle — as genAI applications are trained and prepared, even before actual deployment.

Defending Against Generative AI

Adverse attacks are probably as old as banking itself. It's not difficult to imagine some shady Renaissance character trying to convince a Medici banker that he is someone else, in order to gain access to that someone else's funds. Modern banks are no strangers to these sorts of attacks, ranging from identity theft to forged checks to stolen credit card numbers and many more. Impersonation, in one form or another, lies at the heart of nearly all of them.

GenAI has opened the floodgates for new forms of impersonation. Most prominent is the ability to create a realistic dossier for an entirely fictitious individual or organization — simulacra of ID cards, certificates, government documents, real estate documents, and even realistic photographs of people who never existed. Identity synthesis, as differentiated from identity theft, will evolve to become a major fraud vector.

GenAI capabilities have also evolved to the point where they can be enlisted to create live-action sequences that appear to be initiated by a real person. Perhaps the most notorious incident of this sort was a fraud perpetrated on the giant UK engineering company Arup. The attack started when a junior staffer in Hong Kong received an email that appeared to be from the head office in the UK, requesting a secret transfer of 200 million Hong Kong dollars (over US $25 million). The staffer rightly became suspicious, but was then invited into a video conference to discuss the matter. Unbeknownst to the staffer, the video images on the call were deepfake imitations of the UK-based CFO and his staff. Satisfied by the call, the staffer proceeded to transfer the funds.

The mischief that can be engendered by these capabilities is almost unlimited. Financial institutions can be fooled into creating accounts for spurious individuals or organizations. Once an account is created, loans can be applied for and other transactions initiated. An account with one FI can be leveraged into accounts or lines of credit with others.

These same capabilities can be used to create convincing emulations of existing persons and put to work separating those individuals — or their organizations — from their funds. Even before genAI, one of the most difficult-to-detect fraud schemes, often called "business email compromise," made use of impersonation techniques to send email instructions from one putative member of an organization (e.g., the CFO) to another (e.g., a junior staffer in Accounts Payable), instructing them to issue an emergency wire or funds transfer. Attacks of this sort are essentially impossible for traditional fraud platforms to detect, since they originate from a legitimate employee, transmitted through proper channels.

Now consider the possibility that instead of an email, the junior staffer has a face-to-face video conversation with the CFO — who appears to be traveling outside the US and needs the transfer done quickly. Furthermore, suppose that the transfer requested is not a wire, but an internal transfer to a spurious account that was set up earlier for a fictitious organization and allowed to mature as it built a history of apparently innocent transactions. The difficulties of detection multiply considerably.

In March of 2019, a similar fraud was perpetrated against an unnamed UK energy subsidiary of a German firm. The target, surprisingly, was the subsidiary's own CEO. He was fooled by an AI-generated voice recording that seemed to come from his boss, the CEO of the German parent — precise enough to even mimic the German accent. The energy CEO was duped into sending EUR 220,000 to an account in Hungary. After arriving in Hungary, the funds were immediately transferred to Mexico and then distributed to various other locations. It wasn't until the UK CEO received a third call requesting another transfer that he realized his mistake. Note that this happened over seven years ago. The sophistication of these techniques has undoubtedly escalated since then.

These capabilities will no longer be limited to criminal organizations with a high degree of technical skill — they will be available to smaller and smaller fraud rings, even single individuals, since genAI platforms can be operated by almost anybody. We don't yet know all the uses to which such techniques will be put, but what we do know is that there will be thousands — perhaps millions — of potentially bad actors out there, ranging from high school pranksters to malevolent nation states, spending time, energy, and creativity crafting new ways of using genAI to penetrate banking systems. Under these circumstances, the number of fraud attacks will certainly increase, perhaps almost exponentially. Financial institutions will need to ramp up their detection capabilities dramatically, and develop whole new methods to identify and eliminate adverse attacks at scale.

Governance and Regulation of Generative AI

In its AI Update of April 22, 2024, the UK's Financial Conduct Authority, citing the need for regulatory resilience, issued a call for more collaboration with its regulated financial institutions in order to build an "empirical understanding" of AI as it develops, and a "consensus on best practice." This appeal pretty much sets the tone for how governance and regulatory issues will need to be handled in a world where new genAI capabilities come on-stream almost daily.

Managing AI risks will necessarily become ad hoc, incremental, and ultimately experimental. Many risks that accrue from the use of genAI are already known, but perhaps not well understood. There are undoubtedly other risks we presently know little or nothing about. The situation is too complex to predict these risks in advance or develop "once and for all" fixes. Risks will need to be addressed immediately as soon as they are discovered. Serious risk incidents will require all hands on deck to diagnose the problem and devise both short-term and long-term remedies.

Internal risk management will need to descend to "street level." Constant monitoring will be the order of the day, as internal risk management functions learn to work hand-in-hand with lines of business to spot emerging vulnerabilities early and take steps to remedy them. This will likely involve pushing risk management personnel down into the lines of business to work alongside the operational staff who will likely be the first to spot anomalies.

Regulatory relationships will follow a similar path — likewise becoming ad hoc, incremental, and iterative, at least insofar as possible. Even as regulators try to articulate new policies and guidelines, AI will introduce major gaps in the traditional risk management frameworks. How is the traditional model risk notion of "explainability" to be interpreted in light of the myriad hidden nodes in an LLM model, which almost by design are not explainable? Who is responsible for the risks posed by agentic AI? Just as internal risk managers will need to work hand-in-hand with their lines of business, regulators will need to work hand-in-hand with their FIs just to stay abreast of the exploding capabilities.

Looking to the Future

If one thing is certain, it is that the genAI supernova will continue to explode. For financial institutions, this represents both good news and bad news. The bad news, of course, is that entirely new and undreamed-of risks lie ahead.

The good news, on the other hand, is that some AI capabilities may emerge that enable FIs to fight fire with fire. For example, it might eventually prove possible to train "deep learning" AI systems to distinguish fraudulent genAI artifacts from real ones, or to recognize sophisticated fraud strategies such as cross-market manipulation.

The continuing development of genAI will also require changes to a bank's organizational structure. AI risk management will need to operate at "street level," but the need for centralized capabilities will also grow. Financial institutions will invest in highly skilled technical staff who understand the nuances of training, calibrating, and re-training genAI applications. These experts will likely cluster into "centers of excellence" that function as central clearing houses for new information, while distributing that information back to the lines of business as consultants and architects.

In summary, all risk management functions within a financial institution will need to become ad hoc, incremental, and experimental — in line with their dependence on front-line staff for discovery and mitigation ideas. Communications up, down, and across the organization will become more frequent, informal, and transparent, as the early disclosure of each new problem becomes an opportunity to learn and adapt quickly. Benjamin Franklin's words were never more true: we must all hang together, or we will all hang separately.

About the Author

Mike Freiling received his PhD from the MIT Artificial Intelligence Lab in 1977. That same year, he was named a Henry Luce Scholar at Kyoto University in Japan. In 1994, he was awarded the Chartered Financial Analyst (CFA) designation. Over the years, his work has focused primarily on AI and data science for the banking and financial service industries, including pension valuation, credit card underwriting, credit liability estimation, fraud detection, and stock market regulation. Mike serves as an Advisor to StandardC. These days Mike spends half the year in Kyoto, Japan, and has published 2 translations of Japanese poetry.