Financial modelling with AI: Part 3

Generative AI tools such as ChatGPT and Copilot can assist with the test and implement phases of financial model development.By Liam Bastick, FCMA, CGMA

June 11, 2025

Editor’s note: This article is Part 3 of a three-part series on financial modelling with AI. Part 1 looks at the limitations of using AI when modelling; and Part 2 looks at using AI in the scope, plan, and design stages before building a model.

Over the past two articles, I have discussed how artificial intelligence (AI) might not be able to automate building models just yet, but that it might be of assistance in the preliminary stages of model development, ie, during the scope, plan, and design stages.

This time, I want to consider how AI can help in the stages after you have actually built a model. There are, overall, six stages of model development:

Two of those come after the build stage:

Test: Has it been checked and does the model do what it should? Here, the model can be reviewed by checking/validating the results and/or reviewing the formulae.
Implement: Do users know how to use it? Is it well understood? Model documentation can be developed explaining how the model works and how it should be operated and provide possible troubleshooting tips. Ongoing support or training may also be options provided.

AI can assist with a variety of topics that fall under these two stages. In particular, I will discuss:

Scenario analysis.
What-if analysis (including goal seeking).
Metrics and comparisons to industry benchmarks.
Providing explanations (documentation).
Visualisations (charting).

As before, I will use the scenario I created for Part 1 and Part 2 in this series:

Given the assumption that I am developing a model in Excel, the two most appropriate AI tools here are ChatGPT and Copilot (in Excel). Let’s see how they got on during a common analytical test whereby inputs are varied to check/assess the results (known as scenario analysis).

Scenario analysis

For both of the large language models (LLMs), only one prompt was needed to produce accurate scenario results:

This produced nine results (three indirect cost scenarios for each of the three tax rates). To check whether the AI tools calculated the results correctly, I created the results manually first (you could use a Data Table in Excel for example):

ChatGPT and Copilot both mirrored these precisely, albeit as values only and without the underlying formulas:

Both tools seem able to reproduce simple tasks, even if the results are hard-coded with little explanation of methodology.

What-if? analysis and goal seeking

ChatGPT and Copilot produced accurate values when asked to alter an assumption to arrive at a certain value of NPAT (net profit after tax). This is known as goal seeking and is often used to assess risk as well as model veracity. Both AI engines received the same prompt:

When I calculated the revenue growth figure, I found it needed to be -52.01% (ie, a decline). This took into account dependent calculations such as COGS (cost of goods sold) and operating expenditure, too:

ChatGPT and Copilot came up with similar — and correct — responses, easily understanding what “YOY” meant:

If I change the value manually, I can understand reasons for the value. Sometimes, the solution can be generated using a formula that readily identifies the key drivers of change. However, when I use ChatGPT or Copilot, I simply get numbers written on a page. If I am not a modeller, I may have no idea how to confirm the value cited. It gives you no feel for why the value is what it is or how to corroborate it.

At the time of writing, you can use various AI tools such as Copilot and ChatGPT with Python and Copilot in Excel to produce more computational results using code. Producing a spreadsheet with Excel code created — outside of Excel Tables — appears difficult if not impossible yet. But have no doubt: AI will continue to improve, and simple layouts with formulas should be possible soon (but do note, I am speculating; I have no inside information).

Sometimes, goal seeking becomes more complex. In this extension, both AI engines were asked to perform iterative calculations over multiple periods:

These were our findings:

Both ChatGPT and Copilot arrived at the same solution, respectively, with little explanation of derivation. If you are testing, this may be sufficient, whereas if you are seeking modelling assistance, it would not be so helpful.

The AI tools coped with a multi-dimensional analysis, too (similar to the what-if? variant above), where both indirect costs and tax rates were flexed simultaneously:

Using a Data Table (an Excel feature, not explained here, that solves this sort of request), these were our results:

ChatGPT again passed with flying colours, but something went wrong with Copilot:

Do you see the numbers are slightly different? The AI tools do appear to introduce errors as more and more computations are required. Given no calculation insights, it is difficult to surmise what has caused these minor errors. And therein lies the problem: AI tools can perform checks, but who checks the checks?

Metrics and comparisons to industry benchmarks

Another common post-build process is to undertake benchmarking analyses and perform ratio tests. Here, AI tools can truly assist. The images below are of our income statement, cash flow statement, and balance sheet:

ChatGPT listed a variety of commonly used metrics in its initial response. It chose to focus on year 5, the final period (extracts only):

Copilot was also given the same task. However, as of the time of writing, Copilot does not support uploaded images, so the data had to be pasted into the prompt bar. Nonetheless, once this had been done, it still provided plenty of test results for contemplation, albeit fewer outputs than ChatGPT. In either case, I would recommend that you check the results before you trust/accept the results.

By “leading the witness”, benchmarking is possible. For example, ChatGPT was asked:

It was then asked to find comparisons between the earlier metrics calculated to those of other (real) companies:

Its approach was sound. However, it had trouble with accurate referencing. In most cases, it fabricated values and placed sources as placeholders:

The Wikipedia article shown does not have any mention of gross profit margins. It also used this source for its next point on net profit margin:

ChatGPT had issues: It altered revenue growth to stretch the context to fit the article provided:

Further examples of false equivalency were found, one being comparisons between internal operating cash flow (OCF) and an external OCF that was 425 times larger. Its justification, which was they were of similar size, was not a conclusion I could support.

Copilot succumbed to similar issues, with inappropriate logic and fictional referencing.

Providing explanations (documentation)

As previously mentioned, AI tools can be good for documentation and training, explaining concepts, formulae, and model flows. For example, imagine you had the following valuation query:

Interactive help (even if informal) is popular with end users, as you do not have to wade through pages and pages of irrelevant documentation. Asking your question — as long as the answer is accurate — is much more preferable.

These AI tools are LLMs. With some assistance, documentation production is relatively straightforward compared to other, more complex, tasks.

Visualisations (charting)

Finally, focusing on various types of output presentation, ChatGPT, unlike Copilot, can create accurate visualisations (charts) of results after building the model. It was given this open-ended prompt with all three completed financial statements:

It responded with a list of suggestions that brought multiple pieces of information together:

It also asked if it should generate charts based on its suggestions:

At first its charts, which are images only rather than a spreadsheet’s interactive charts, seemed acceptable until you inspected them more closely. There were instances of incorrect y axis values and units and decimalised periods on the x axis:

This prompt was used to fix those:

The end output was better than expected. ChatGPT included an insightful multi-line graph depicting items like gross profit, EBITDA, and NPAT together:

It also included a visualisation of the makeup of expenditure and a cash flow snapshot depicting increases/decreases of cash flow components:

Indeed, more complex charts could be readily requested, eg:

Copilot had trouble with this task. Given the same prompt, it continually prepared erroneous charts with incorrect values, labels, and descriptions.

In a post-build world, ChatGPT again appeared to outperform Copilot.

Word to the wise

It remains entirely possible that using the same queries you will get vastly different responses. However, it should not detract from the fact that both Copilot (in Excel) and ChatGPT, the two AI tools you are most likely to use when building a model in Excel, are helpful in the testing and implementation phases, too.

Once more, responses need to be verified independently, but they can certainly assist where subject matter knowledge is limited. ChatGPT repeatedly provided better qualitative and quantitative answers and, reflecting on all three articles, is probably the better all-rounder for AI analysis of all your modelling work.

— Liam Bastick, FCMA, CGMA, FCA, is director of SumProduct, a global consultancy specialising in Excel training. He is also an Excel MVP (as appointed by Microsoft) and author of Introduction to Financial Modelling and Continuing Financial Modelling. Send ideas for future Excel-related articles to him at liam.bastick@sumproduct.com. To comment on this article or to suggest an idea for another article, contact Oliver Rowe at Oliver.Rowe@aicpa-cima.com.

LEARNING RESOURCE

AI-Powered Excel: Leveraging AI and ChatGPT for Supercharged Productivity

This webcast will have you streamlining your Excel work, research, and documentation, saving you time and effort in your day-to-day tasks.

WEBCAST

MEMBER RESOURCES

Articles

“Financial Modelling With AI: Part 2”, FM magazine, 2 June 2025

“Financial Modelling With AI: Part 1”, FM magazine, 23 May 2025

“Using the AI in Power BI to Do Root Cause Analyses”, FM magazine, 27 March 2025

“Excel Modelling: How to Implement 3 Types of Checks”, FM magazine, 25 March 2025