Informal LLM comparison (round 1): Which LLM model gives the best answer to a typical FinEd Black-Scholes question?

GPT-4 easily wins this round, Perplexity comes in 2nd. Both excel at improvement suggestions and expertly handle the follow-up. Claude-3-O is getting better at math. Gemini and Llama failed this test.

Apr 23, 2024

My prompt primes with a general persona (finance professor) per my habit. Here is the prompt I used across each of the five models:

“Act as a finance professor at a prestigious university who teaches derivatives. To follow is a draft practice question for students learning stock option valuation. After solving the question, make suggestions to improve the question. Here is the question:

While the risk-free rate is 3.0% per annum, a European call option with a strike price of $50.00 has one year to maturity. If the underlying non-dividend-paying stock price is $50.00 (i.e., the option is exactly at the money) with a volatility of 20.0% per annum, what is the value of the option?”

Three quick notes about the text of the question itself, the second paragraph1. It’s a hard question because no formulas are provided; it would be easier if the outer BSM and inner d1-d2 formulas were provided2. Further, there’s a calculation hurdle in the N(.) function, but only among tests with outdated calculator-based approaches. Finally, the parenthetical (“the option is exactly at the money”) is a bit of deliberate, redundant help. It is possible to write a more efficient question3, but we rarely want to serve the most efficient question when the goal is clarity and not cleverness in serving the assumptions.

My results (with links to shareable LLM conversations):

GPT4 nails the answer ($4.71) with precision and its entire solution is terrific
Perplexity is approximately correct due to tolerable rounding
Claude-3-Opus is approximately incorrect (it returns $4.42 which is merely -6%) due to strangely flubbing the final and easiest arithmetic step. I should mention that this is a big improvement for Claude, who previously flubbed most of my math.
Gemini 1.5 Pro and Llama-3-70B-T are both incorrect

All of the models additionally offered suggestions to improve the question (per the above prompt). To be honest, I’m dazzled by the improvement suggestions given by both GPT4 and Perplexity. Here’s my reason: off the top of my head (without preparation), I could probably think of a few, but the models instantly give me a set of six actionable, great ideas. They range from technically advisable (provide the BSM formula) to pedagogically sharp (interpretations and variants) to practical (GPT4’s suggestion to “challenge with real-world data” might be my favorite). If I had to vote, I’d give the nod to GPT4’s suggestions: all five are terrific, including the idea to ask for graphs!

My follow-up prompt is based on experience with learners; in finance, we so often observe market prices that vary from our model prices:

Let's assume we observe a trading price that is significantly lower than the model value computed by the Black-Scholes, in which case we could say the option's market price is "trading cheap" relative to its model price. What are the three most likely explanations for this price difference?

I only gave this to the two models that answered correctly. Again, I’m dazzled. I give GPT4 and perplexity a tie (both winners) on their response to the follow-up. Two of their three factors are sufficiently similar that I would equate them: volatility and technical factors (my bucket for supply/demand, liquidity etc). In regard to the third factor, GPT4 leans into model limitations while perplexity leans into market limitations. There’s enough here for a really instructive conversation on option valuation. The LLMs have basically synthesized most of the taught material into the chat. As instructors or learners, it’s a great place from which to start a deeper discussion.

As usual, I want to noodle on these findings. For the moment, I’m just experiencing the feeling that the LLMs can—and will, let’s plot into the future not get weighed down in current drawbacks—perform actions at the fundamental expertise layer better than a human teacher. It’s a bit dizzying (dazzling?) but there is a sense that the instructor can immediately shift into higher gears. In the case of this Black-Scholes, the traditional approach is too spend time on the formula and its implementation. But much of that activity can be delegated. I do have some thoughts on the process of learning the math, but I’ll write about that later. Personally, as a student, I spent a lot of my own learning time on the BSM’s math and its derivation. The preparation gave me a robust understanding of the formula, but I’m ambivalent about its ROI. Would I start today where I started then (e.g., Ito’s lemma)? No, I would not (unless my study focus was stochastic processes). Thanks for reading!

As with all of my writing, all text is my own unless I disclose otherwise including this prompt. The actual question (“While the risk-free rate is 3.0% per annum, a European call option…”) is fresh and mine although it’s in the spirit and style of John Hull with subtle differences due to my style informed by learner experiences. And I do have much experience on this topic. I’ve written thousands of question like this and, more importantly, I’ve engaged with learners (~20,000 conversations) who have supplied feedback such that I have quite a bit of experience drafting viable practice questions, calibrating difficulty, and avoiding little traps (mostly by committing errors). For more details of my approach to writing PQs, see my post:

fintechie by David Harper, CFA, FRM

The art of writing a great practice question

Contents Raise the standard—don't be lazy—because GPT-4 already writes more and better concept check questions. Read the text actively and well enough to select three key themes (focused + diffuse mode). Identify crucial footnotes and go to their source…

7 months ago · David Harper, CFA, FRM

You’ll notice that the first improvement suggestion by Claude-3-Opus is “1. Consider providing the Black-Scholes formula or a hint to guide students towards using the appropriate pricing model.” That’s a terrific suggestion that neither GPT4 nor perplexity offered.

Here is the same question written in the most efficient format that I can find (without LLM help):

“What is the value of a one-year, at-the money European call option on a non-dividend-paying stock with a current price of $50.00 and volatility of 20.0% per annum while the riskfree rate is 3.0% per annum?”

ai bucks by David Harper, CFA, FRM

Informal LLM comparison (round 1): Which LLM model gives the best answer to a typical FinEd Black-Scholes question?

GPT-4 easily wins this round, Perplexity comes in 2nd. Both excel at improvement suggestions and expertly handle the follow-up. Claude-3-O is getting better at math. Gemini and Llama failed this test.