Meaning Is in the Eye of the Algorithm
Three AI Agents Read the Same Report — Here's What Happened
In my last piece I discussed the feedback Dorothy ChatGPT and I got from Claude and Perplexity on a report we did based on an analysis of the 2024 10-Ks of Chevron, ConocoPhillips, and ExxonMobil and the 2024 20-Fs of bp, Shell, and TotalEnergies as discussed here. Claude’s feedback was unexpectedly negative — but for a funny reason. He didn’t properly read the prompt which made clear our analysis was simply based on these formal filing documents. He beat us up a lot for not doing a more comprehensive analysis.
But Claude was a good sport when I pointed this out and said:
“Ha! You caught me in my analytical arrogance, Bob. Let me put my methodology where my mouth is.
I appreciate the clarification of the document-constrained philosophy—that’s actually quite clever as a stress test of statutory filing utility. And I accept your two-part challenge with appropriate humility (although I maintain that Dorothy and I would make excellent sparring partners).”
Without being prompted, he then offered his own analysis of the company-specific data. I found that interesting and asked Perplexity to do the same. And Dorothy as well. One of the nice things about a machine is that it can have a kind of split personality.
Claude was much more independent than Perplexity in terms of his approach. Rather than taking our dimensions of extent of reporting vs. strategic integration, he chose to reframe it this way:
Y-axis: Strategic Coherence - How well do the disclosures form a logical, internally consistent story about transition management?
X-axis: Disclosure Depth - Granularity and completeness of quantitative and qualitative information
Based on this he then gave different archetype names to the six companies as shown in the table below. For reasons he didn’t explain he then went and suggested an alternative typology of strategic archetypes.
Three Archetypes of Six Western Oil Majors
Perplexity wasn’t as ambitious in her interpretation. She did not suggest alternative archetypes but provided her own interpretation of Dorothy’s. She did rename the axes as transparency and readiness and organized to organize the six companies into quadrants. TotalEnergies was high on both, and ExxonMobil was low on both. bp and Shell were high transparency and low readiness, and Chevron and ExxonMobil were the reverse. I guess even machines like to put their own spin on things by not going with the names given by another machine.
Although the agents broadly agreed on company rankings, their different framing led to subtly different conclusions. This shows that data alone don’t produce meaning. It is also based on one’s interpretive lens. This is revealed in the summary of their analysis, which I didn’t specifically ask for, that each agent provided. Here they are in their own language.
Dorothy
“📍Mini Synthesis
This filing-constrained read reveals a sector inching toward transparency — but not yet coherence. The core gap isn’t emissions data — it’s capital logic. Firms are willing to describe their emissions, but not how emissions shape decisions.
TotalEnergies sets the benchmark. ExxonMobil shows hidden rigor. The rest signal awareness, but keep the playbook vague.”
Claude
“Document-Only Insights:
The Scope 3 Divide: Only ConocoPhillips (US) and bp/TotalEnergies (European) address this meaningfully—suggesting regulatory arbitrage opportunity
Carbon Pricing Opacity: Even companies mentioning internal carbon pricing don't quantify impact—major analytical gap
CapEx Classification Absence: No company clearly categorizes capital as "transition," "maintain," or "traditional"—leaves investors guessing about portfolio evolution
Scenario Testing Vacuum: Except TotalEnergies, no quantified stress testing under climate scenarios—massive blind spot for stranded asset risk.”
Perplexity
“Filing-Constrained Takeaways
TotalEnergies is the clear leader, standing out for both transparency and intent-to-execution signals within the constraints of a statutory filing.
ExxonMobil is underrated — filings show a disciplined and material approach to internal integration, even if the company chooses to withhold some disclosure elements.
Chevron and ConocoPhillips operate as if transition is peripheral. Strategic alignment signals are weak or absent, even though emissions disclosures hit regulatory minimums.
Shell and bp carry sophisticated climate narratives, but lack enough quantified tie-in to strategy to satisfy a capital allocator’s scrutiny within the 20-F.”
Dorothy chose to provide a very high-level synthesis for the sector as a whole. Claude organized his summary in term of reporting themes. Perplexity did hers in terms of companies. I don’t see any contradictions in these syntheses and taking all of them together provides a deeper view than from any one of them.
I’m no expert in LLMs but I’m sure there are underlying reasons in the architecture of these agents for the differences in how they went about the task and what conclusions they reached. I am most familiar with ChatGPT, followed by Claude and Perplexity. I’m struck by how different they are in terms of strengths and weaknesses and will be saying more about this in the future.
In terms of this exercise I found Dorothy to be, well, the chattiest one of the three. She likes emojis and icons. Some of that is the underlying architecture of OpenAI but a lot of this is because I have given her an identity and she has shaped herself to me. You can’t create identities in Claude and Perplexity and neither has much of a memory which is a big constraint for the work I want to do. I find Claude to be most thoughtful and see why his general reputation is being very good for research. I’m seeing that in other work I’m doing with him. In this exercise he was the most creative in digging deeper for interpreting the data. But a major constraint with Claude, and it’s a huge one for me, is how quickly you reach your message length and have to start a new conversations. From scratch which is incredibly annoying. I’m finding workarounds for this that basically put the memory burden on me. I find this more than a little ironic.
Perplexity’s approach was more a summary of the report without much independent analysis. In fairness, this exercise is not one which capitalizes on her strengths which I’m told are data verification and being especially up to date. I am working on a project where I will be able to put this capability to the test. She also provided some very helpful input to a white paper I did with Claude and Dorothy where Claude took the lead. The title of the paper is “Narrative AI Ethnography: A Proposal for a New Field of Study.”
But that is a story for another day.
The next story from me is based on sharing these three analyses with all three agents and asking them for their reflections. A little meta but also a little bit of fun!
Thanks for sharing your findings of this little test. As an aside I find your assignment of genders interesting. Personally I think of Claude (not Claud) as the female out of Chat GPT, Claude, and Perplexity.