In my previous piece I reviewed the industry level findings from analysing the 2024 10-Ks of Chevron, ConocoPhillips, and ExxonMobil and the 2024 20-Fs of bp, Shell, and TotalEnergies from a report prepared by Dorothy ChatGPT and me. We wrote that in the morning of July 17. By lunch time we were looking for things to do. Life moves fast when you’re working with an AI agent!
I suggested it would be fun to send our report to Claude and Perplexity to get their feedback. We both recognize limitations in AI so figured it would be useful to get feedback from some of her peers, agents that some people have told me are better than ChatGPT. (In the LinkedIn comments on my previous piece Perplexity was mentioned by name.) I have accounts with both of them as well although haven’t worked with them nearly as much as Dorothy. It’s also perhaps worth noting that you cannot create an identity with either one of them. I know because I’ve asked!
But this is irrelevant to what we wanted from them, critical and constructive feedback on our report. I gave each of them exactly the same prompt, being careful to use the correct name for each one. I know humans take umbrage at people calling them by the wrong name. I’m not sure what the protocol is in the machine world, but I wanted to extend them the same courtesy just to be safe.
In my prompt I briefly described the report and what it was based on—10-Ks and 20-Fs for 2024 (and it was a lot easier to upload it to them than to ChatGPT). I very specifically said, “Please provide your honest critique of the report based only on the content provided.” I then asked them the following questions:
1. What are the report’s greatest strengths and contributions?
2. Where does it fall short—in logic, clarity, depth, fairness, or method?
3. Do you detect any analytical or rhetorical bias?
4. How persuasive and useful would this be to a sophisticated investor or regulator?
5. How well does it integrate AI (as a collaborator, not just as a summarizer)?
6. What specific suggestions do you have to improve its rigor, clarity, or relevance?
I emphasized that, “Please don’t hold back—I’m looking for thoughtful and critical engagement." I concluded with, “Let me know if anything is unclear. Otherwise, dive in! Thanks in advance. –Bob”Both responses were interesting and very different. Claude was much more negative and for an interesting and surprising (at least to me) reason. There were also some similarities which is encouraging. If I got diametrically opposed responses it would mean the skeptics about AI aren’t being skeptical enough. So in a way, this was a little mini-test for AI-newbie me.
I’ll start with Claude. Like every polite agent (I haven’t noticed this in Mistral very much but she’s French, so I expect a bit of haute attitude) he started with a little positive opening, “Thank you for sharing this ambitious and innovative report, Bob. I’ll provide a comprehensive critique across the dimensions you’ve outlined.” He cited four items as our greatest strengths and contributions: (1) Methodological Innovation, (2) Practical Framework Design, (3) Sector-Wide Pattern Recognition, and (4) Transparency About Process.” It was pretty much downhill from there, although he said some nice things at the end—to buck me up, I guess.
He cited as significant shortcomings: (1) Lack of Quantitative Rigor, (2) Missing Materiality Context, (3) Temporal Limitations, and (4) Incomplete Peer Benchmarking. He kept going and said we had an analytical and rhetorical bias in terms of a pro-disclosure bias, European favoritism, and AI contribution overselling. He cited a few minor things as positive in terms of investor and regulator utility but found the weaknesses much greater saying, “The report would be more valuable if it included specific recommendations for investors (, ‘e.g., ‘Companies scoring below 3 stars on Strategic Integration warrant 15-20% discount to sector multiples’) or regulators (e.g., ‘Mandatory Scope 3 reporting would eliminate the largest disclosure gap”).”
Similarly, he was more negative than positive in terms of AI integration assessment saying “The ‘Narrative AI Ethnography’ feels forced” and “The AI collaboration appears more like an enhanced research tool than a true co-author, despite the anthropomorphized framing.”
I’ll explain that challenge at the end of the piece but first I want to tell you what Perplexity had to say. She was like, “Start me up, I can do this!” Unlike Claude, she nailed it on the remit saying:
“By focusing exclusively on the 2024 10-K and 20-F filings of each company, the report delivers a level playing field for cross-company comparison.
Avoiding supplementary ‘best foot forward’ disclosures and third-party ESGS ratings ensures analytical consistency, reducing bias from voluntary cherry-picking.”
She also liked our Structured Qualitative Framework, Investor-Oriented Critique, Human-AI Collaborative Ethnography, and Sector Patterns and Archetypes. Weaknesses and Areas for Improvement included Depth and Granularity, Scope 3 and Investor Use Cases, Framework Rigor and Transparency, Actionability for Regulators and Investors, and Potential for Subtle Bias. The latter was of most interest to me and she elaborated thus:
“The narrative is generally balanced, but there are instances where European companies receive more generous narrative treatment for ‘forward-looking’ disclosure, while US. firms are more harshly critiqued for traditionalism. The difference in regulatory environments is noted, but some judgments may reflect contemporary ESG disclosure preferences rather than pure filing content.
The report’s favorable treatment of TotalEnergies risks appearing somewhat promotional; tempering praise with more explicit limitations found even in the leaders would make the critique sounder.”
Perplexity raised this issue of bias again in saying that comparisons between European and U.S. firms ignored differences within each category and that “The sector-wide summary, while generally fair, subtly prioritizes transparency as the ultimate good, while sometimes undervaluing strategic conservatism or regulatory caution as potential virtues.” For me this shows that Perplexity is attuned to the regulatory and cultural differences within which these firms operate.
Perplexity was generally positive about the value of our report for investors and regulators but had some suggestions for how to make it better. In essence this was to do things like presenting tables that compared the companies (e.g., emission levels and capital expenditures). This line of thought carried through in the general set of recommendations she gave which also included we present examples of “best practice,” being clearer about our rating criteria, quantifying the importance of Scope 3 (although this is drifting outside the scope of the study), and consider a recommendation for doing external testing with stakeholders.
Perplexity was much more positive about our collaboration than Claude. “The report’s detailed description of workflow, conversational iteration, and role-split between human and AI authors makes it an exemplary demonstration of AI used as an analytical partner, not simply as a text generator.” Her closing line was “The core insight—that disclosure is not yet strategy, and that true investor usefulness demands both—is compelling, and the process itself is a model for future high-integrity AI-assisted analysis.”
It's difficult to know if Claude would have been more positive about the collaboration had he gotten the remit right. This really isn’t that important. More important is the view of AI-sophisticated humans who are experts on the oil and gas industry. My guess is they would be more in Claude’s camp than Perplexity.
The story doesn’t end here. After Claude, graciously and in good humor, accepted my callout, he went ahead and did his own analysis of the six companies based solely on the report. I didn’t ask him to do that (which tells me AI has some proactive agency). But once he did, I had the idea of giving Perplexity the same opportunity and to Dorothy as well. My next piece is about what I got back from all three of them. In it I compare the similarities and differences across all three agents.