AI Tools Face-Off: Round Two Results and Revelations
From Benchmarks to Real-World Value: How Six Leading AI Models Perform in Market Research
In my previous post, I tested four AI tools (Claude, Gemini, Genspark and Manus). The test was eye-opening for me, as I had not realised before how much difference there is with different models. The focus in the public is very much on LLM benchmarks and how models solve scientific problems or student exams. But these tests don’t say how and to what purpose models are tuned for.
Claude became my favourite in the first round of this series after reading hundreds of documents and condensing them into a lovely summary that was excellent for market research. Close behind were Manus and Gemini, but Gemini failed with delivery, and with Manus, I ran out of credits in the middle of the research.
After the first test, I received requests to continue the test with ChatGPT Deep Research and Grok. If you think I should test anything else in addition to these six, please let me know.
ChatGPT Deep Research
ChatGPT by OpenAI is, of course, the ‘grandfather’” of LLMs. For the majority of us, it was our first model, either directly or via Microsoft Copilot. Since I have conflicting opinions about ChatGPT's effectiveness and propensity for hallucinations, I skipped it from my initial round of testing. However, I decided to give Deep Research a try now that it's available, and I was advised to do so.
The research mode of ChatGPT is selected from the chat box, and the free version today includes five research tasks every 30 days. After you prompt the research question, ChatGPT asks clarifying questions about the task, which is a really good attempt to improve the user experience. You don't want to waste your research efforts, only to discover after the fact that it wasn't what you had hoped for.
How did it perform? Very well, actually really well. I requested that ChatGPT investigate a specific market from the perspectives of market entry, competition, and regulations. The result was a well-written document that can be used as input for a go-to-market strategy after some tweaking.
Score: 5 / 5
Grok
Grok is a model that X makes available. There are two research modes in the Grok 3 model: Deep Research and Deeper Research. Additionally, the "Think" option allows the model to spend more time conducting research. I used "Think" and "Deeper Research" for the test.
This test's prompt, which looks at a market from multiple angles, was the same one I used for the last one. Grok responded quickly, but the result was quite disappointing: a very short list of bullets and suggestions that were obvious, like “expand to neighbouring countries”.
Grok might be able to assist with basic read-and-summarize tasks. I asked Grok to see if some OEMs have independent products all over the world. This is not difficult, but it does require you to read and summarise thousands of pages of information. Grok researched the topic happily for almost 15 minutes but did not find anything useful. I knew from a prior test with ChatGPT that there are good answers for this prompt, but Grok couldn't find them. Result: Grok is not useful for market research.
Score 2/5
Summary
After testing six AI tools, the conclusion is that ChatGPT and Claude are suitable for the task. Given how rapidly tool capabilities and maturity evolve, this is probably going to change as models get more powerful. However, these two tools currently offer significant value and time savings for market research.



