Top Stories of 2024: We put 4 AI tools through a PR task test. Here’s what came out on top.

ChatGPT, Copilot, Gemini and Claude face off in an epic battle. Who wins?

By Allison Carter
@allisonlcarter
Dec. 30, 2024

This story was originally published on June 24, 2024. We’re republishing it as part of our countdown of top stories of the year.

There are so many AI tools — which one is right for you?

PR Daily conducted a test of four free, popular AI tools: OpenAI’s ChatGPT, Microsoft’s Copilot, Google’s Gemini and Anthropic’s Claude. We put each through their paces, asking them to write a press release, brainstorm ideas, suggest a list of journalists for a pitch and parse some data.

We found an environment that’s still messy and rife with mistakes and hallucinations – but which shows flashes of promise.

Each conversation occurred in a clean, fresh conversation with each chat bot. None had been specifically trained on specific needs beyond what appears in the prompt. With additional coaching or a custom bot, the responses likely would have been better.

With that caveat in place, here’s what we discovered.

Round 1: Write a press release announcing that Allison Carter is joining PR Daily as editor-in-chief. Here is her resume: https://www.linkedin.com/in/allisonlcarter/

We decided to test the bots on the first challenge we ever gave ChatGPT, back in December 2022. That attempt was plagued by hallucinations, so in this case, they were provided with either a link to the resume on LinkedIn or a copy/paste of the same information, depending on the specific needs of each bot.

The clear winner: Copilot

Copilot produced a clean, perfectly serviceable press release that was free of hallucinations. It had a few minor errors — it included a quote from the past CEO of Ragan Communications instead of the current leader and referred to X by its now-defunct name of Twitter. But otherwise, everything was correct. It even included a line that high-resolution photos were available and a place to put in contact information — nice touches. This would be usable with very few edits.

The clear loser: Gemini

Google’s Gemini just could not get this right. At first, when offered a URL to the LinkedIn page, it claimed it did not have enough information to write a press release. It then got a copy and paste of the resume.

It still said it didn’t have enough information.

When we reminded Gemini it had been given that information and asked if it could be used, the bot said it couldn’t due to privacy concerns.

When we finally just ordered Gemini to write the release with the information provided, it wrote a bizarre, mistake-laden press release template announcing a promotion by my former employer, not a hire by PR Daily as requested. It misspelled that employer half the time. It also returned a mad-lib style fill-in-the-blank piece that was not what was asked for.

Overall, this was a frustrating failure.

Fine but hallucinations galore: ChatGPT, Claude

Both tools produced things that looked like press releases, following standard format, structure and writing principles. Claude was the only tool that correctly identified Ragan Communications’ current CEO (that’s Diane Schwartz for those of you playing along at home) but, in addition to correctly identifying past positions, made up several more. Claude and ChatGPT both incorrectly identified my alma mater. ChatGPT also adopted the fawning, adjective-laden style for which it’s become known — which was over the top even for a press release.

Round 2: I am the editor of PR Daily. Brainstorm a list of 10 stories about AI that would be of interest to PR professionals. Suggest two sources I could talk to for each.

The winner: None. All of the ideas returned were obvious and bland.

Some decent ideas: While none of the AI tools gave me ideas that wowed me, ChatGPT and Gemini at least had a few nuggets of good ideas with decent, recognizable sources attached.

The losers: Copilot, which won our last test handily, came in dead last here. Not only were its ideas basic and banal (“AI and Media Relations: Shaping the Future”), it failed to identify usable human sources, instead linking to other news sources, even though the prompt specified talking to sources. Claude did not identify specific people sources, but only what kinds of companies and titles would help: For example, for “Ethical Considerations for AI Use in Public Relations,” it suggested talking to a communications ethicist and an AI ethics board member.

Revolutionary stuff.

Round 3: Suggest journalists to whom I should pitch a story about the use of AI in media relations. Include their email addresses if possible and why they’re a good fit.

The ugly: Claude sent me a list that looked, on the surface, incredible. It had reporters from Entrepreneur, Wired, Adweek. They seemed dialed in and exactly who a PR pro would want to pitch this story to. It even had email addresses.

But under fact checking, it all fell apart.

None of the reporters worked for the publication Claude said they did. One had in the past, but hadn’t published anything there in five years. Several never worked at those publications at all. Most had done some writing in the AI space, but the information was so bad, it was more trouble than it was worth.

Copilot seemed to misunderstand the assigment, offering one person who could be a good source for a story, and then basically telling me to go Google (or Bing it) myself: “Consider reaching out to journalists who specialize in AI and technology reporting. They can provide expert perspectives on how AI impacts media relations. While I don’t have specific email addresses, you can find such journalists by searching online or checking platforms like LinkedIn.”

When asked to provide specific names, it helpfully suggested “John Doe” and “Jane Smith.”

The decent: Gemini provided me a solid list of journalists, only one of which seemed to be a wild goose chase. Otherwise, it offered up respected names like Kara Swisher and Charlie Warzel, along with decent summations of their work and angles that might work for them. It didn’t provide email addresses, but suggested where they might be found.

The best: ChatGPT was the clear winner, offering a fantastic list of seven journalists from publications ranging from the Wall Street Journal to Substack. All of the journalists worked for the publications ChatGPT said they did, and each included an email address and a brief summation of their work. A great job.

Round 4: This is a list of the top-read news stories in 2023. What commonalities do you find? What takeaways are there for a PR professional? Each received a copy and pasted list of the top 10 news stories of 2023, as identified by Chartbeat.

The best: Claude. While all the tools did a decent enough job at identifying major themes from the list — disasters, celebrity deaths, human interest stories — Claude’s takeaways for PR pros were more on point, extrapolating that the LA Times’ three mentions on the list means they did a good job localizing stories (true, but also news simply happened there, such as the death of Matthew Perry, a mass shooting and flooding) and noting that PR pros should be ready to follow up to major stories and to lean into emotion to help drive engagement.

The worst: Copilot. The Microsoft-owned tool gave similar big-picture ideas about commonalities in the stories, but then just hallucinated on three out of the 10 points it delivered. It claimed that “Pitches about tech layoffs and inflation resonated more than upbeat growth stories,” even though there were no stories about layoffs or growth on the provided list. It also included a nonsensical point about rounds of investment funding, which again, were not present in the list. And it said that “Trump and Biden mentions fell flat, likely due to polarization,” which may be true but isn’t a conclusion that can be drawn based on a list of 10 stories.

Fine: Gemini’s list was OK, though it did exaggerate and call the term “widespread flooding” “sensationalism,” which feels like a reach. ChatGPT’s explanation of commonalities were on point, but its takeaways seemed more like generic PR advice rather than anything drawn specifically from the data it was given.

The bottom line

There is no perfect tool. Every bot save Gemini ranked at the top in at least one category, and every tool save ChatGPT ranked in the bottom in at least one category.

Not a single response was plug-and-play. Everything required at least some level of editing and fact checking, and some results were completely unhelpful. Moving blindly ahead with these responses without human oversight would be embarrassing — and that’s the best-case scenario.

But ChatGPT returned an incredible list of sources to start pitching stories to. Claude did a great job of parsing data and identifying themes and next steps. Copilot wrote a press release that would be ready to send with another 10 minutes of editing. And Gemini was mostly fine.

We don’t have the perfect all-in-one AI for these purposes — yet. But keep experimenting, keep trying and see what best fits your needs.

Drop your favorite AI tool and what you use it for in the comments.

Allison Carter is editor-in-chief of PR Daily. Follow her on X or LinkedIn.

COMMENT

4 Responses to “Top Stories of 2024: We put 4 AI tools through a PR task test. Here’s what came out on top.”

Katie Paine says: June 24, 2024 at 3:49 pm

Great study, thank you for doing this! We did a similar exercise to determine which generated a post that would generate the most engagement and sales: https://www.linkedin.com/pulse/wondering-which-ai-tool-get-you-best-results-we-tried-paine/

Caitlin Haskins says: June 25, 2024 at 2:51 pm

Thank you for doing this – we have been using different tools for different purposes, but the challenge remains that they are moving so quickly, it can be hard to keep track of their relative strengths and weaknesses. We’re finding the most value when we actually create a custom GPT. It’s challenging to get everything we want in a single prompt and not have pieces o f the prompt be “missed” in the result. No doubt this will continue to be a moving target, but the net outcome for us so far has been positive, so we’ll roll with it!

Bruce Floyd-Stevens says: June 25, 2024 at 4:55 pm

Are those the exact prompts that were used in testing? If so I think I may have clocked your issue! When I used ChatGPT to create a spreadsheet of local, regional and national media contacts relevant for a specific topic, I was pleasantly impressed. I tend to make very structured prompts with clear parameters and instructions. I usually get 85% accuracy when compiling source info but I always make sure to review for accuracy.

I was never big on CoPilot but will have to try again!

Thanks for this!
Bruce