The Tow Center for Digital Journalism at Columbia Journalism Review conducted a study evaluating the ability of eight generative AI search tools to accurately retrieve and cite news content. The study builds on previous research and found that these tools consistently struggle with accuracy, often providing incorrect, speculative, or fabricated information, even when presented with direct excerpts from articles. The report highlights a troubling imbalance: while traditional search engines guide users to news websites, generative search tools parse and repackage information themselves, often cutting off traffic to original sources.
The researchers tested eight AI chatbots with live search features, including ChatGPT, Perplexity (both free and Pro versions), Copilot, Gemini, DeepSeek, Grok 2, and Grok 3. They presented the chatbots with specific excerpts from articles and asked them to identify the article’s headline, original publisher, publication date, and URL. The excerpts were deliberately chosen to easily yield the correct source in a traditional Google search. A total of 1,600 queries were run across twenty publishers.
The results revealed that the chatbots often failed to retrieve the correct articles, providing incorrect answers to over 60% of queries. Many tools presented inaccurate answers with high confidence, rarely acknowledging knowledge gaps. Premium models, like Perplexity Pro and Grok 3, while answering more prompts correctly than their free counterparts, paradoxically also demonstrated higher error rates due to their tendency to provide definitive but wrong answers rather than admitting uncertainty.
The study also examined how the chatbots adhered to the Robot Exclusion Protocol (robots.txt), a widely accepted standard for specifying which parts of a site should not be crawled. Five of the chatbots had publicly known crawlers. However, the researchers found inconsistencies in how the chatbots respected these protocols. Some chatbots answered queries related to publishers that had blocked their crawlers, while others declined or incorrectly answered queries from publishers that permitted crawler access.
Furthermore, the generative search tools frequently failed to link back to the original source. Some cited the wrong article, directed users to syndicated versions of articles instead of the original sources, or fabricated URLs that led to error pages. This deprives original sources of proper attribution and referral traffic while also undermining users’ ability to verify information.
The study also investigated whether content licensing deals between AI companies and news publishers resulted in more accurate citations. The results indicated that the presence of such deals did not guarantee more accurate surfacing of content from partner publishers.
The authors conclude that the findings align with previous research, revealing consistent patterns of confident presentations of incorrect information, misleading attributions, and inconsistent information retrieval practices across chatbots. They note that these issues pose potential harm to both news producers and consumers, and that many AI companies developing these tools have not expressed interest in working with news publishers or fail to honor preferences indicated through the Robot Exclusion Protocol.
Key takeaways
- AI search engines are often inaccurate at citing news sources, with over 60% of queries resulting in incorrect answers.
- Premium models provide more confidently incorrect answers than free versions, creating a potentially dangerous illusion of reliability.
- AI search engines frequently disregard the Robot Exclusion Protocol, accessing and using content from sites that have explicitly blocked them.
- Attribution is a major problem, with chatbots often citing the wrong article, directing users to syndicated versions instead of original sources, or fabricating URLs.
- Content licensing deals do not guarantee accurate citations or better performance for partner publishers.
- The study raises concerns about the impact of AI search engines on the news industry, including reduced traffic to original sources and the potential for misrepresentation of content.
- The tools lack transparency and user agency, amplifying bias and providing unverified or toxic answers.
- AI companies need to prioritize accuracy, transparency, and respect for publishers’ rights to control their content.
- Users should be aware of the limitations of AI search engines and critically evaluate the information they provide.
Links
Announcement: We compared eight ai search engines theyre all bad at citing news
Download report data: https://github.com/TowCenter/genSearch-Part2/tree/main
Anthropic – reasoning models don’t always say what they think
More AI news.


