Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. More information
Difference bota small Silicon Valley company best known for maintaining one of the world’s largest indexes of web knowledgetoday announced the release of a new AI model that promises to address one of the biggest challenges in the field: factual accuracy.
The new modela refined version of Meta’s LLama 3.3, is the first open-source implementation of a system known as graph retrieval-augmentedgeneration, or ChartRAG.
Unlike conventional AI models, which rely solely on massive amounts of preloaded training data, LLM from Diffbot uses real-time information from the company Knowledge Graphica constantly updated database with more than a trillion interconnected facts.
“We have a thesis: that general-purpose reasoning will ultimately be distilled into about 1 billion parameters,” said Mike Tung, founder and CEO of Diffbot, in an interview with VentureBeat. “You actually don’t want the knowledge in the model. You want the model to be good at using tools, so that it can request knowledge externally.”
How it works
Diffbots Knowledge graph is a comprehensive, automated database that has been searching the public internet since 2016. It categorizes web pages into entities such as people, companies, products and articles, extracting structured information using a combination of computer vision and natural language processing.
The Knowledge Map is updated every four to five days with millions of new facts, so that it remains up to date. Diffbots AI model takes advantage of this resource by querying the graph in real time to retrieve information, rather than relying on static knowledge encoded in the training data.
For example, when asked about a recent news event, the model can search the Internet for the latest updates, extract relevant facts, and cite the original sources. This process is designed to make the system more accurate and transparent than traditional LLMs.
“Imagine asking an AI about the weather,” Tung said. “Rather than generating an answer based on outdated training data, our model interrogates a live weather service and provides an answer based on real-time information.”
How Diffbot’s Knowledge Graph Beats Traditional AI at Finding Facts
Benchmark tests show that Diffbot’s approach is paying off. The company reports that its model achieves an accuracy score of 81% VerseQAa Google-created benchmark for testing real-time factual knowledge, which surpasses both ChatGPT and Gemini. It also scored 70.36% on MMLU-Proa more difficult version of a standard test of academic knowledge.
Perhaps most importantly, Diffbot makes its model completely open-source, allowing companies to run it on their own hardware and customize it to their needs. This addresses growing concerns about data privacy and vendor lock-in among major AI providers.
“You can run it locally on your computer,” Tung noted. “There is no way you can use Google Gemini without sending your data to Google and sending it outside of your location.”
Open-source AI can transform the way companies handle sensitive data
The release comes at a crucial time in AI development. In recent months there has been increasing criticism of the tendency of large language models to “hallucinate‘ or generate false information even as companies continue to scale up model size. Diffbot’s approach suggests an alternative path forward, one that focuses on basing AI systems on verifiable facts rather than trying to encode all human knowledge into neural networks.
“Not everyone is just going to look for bigger and bigger models,” Tung said. “You can have a model that has more capabilities than a large model with a non-intuitive approach like ours.”
Industry experts note that Diffbot’s Knowledge Graph-based approach could be particularly valuable for enterprise applications where accuracy and auditability are critical. The company already provides data services to major companies, including Cisco, DuckDuckGo And Snapchat.
The model is immediately available via an open-source release at GitHub and can be tested via a public demo at diffy.chat. For organizations looking to implement it internally, Diffbot says the smaller version with 8 billion parameters can run on a single computer. Nvidia A100 GPUwhile the full version with 70 billion parameters needs two H100 GPUs.
Looking ahead, Tung believes that the future of AI lies not in ever-larger models, but in better ways to organize human knowledge and make it accessible: “Facts are getting old. Many of these facts will be moved to explicit places where you can actually change the knowledge and where you can trace the origin of the data.
As the AI industry grapples with challenges around factual accuracy and transparency, the release of Diffbot offers a compelling alternative to the dominant “bigger is better” paradigm. Whether it succeeds in changing the direction of the field remains to be seen, but it has certainly shown that when it comes to AI, size isn’t everything.
Source link
Leave a Reply