Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. More information
A new report from AI data provider App shows that companies are struggling to obtain and manage the high-quality data needed to power AI systems as artificial intelligence expands into business operations.
Appen’s 2024 State of AI Reportsurveying more than 500 US IT decision makers, found that generative AI adoption increased 17% in the past year; However, organizations now face significant hurdles in data preparation and quality assurance. The report shows a 10% year-over-year increase in bottlenecks related to data acquisition, cleansing, and labeling, underscoring the complexity of building and maintaining effective AI models.
Si Chen, head of strategy at Appen, explained in an interview with VentureBeat: “As AI models tackle more complex and specialized problems, data requirements are also changing,” she said. “Companies are discovering that just having a lot of data is no longer enough. To refine a model, the data must be of extremely high quality, meaning it is accurate, diverse, properly labeled and tailored to the specific AI use case.”
As the potential of AI continues to grow, the report identifies several key areas where companies face obstacles. Below are the five key takeaways from Appen’s 2024 State of AI report:
1. Adoption of generative AI is exploding, but so are data challenges
Generative AI (GenAI) adoption is set to grow by as much as 17% by 2024, thanks to advances in large language models (LLMs) that allow companies to automate tasks for a wide range of use cases. From IT operations to R&D, companies use GenAI to streamline internal processes and increase productivity. However, the rapid increase in GenAI usage has also brought new hurdles, especially in the area of data management.
“Generative AI outputs are more diverse, unpredictable and subjective, making it harder to define and measure success,” Chen told VentureBeat. “To achieve enterprise-ready AI, models must be customized with high-quality data tailored to specific use cases.”
Custom data collection has emerged as the primary method for obtaining training data for GenAI models, reflecting a broader shift from generic, web-collected data in favor of custom, reliable data sets.
2. AI implementations and ROI in enterprises are declining
Despite the excitement around AI, the report finds a worrying trend: fewer AI projects are being deployed, and those that are are showing lower ROI. Since 2021, the average percentage of AI projects reaching deployment has fallen by 8.1%, while the average percentage of deployed AI projects with meaningful ROI has decreased by 9.4%.
This decline is largely due to the increasing complexity of AI models. Simple use cases such as image recognition and voice automation are now considered mature technologies, but companies are shifting to more ambitious AI initiatives, such as generative AI, which require high-quality, customized data and are much more difficult to implement successfully.
Chen explains, “Generative AI has more advanced capabilities in understanding, reasoning, and content generation, but these technologies are inherently more challenging to implement.”
3. Data quality is essential, but is deteriorating
The report highlights a critical issue for AI development: data accuracy has fallen by nearly 9% since 2021. As AI models become more sophisticated, the data they require has also become more complex, often requiring specialized, high-quality annotations.
As many as 86% of companies now retrain or update their models at least once a quarter, underscoring the need for new, relevant data. But as the frequency of updates increases, it becomes more difficult to ensure this data is accurate and diverse. Companies are turning to third-party data providers to meet these demands, with nearly 90% of companies relying on external sources to train and evaluate their models.
“While we can’t predict the future, our research shows that managing data quality will remain a major challenge for companies,” said Chen. “With more complex generative AI models, data collection, cleaning and labeling have already become major bottlenecks.”
4. Data bottlenecks are increasing
Appen’s report shows that the number of bottlenecks in data purchasing, cleaning and labeling is increasing by 10% year-on-year. These bottlenecks have a direct impact on companies’ ability to successfully deploy AI projects. As AI use cases become more specialized, the challenge of preparing the right data becomes more acute.
“Data preparation problems have increased,” says Chen. “The specialized nature of these models requires new, tailor-made datasets.”
To address these issues, companies are turning to long-term strategies that emphasize data accuracy, consistency and diversity. Many are also looking for strategic partnerships with data providers to help navigate the complexities of the AI data lifecycle.
5. Human-in-the-Loop is more important than ever
As AI technology continues to evolve, human involvement remains indispensable. The report shows that 80% of respondents emphasize the importance of human-in-the-loop machine learning, a process that uses human expertise to guide and improve AI models.
“Human involvement remains essential to the development of high-performing, ethical and contextually relevant AI systems,” said Chen.
Human experts are particularly important to limit biases and ensure ethical AI development. By providing domain-specific knowledge and identifying potential biases in AI outputs, they help refine and align models with real-world behavior and values. This is especially critical for generative AI, where results can be unpredictable and require careful monitoring to avoid harmful or biased results.
Check out Appen’s full 2024 State of AI Report here.
Source link
Leave a Reply