The ROI of High-Quality AI Training Data 2024

Introduction

Over the past few years, LXT has conducted research to understand the evolution of AI maturity in enterprise companies in the United States. In that time we’ve seen a large shift in the number of organizations that rate themselves in the higher levels of AI maturity, where AI is in production and delivering a return on investment. Undoubtedly this push to production is being fueled by the launch of ChatGPT and the frenzy around generative AI. Our recent research shows that close to 70% of companies say that generative AI is more important than other AI initiatives.

In this follow-on report from the Path to AI Maturity 2024, we dive deeper into how enterprises are currently evaluating the return on investment for the training data that fuels their AI programs, including how they source their training data and how they evaluate the return on this investment.

AI Maturity of US Organizations Today

In our survey fielded in late 2023, respondents were asked to indicate the level of AI maturity of their organization from their point of view. The results show that 72% of US organizations consider themselves to have reached the three highest levels of AI maturity as defined by Gartner’s AI maturity model. This means that they have moved from the awareness and experimentation phases of their AI deployments to achieving demonstrable ROI from AI in production. This is a significant jump from last year’s research when just 48% of companies rated themselves in the highest levels of AI maturity.

Gartner AI Maturity Model

Survey Q3. Looking at the Gartner AI Maturity Model diagram below, at what level of AI maturity is your organization today? Weighted to NAICS US industry split.

This year’s study shows a 24 percentage point increase in AI maturity. The most significant shift is at the Operational stage where AI is in production and delivering value. Our report shows that 32% of enterprises are now in this stage. Results show that an increasing number of organizations are succeeding with their AI deployments and have moved past the experimentation phase into the production phase. The frenzy over generative AI has most likely been a top driver in this shift as companies have deployed their resources to accelerate their AI projects.

AI maturity level

[2024] Q3. Looking at the Gartner AI Maturity Model diagram below, at what level of AI maturity is your organization today? Weighted to NAICS US industry split.

[2023] Q3. Looking at the Gartner AI Maturity Model diagram below, at what level of AI maturity is your organization today? Weighted to NAICS US industry split.

AI data investment

This year’s research reveals that half of all organizations invest between $1 million and $50 million to support their AI initiatives. 13% of respondents reported an AI budget of $50M up to $500M. One percent of respondents have AI budgets above $500M.

total budget for AI

Survey Q4. In which range approximately is your organization’s total budget for AI? Weighted to NAICS US industry split.

Image

Want to read this report later?

Enter your email address and a copy of this report will be sent to your inbox.

AI budget distribution

When asked how an organization’s budget for AI is allocated across a range of categories, AI strategy and training data ranked the highest. This makes sense as strategy and training data are both critical components to AI success and should be considered first before embarking on AI projects. AI budgets have many components including product development, talent, controls and compliance, software and hardware with a fairly even distribution, similar to the results we saw in 2023.

AI investment allocation by category

Survey Q7. What percentage of your investment in AI is dedicated to each of the following? Please make your best estimate between 0 – 100% of AI investment.

The total should not add up to more than 100%. Weighted to NAICS US industry split.

How training data for AI is sourced

It is critical for organizations that are deploying AI to have a thorough training data strategy. According to our research findings, organizations are primarily using publicly available data sets and working with commercial data providers to obtain the training data they need for their AI systems. Other methods include building data sets internally, using internal data and using customer data.

how enterprises are sourcing training data

Survey Q11. How does your organization source training data for AI? Weighted to NAICS US industry split.

Measuring training data ROI

Companies investing in AI evaluate the ROI of the training data that is used to support their AI projects in several ways. This year, time-to-market acceleration and higher success rates of AI programs were the leading measures of ROI, closely followed by increased customer satisfaction. Additional ROI metrics including operational efficiency and increased revenue were cited fairly evenly amongst respondents.

quality training data evaluated for ROI

Survey Q14. What is the ROI for high-quality training data for AI? Weighted to NAICS US industry split.

When we look at this in more detail by experimenter companies versus maturing companies, we see that operational efficiency is the most often cited value for high-quality training data, followed by cost reduction. Maturing companies see high-quality training data as a means to improve their regulatory compliance and boost the success rate of their AI programs.

ROI of training data by maturity level

Q14. What is the ROI for high-quality training data for AI? Weighted to NAICS US industry split.

Enterprise training data needs

The majority of respondents stated that their needs for training data will increase or greatly increase in the next two to five years. Only 3% of organizations said that their need for training data will decrease. Organizations at the Active and Operational stages of their AI maturity journey indicate the strongest need to increase their training data volumes over this time period.

expected training data needs over five years

Q13. Do you expect your organization’s needs for training data to increase, decrease, or remain the same in the next two to five years? Weighted to NAICS US industry split.

Expected need for training data over five years

Q13. Do you expect your organization’s needs for training data to increase, decrease, or remain the same in the next two to five years? Weighted to NAICS US industry split.

Importance of data quality

When asked what was more important to the success of AI projects—data quality or data volume—over 60% of respondents stated that data quality is more important. This is consistent with the data-centric AI trend that has gained popularity in recent years and is “the discipline of systematically engineering the data used to build an AI system” according to datacentric.org.

Importance of data volume vs data quality

Survey Q23. Which is more important for the success of your AI projects, data volume or data quality? Weighted to NAICS US industry split.

The role of data service providers

Respondents were asked about how they use external data service providers to support their AI programs. Improving data quality was the leading answer, followed by fine-tuning models for specific domains, accelerating time-to-market and data collection for machine learning models.

how enterprises use data service providers

Survey Q22. How, if in any way, does your organization use external data service providers? Weighted to NAICS US industry split.

Conclusion

LXT’s research findings illustrate the important role that high-quality training data plays for organizations deploying AI, as seen in the percentage of AI budget that is allocated to data. The ROI of high-quality training data is evaluated by the enterprise in multiple ways including time-to-market acceleration, higher success rates for AI programs and increased customer satisfaction. As companies mature in their AI journey, they value high-quality data as a means to improve their regulatory compliance and boost the success rate of their AI programs.

Data quality is of high importance to enterprise companies that are deploying AI, and working with external data providers is seen as a way to improve AI data quality by over a third of organizations surveyed. Companies looking to drive successful AI projects should thoroughly evaluate their training data investments to make sure they are building a healthy data pipeline needed to ensure their AI projects reach their goals.

Behind the research

LXT commissioned a survey of 322 senior decision-makers working for US organizations. More than half of respondents were from the C-Suite and all those who took part had verified AI experience.

Respondents were engaged using online surveys, answering on behalf of a range of business sizes, revenues and industries. Each participant represents a US organization with at least $100 million in annual revenue and over 500 employees.

The results were weighted to match the North American Industry Classification System split of companies with 500+ employees in an effort to reflect an accurate representation of the mix of US companies by industry.

The research was conducted from November 14, 2023, to January 2, 2024, by Censuswide, an independent research organization.

For questions about the research findings, please reach out to us at info@lxt ai.

annual revenue worldwidesurvey respondents by industry

About LXT

LXT is an emerging leader in AI training data to power intelligent technology for global organizations. In partnership with an international network of contributors, LXT collects and annotates data across multiple modalities with the speed, scale and agility required by the enterprise. Our global expertise spans over 145 countries and coverage for more than 1000 language locales.

Founded in 2010, LXT is headquartered in Toronto, Canada with presence in the United States, UK, Egypt, India, Turkey and Australia. The company serves customers in North America, Europe, Asia Pacific and the Middle East.

To learn more about LXT, visit lxt.ai.