The ROI of high-quality AI training data


Earlier this year, LXT published its first annual report: “The Path to AI Maturity”. The report summarizes the findings of its survey of 200 senior executives with artificial intelligence (AI) experience at mid-to-large US organizations. Two thirds of respondents were C-Suite executives.

The report reveals the current status of AI maturity in US organizations overall, as well as the investment levels and business drivers behind AI initiatives at all phases of maturity. The findings show that organizations at the highest stages of AI maturity claim that quality training data is a key contributor to their success.

This follow-on report from the same research study provides a view into the specific benefits that organizations are realizing by using quality training data as part of their AI strategies, as well as the role of third party vendors in sourcing quality data.

AI maturity of US organizations today

In the survey that LXT fielded in late 2021, respondents were asked to indicate the level of AI maturity currently achieved by their organizations. According to the results, 40% of US organizations consider themselves to have reached the three highest levels of AI maturity. This means that they have moved from the awareness and experimentation phases of their AI deployments to achieving demonstrable ROI in production. It’s encouraging to see the progress in AI maturity within US enterprises, but the data shows that the majority of organizations are still experimenting with AI, attempting to move from pilot to production.
Survey Q7: Looking at the Gartner AI Maturity Model diagram below, at what level of AI maturity is your organization today?

Success factors for AI

Through this research study LXT also sought to uncover the factors that are important to the success of AI strategies. Respondents were given the opportunity to select from a wide range of possible success factors, including model quality, skilled staff, and quality training data.

Results show that for companies at the highest levels of AI maturity – the Systemic and Transformational stages – quality training data is the most important contributor to AI success. For organizations just starting out in their journey, prioritizing data quality earlier in their journey could help accelerate their companies’ progress towards AI maturity.

Survey Q5: What are the biggest contributors to the success of your AI strategies? (select up to 3).

Want to read this report later?

Enter your email address and a copy of this report will be sent to your inbox.

AI training data investment

The survey findings reveal that organizations across all levels of maturity are dedicating significant percentages of their overall AI budgets to training data. Four in ten companies are allocating 70% or more of their total AI budget to training data. This can include data collection, data processing, and data annotation either internally or by third parties.

On average, businesses are investing 59% of their budget on training data. This suggests that even outside of the businesses with a larger budget, training data still makes up a majority of AI spend across companies.

Survey Q11: What percentage of your investment in AI is dedicated to training data for AI? (grouped) n=200

Organizations across industries are accelerating their digital transformation initiatives to stay competitive and are using AI to achieve this goal. Coupled with this is the expected growth in demand for training data; two-thirds of those surveyed indicated that their need for training data will increase over the next five years.

A deeper look into the results shows that organizations that have reached the highest level of maturity indicate the strongest need to increase their training data volumes over this time period. As companies deploy more AI models across an increasing number of functions and processes, more training data is needed to support periodic model updates.

Survey Q12. Do you expect your organization’s needs for training data to increase, decrease, or remain the same in the next two to five years?
Expected need for training data over five years, by AI maturity level

Return on investment for quality training data

To better understand how organizations view their AI training data investments and what is driving the need to increase investment in this area, respondents indicated that ROI is measured across four key factors. These include: 1) operational efficiency, 2) cost reduction, 3) reduction in error rates, and 4) improved reputation. An important note is that most respondents selected more than one of these attributes when responding to this question.
ROI of quality training data for AI
Q19. What is the ROI for quality training data for AI? n=200

01 Operational efficiency (65%)

Investing in high-quality training data creates more efficient, productive, and powerful AI systems. AI models can accomplish more in a shorter amount of time as a result of higher accuracy and reliability. Creating a reliable pipeline of highquality training data means that companies can get to production more quickly and use AI to streamline their operations through automation.

02 Cost reduction for our Al program (64%)

Going hand-in-hand with operational efficiency is cost reduction. When organizations invest in high-quality training data from the start, they are able to avoid rework and costly delays.
"AI training data has added a lot more productivity and efficiency — and considerably reduced costs."
— IT Manager/Director, Technology and IT.

03 Reduced error rates (59%)

59% of enterprises reported that a reduced error rate was one of the top benefits they saw when improving the quality of their training data. Using quality training data to train machine learning algorithms leads to more accurate models and lower overall error rates.
"We have seen the AI beginning to make more reasonable and rational decisions as we progress. It is incredibly interesting to see it learn as more data comes in for it to process. At the beginning there was too much risk of it misinterpreting."
— CFO. Professional Services. Scientific. and Technical Services.

04 Improved reputation (55%)

Businesses that invest in high-quality training data see greater accuracy in their AI models which leads to better customer experiences, thus building the company’s reputation and brand in the marketplace.

Training data ROI by maturity phase

Companies that are starting out in their AI journey need to create a strong foundation for AI which leads them to focus on cost and error rate reduction. These companies need to prove to senior management that AI is a worthy investment. As they mature, the ROI shifts to operational efficiency and business transformation.

This illustrates that as a business grows more familiar with AI and has AI systems in place, the cost-saving benefits of AI become secondary to (or perhaps are a guaranteed result of) more efficient, applicable, and scalable AI models. While cost reduction is an important ROI measurement as companies reach the highest levels of AI maturity, the other ROI measurements of operational efficiency, reduction in error rates, and improved reputation increase in importance.

Q19. What is the ROI for quality training data for AI? n=200
Q7. Looking at the Gartner AI Maturity Model diagram below, at what level of AI maturity is your organization today? n=200


Sourcing high-quality training data

Overwhelmingly, the research uncovered that businesses partner with third parties to supply their training data - in fact 99% of respondents indicated that they work with third parties for AI training data. Key reasons for this approach include:


Enterprises want to confidently train their AI models, and look for a third-party vendor with the experience to provide a reliable data pipeline. They need a data partner who will deliver on time, and on budget.


Enterprises also want to expand globally to capture more market share. This requires a partner that can provide them access to new markets. They rely on their third-party data partners to provide the proper scope of training data for their AI models in terms of language coverage and in-country experience in a wide range of locales.

This point is important, as it shows that businesses are aware of the fact that training data is not something they can readily engineer on their own.

Longer strategic partnerships

Lastly, these businesses are looking for a long-term partnership with a third-party vendor. The vendor is not just a supplier of training data, but a key collaborator in the development of their AI systems.

This paints a picture of a third-party vendor that is experienced, trustworthy, able to meet the required scope of the project, and one that can support the long-term requirements of a more robust program.

The survey results show that cost is a less important factor versus the aforementioned motivations. This further underscores the finding that training data is a highly valuable asset — one that’s worth investing in to get the highest quality



When the survey responses were reviewed by maturity level, the results showed that mature organizations particularly value execution speed. Examining the chart below shows that while a data partner’s speed is of much lesser importance in the early stages of the AI journey, this changes dramatically once companies reach the higher stages of maturity. Companies at this stage are deploying AI across an entire business process - or across their entire company - and are looking for reliable partners who can deliver training data quickly to help them accomplish their goals.

Finally, when reviewing the data by maturity level, we see that companies that have reached the highest levels of maturity all rely on a data partner to support their AI initiatives.



LXT’s research findings demonstrate the importance of high-quality training data in an organization’s AI maturity journey. The research also reveals how investments in training data for AI have multiple benefits, including operational efficiency, lowering costs, reducing error rates and improving an organization’s reputation. Training data partners play a key role in sourcing training data, and organizations that have reached maturity value reliability, trust, global reach, and speed when choosing a third party for training data.

Behind the research

LXT commissioned a survey of 200 senior decision-makers within US organizations. Two-thirds of respondents were from the C-Suite and all those who took part had verified AI experience; only 25% of those who applied met the criteria required for participation, which included their level of AI knowledge and experience. Contributors were engaged using online surveys, answering on behalf of a range of business sizes, revenues and industries. Each participant represents a US organization with at least $100 million in annual revenue and over 500 employees. The research was conducted from November 29 to December 10, 2021, by Reputation Leaders, an independent research organization.

About LXT

LXT is an emerging leader in AI training data to power intelligent technology for global organizations. In partnership with an international network of contributors, LXT collects and annotates data across multiple modalities with the speed, scale, and agility required by the enterprise. Our global expertise spans more than 145 countries and over 1000 languages. Founded in 2010, LXT is headquartered in Toronto, Canada with presence in the United States, UK, Egypt, India, Turkey and Australia. The company serves customers in North America, Europe, Asia Pacific and the Middle East.

To learn more about LXT, visit