Expanding language coverage for a global technology leader’s AI assistant through high-quality data collection and transcription

A top 10 global technology company sought to extend the user base for its AI assistant with support for 20 languages. For its AI assistant to correctly understand user prompts in the client’s target markets, it needed to be trained with high-quality audio data in the languages represented by those markets. The company chose LXT based on its reputation for delivering high-quality AI training data in multiple languages.


LXT designed a data collection and transcription program to create the training data required by the client. For the audio data collection portion of the program, freelancers were recruited to read sentences and record their voice through the client’s platform. LXT managed recruitment to ensure that freelancers were sourced in the target languages. Audio data collection began in one language with thousands of freelancers and quickly expanded to three times the original number of freelancers when additional languages were added to the program.

For the transcription portion of the program, LXT was responsible for recruiting hundreds of freelancers for each language, creating the project guidelines, and performing quality audits to ensure that the data provided for the client’s training data pipeline was accurate in each language.


The original timeline that was established to deliver the volumes required for the program was 12 months. However, LXT was able to successfully deliver the required volume in just six months while still maintaining high quality standards. LXT developed a freelancer referral program to accelerate data collection and transcription which allowed them to cut the delivery time in half. LXT’s ability to quickly scale its program meant that their client could expand its AI assistant to multiple markets much faster than anticipated.