In 2009, an industry-leading technology company wanted to expand its portfolio of online speech and text recognition capabilities across the globe. The company had already established a market-leading position in English-language locales by making it possible for end users to transcribe recorded or live speech-to-text and transforming text-to-speech—whether in the context of an online conference call, transcribing an uploaded file, or initiating an online voice search. As part of its growth strategy, the company had identified 20 languages into which it wanted to expand its services. Gathering text and speech data from the relevant language locales would require the expertise and geographic reach of an experienced third-party vendor.
Stakeholders for the initiative gave LXT 12 months to complete the project, which involved collecting audio and text to help train the three components of its speech recognition system—the acoustic statistical model, the pronunciation statistical model, and the language statistical model.
The initial stage of the project focused on collecting data for Arabic and then expanded to MENA, English and Asian languages. Eventually the program covered over 750 language locales, including languages that are advancing toward vulnerability or endangerment. Each month, the client sent LXT a list of the words and aspects of speech they needed to be collected.
For the audio data collection portion of the program, LXT recruited freelancers who were native speakers of their respective languages. Freelancers recorded their voice through the client’s platform, reading the sentences and speech components that were part of that month’s assignment.
In addition, LXT managed recruitment to ensure that freelancers were sourced in the target languages. Audio data collection began in one language with thousands of freelancers and quickly expanded to three times the original number of freelancers when additional languages were added to the program.
For the transcription portion of the program, LXT recruited hundreds of additional freelancers for each language, created project guidelines, and performed quality audits to ensure the accuracy of data provided for the client’s training data pipeline.
Freelancers uploaded audio and transcription data to one of three platforms, which the client periodically updated with new features or improved capabilities, such as keyboard shortcuts or the ability to add custom color code to different speakers. LXT team leads and freelancers provided feedback about the performance and ease of use of these upgrades, and were instrumental in combining the three separate platforms into one.
Results and partnership expansion
The initial data collection and transcription program has evolved over more than a decade into a strong partnership between LXT and the client whereby LXT is a critical part of the value chain for improving the client’s speech recognition capabilities. During the first stage of the program, LXT proved itself by consistently delivering high-quality data at scale across multiple languages and by enhancing language-specific project guidelines to improve overall data quality.
As a result of LXT’s strong performance year after year, in 2016 the client requested an increase in volume of 10x. At the same time, the client shifted from short form to long form transcription and deployed a new platform. The LXT team adapted quickly to delivering the new data modality and working in the updated platform, providing feedback on its capabilities and suggestions for automation.
LXT succeeded in delivering the 10x volume increase, and in 2019 delivered 20,000 hours of speech data across 28 languages. In late 2021, the client approached LXT with a need for secure transcription across multiple languages. LXT quickly developed a solution to meet the client’s needs which included expanding into four ISO 27001-certified secure facilities across Canada in 2022. During 2022, LXT grew its language coverage from 230 to 780 language locales as result of this partnership, and continues to deliver secure transcription services to this day.