Case study

Delivering high-quality control data for leading speech industry initiative



data partner

High-quality data


of schedule


AI data partnership


The Speech processing Universal PERformance Benchmark initiative – otherwise known as SUPERB – is a collaborative effort between leading academic institutions including National Taiwan University, Carnegie Mellon, Johns Hopkins and MIT as well as researchers from Facebook AI to fuel research in representation learning and general speech processing. By creating benchmark models that enable the detection of emotion, intent, content and other semantic information, the SUPERB initiative intends to support the efforts of the global speech community to accelerate the development of innovative language-based AI applications.

To effectively evaluate its models, the SUPERB team needed to partner with a trusted AI data provider that could provide gold standard control datasets both in the immediate term as well as on an ongoing basis to support its mission.


The team decided to partner with LXT as its exclusive AI training data partner after a highly successful pilot where they evaluated the quality of translations and utterance collections across a group of male and female speakers. Once LXT was selected as the partner of choice, the program expanded to include additional translations from English to German, and utterance collections in English that included specific emotions such as anger and surprise. Using LXT’s technology platform, thousands of utterances were collected across both male and female speakers. Each utterance was individually quality checked to ensure that the intended emotion was captured within the recording and that each recording was clear. If needed, utterances were rerecorded to meet the quality requirements of the SUPERB team. The LXT team used the Translation, Editing and Proofreading (TEP) framework to support the team’s quality needs.


The LXT team was able to deliver all of the translations and utterances ahead of schedule while meeting the SUPERB team’s needs for high-quality datasets. According to Hung-yi Lee, an associate professor of the Department of Computer Science & Information Engineering at National Taiwan University, “High-quality data is key to the success of our efforts, and LXT was chosen as the exclusive partner based on its flexibility, reliability, and collaborative culture.”

The SUPERB team has entered into an exclusive multi-year partnership with LXT based on the company’s flexibility, scalability and superior delivery. LXT’s high-quality datasets will provide the foundation for innovations in the speech industry for years to come.