Welcome to our AI data expert spotlight where I interview LXT teammates about their background and experience in helping companies of all sizes build reliable data pipelines. Today I am sitting down with Martha Hakvoort – LXT’s Director of Business Development – to learn more about her experience and approach to helping clients succeed with AI.

Tell me a bit about your background. How did you end up in the field of AI data?

About 13 years ago I was living in Sydney, Australia and I found a job as a Dutch language transcriber. Soon after I started, I took on additional projects such as supervising data collections and annotation projects for proofing tools. Later, I had the opportunity to move into Sales and eventually took on a Solutions Architect role, helping to develop data solutions aligned with customer needs. After several years in that role I was promoted to an Operational role where I led a team of 70 people consisting of AI experts, linguists and project managers who delivered speech and NLP projects for one of the largest tech companies in the world.

Over this 12-year period I witnessed massive changes in the AI industry with the evolution of the technology and the types of applications that have been launched. By working in both Sales and Operational roles, I gained important experience in solution development all the way to delivery that gave me unique insight into customers’ data challenges and how to solve them.

I joined LXT just over a year ago for the opportunity to once again work with a portfolio of clients to help them in their AI journey. It’s exciting to use the experience I’ve developed after having worked in the industry for over a decade and help a new set of customers reach their goals.

Over the past 12 years you’ve seen quite a range of AI use cases. What are some notable examples that come to mind?

The AI industry has grown by leaps and bounds, moving from niche applications back in 2010 to now where a large majority has a smartphone with multiple AI capabilities at their fingertips: text-to-speech, machine translation and conversational AI, just to name a few.

I’ve definitely seen my share of interesting projects, including a data collection for gunshot sounds and glass breaking to help train security cameras, collection of speech data in driving cars at various speeds and road conditions, and even a data collection of sounds of babies crying! That’s the thing about AI – the possibilities for new applications are endless and the data requests I continue to see never cease to surprise me.

What is one of the most challenging AI data projects you’ve worked on? 

One project that comes to mind is the four-year program I worked on for Babel – IARPA that covered data collection, transcription and lexicons in 26 languages, including some very challenging locales. The Babel project was focused on building robust speech recognition for major language families at large scale so future models could be built for those languages very quickly.

My role involved finding and managing vendors to help us source language data in a wide range of locations around the world including Papua New Guinea, Mongolia and Kazakhstan. In many cases we didn’t have any existing relationships, so I needed to do extensive due diligence and screening to find the right partners. Many of these countries had poor infrastructure, political instability and other challenges that created a lot of complexity. When issues arose, such as fraud or needing to find a new partner halfway through an engagement, I had to act quickly and work with my team to find creative solutions. Through this time, I learned valuable skills such as relationship building and commercial negotiation within many different cultural settings.

In your role you help companies determine the type and amount of data they need to improve their AI solutions. How would you describe your approach?

It really starts with getting a clear understanding of the AI product or solution they are building and the current data challenge they are facing. The client might be seeing performance issues with their machine learning models due to data quality or data gaps, or they might have data bias concerns. I work with each client to develop a bespoke solution based on these factors.

In some cases, the client might be a startup that is trying to land a large deal and needs to deliver a proof of concept. They may not have a large budget for data but they have an immediate need so they can get their business off the ground. It’s exciting to help these clients meet important milestones, and hopefully build a trusting long-term relationship as a result.

What advice do you have for companies working in AI when it comes to their data strategy?

Over the years that I’ve worked in the AI data space, I’ve seen firsthand how important it is to have a data strategy, and how at times this gets overlooked. But when companies invest in their data strategy upfront, they reduce the time and effort needed to get their machine learning models performing accurately, and they also reduce the risk of releasing a biased product.

Another insight I’ve learned is that a bigger dataset is not always better. For example, Large Language Models are hard to change because they are so big. It’s better to start small with a custom dataset that is tailored to a product and domain, and to determine how the dataset will be maintained over time.

If someone wants to connect to discuss their data needs with you, how should they get in contact?

Send us a message via our website, find me on LinkedIn, or come and meet us at our booth at one of the many events we will be attending this year. You can also follow us on LinkedIn to hear the latest news.