Artificial intelligence (AI) has seen some amazing breakthroughs in the past few months, most recently in computer vision, which trains computers to analyze visual data and make decisions based on that analysis, and in natural language processing (NLP), which teaches computers to understand human language in spoken and written form. Each of these disciplines is powerful in its own right. Together, they have the potential to help companies solve problems in new ways, saving time, improving customer service, and developing stronger connections between what people see, hear and say. In this blog post, we’ll highlight real-world breakthroughs and identify the potential business benefits of integrating natural language processing and computer vision.

Real-world examples of integrating computer vision and NLP

A few companies have already started integrating computer vision and NLP, specifically with the goal of providing greater accessibility. One example is the OrCam MyEye, a device that clips onto a pair of glasses. The MyEye’s optical sensor combines computer vision and natural language processing to take in the wearer’s surroundings and then audibly describe what it sees, whether physically around them or on a page in front of them.

Aside from this, perhaps the most notable breakthroughs we’ve seen are the limited release of OpenAI’s Dall-E 2 algorithm, which transforms written scene descriptions into “original, realistic” images, and  Make-A-Scene, a research project offering much the same capabilities, announced by Meta just a few days ago.

It’s easy to dismiss experiments like these as nothing more than cutting-edge parlor tricks, but both examples demonstrate the potential of integrating NLP and computer vision, and what will soon be possible as Dall-E 2 and Make-A-Scene become the foundation for many applications.  

Future possibilities to solve long-standing problems

Computer vision offers the ability to sense surroundings and process the information it’s taken in. Likewise, NLP enables the understanding of spoken or written language—and knowing which words to string together to communicate a prescribed message, much the same way as humans do. And because computers can perform these tasks flawlessly, they’re perfectly suited to repeatedly detect objects, recognize patterns and communicate back what they see. 

By combining these two abilities companies can begin to create accurate descriptions, either visually or verbally, exemplifying the old adage that “a picture is worth a thousand words.” 

With this in mind, SciForce, a Ukrainian AI development company identified a few areas in which computer vision and NLP can help solve some “long-standing problems”: 

Describing medical images

Patient care teams rely heavily on CT, PET, MRI, and X-ray imagery to diagnose patients and determine the best treatment options. Computer vision has already proven a worthy tool in analyzing such imagery, and in helping radiation oncologists to accurately diagnose cancer patients and develop personalized treatment plans. Given the shortage of qualified medical personnel, computer vision and NLP could be a useful tool for not only performing an initial analysis of imagery and helping doctors diagnose a patient, but also to prepare an initial report of its findings and save valuable time in the treatment process.

Visualizing creative briefs and customer requirements

In any field involving the design and fabrication of a physical product or space, creation of the end product can be wrought with miscues and unmet expectations. Rather than relying solely on examples of other work that a client likes, converting their project requirements into a visual could help shorten the initial phases of a project, saving time and money.

Bridging the communications divide

Perhaps the most potent example of integrating computer vision and NLP is the ability to empower people who are deaf, hard of hearing, blind or visually impaired. According to the CDC, approximately “12 million people 40 years or older in the United States have vision impairment” and “15% of American adults (37.5 million) aged 18 and over reported some trouble hearing without a hearing aid.” 

Integrating NLP and computer vision enables the design of assistive technology solutions for people who are deaf, converting their sign language into visuals or text. Alternatively, someone else’s speech could be transformed into an image, thus bridging the communications gap. And for people who are blind or visually impaired, a solution can be created, using computer vision to take in their physical surroundings and then verbally describe it back to them. 

Together, computer vision and NLP have a vast potential to benefit society and solve problems in completely new ways. The above examples showcase how companies can experience breakthroughs and leverage the best of both technologies to create something that’s truly unique.