Clearing the Way
Using AI to help blind and low vision users ‘see’
For many blind people, entering a new environment can be like stepping into terra incognita. Unfamiliar settings can present as obstacle courses that require assistance from a sighted guide, complicating daily life and impeding autonomous movement.
Brian A. Smith, assistant professor of computer science, thought there must be a better way. Smith, who leads Columbia’s Computer-Enabled Abilities Laboratory, set out to develop an unobtrusive, wearable system that blind users can deploy anywhere to instantly convert visual information into audio cues. Pairing a computer-vision-enabled device with a novel interface, the system registers object position in a setting and then records a personal, detailed audio guide—one users can freely share in what he hopes could ultimately become a sort of crowdsourced aural “Google Maps.”
It’s part of Smith’s larger project to combine AI, gaming, augmented reality, sensing, and computer vision to help people better experience their world.
“Much of my research focuses on a central theme: how can machines represent our very complex visual world through audio?” says Smith. “I want to give blind users the information they need to navigate independently and the confidence to make decisions or themselves.”
To do that, Smith homes in on how sighted people use visual cues to navigate their environments at a fundamental level, an approach well illustrated by a gaming interface he created.
Rather than just adding a voice-over that tells players when to turn left or right, for instance—reducing the experience to a simple test of reaction speed—Smith sought to replicate the immersive world of video games without overloading players with excessive detail.
To create an actionable sound design, dubbed the RAD (racing auditory display) in honor of an early ’80s racing game called Rad Racer, Smith zeroed in on the player’s cognitive process. “I thought about the decisions that sighted players make from moment to moment and the information that they base those decisions on,” he says. “I then tried to ensure audio cues convey that same information so blind gamers get to make the same decisions from the same key pieces of information.” He designed the RAD’s two audio cues—its sound slider and turn indicator—to replicate that in real time.
Next up, Smith’s team is using these principles to render a wide variety of games accessible to blind people. In the meantime, his insights are making the real world that much easier to navigate.
Much of my research focuses on a central theme: how can machines represent our very complex visual world through audio?