FReAD is a smart belt for the visually impaired that assists in identifying everyday objects. It attempts to naturally imbue the wearer with an augmented sense of perception emanating from their own hands. When the user picks any object up, the belt (connected to a bluetooth headset) will voice out an audio description of that object using advanced computer vision. We have described this result as akin to having a computational sense of touch.

Overview of the operation of FReAD; When user's hands are visible, light audio tones are emitted.; When user picks up the object and uses his tactile senses to explore it, audio feedback is given for detected features such as text and logos

Motivation

Most of accessibility work seems to follow a ‘deficit’ model – replacing a visual task with a point solution. Our motivation was to provide visually impaired persons with new kinds of sensations that could allow them to explore, imagine, play, and learn seamlessly.

During our initial user studies, we discovered that while several AI apps for object identification exist, there were severe usability issues. Users had difficulty framing photos and orienting objects correctly. More interestingly, there was an overwhelming need to touch the object and build a model of it’s tactile features. We thought this was an interesting aspect of their lived experience to augment, instead of simply substituting for their lost vision.

Fread camera view of a user holding a bottle of mango drink

Implementation

We used a 3D printed belt buckle to hold our RGB camera together with the LeapMotion sensor. For the purposes of the study, we connected both to a backpack laptop but this can be easily miniaturized. Using the LeapMotion and a set of custom heuristics, we were able to detect common gestures robustly. Depending on the gesture, trigger code would call OCR routines, calls to a crowd-worker API (CloudSight) or calls to cloud based object detection APIs like Microsoft Cognitive Services. When responses are ready, they are voiced to the user using a Text-To-Speech engine.

The choice of form factor enabled easy framing of objects while the deliberately chosen camera focus blurred background details, allowing image captioning software to correcty identify the object of interest.

Closeup of device on user

User Studies

You can find complete information about our user trials and evaluations in the submitted paper.