Research News
Efficient object manipulation planning with Monte Carlo Tree Search
This paper was a finalist for the Best Paper Award on Mobile Manipulation at IROS 2023. It is one of 5 finalists out of 2760 submitted papers at one of the largest robotics conferences in the world.
In the field of robotics, the intricate dance of planning how machines touch and maneuver objects is a linchpin for granting them the autonomy to execute intricate tasks. However, this endeavor remains a challenge. The pursuit of identifying dynamically viable sequences of contacts between robotic manipulators and objects often unravels into a web of formidable combinatorial and nonlinear complications.
Consider the ostensibly simple act of reorienting an object nestled on a table using a two-fingered robotic hand. The strategic planning of contacts necessitates a thoughtful consideration of interaction forces. For instance, a cube succumbs to rotation through forces applied from its sides. In stark contrast, when faced with a slender plate, the robotic fingers must exert a downward pressure — initiating a frictional interplay — to achieve a comparable outcome.
The crux of the challenge lies in the nuanced orchestration of interaction forces and judicious contact switches. As the robotic fingers approach their kinematic limits, a process of breaking and re-establishing contacts becomes imperative to propel the object into further rotation. These twin challenges, encapsulated in interaction forces and contact switches, underscore the complexity inherent in planning the manipulation of objects.
Over the past decade, trajectory optimization has emerged as the favored approach for orchestrating multi-contact motion planning. This methodology holds sway due to its capacity to construct efficient formulations for navigating the intricate terrain of interaction forces. Yet, a lingering problem persists—the effective integration of planning for contact modes remains elusive, primarily attributed to its discrete nature, injecting a disruptive discontinuity into the dynamics at the crucial juncture of contact switches.
Now, a team of NYU researchers led by Ludovic Righetti, Associate Professor of Electrical and Computer Engineering Department and Mechanical and Aerospace Engineering, as well as a member of the Center for Urban Science + Progress, have developed a strategy for planning object manipulation, leveraging the power of Monte Carlo Tree Search (MCTS) to discern optimal contact sequences. Complementing this, an adept trajectory optimization algorithm, rooted in Alternating Direction Method of Multipliers (ADMM), evaluates the dynamic feasibility of potential contact sequences. The algorithm was previously developed by Righetti’s group, The Machines in Motion Laboratory.
The team, including Ph.D. students Huaijiang Zhu and Avadesh Meduri, made a key innovation in expediting MCTS involving the development of a goal-conditioned policy-value network, guiding the search toward promising nodes. Additionally, manipulation-specific heuristics prove instrumental in markedly shrinking the search space.
The efficacy of the approach is underscored through a series of meticulous object manipulation experiments conducted both in a physics simulator and on tangible hardware. The methodology exhibits a favorable scalability for protracted manipulation sequences, a testament to the learned policy-value network.
This advancement substantially elevates the planning success rate, marking a significant stride in the realm of object manipulation planning.
Zhu, H., Meduri, A., & Righetti, L. (2023, March 19). Efficient object manipulation planning with Monte Carlo Tree Search. arXiv.org. https://arxiv.org/abs/2206.09023
A large-scale analytical residential parcel delivery model evaluating greenhouse gas emissions, COVID-19 impact, and cargo bikes
The e-commerce industry, which has seen remarkable growth over the past decade, experienced an even more accelerated surge in the wake of the COVID-19 pandemic. This exponential rise in online shopping has triggered a corresponding boom in the parcel delivery sector. However, a glaring gap exists in our understanding of the extensive social and environmental repercussions of this burgeoning industry.
To bridge this knowledge void, researchers at NYU Tandon led by Joseph Chow, Institute Associate Professor of Civil and Urban Engineering and Deputy Director of the C2SMARTER University Transportation Center, have proposed a comprehensive model to scrutinize the multifaceted impacts stemming from the parcel delivery surge. The model's architecture incorporates a parcel generation process, ingeniously converting publicly available data into precise figures detailing parcel volumes and delivery destinations. Additionally, a sophisticated continuous approximation model has been meticulously calibrated to gauge the lengths of parcel service routes.
The veracity of this model was subjected to rigorous examination through a real-world case study, employing a trove of data from the labyrinthine streets of New York City. Impressively, the parcel generation process demonstrated an impressive degree of fidelity to the actual data. Even more striking were the high R2 values, consistently hovering at 98% or greater, characterizing the model's ability to approximate reality. Validation of the model's output was further solidified by comparing it against the tangible UPS truck journeys.
Applying this model to the year 2021, it emerged that residential parcel deliveries in NYC constituted 0.05% of the total daily vehicle-kilometers traveled (VKT), equivalent to a staggering 14.4 metric tons of carbon emissions per day. The COVID-19 pandemic substantially contributed to a surge in parcel deliveries, culminating in an alarming annual greenhouse gas (GHG) emissions figure of 1064.3 metric tons of carbon equivalent (MTCE) within the city's boundaries. To put this in perspective, this is sufficient to power the homes of 532 standard US households for an entire year.
A ray of hope emerges in the form of NYC's existing bike lane infrastructure, which has the capacity to seamlessly replace 17% of parcel deliveries with eco-friendly cargo bikes, thereby precipitating an 11% reduction in VKT. By strategically augmenting this infrastructure with 3 kilometers of bike lanes connecting Amazon facilities, the cargo bike substitution benefit skyrockets from 5% to an impressive 30% reduction in VKT. The prospect becomes even more promising with the construction of an additional 28 kilometers of bike lanes citywide, potentially pushing parcel delivery substitution via cargo bikes from 17% to a remarkable 34%, concurrently saving an extra 2.3 MTCE per day.
Notably, the prioritization of cargo bike deployments holds the potential to disproportionately benefit lower-income neighborhoods, including but not limited to Harlem, Sunset Park, and Bushwick, by substantially curtailing GHG emissions in these communities.
Hai Yang, Hector Landes, Joseph Y.J. Chow, "A large-scale analytical residential parcel delivery model evaluating greenhouse gas emissions, COVID-19 impact, and cargo bikes," International Journal of Transportation Science and Technology, 2023, ISSN 2046-0430.
Microscopy image segmentation via point and shape regularized data synthesis
In contemporary deep learning-based methods for segmenting microscopic images, there's a heavy reliance on extensive training data that requires detailed annotations. This process is both expensive and labor-intensive. An alternative approach involves using simpler annotations, such as marking the center points of objects. While not as detailed, these point annotations still provide valuable information for image analysis.
In this study, researchers from NYU Tandon and University Hospital Bonn in Germany assume that only point annotations are available for training and present a novel method for segmenting microscopic images using artificially generated training data. Their framework consists of three main stages:
1. Pseudo Dense Mask Generation: This step takes the point annotations and creates synthetic, detailed masks that are constrained by shape information.
2. Realistic Image Generation: An advanced generative model, trained in a unique way, transforms these synthetic masks into highly realistic microscopic images while maintaining consistency in object appearance.
3. Specialized Model Training: The synthetic masks and generated images are combined to create a dataset used to train a specialized model for image segmentation.
The research was led by Guido Gerig, Institute Professor of Computer Science and Engineering and Biomedical Engineering, alongside PhD students Shijie Li and Mengwei Ren, as well as Thomas Ach at University Hospital Bonn. The three NYU Tandon researchers are also members of the Visualization and Data Analytics (VIDA) Research Center.
The researchers tested their method on a publicly available dataset and found that their approach produced more diverse and realistic images compared to conventional methods, all while maintaining a strong connection between the input annotations and the generated images. Importantly, when compared to models trained using other methods, their models, trained on synthetic data, outperformed them significantly. Moreover, their framework achieved results on par with models trained using labor-intensive, highly detailed annotations.
This research highlights the potential of using simplified annotations and synthetic data to streamline the process of segmenting microscopic images, potentially reducing the need for extensive manual annotation efforts. The research, in collaboration with the Ophthalmology department at University Hospital Bonn, is a first step in a collaboration to finally process three dimensional retinal cell images of the human eye from subjects diagnosed for age-related macular degeneration (AMD), a leading cause of vision loss in older adults.
The code for this method is publicly available for further exploration and implementation.
“Microscopy Image Segmentation via Point and Shape Regularized Data Synthesis.” S Li, M Ren, T Ach, G Gerig. arXiv preprint, arXiv:2308.09835, 2023.
NYU Tandon researchers unveil tool to help developers create augmented reality task assistants
Augmented reality (AR) technology has long fascinated both the scientific community and the general public, remaining a staple of modern science fiction for decades.
In the pursuit of advanced AR assistants – ones that can guide people through intricate surgeries or everyday food preparation, for example – a research team from NYU Tandon School of Engineering has introduced Augmented Reality Guidance and User-Modeling System, or ARGUS.
An interactive visual analytics tool, ARGUS is engineered to support the development of intelligent AR assistants that can run on devices like Microsoft HoloLens 2 or MagicLeap. It enables developers to collect and analyze data, model how people perform tasks, and find and fix problems in the AR assistants they are building.
Claudio Silva, NYU Tandon Institute Professor of Computer Science and Engineering and Professor of Data Science at the NYU Center for Data Science, leads the research team that will present its paper on ARGUS at IEEE VIS 2023 on October 26, 2023, in Melbourne Australia. The paper received Honorable Mention in that event’s Best Paper Awards.
“Imagine you’re developing an AR AI assistant to help home cooks prepare meals,” said Silva. “Using ARGUS, a developer can monitor a cook working with the ingredients, so they can assess how well the AI is performing in understanding the environment and user actions. Also, how the system is providing relevant instructions and feedback to the user. It is meant to be used by developers of such AR systems.”
ARGUS works in two modes: online and offline.
The online mode is for real-time monitoring and debugging while an AR system is in use. It lets developers see what the AR system sees and how it's interpreting the environment and user actions. They can also adjust settings and record data for later analysis.
The offline mode is for analyzing historical data generated by the AR system. It provides tools to explore and visualize this data, helping developers understand how the system behaved in the past.
ARGUS’ offline mode comprises three key components: the Data Manager, which helps users organize and filter AR session data; the Spatial View, providing a 3D visualization of spatial interactions in the AR environment; and the Temporal View, which focuses on the temporal progression of actions and objects during AR sessions. These components collectively facilitate comprehensive data analysis and debugging.
“ARGUS is unique in its ability to provide comprehensive real-time monitoring and retrospective analysis of complex multimodal data in the development of systems,” said Silva. “Its integration of spatial and temporal visualization tools sets it apart as a solution for improving intelligent assistive AR systems, offering capabilities not found together in other tools.”
ARGUS is open source and available on GitHub under VIDA-NYU. The work is supported by the DARPA Perceptually-enabled Task Guidance (PTG) program.
ARGUS: Visualization of AI-Assisted Task Guidance in AR
Postintervention monitoring of peripheral arterial disease wound healing using dynamic vascular optical spectroscopy
Peripheral arterial disease (PAD) is a vascular disease that is caused by clogging of the arteries due to plaque. The lower legs and feet are often impacted, and symptoms include pain, numbness, difficulty walking, and non-healing wounds.
For many patients, wounds continue not to heal even after they have undergone a surgical revascularization procedure designed to unclog the affected arteries, necessitating another intervention. Determining whether wounds will heal must be done as soon as possible after the first intervention to reduce the duration of a patient’s pain and increase the likelihood of a good outcome.
Currently, the most common methods for monitoring PAD progression (and related wound healing) are the ankle-brachial index (ABI) and ultrasound imaging. The ABI uses the ratio of systolic blood pressure measurements from arteries in the lower extremities to the systolic blood pressure measurement from the brachial artery in the arm. PAD patients generally have pressure ratios that are below a certain threshold (often 0.9). The ABI has low accuracy when monitoring vasculature in diabetic patients and ultrasound imaging has low accuracy when monitoring smaller arteries such as those in the feet. Unfortunately, PAD is often comorbid with diabetes and the affected arteries are commonly in the feet.
To address the existing limitations of the current technology, a team from NYU Tandon School of Engineering’s department of Biomedical Engineering, including lead author Nisha Maheshwari from Andreas Hielscher's Clinical Biophotonics Laboratory, developed optical imaging technology to assist physicians with monitoring the healing of lower limb ulcers after a surgical intervention.
Dynamic vascular optical spectroscopy (DVOS) is an optical imaging technology that uses light in the red and near-infrared ranges to determine characteristics of blood flow through arteries. In the paper “Postintervention monitoring of peripheral arterial disease wound healing using dynamic vascular optical spectroscopy,” published in the Journal of Biomedical Optics, the team used their DVOS system to monitor a set of 14 patients with PAD that underwent a surgical revascularization procedure. Of these patients, five needed a second intervention due to the persistence of non-healing wounds.
The team was able to correctly categorize the long-term healing and non-healing of wounds in 93% of this patient population within only one month after each patient’s initial intervention. The method outperformed the gold standard methods of ultrasound and ABI. These findings suggest that the DVOS may be able to assist physicians in improving patient outcomes and reducing long-term pain by determining wound outcome earlier than existing technology can.
The authors would like to thank the patients who volunteered for this study for their time and participation. This work was supported in part by the National Heart, Lung, and Blood Institute (Grant No. NHLBI-1R01-HL115336); Wallace H. Coulter Foundation; Society of Vascular Surgery; Columbia University Fu Foundation School of Engineering and Applied Science; and New York University Tandon School of Engineering.
"Maheshwari N, Marone A, Altoé M, Kim SHK, Bajakian DR, Hielscher AH. Postintervention monitoring of peripheral arterial disease wound healing using dynamic vascular optical spectroscopy. J Biomed Opt. 2022 Dec;27(12):125002. doi: 10.1117/1.JBO.27.12.125002. Epub 2022 Dec 24. PMID: 36582192; PMCID: PMC9789744."
NYU Tandon School of Engineering researchers develop algorithm for safer self-driving cars
In a promising development for self-driving car technology, a research team at NYU Tandon School of Engineering has unveiled an algorithm — known as Neurosymbolic Meta-Reinforcement Lookahead Learning (NUMERLA) — that could address the long-standing challenge of adapting to unpredictable real-world scenarios while maintaining safety.
The research was conducted by Quanyan Zhu, NYU Tandon associate professor of electrical and computer engineering, and his Ph.D. candidate Haozhe Lei.
Artificial intelligence and machine learning have helped self-driving cars operate in increasingly intricate scenarios, allowing them to process vast amounts of data from sensors, make sense of complex environments, and navigate city streets while adhering to traffic rules.
As they venture beyond controlled environments into the chaos of real-world traffic, however, such vehicles’ performance can falter, potentially leading to accidents.
NUMERLA aims to bridge the gap between safety and adaptability. The algorithm achieves this by continuously updating safety constraints in real-time, ensuring that self-driving cars can navigate unfamiliar scenarios while maintaining safety as the top priority.
The NUMERLA framework operates as follows: When a self-driving car encounters an evolving environment, it uses observations to adjust its “belief” about the current situation. Based on this belief, it makes predictions about its future performance within a specified timeframe. It then searches for appropriate safety constraints and updates its knowledge base accordingly.
The car's policy is adjusted using lookahead optimization with safety constraints, resulting in a suboptimal but empirically safe online control strategy.
One of the key innovations of NUMERLA lies in its lookahead symbolic constraints. By making conjectures about its future mode and incorporating symbolic safety constraints, the self-driving car can adapt to new situations on the fly while still prioritizing safety.
The researchers tested NUMERLA in a computer platform that simulates urban environments – specifically to ascertain its ability to accommodate jaywalkers — and it outperformed other algorithms in those scenarios.
Lei, Haozhe & Zhu, Quanyan. (2023). Neurosymbolic Meta-Reinforcement Lookahead Learning Achieves Safe Self-Driving in Non-Stationary Environments.
New model predicts likely power outages from hurricanes more accurately than conventional predictive techniques
Utility companies are generally well-equipped to handle routine blackouts, but often struggle with extreme weather events like hurricanes.
Conventional hurricane power-outage prediction models often produce incomplete or incorrect results, hampering companies’ abilities to prepare to restore power as quickly as possible, especially in cities that are susceptible to prolonged hurricane-induced power outages.
New research from NYU Tandon School of Engineering may help solve that problem.
By combining wind speed and precipitation data with data about an area’s land use patterns — which reflect variations in power infrastructure between rural and urban areas — and population density — as an indicator of the number of transformers present — researchers are moving towards a more accurate physics-driven hurricane-induced power outage predictive model than techniques currently in widespread use.
Luis Ceferino, a civil and urban engineering (CUE) assistant professor and Prateek Arora, a CUE Ph.D. candidate, presented the research at the 14th International Conference on Applications of Statistics and Probability in Civil Engineering (ICASP 14), held from July 9 - 13, 2023 in Dublin, Ireland.
In May 2023, Natural Hazards and Earth System Sciences published the duo’s paper evaluating the limits of existing power-outage prediction models. The paper discussed those models’ restricted applicability due to reliance on data from specific regions and utility companies; unbounded predictions; difficulties in extrapolating to high wind conditions; and inadequate handling of uncertainties and variance in outage data during extreme weather events.
Compensating for those constraints, the research team is training its model with historical outage data from Hurricanes Harvey (2017), Michael (2018), and Isaias (2020). The model accounts for the nonlinear relationships between input parameters — meaning changes in one variable that do not result in proportional or consistent changes in another variable — and the likelihood of power outages.
In its ICASP 14 paper, the team focused on two key performance indices: the System Average Interruption Frequency Index (SAIFI) and the System Average Interruption Duration Index (SAIDI). SAIFI measures how often customers experience power outages and SAIDI reflects the total time customers spend without power in a year. These indices are pivotal in determining the efficiency and resilience of power systems during extreme weather events.
The research team used probabilistic modeling to compute/obtain the SAIFI and SAIDI for a 10-year return period in New Jersey. It revealed that rural areas face greater likelihood of outages than urban areas, when wind speed is the only damaging factor. The team is continuing to build the model, and upcoming research will incorporate storm surge effects, especially relevant for coastal blackout predictions.
By mapping out the potential scenarios and probabilities of power disruptions, this research project can equip stakeholders including utility companies and regulatory bodies with insights for strategic decision-making. This could include targeted resource allocation, infrastructure upgrades, and even the development of emergency response plans that mitigate the adverse impact of hurricanes on power systems.
New AI model developed at NYU Tandon can alter apparent ages of facial images while retaining identifying features, a breakthrough in the field
NYU Tandon School of Engineering researchers developed a new artificial intelligence technique to change a person’s apparent age in images while maintaining their unique identifying features, a significant step forward from standard AI models that can make people look younger or older but fail to retain their individual biometric identifiers.
In a paper published in the proceedings of the IEEE International Joint Conference on Biometrics (IJCB), Sudipta Banerjee, the paper’s first author and a research assistant professor in the Computer Science and Engineering (CSE) Department, and colleagues trained a type of generative AI model – a latent diffusion model – to “know” how to perform identity-retaining age transformation.
To do this, Banerjee – working with CSE PhD candidate Govind Mittal and PhD graduate Ameya Joshi, under the guidance of Chinmay Hegde, CSE associate professor and Nasir Memon, CSE professor – overcame a typical challenge in this type of work, namely assembling a large set of training data consisting of images that show individual people over many years.
Instead, the team trained the model with a small set of images of an individual, along with a separate set of images with captions indicating the age category of the person represented: child, teenager, young adult, middle-aged, elderly, or old. This set included images of celebrities captured throughout their lives.
The model learned the biometric characteristics that identified individuals from the first set. The age-captioned images taught the model the relationship between images and age. The trained model could then be used to simulate aging or de-aging by specifying a target age using a text prompt.
Researchers employed a method called "DreamBooth" for editing human face images by gradually modifying them using a combination of neural network components. The method involves adding and removing noise – random variations or disturbances – to images while considering the underlying data distribution.
The approach utilizes text prompts and class labels to guide the image generation process, focusing on maintaining identity-specific details and overall image quality. Various loss functions are employed to fine-tune the neural network model, and the method's effectiveness is demonstrated through experiments on generating human face images with age-related changes and contextual variations.
The researchers tested their method against other existing age-modification methods, by having 26 volunteers match the generated image with an actual image of that person, and with ArcFace, a facial recognition algorithm. They found their method outperformed other methods, with a decrease of up to 44% in the rate of incorrect rejections.
People’s everyday pleasures may improve cognitive arousal and performance
UPDATE March 4, 2024: The data set that Faghih’s lab collected for this research is now available to the global research community on the PhysioNet platform. This dataset is unique, offering real-world insights into how common pleasures affect our physiological responses and cognitive performance.
The potential of this dataset is vast. It opens new avenues for research into the influence of everyday experiences on cognitive performance, potentially leading to smarter work environments or personalized life-enhancing strategies. Imagine tailoring your work environment with specific sounds or scents to boost productivity and creativity. By analyzing this dataset, researchers can discover patterns and connections previously unseen. This could lead to breakthroughs in understanding how to harness everyday experiences to enhance cognitive abilities. Ultimately, this research could pave the way for innovative applications in workplace productivity enhancement and educational method improvement.
“This dataset is more than a collection of data points; it is a window into the intricate relationship between daily pleasures and our brain's performance,” says Fekri Azgomi, Faghih’s former PhD student who collected this data. “As our lab, the Computational Medicine Laboratory, shares this dataset with the world, we are excited about the endless possibilities it holds for advancing our understanding of the human mind and enhancing everyday life.”
Original story below.
Listening to music and drinking coffee are the sorts of everyday pleasures that can impact a person’s brain activity in ways that improve cognitive performance, including in tasks requiring concentration and memory.
That’s a finding of a new NYU Tandon School of Engineering study involving MINDWATCH, a groundbreaking brain-monitoring technology.
Developed over the past six years by NYU Tandon's Biomedical Engineering Associate Professor Rose Faghih, MINDWATCH is an algorithm that analyzes a person's brain activity from data collected via any wearable device that can monitor electrodermal activity (EDA). This activity reflects changes in electrical conductance triggered by emotional stress, linked to sweat responses.
In this recent MINDWATCH study, published in Nature Scientific Reports, subjects wearing skin-monitoring wristbands and brain monitoring headbands completed cognitive tests while listening to music, drinking coffee and sniffing perfumes reflecting their individual preferences. They also completed those tests without any of those stimulants.
The MINDWATCH algorithm revealed that music and coffee measurably altered subjects’ brain arousal, essentially putting them in a physiological “state of mind” that could modulate their performance in the working memory tasks they were performing.
Specifically, MINDWATCH determined the stimulants triggered increased “beta band” brain wave activity, a state associated with peak cognitive performance. Perfume had a modest positive effect as well, suggesting the need for further study.
“The pandemic has impacted the mental well-being of many people across the globe and now more than ever, there is a need to seamlessly monitor the negative impact of everyday stressors on one's cognitive function,” said Faghih. “Right now MINDWATCH is still under development, but our eventual goal is that it will contribute to technology that could allow any person to monitor his or her own brain cognitive arousal in real time, detecting moments of acute stress or cognitive disengagement, for example. At those times, MINDWATCH could ‘nudge’ a person towards simple and safe interventions — perhaps listening to music — so they could get themselves into a brain state in which they feel better and perform job or school tasks more successfully.”
The specific cognitive test used in this study — a working memory task, called the n-back test — involves presenting a sequence of stimuli (in this case, images or sounds) one by one and asking the subject to indicate whether the current stimulus matches the one presented "n" items back in the sequence. This study employed a 1-back test — the participant responded "yes" when the current stimulus is the same as the one presented one item back — and a more challenging 3-back test, asking the same for three items back.
Researchers tested three types of music - energetic and relaxing music familiar to the subject, as well as novel AI-generated music that reflected the subject’s tastes. Consistent with prior MINDWATCH research, familiar energetic music delivered bigger performance gains — as measured by reaction times and correct answers — than relaxing music. While AI-generated music produced the biggest gains among all three, further research is needed to confirm those results.
Drinking coffee led to notable but less-pronounced performance gains than music, and perfume had the most modest gains.
Performance gains under all stimulations tended to be higher on the 3-back tests, suggesting interventions may have the most profound effect when “cognitive load” is higher.
Ongoing experimentation by the MINDWATCH team will confirm the efficacy of the technology’s ability to monitor brain activity consistently, and the general success of various interventions in modulating that brain activity. Determining a category of generally successful interventions does not mean that any individual person will find it works for them.
The research was performed as a part of Faghih’s National Science Foundation CAREER award on the Multimodal Intelligent Noninvasive brain state Decoder for Wearable AdapTive Closed-loop arcHitectures (MINDWATCH) project. The study's diverse dataset is available to researchers, allowing additional research on the use of the safe interventions in this study to modulate brain cognitive states.
Faghih served as the senior author for this paper. Its first author is Hamid Fekri Azgomi, who earned his Ph.D. under Faghih and is now a postdoctoral scholar of neurological surgery at the University of California San Francisco School of Medicine.
Fekri Azgomi, H., F. Branco, L.R., Amin, M.R. et al. Regulation of brain cognitive states through auditory, gustatory, and olfactory stimulation with wearable monitoring. Sci Rep 13, 12399 (2023). https://doi.org/10.1038/s41598-023-37829-z
New app developed at NYU Tandon promises to make navigating subway stations easier for people with blindness and low vision
A new trip-planning app has shown encouraging results in improving navigation inside subway stations, according to a study published in IEEE Journal of Translational Engineering in Health and Medicine, promising the possibility of easier commutes for people who are blind and low-vision.
Designed by researchers at NYU Tandon School of Engineering and NYU Grossman School of Medicine, Commute Booster routes public-transportation users through the “middle mile” — the part of a journey inside subway stations or other similar transit hubs — in addition to the “first” and “last” miles that bring travelers to and from those hubs.
“The ‘middle mile’ often involves negotiating a complex network of underground corridors, ticket booths and subway platforms. It can be treacherous for people who cannot rely on sight,” said John-Ross Rizzo, MD, who led the research team that includes advisors from New York City’s Metropolitan Transit Authority (MTA). Rizzo is an associate professor in NYU Tandon’s Biomedical Engineering department and is on the faculty of NYU Grossman. “Most GPS-enabled navigation apps address ‘first’ and ‘last’ miles only, so they fall short of meeting the needs of blind or low-vision commuters. Commute Booster is meant to fill that gap.”
Subway signs are typically graphical or text-based, creating challenges for the visually impaired to recognize from distances and reducing their ability to be autonomous in unfamiliar environments.
Commute Booster automatically figures out what signs a traveler will encounter along the way to a specific subway platform. Then, it uses a smartphone’s camera to recognize and interpret signs posted inside transit hubs, ignoring irrelevant ones and prompting users to follow relevant ones only.
In the recent study, researchers tested Commute Booster’s interpretation of signage from three New York City subway stations — Jay Street-Metrotech, Dekalb Avenue and Canal Street — that a traveler would encounter on a specific journey. The app proved 97 percent accurate in identifying signs relevant to reach the intended destination.
Testing inside those three subway stations also revealed that Commute Booster could “read” signs from distances and at angles that reflect expected physical positioning of travelers.
The Commute Booster system relies on two technological components. The first, general transit feed specification (GTFS), is a standardized way for public transportation agencies to share their transit data with developers and third-party applications. The second, optical character recognition (OCR), is technology that can translate images of text into actual editable text.
The GTFS dataset contains descriptions for locations and pathways within each subway station. Commute Booster’s algorithm uses this information to generate a comprehensive list of wayfinding signage within subway stations that users would encounter during their intended journey. The OCR functionality reads all texts presented to users in their immediate surroundings. Commute Booster’s algorithm can identify relevant navigation signs and locate the position of signs in the immediate environments. By integrating these two components, Commute Booster provides real-time feedback to users regarding the presence or absence of relevant navigation signs within the field of view of their phone camera during their journey.
Researchers plan to conduct a human subject study of Commute Booster in the near future. The app could be available for public use in the near term.
Rizzo, who was named to MTA’s board in June 2023, has a long track record of research that applies engineering solutions to challenges faced by people with disabilities, particularly those with visual disability .
In addition to Rizzo, the team involved in the Commute Booster study are NYU Tandon PhD candidate Junchi Feng; Physician-Scientist at NYU Langone’s Rusk Rehabilitation Mahya Beheshti; MTA Senior Innovation Strategist Mira Philipson; MTA Senior Accessibility Officer Yuvraj Ramsaywack; and NYU Tandon Institute Professor Maurizio Porfiri.
This research was supported by the National Science Foundation, the National Eye Institute and Fogarty International Center, as well as by the U.S. Department of Defense.