Julia Stoyanovich
-
Institute Associate Professor
-
Director of the Center for Responsible AI
Dr. Julia Stoyanovich is Institute Associate Professor of Computer Science and Engineering, Associate Professor of Data Science, Director of the Center for Responsible AI, and member of the Visualization and Data Analytics Research Center at New York University. She is a recipient of the Presidential Early Career Award for Scientists and Engineers (PECASE) and a Senior member of the Association of Computing Machinery (ACM). Julia’s goal is to make “Responsible AI” synonymous with “AI”. She works towards this goal by engaging in academic research, education and technology policy, and by speaking about the benefits and harms of AI to practitioners and members of the public. Julia’s research interests include AI ethics and legal compliance, and data management and AI systems. Julia is engaged in technology policy and regulation in the US and internationally, having served on the New York City Automated Decision Systems Task Force, by mayoral appointment, among other roles. She received her M.S. and Ph.D. degrees in Computer Science from Columbia University, and a B.S. in Computer Science and in Mathematics & Statistics from the University of Massachusetts at Amherst.
NYU Affiliations:
- NYU Center for Data Science
- Computer Science & Engineering at NYU Tandon School of Engineering
Awards and Recognition
NSF CAREER: Querying Evolving Graphs (2018)
Member of the NYC automated decision systems task force, appointed by Mayor de Blasio (2018)
Co-PI on a NSF-BSF grant: Databases Meet Computational Social Choice (collaborative with UC Santa Cruz and Technion) (2018)
Lead PI on an NSF BIGDATA grant: Foundations of Responsible Data Management (collaborative with UW, UMich and UMass Amherst) (2017)
Research News
War's educational toll: NYU Tandon research reveals 78,000 Ukrainian students directly impacted by Russian war
Russia's invasion of Ukraine has displaced approximately 36,500 graduating high school students — 16% of the country's 2022 senior class — while causing an additional 41,500 students to abandon the traditional pathway to higher education entirely, according to a new study published in Nature's Humanities and Social Sciences Communications.
The research, conducted by a multi-disciplinary team based in the United States and Ukraine and led by Julia Stoyanovich — Director of NYU's Center for Responsible AI, Institute Associate Professor of Computer Science and Engineering at NYU Tandon School of Engineering, and Associate Professor of Data Science at the NYU Center for Data Science — shows that at least 78,000 students (34% of all graduating high school seniors) were directly impacted by the war in 2022.
The team completed the study as part of the RAI for Ukraine Research Program, which Stoyanovich founded at NYU Tandon with partners from Ukrainian Catholic University in Lviv, in response to the war's disruption of Ukrainian higher education. The remote program is open to undergraduate and graduate students who live in Ukraine and are enrolled in degree programs in computer science, information systems, and related fields at accredited Ukrainian universities.
These students — RAI Research Fellows — are mentored by academic researchers from U.S. and European universities, and conduct cutting-edge collaborative research on a range of responsible AI topics. Students receive academic credit and competitive stipends.
The Nature study represents the first systematic analysis of student displacement and educational disruption following Russia's 2022 invasion, providing data for policymakers and humanitarian organizations.
"To the best of our knowledge, no information is available about the impact of the war on the internal and external displacement of high school students," said Stoyanovich. "Our analysis has important implications for governmental organizations and human rights organizations working to address the crisis."
Of the 36,500 displaced students identified, 64% migrated abroad, with most heading to Poland (30.7%), Germany (26.9%), and the Czech Republic (8.3%). The remaining 36% were internally displaced within Ukraine, typically moving from front-line regions toward the central and western parts of the country.
The regions most affected were those along the war's front lines: Kherson, Donetsk, Luhansk, Kharkiv, and Mykolaiv oblasts, where between 41% and 100% of students were registered in their home regions but took exams elsewhere.
The analysis also uncovered disparities in how different demographic groups experienced the war's educational impacts. Among displaced students, 84% came from urban areas despite rural students making up 31% of all test-takers.
The most severely affected group was rural male students, who experienced the greatest decrease in exam participation. "The impact of the war on drop-off for rural-males was greater than for either test-takers living in rural areas or males, indicating an intersectional disadvantage," said Stoyanovich.
Beyond displacement, the study documented a 21% decline in students taking Ukraine's standardized higher education entrance exam in 2022 compared to 2021—representing 41,500 fewer students.
Ukraine's response included rapidly digitizing its paper-based exam system into a computer-based National Multi-subject Test. This transition required developing new software and delivering it to hundreds of thousands of students, making the exam available in 32 countries worldwide for the first time.
The study's methodology relied on comparing students' official registration locations with where they physically completed their standardized exams, a novel approach that revealed displacement patterns invisible to traditional surveys. The researchers overcame significant technical challenges to create their analysis, curating "a uniquely comprehensive dataset of standardized exam outcomes used for admissions to higher education institutions in Ukraine—analogous to the Standardized Aptitude Test (SAT) in the United States," according to the researchers. The dataset encompasses approximately 1.5 million graduating students across eight years.
Ukraine's period of decommunization and decentralization between 2016 and 2023 created substantial data consistency challenges. To solve this problem, researchers assigned unique identifiers to each physical location and educational institution, allowing them to track entities consistently despite name changes and territorial redistricting.
The researchers warn that "reversing 'brain drain'—to the extent it is even possible—is no easy feat for any country" and note that "the issue may be time-sensitive: as the war continues, some families become more deeply rooted in their lives abroad."
In addition to Stoyanovich, the paper's authors are Tetiana Zakharchenko and Nazarii Drushchak from Ukrainian Catholic University, Oleksandra Konopatska from both Ukrainian Catholic University and Kyiv School of Economics, Andrew Bell, Ph.D. candidate at NYU Tandon and Falaah Arif Khan, Ph.D. student at the NYU Center for Data Science.
The research was supported in part by a grant from the Simons Foundation.
Better transparency: Introducing contextual transparency for automated decision systems
LinkedIn Recruiter — a search tool used by professional job recruiters to find candidates for open positions — would function better if recruiters knew exactly how LinkedIn generates its search query responses, possible through a framework called “contextual transparency.”
That is what a team of researchers led by NYU Tandon’s Mona Sloane, a Senior Research Scientist at the NYU Center for Responsible AI and a Research Assistant Professor in the Technology, Culture and Society Department, advance in a provocative new study published in Nature Machine Intelligence.
The study is a collaboration with Julia Stoyanovich, Institute Associate Professor of Computer Science and Engineering, Associate Professor of Data Science, and Director of the Center for Responsible AI at New York University, as well as Ian René Solano-Kamaiko, Ph.D. student at Cornell Tech; Aritra Dasgupta, Assistant Professor of Data Science at New Jersey Institute of Technology; and Jun Yuan, Ph.D. Candidate at New Jersey Institute of Technology.
It introduces the concept of contextual transparency, essentially a “nutritional label” that would accompany results delivered by any Automated Decision System (ADS), a computer system or machine that uses algorithms, data, and rules to make decisions without human intervention. The label would lay bare the explicit and hidden criteria — the ingredients and the recipe — within the algorithms or other technological processes the ADS uses in specific situations.
LinkedIn Recruiter is a real-world ADS example — it “decides” which candidates best fit the criteria the recruiter wants — but different professions use ADS tools in different ways. The researchers propose a flexible model of building contextual transparency — the nutritional label — so it is highly specific to the context. To do this, they recommend three “contextual transparency principles” (CTP) as the basis for building contextual transparency, each of which relies on an approach related to an academic discipline.
- CTP 1: Social Science for Stakeholder Specificity: This aims to identify the professionals who rely on a particular ADS system, how exactly they use it, and what information they need to know about the system to do their jobs better. This can be accomplished through surveys or interviews.
- CTP 2: Engineering for ADS Specificity: This aims to understand the technical context of the ADS used by the relevant stakeholders. Different types of ADS operate with different assumptions, mechanisms and technical constraints. This principle requires an understanding of both the input, the data being used in decision-making, and the output, how the decision is being delivered back.
- CTP 3: Design for Transparency- and Outcome-Specificity: This aims to understand the link between process transparency and the specific outcomes the ADS system would ideally deliver. In recruiting, for example, the outcome could be a more diverse pool of candidates facilitated by an explainable ranking model
Researchers looked at how contextual transparency would work with LinkedIn Recruiter, in which recruiters use Boolean searches — AND, OR, NOT written queries — to receive ranked results. Researchers found that recruiters do not blindly trust ADS-derived rankings and typically double-check ranking outputs for accuracy, oftentimes going back and tweaking keywords. Recruiters told researchers that the lack of ADS transparency challenges efforts to recruit for diversity.
To address the transparency needs of recruiters, researchers suggest that the nutritional label of contextual transparency include passive and active factors. Passive factors comprise information that is relevant to the general functioning of the ADS and the professional practice of recruiting in general, while active factors comprise information that is specific to the Boolean search string and therefore changes.
The nutritional label would be inserted into the typical workflow of LinkedIn Recruiter users, providing them information that would allow them to both assess the degree to which the ranked results satisfy the intent of their original search, and to refine the Boolean search string accordingly to generate better results.
To evaluate whether this ADS transparency intervention did achieve the change that can reasonably be expected, researchers suggest using stakeholder interviews about potential change in use and perception of ADS alongside participant diaries documenting professional practice and A/B testing (if possible).
Contextual transparency is an approach that can be used for AI transparency requirements that are mandated in new and forthcoming AI regulation in the US and Europe, such as the NYC Local Law 144 of 2021 or the EU AI Act.
Teaching Responsible Data Science: Charting New Pedagogical Territory
Julia Stoyanovich, director of the Center for Responsible AI (R/AI) at NYU Tandon, and assistant professor of computer science and engineering and of data science, co-authored this paper with Armanda Lewis, a graduate student pursuing her master’s at the NYU Center for Data Science.
The authors detail their development of and pedagogy for a technical course focused on responsible data science, which tackles the issues of ethics in AI, legal compliance, data quality, algorithmic fairness and diversity, transparency of data and algorithms, privacy, and data protection.
The ability to interpret machine-assisted decision-making is an important component of responsible data science that gives a good lens through which to see other responsible data science topics, including privacy and fairness. The researchers’ study includes best practices for teaching technical data science and AI courses that focus on interpretability, and tying responsible data science to current learning science and learning analytics research.
The work also explores the use of “nutritional labels” — a family of interpretability tools that are gaining popularity in responsible data science research and practice — for interpreting machine learning models.
- In the paper, the investigators offer a description of a unique course on responsible data science that is geared toward technical students, and incorporates topics from social science, ethics and law.
- The work connects theories and advances within the learning sciences to the teaching of responsible data science, specifically, interpretability — allowing humans to understand, trust and, if necessary, contest the computational process and its outcomes. The study asserts that interpretability is central to the critical study of the underlying computational elements of machine learning platforms.
- The collaborators assert that they are among the first to consider the pedagogical implications of responsible data science, creating parallels between cutting-edge data science research and cutting-edge educational research within the fields of learning sciences, artificial intelligence in education, and learning analytics and knowledge.
Additionally, the authors propose a set of pedagogical techniques for teaching the interpretability of data and models, positioning interpretability as a central integrative component of responsible data science.