Selected Work

Projects

CompactConnect

Open-source data system for sharing professional licensure data across state lines. Built for the Council of State Governments (CSG), CompactConnect enables interstate licensing compacts to share credentials, verify licenses, and track participation — saving an estimated $1.1M per compact versus building from scratch. Currently supports Audiology & Speech-Language Pathology, Counseling, and Occupational Therapy compacts, with more on the way.

gov-tech open-source python vue.js typescript

SeqHub

Backend engineer at Tatta Bio, building SeqHub -- a platform for exploring, annotating, and sharing biological sequences at scale.

biotech python genomics

OutReach

Open-source campaign tools for relational voter mobilization. Built OutReach in under a week to contact 17k+ voters in the final two days of a U.S. Senate race, along with a relational organizing dashboard supporting 3,000 voter mobilizers reaching over 160k voters. Published analysis showing the program improved turnout by an estimated 3.8 percentage points.

politics data open-source

Publications

The Emotional Toll of Inflammatory Bowel Disease: Using Machine Learning to Analyze Online Community Forum Discourse

2019

Crohn's & Colitis 360

Robert Lerrigo, Johnny TR Coffey, Joshua L Kravitz, Priyanka Jadhav, Azadeh Nikfarjam, Nigam H Shah, Dan Jurafsky, Sidhartha R Sinha

Patients with inflammatory bowel disease are using online community forums (OCFs) to seek emotional support. The impact of OCFs on well-being and their emotional content are unknown. We used an unsupervised machine learning algorithm to identify the thematic content of 51,591 public, online posts from the Crohn's & Colitis Foundation Community Forum. We identified 10,702 (20.8%) posts expressing: gratitude (40%), anxiety/fear (20.8%), empathy (18.2%), anger/frustration (13.4%), hope (13.2%), happiness (10.0%), sadness/depression (5.8%), shame/guilt (2.5%), and/or loneliness (2.5%). A common subtheme was the importance of fostering social support. High-throughput, machine learning-directed analysis of OCFs may help identify psychosocial impacts of inflammatory bowel disease on patients and their caregivers.

machine-learning nlp healthcare

Visual genome: Connecting language and vision using crowdsourced dense image annotations

2017

International Journal of Computer Vision

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al.

Despite progress in perceptual tasks such as image classification, computers still perform poorly on cognitive tasks such as image description and question answering. Cognition is core to tasks that involve not just recognizing, but reasoning about our visual world. However, models used to tackle the rich content in images for cognitive tasks are still being trained using the same datasets designed for perceptual tasks. To achieve success at cognitive tasks, models need to understand the interactions and relationships between objects in an image. In this paper, we present the Visual Genome dataset to enable the modeling of such relationships. We collect dense annotations of objects, attributes, and relationships within each image to learn these models. Specifically, our dataset contains over 108K images where each image has an average of 35 objects, 26 attributes, and 21 pairwise relationships between objects.

computer-vision ai crowdsourcing

Embracing error to enable rapid crowdsourcing

2016

Proceedings of the 2016 CHI conference on human factors in computing systems

Ranjay A Krishna, Kenji Hata, Stephanie Chen, Joshua Kravitz, David A Shamma, Li Fei-Fei, Michael S Bernstein

Microtask crowdsourcing has enabled dataset advances in social science and machine learning, but existing crowdsourcing schemes are too expensive to scale up with the expanding volume of data. To scale and widen the applicability of crowdsourcing, we present a technique that produces extremely rapid judgments for binary and categorical labels. Rather than punishing all errors, which causes workers to proceed slowly and deliberately, our technique speeds up workers' judgments to the point where errors are acceptable and even expected. We demonstrate that it is possible to rectify these errors by randomizing task order and modeling response latency. Where prior work typically achieves a 0.25x to 1x speedup over fixed majority vote, our approach often achieves an order of magnitude (10x) speedup.

crowdsourcing hci