UXR Case Study – Mozilla Data Collective

Community-Centered AI Data Governance

Designing How AI Training Data Is Collected, Governed, and Shared Across 286 Languages

Design Framing

      How might we redesign the systems through which AI training data is collected, governed, and shared — so that they reflect the needs, languages, and values of the communities whose voices shape AI?
    

As Lead User Experience Researcher for the Mozilla Data Collective — a Mozilla Foundation-incubated platform — I led user intelligence research across a platform spanning 286 languages and 300+ datasets. The core challenge was designing consent workflows, data acquisition mechanisms, and community feedback loops that were trustworthy, equitable, and actually usable by diverse contributor communities. This work sat at the intersection of AI infrastructure, community governance, and continuous product discovery.

Methods

Connected Study Design — planned and conducted a linked series of studies to surface community needs across contributor, researcher, and governance stakeholder groups
Consent Workflow Evaluation — usability testing and qualitative interviews to assess how contributors understood and navigated data sharing agreements
Behavioral Analytics & Quantitative Signal Analysis — mapped drop-off points, engagement patterns, and contributor retention across the platform
Data Triangulation & Synthesis — integrated behavioral, quantitative, and qualitative streams into unified insights for product and roadmap decisions

Research was designed to feed continuous discovery for product teams — not one-time deliverables, but an ongoing intelligence infrastructure that kept community voice present in roadmap prioritization and feature development.

Key Challenges

Consent flows that were technically compliant but poorly understood by contributors — especially across language and literacy contexts
Fragmented data signals across behavioral analytics, contributor surveys, and qualitative sessions, with no shared synthesis layer
Community needs that varied significantly across linguistic and geographic communities, resisting a one-size-fits-all platform model
Product teams operating without reliable mechanisms for continuous community input — defaulting to assumption-driven roadmaps

What We Built Together

Through connected research cycles, we developed an evidence base that shaped both the platform's consent architecture and its contributor experience. Key outputs included a revised consent workflow grounded in plain-language principles across language communities, a continuous discovery framework for product teams, and a triangulation model for synthesizing behavioral and qualitative signals into roadmap-ready recommendations.

Community Listening → Needs Synthesis → Consent Redesign → Contributor Journey Mapping → Roadmap Integration → Continuous Discovery Loop
Attributes: Multilingual, Governance-Centered, Continuously Validated

Real World Impacts

Consent workflow redesign informed by research was implemented across the platform, improving contributor comprehension and trust signals
Triangulated insights directly influenced roadmap priorities across multiple product cycles
Established data acquisition mechanisms and continuous discovery infrastructure that reduced assumption-driven decision-making for product teams