UXR Case Study – Mozilla Data Collective

Community-Centered AI Data Governance

Designing How AI Training Data Is Collected, Governed, and Shared Across 286 Languages

Mozilla Data Collective launch

Design Framing

How might we redesign the systems through which AI training data is collected, governed, and shared — so that they reflect the needs, languages, and values of the communities whose voices shape AI?

As Lead User Experience Researcher for the Mozilla Data Collective — a Mozilla Foundation-incubated platform — I led user intelligence research across a platform spanning 286 languages and 300+ datasets. The core challenge was designing consent workflows, data acquisition mechanisms, and community feedback loops that were trustworthy, equitable, and actually usable by diverse contributor communities. This work sat at the intersection of AI infrastructure, community governance, and continuous product discovery.

Mozilla Data Collective platform alpha

Methods

  • Connected Study Design — planned and conducted a linked series of studies to surface community needs across contributor, researcher, and governance stakeholder groups
  • Consent Workflow Evaluation — usability testing and qualitative interviews to assess how contributors understood and navigated data sharing agreements
  • Behavioral Analytics & Quantitative Signal Analysis — mapped drop-off points, engagement patterns, and contributor retention across the platform
  • Data Triangulation & Synthesis — integrated behavioral, quantitative, and qualitative streams into unified insights for product and roadmap decisions

Research was designed to feed continuous discovery for product teams — not one-time deliverables, but an ongoing intelligence infrastructure that kept community voice present in roadmap prioritization and feature development.

Key Challenges

  • Consent flows that were technically compliant but poorly understood by contributors — especially across language and literacy contexts
  • Fragmented data signals across behavioral analytics, contributor surveys, and qualitative sessions, with no shared synthesis layer
  • Community needs that varied significantly across linguistic and geographic communities, resisting a one-size-fits-all platform model
  • Product teams operating without reliable mechanisms for continuous community input — defaulting to assumption-driven roadmaps
Mozilla Foundation community

What We Built Together

Through connected research cycles, we developed an evidence base that shaped both the platform's consent architecture and its contributor experience. Key outputs included a revised consent workflow grounded in plain-language principles across language communities, a continuous discovery framework for product teams, and a triangulation model for synthesizing behavioral and qualitative signals into roadmap-ready recommendations.

  • Community Listening → Needs Synthesis → Consent Redesign → Contributor Journey Mapping → Roadmap Integration → Continuous Discovery Loop
  • Attributes: Multilingual, Governance-Centered, Continuously Validated

Real World Impacts

  • Consent workflow redesign informed by research was implemented across the platform, improving contributor comprehension and trust signals
  • Triangulated insights directly influenced roadmap priorities across multiple product cycles
  • Established data acquisition mechanisms and continuous discovery infrastructure that reduced assumption-driven decision-making for product teams