When AI Regulation is Written on GitHub: Build Your Own

Create a collaborative AI regulatory framework using GitHub and your existing software development methodologies.

In March 2026, three European digital health startups decided it was time to create their own shared standard for medical models. This came after their boards rejected the idea of implementing overly abstract frameworks. They published their proposal on GitHub, and within six weeks, 47 organizations had adopted and adapted it. They didn't wait for Brussels or Washington; they simply built what they needed.

3D rendered ai text on dark digital background
Photo: Steve A Johnson on Unsplash

This model is gaining traction: collaborative regulatory frameworks that function like open source. It’s not about replacing official regulation, but rather establishing concrete operational standards while governments continue to debate. I will show you how to build one from scratch using GitHub, AI tools, and methodologies you’re already familiar with from software development.

Why Traditional Standards Fail in AI Environments

Government regulation of AI faces a significant timing problem. Take the European AI Act as an example, which took four years to pass. During that time, GPT evolved from version 3 to 5, multimodal models became a commodity, and three complete paradigms of agent architecture emerged (and disappeared).

Traditional standards committees—like ISO, IEEE, NIST—operate on cycles of 18 to 36 months. They publish 200-page documents that become obsolete by the time they hit the presses. Notably, in April 2026, an internal audit at a Spanish fintech revealed that 73% of its AI implementations did not strictly comply with the AI Act. This wasn’t due to negligence, but because the requirements were impossible to operationalize without interpretation.

The problem isn’t the intent; it’s the medium. PDFs don’t evolve well. Closed committees don’t scale. Regulations written in legal jargon don’t translate into YAML or model configurations.

What really works: treating regulatory standards as living repositories. They should be versioned, forkable, with pull requests (PRs) and public discussions. This is exactly how you manage your code.

Architecture of a Functional Collaborative Regulatory Framework

Glowing ai chip on a circuit board.
Photo: Immo Wegmann on Unsplash

For a framework to be effective, it needs four interconnected layers:

1. Policy Definition as Code

Your policies should exist in a structured, processable format. Forget Word or PDFs. You should use YAML, JSON, or structured Markdown with frontmatter.

# policies/model-evaluation.yaml
version: 1.2.0
domain: healthcare
effective_date: 2026-05-01

requirements:
  - id: REQ-001
    category: bias-testing
    description: "Test demographic parity across protected groups"
    implementation:
      - method: disparate_impact_ratio
        threshold: 0.8
        frequency: pre-deployment
    evidence_required:
      - statistical_tests
      - demographic_breakdown
    
  - id: REQ-002
    category: explainability
    description: "Provide SHAP values for high-stakes predictions"
    scope:
      - predictions_affecting: ["diagnosis", "treatment_plan"]
    implementation:
      - library: shap>=0.42
        output_format: json

This structure allows for automation. You can write tests to validate compliance, generate compliance dashboards, and crucially, fork and adapt the standard without breaking compatibility.

2. Issue Tracking and Public Discussion

GitHub Issues becomes your governance mechanism. Every proposed change—new requirement, threshold modification, domain extension—starts as an issue.

Useful labels:

proposal: new policy proposed
implementation-challenge: requirement difficult to operationalize
domain-specific: affects only certain sectors
breaking-change: breaks compatibility with previous versions

The fictitious project "AI Accountability Framework" (but based on real patterns) uses an issue template that requires:

Specific problem it solves
Estimated impact (how many organizations it affects)
Implementation cost (average person-hours)
Alternatives considered

This makes the regulatory debate tangible. Instead of saying "we need more transparency," you can propose something specific like "add REQ-015 that requires logging of all rejection decisions with latency <50ms because an analysis of 12 implementations shows it's technically feasible."

3. Automating Compliance with GitHub Actions

The real magic happens when you connect your policies to automated validation pipelines.

# .github/workflows/compliance-check.yml
name: Compliance Validation

on:
  pull_request:
    paths:
      - 'models/**'
      - 'policies/**'

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Load policies
        run: |
          python scripts/load_policies.py \
            --policy-dir policies/ \
            --output policy_bundle.json
      
      - name: Run compliance tests
        run: |
          python scripts/compliance_validator.py \
            --policies policy_bundle.json \
            --model-metadata models/metadata.json \
            --test-results tests/results/
      
      - name: Generate compliance report
        uses: ai-governance/compliance-reporter@v2
        with:
          format: markdown
          output: compliance_report.md
      
      - name: Comment on PR
        uses: actions/github-script@v6
        with:
          script: |
            const fs = require('fs');
            const report = fs.readFileSync('compliance_report.md', 'utf8');
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: `## Compliance Check Results\n\n${report}`
            });

Every time someone proposes a change to the model or policies, they get automatic feedback on compliance. This way, the framework becomes executable.

4. Integration with AI Tools for Analysis and Review

This is where 2026 will make a difference. You use LLMs to speed up the slowest part: proposal analysis and inconsistency detection.

A simple script with Claude or GPT-4 that reviews each policy PR could look like this:

# scripts/policy_review_assistant.py
import anthropic
import json

def review_policy_change(diff_text, existing_policies):
    client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
    
    prompt = f"""You are an AI compliance assistant. Analyze this proposed change:

{diff_text}

Existing policies:
{json.dumps(existing_policies, indent=2)}

Evaluate:
1. Conflicts with existing policies
2. Ambiguities that would hinder implementation
3. Impact on small vs large organizations
4. Realistic technical burden (0-10)

Respond in JSON."""

    message = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=2048,
        messages=[{"role": "user", "content": prompt}]
    )
    
    return json.loads(message.content[0].text)

It doesn’t replace human judgment, but it automatically detects 80% of obvious issues. This reduces review times from weeks to days.

Real Case: How Three Healthtechs Built "MedAI Commons"

In February 2026, three medical diagnostic startups (two Spanish, one French) faced the same problem: costly compliance audits that found different issues using various interpretations of the same AI Act.

What they did:

They created a private (and later public) GitHub repository with three components:

Health-specific policies based on the AI Act, but operationalized. Instead of “must be accurate and non-discriminatory,” they defined “the model must achieve AUC ≥0.85 in a validation population stratified by age/gender.”
Automated testing suite in pytest that anyone could run locally: fairness, explainability, robustness, and data drift tests.
Documentation templates that pre-filled 60% of what was required in audits: model architecture, design decisions, and risk assessments.

Results in 3 months:

14 additional organizations adopted the framework.
Average audit preparation time dropped from 6 weeks to 1.5 weeks.
Two compliance consultants began using their tests as part of official audits.
The Spanish regulator invited them to present at a roundtable, a sign of legitimacy.

They didn’t wait for permission. They built what they needed, and others found value in it.

Step-by-Step Implementation: Your First Framework in 5 Days

Day 1: Basic Structure

Create a GitHub repository (private if sensitive, public if seeking broad adoption).
Folders: /policies, /tests, /documentation, /scripts.
Define your first set of policies (5-10 requirements max) in YAML.
Write a README explaining the purpose, scope, and how to contribute.

Day 2: Basic Compliance Tests

Implement validators for your most critical policies.
Use existing libraries: Fairlearn for fairness, SHAP for explainability, Great Expectations for data quality.
Get the tests to pass with a sample model (even if it’s a dummy one).

Day 3: Automation with GitHub Actions

Set up a workflow that runs tests on every PR.
Add a linter for your policy YAML files.
Generate a compliance badge for the README.

Day 4: AI as a Governance Co-Pilot

Implement a script that uses Claude/GPT to review proposals.
Set up a bot that comments on issues with initial analysis.
Optional: generate automatic documentation from policies.

Day 5: Documentation and First Contributors

Write a contribution guide (CONTRIBUTING.md).
Document design decisions.
Invite 2-3 aligned organizations as early adopters.

After: iterate based on real feedback. The best policies emerge when confronting concrete cases, not by designing in the abstract.

Common Mistakes You’ll See (and How to Avoid Them)

Mistake #1: Over-specifying from the Start
Don’t try to cover all possible cases in v1.0. Start with 5-7 requirements that address your most pressing problem. The framework will grow organically.

Mistake #2: Language Too Abstract
"The model must be fair" doesn’t work. Instead, "The disparate impact ratio must be between 0.8-1.2 for protected groups according to categories X, Y, Z" is operationalizable and testable.

Mistake #3: Ignoring the Cost of Compliance
Every requirement has a development time cost. If your framework requires 40 hours of work to implement, no one will adopt it. Aim for less than 8 hours for basic cases.

Mistake #4: Forking Without Merging Back
If many fork your framework but never contribute improvements back, you’ll create fragmentation. Facilitate the contribution process and publicly recognize contributors.

Mistake #5: Confusing Tool with Legal Legitimacy
Your framework does not replace official compliance. It’s a tool to facilitate compliance and demonstrate good effort. Make this limitation clear in the documentation.

The Future of Collaborative Regulation

We are witnessing the emergence of a pattern: regulation at two levels. Governments set principles and red lines, while practice communities create the real operational standards.

Frameworks that survive have these characteristics:

Aggressively versioned (monthly or bi-monthly releases).
Legitimized domain-specific forks (the healthtech fork is different from the fintech one, and that’s okay).
Tests before documentation (show, don’t tell).
Light governance (2-3 maintainers, not committees of 40 people).

The correct analogy isn’t ISO 27001, but Kubernetes: it started solving a specific problem for Google, opened to the public, and now serves as the de facto standard because it works and you can try it.

Final Reflection: Regulation as Software

When you treat regulatory standards like code—versioned, testable, collaborative—something interesting happens: abstract debates about “AI ethics” transform into specific PRs about fairness thresholds. Philosophical disagreements become discussions about measurable trade-offs.

It won’t be perfect. Biases can be encoded, and metrics can be gamified. However, it is infinitely better than static documents that don’t reflect the ever-changing reality of AI.

🇪🇸 Also available in Spanish: Leer en español

𝕏 in