When Setting Up an Internal GitHub for AI Models Costs Less Than $200 a Month: Complete Architecture with Google Cloud and TensorFlow

robot and human hands reaching toward ai text
Photo: Igor Omilaev on Unsplash

Two weeks ago, a startup from Barcelona that I work with shared their issue: they had six data scientists training variations of the same text classification model. Unaware of what the others had tried, they duplicated experiments and wasted days on configurations that had already been discarded. The CTO wanted to hire MLflow Enterprise. However, I set up a complete collaborative platform on Google Cloud for less than the cost of a junior developer per month.

This isn’t a guide on enterprise tools or adopting closed platforms. It’s the exact architecture I implemented: from shared dataset storage to model versioning, collaborative notebooks, and CI/CD pipelines for TensorFlow. All of this, with a seed-stage budget.

Why Google Cloud (and not AWS or Azure) for AI Collaboration

The choice of platform isn't a religious one; it's a practical one. I've tried all three major clouds while building similar systems, and in my experience, Google Cloud has three specific advantages for small teams working with TensorFlow:

Vertex AI Workbench is free up to a point. You can have shared notebooks with integrated version control, and you only pay for compute resources. In AWS, you need SageMaker Studio, which adds $50-100 a month before even touching a GPU. In Azure, you need ML Studio, with similar costs.

Cloud Storage has native integrations with TensorFlow. You can read datasets directly from buckets without downloading locally using tf.data.Dataset with gs:// prefixes. It may sound trivial until you have 50GB datasets and three people trying to access them simultaneously.

Google's IAM is the most granular. You can give read access to datasets, write access only to trained models, and execute access to notebooks without granting root access to the entire project. In AWS, this requires configuring multiple roles and policies that become unmanageable.

The complete stack we are going to set up includes:

Cloud Storage for datasets and versioned models
Vertex AI Workbench for collaborative notebooks
Cloud Build for CI/CD of training processes
Artifact Registry for custom Docker images
Cloud Run to serve models as APIs (optional but recommended)

The actual monthly cost with moderate use (3-4 people, daily training) is between $150-200. And with AWS/Azure? Oof, between $400-600.

Storage Architecture: How to Version Datasets and Models Without Going Crazy

3D rendered ai text on dark digital background
Photo: Steve A Johnson on Unsplash

The number one mistake I see in teams starting with collaborative ML is treating models like code and datasets like static files. They don't work the same way, and this distinction is crucial.

Bucket structure that truly scales:

ml-platform-datasets/
├── raw/
│   ├── 2026-01-15_customer_reviews_v1.parquet
│   └── 2026-01-22_customer_reviews_v2.parquet
├── processed/
│   ├── sentiment_train_v1/
│   └── sentiment_train_v2/
└── metadata/
    └── dataset_versions.json

ml-platform-models/
├── experiments/
│   ├── user_john/
│   │   └── sentiment_lstm_2026_01_20/
│   └── user_maria/
│       └── sentiment_transformer_2026_01_21/
├── staging/
│   └── sentiment_model_v1.2_candidate/
└── production/
    └── sentiment_model_v1.1/

Rules we follow religiously:

Raw is never modified. Each version of the raw dataset is append-only with a timestamp.
Processed has explicit versioning. If you change the preprocessing, it's a new version.
Experiments are a free-for-all. Each user has their own folder and can do whatever they want.
Staging and production require mandatory metadata: what dataset was used, hyperparameters, metrics, who uploaded it.

Implementation of metadata tracking:

# dataset_versioning.py
from google.cloud import storage
import json
from datetime import datetime

class DatasetVersioner:
    def __init__(self, bucket_name):
        self.client = storage.Client()
        self.bucket = self.client.bucket(bucket_name)
        self.metadata_blob = self.bucket.blob('metadata/dataset_versions.json')
    
    def register_dataset(self, path, description, author, schema_changes=None):
        """Registers a new version of the dataset with metadata"""
        metadata = self._load_metadata()
        
        version_id = f"v{len(metadata) + 1}"
        metadata[version_id] = {
            'path': path,
            'description': description,
            'author': author,
            'timestamp': datetime.now().isoformat(),
            'schema_changes': schema_changes or [],
            'used_by_models': []
        }
        
        self._save_metadata(metadata)
        return version_id
    
    def link_model_to_dataset(self, model_path, dataset_version):
        """Links a trained model with its dataset"""
        metadata = self._load_metadata()
        if dataset_version in metadata:
            metadata[dataset_version]['used_by_models'].append({
                'model_path': model_path,
                'timestamp': datetime.now().isoformat()
            })
            self._save_metadata(metadata)
    
    def _load_metadata(self):
        try:
            content = self.metadata_blob.download_as_string()
            return json.loads(content)
        except:
            return {}
    
    def _save_metadata(self, metadata):
        self.metadata_blob.upload_from_string(
            json.dumps(metadata, indent=2),
            content_type='application/json'
        )

# Usage in your notebook or script
versioner = DatasetVersioner('ml-platform-datasets')
dataset_v = versioner.register_dataset(
    path='processed/sentiment_train_v3/',
    description='Added spam filtering, class balancing',
    author='maria@startup.com',
    schema_changes=['removed column: ip_address', 'added: sentiment_score_normalized']
)

This simple system saved us when a production model started failing, and we needed to track exactly what data was used to train it. Without it, we would have spent days trying to rebuild the pipeline.

Collaborative Notebooks That Don't End in Merge Hell

Vertex AI Workbench gives you Jupyter notebooks with integrated Git, but be careful, the default integration is basic. You need clear conventions or you'll end up with impossible-to-resolve conflicts.

Initial Workbench Setup:

Create an instance in Vertex AI Workbench (do not use Managed Notebooks; use User-Managed to have full control).
Minimum size: n1-standard-4 (4 vCPUs, 15GB RAM) - $120/month if running 24/7, but turn it off when not in use.
Add GPU only when you really need to train large models. For exploration and experimentation, CPU is sufficient.
Connect a Git repository from the launcher.

Repository structure that works:

ml-experiments/
├── notebooks/
│   ├── exploration/
│   │   ├── 01_data_analysis_john.ipynb
│   │   └── 02_feature_engineering_maria.ipynb
│   ├── training/
│   │   └── sentiment_model_v1.ipynb  # This one is versioned and reviewed
│   └── evaluation/
│       └── model_comparison.ipynb
├── src/
│   ├── preprocessing/
│   ├── models/
│   └── utils/
├── tests/
└── configs/
    └── training_config.yaml

Golden rule: production code lives in /src/, while exploration is in /notebooks/.

Notebooks in exploration/ can be personal and chaotic. Those in training/ and evaluation/ should be reviewed in PRs like normal code.

Example of a collaborative training-friendly notebook:

# notebooks/training/sentiment_classifier_v2.ipynb

# Cell 1: Setup (always the same structure)
import sys
sys.path.append('../../src')

from preprocessing import load_and_preprocess
from models import SentimentClassifier
from utils import upload_model_to_gcs
import yaml

with open('../../configs/training_config.yaml') as f:
    config = yaml.safe_load(f)

# Cell 2: Data loading (reference to a specific version)
dataset_version = 'v3'  # Explicit, never 'latest'
train_data, val_data = load_and_preprocess(
    bucket='ml-platform-datasets',
    path=f'processed/sentiment_train_{dataset_version}/',
    config=config['preprocessing']
)

# Cell 3: Model construction
model = SentimentClassifier(
    vocab_size=config['model']['vocab_size'],
    embedding_dim=config['model']['embedding_dim'],
    lstm_units=config['model']['lstm_units']
)

# Cell 4: Training
history = model.fit(
    train_data,
    validation_data=val_data,
    epochs=config['training']['epochs'],
    callbacks=[...]
)

# Cell 5: Evaluation and upload
metrics = model.evaluate(val_data)
print(f"Validation accuracy: {metrics['accuracy']:.4f}")

if metrics['accuracy'] > config['promotion_threshold']:
    model_path = upload_model_to_gcs(
        model=model,
        bucket='ml-platform-models',
        path=f'staging/sentiment_v2_{datetime.now().strftime("%Y%m%d")}/',
        metadata={
            'dataset_version': dataset_version,
            'config': config,
            'metrics': metrics
        }
    )
    print(f"Model uploaded to: {model_path}")

The key here is that everything configurable is in YAML, everything reusable is in /src/, and the notebook is merely the "glue" that orchestrates it all. So when someone makes a PR on the notebook, you review the training logic, not the implementation of LSTM layers.

CI/CD for Models: Automating Trainings Without Losing Your Mind

This is where 80% of small teams give up and revert to manual training. Cloud Build and Cloud Run make the setup worthwhile.

Dockerfile for reproducible training:

# training.Dockerfile
FROM tensorflow/tensorflow:2.15.0-gpu

WORKDIR /app

# Dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Source code
COPY src/ ./src/
COPY configs/ ./configs/

# Training script
COPY scripts/train.py .

# GCP credentials (injected at runtime)
ENV GOOGLE_APPLICATION_CREDENTIALS=/secrets/gcp-key.json

ENTRYPOINT ["python", "train.py"]

Cloud Build config that triggers trainings:

# cloudbuild.yaml
steps:
  # Build the training image
  - name: 'gcr.io/cloud-builders/docker'
    args: [
      'build',
      '-t', 'gcr.io/$PROJECT_ID/sentiment-trainer:$SHORT_SHA',
      '-f', 'training.Dockerfile',
      '.'
    ]
  
  # Push to the Artifact Registry
  - name: 'gcr.io/cloud-builders/docker'
    args: ['push', 'gcr.io/$PROJECT_ID/sentiment-trainer:$SHORT_SHA']
  
  # Run the training
  - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
    entrypoint: 'gcloud'
    args:
      - 'run'
      - 'jobs'
      - 'create'
      - 'sentiment-training-$SHORT_SHA'
      - '--image=gcr.io/$PROJECT_ID/sentiment-trainer:$SHORT_SHA'
      - '--region=us-central1'
      - '--task-timeout=3h'
      - '--memory=8Gi'
      - '--cpu=4'
      - '--execute-now'

timeout: 4h
options:
  machineType: 'N1_HIGHCPU_8'

Automatic trigger on Git push:

In Cloud Build, you can set up a trigger that listens for changes on main of the repository. When someone merges a PR that modifies configs/training_config.yaml or code in src/models/, it automatically triggers a new training.

The critical part is that the train.py script must be idempotent and save checkpoints with a unique timestamp. If it fails partway through, it can resume.

Production-ready training script:

# scripts/train.py
import argparse
import tensorflow as tf
from google.cloud import storage
import yaml
import json
from datetime import datetime

def train_model(config_path, dataset_version, output_bucket):
    # Load config

Editorial note: This article was generated with AI assistance and reviewed by the NewsTide editorial team to ensure accuracy and relevance. Read our editorial policy.