The SMB Infrastructure Maturity Model: Level 2 — Centralized Infrastructure

The SMB Infrastructure Maturity Model: Level 2 — Centralized Infrastructure

Recap: Where We Left Off

In Level 1: Surviving Chaos, we tackled the fundamentals: version control for infrastructure, automated deployments, basic monitoring, and backup & disaster recovery. If you’ve implemented those steps, you’ve moved from panic-driven operations to a repeatable foundation.

Now it’s time for Level 2: Centralized Infrastructure Management.

At Level 1, each team or service likely manages its own infrastructure independently. The DevOps team has their Terraform. The backend team has their Docker Compose files. The data team has their own scripts. This works for a while — until you realize:

  • No one knows what’s actually running in production
  • There are three different monitoring dashboards, all showing different things
  • Each team has its own CI/CD setup with different standards
  • Onboarding a new service requires weeks of tribal knowledge transfer
  • Cost allocation is impossible — you can’t tell which service costs what

Level 2 solves all of this by centralizing your infrastructure tooling, observability, and governance — without creating a bottleneck that slows teams down.

What “Centralized” Means (and Doesn’t Mean)

Let’s clear up a common misconception: centralization doesn’t mean one team controls everything and everyone else submits tickets. That’s the opposite of DevOps.

Centralized infrastructure means:

  • Shared tooling and platforms that every team uses
  • Standardized patterns for deploying, monitoring, and scaling services
  • Single source of truth for infrastructure state and costs
  • Self-service capabilities so teams can deploy independently

It does NOT mean:

  • A single ops team as a bottleneck
  • One-size-fits-all that doesn’t fit anyone
  • Removing team autonomy and ownership

The Centralization Stack for SMBs

1. Centralized Observability (Single Pane of Glass)

Every team should see the same dashboards, logging, and alerting. This is the highest-impact first step because it immediately reduces MTTR and eliminates the “whose dashboard is right?” problem.

# docker-compose.observability.yml — Centralized observability stack
version: '3.8'
services:
  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    ports: ["9090:9090"]

  grafana:
    image: grafana/grafana
    depends_on: [prometheus]
    ports: ["3000:3000"]
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
    volumes:
      - grafana_data:/var/lib/grafana

  loki:
    image: grafana/loki
    ports: ["3100:3100"]
    volumes:
      - ./loki-config.yml:/etc/loki/local-config.yaml
      - loki_data:/loki

volumes:
  prometheus_data:
  grafana_data:
  loki_data:

2. Centralized Infrastructure Registry

Maintain a single inventory of every service, its dependencies, owners, and cost allocation. For SMBs, this doesn’t need to be fancy:

# infrastructure.yml — Centralized service registry
services:
  api-gateway:
    owner: "platform-team"
    repository: "github.com/org/api-gateway"
    infrastructure: "terraform/environments/prod/api-gateway"
    monitoring: "grafana/dashboards/api-gateway.json"
    alerts: "pagerduty/api-gateway"
    cloud_resources: ["ecs:api-gateway-prod", "rds:api-gateway-db"]
    cost_center: "platform"
    criticality: "tier-1"

  user-service:
    owner: "backend-team"
    repository: "github.com/org/user-service"
    infrastructure: "terraform/environments/prod/user-service"
    monitoring: "grafana/dashboards/user-service.json"
    alerts: "pagerduty/user-service"
    cloud_resources: ["ecs:user-service-prod", "rds:user-service-db", "elasticache:user-sessions"]
    cost_center: "product"
    criticality: "tier-1"

3. Centralized CI/CD Platform

Instead of each team reinventing their CI/CD, create a shared set of reusable workflows. In GitHub Actions, this means composite actions and reusable workflows:

# .github/workflows/deploy-template.yml — Reusable deploy workflow
on:
  workflow_call:
    inputs:
      environment:
        required: true
        type: string
      dockerfile:
        default: Dockerfile
        type: string
    secrets:
      deploy_key:
        required: true

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: ${{ inputs.environment }}
    steps:
      - uses: actions/checkout@v4
      - name: Build and test
        run: |
          docker build -f ${{ inputs.dockerfile }} -t app:latest .
          docker run app:latest npm test
      - name: Deploy
        run: |
          ssh deploy@host "docker compose pull && docker compose up -d"

Then teams consume it in one line:

# .github/workflows/user-service.yml — Team-specific config
name: Deploy User Service
on:
  push:
    branches: [main]
jobs:
  deploy:
    uses: ./.github/workflows/deploy-template.yml
    with:
      environment: production
    secrets:
      deploy_key: ${{ secrets.DEPLOY_KEY }}

4. Centralized Cost Management

You can’t optimize what you can’t measure. Set up cost allocation tagging across all cloud resources:

# Tagging standard for all cloud resources
Required Tags:
  - service: (name from registry)
  - environment: (prod/staging/dev)
  - team: (owning team name)
  - cost-center: (product/platform/data/infra)
  - terraform: (true/false)
  - created-by: (tool/username)

Implementation Roadmap

Week 1–2: Centralize Observability

Deploy Prometheus + Grafana + Loki. Migrate all teams to the same stack. Create standard dashboard templates for services.

Week 3–4: Build the Service Registry

Create an infrastructure YAML file (or use Backstage if you have more resources). Map every service and its dependencies.

Week 5–6: Standardize CI/CD

Extract your most common pipeline into a reusable template. Migrate teams one at a time — don’t try to do all at once.

Week 7–8: Implement Cost Allocation

Apply tagging standards retroactively. Set up AWS Cost Explorer or GCP Cost Management dashboards by team and service.

Measuring Level 2 Success

You’ve graduated from Level 2 when:

  • Any engineer can look at one dashboard to understand the health of all services
  • Onboarding a new service takes less than a day (not weeks)
  • You can tell exactly how much each service costs per month
  • Teams deploy independently using shared, battle-tested pipelines
  • The CEO can ask “how’s production?” and get a one-click answer

What’s Next: Level 3 — Measured

Once your infrastructure is centralized and standardized, you can start measuring everything that matters: SLIs, SLOs, error budgets, and business impact metrics. That’s what Level 3 covers — and it’s where you transform from “keeping the lights on” to proactive reliability engineering.

Stay tuned for the next installment, or get a head start with our infrastructure assessment — we’ll tell you exactly which level you’re at and what to prioritize next.


Need help implementing this in your company?
We help SMBs adopt these practices without hiring a full-time internal team.
Book a free consultation and discover how we can transform your infrastructure.

Scroll to Top