Senior DevOps / Infrastructure Engineer @ AlphaX

About the Company

We are building distributed observability and multi-cloud intelligence software for modern AI and data-intensive systems. Our platform enables engineering teams to monitor, debug, and optimize workloads operating across regions, cloud providers, and heterogeneous compute environments.

Reliability, performance, and operational clarity are foundational to our product. Infrastructure is not a support layer — it is core to what we deliver.

We are a small, senior engineering team operating in a high-ownership environment where architectural decisions have direct product impact.

The Role

We are hiring a Senior DevOps / Infrastructure Engineer to design, scale, and operate the systems that power our observability platform.

You will own multi-region cloud deployments, high-ingest telemetry pipelines, system reliability, and production observability. This role is suited for engineers who are comfortable working deeply in distributed systems and who want to influence long-term infrastructure strategy.

What You’ll Own

Multi-region, multi-cloud infrastructure supporting real-time observability
High-throughput ingestion pipelines and event-driven architectures
Autoscaling, failover, and fault-tolerant system design
CI/CD pipelines and deployment automation
Production SLOs, incident response, and reliability engineering
Metrics, logs, tracing, and alerting systems
Secure, scalable API and service infrastructure
Infrastructure-as-code and environment lifecycle management

Key Responsibilities

Architect and operate multi-region deployments across AWS, GCP, or Azure
Build and maintain high-throughput telemetry ingestion pipelines
Design autoscaling and failover strategies for mission-critical services
Own observability systems including Prometheus, Grafana, and distributed tracing
Improve MTTR and operational readiness processes
Manage CI/CD pipelines, GitOps workflows, and automated deployments
Collaborate with backend teams on API performance and infrastructure reliability
Harden infrastructure for security, compliance, and tenant isolation
Drive long-term infrastructure roadmap and architectural direction

Requirements

Required Qualifications

Deep experience with Kubernetes, Docker, and container orchestration
Strong background in distributed systems and multi-region architectures
Experience with high-ingest, streaming, or event-driven systems
Hands-on experience with Prometheus, Grafana, and tracing/alerting frameworks
Proficiency with Terraform or similar infrastructure-as-code tools
Experience building and maintaining CI/CD pipelines
Strong understanding of AWS, GCP, or Azure
Python or Go scripting for automation and tooling
Experience operating high-availability, production-critical systems

Preferred Experience

Cloudflare (DNS, CDN, WAF, SSL)
Helm, Kustomize, or similar Kubernetes tooling
Experience with time-series databases, vector databases, or high-throughput storage systems
Background in SRE, platform engineering, or observability tooling
Experience supporting AI/ML workloads or GPU-based systems
Familiarity with OpenTelemetry, Jaeger, or similar distributed tracing frameworks

Benefits

What We Offer

Significant ownership over core infrastructure decisions
A senior engineering team with low overhead and direct collaboration
Fast-paced environment with measurable impact
Competitive compensation and meaningful equity
Opportunity to architect infrastructure for a category-defining observability platform