Build and Scale Observability-as-Code
Design and maintain Python tooling and Terraform modules that standardize Datadog configuration across services.
Eliminate manual setup by codifying monitors, dashboards, SLOs, and alerting patterns.
Improve consistency, repeatability, and reliability of observability across the organization.
Establish Reliable & Standardised Instrumentation
Define and implement observability blueprints that integrate high‑fidelity metrics, logs, and traces into the development lifecycle.
Codify best practices so teams get out-of-the-box visibility without needing deep observability expertise.
Raise the baseline for service health, debuggability, and operational readiness
Optimize Datadog Usage and Cost
Own critical parts of the Datadog platform configuration.
Improve data quality, signal-to-noise ratio, and alert reliability.
Partner with teams to adopt telemetry effectively while managing ingestion and alerting costs.
Maintain and Evolve Platform Components
Upgrade and maintain tracers, agents, and shared observability libraries.
Ensure upgrades are automated, backwards-compatible, and minimally disruptive to product teams.
Reduce operational risk by improving rollout and validation processes.
Integrate Observability Across Infrastructure
Collaborate with Platform and Infrastructure teams to embed monitoring into systems such as Kafka, gRPC services, Kubernetes, and AWS-managed services.
Improve production visibility and reduce mean time to detect (MTTD) and resolve (MTTR) incidents
Deliver High-Quality, Production-Ready Code
Write clean, well-tested, and maintainable Python code and Terraform modules.
Participate in architecture and design reviews; provide thoughtful feedback in code reviews.
Take ownership of projects end-to-end, from design and implementation through production rollout and support.
Mentorship & Collaboration
Assist team members to solve problems and develop their own skills.
Foster a collaborative mindset within the team.
Degree in Computer Science, or related.
7+ years of experience in application development, platform engineering, or developer tooling.
High proficiency in Python; solid experience with Terraform.
Hands-on experience using Datadog for metrics, logging, tracing, dashboards, monitors, and alerts.
Experience with containerized and cloud-native environments (e.g., Kubernetes, Kafka, AWS, gRPC, Lambda).
Proven ability to independently drive medium-to-large initiatives from design to delivery.
Comfortable making pragmatic tradeoffs to deliver reliable, scalable solutions.
A strong product mindset for internal tools.
Passion for reducing cognitive load, eliminating toil, and making observability easy to adopt by default.
Solid understanding of modern web applications and distributed systems.
Knowledge of how observability applies to high-throughput, highly available systems
Clear written and verbal communication skills.
Ability to influence technical direction through design discussions, documentation, and hands-on implementation.
Comfortable partnering with product, platform, and infrastructure teams.
Founded
2009 (about 17 years ago)
People
201-500 employees
Industry
Software Development
Type
Privately Held
Locations