Back to All Positions
Site Reliability Engineer
Remote / San FranciscoFull-time
Ensure system reliability and performance
Who We Are
nirvahatech is a leading DevOps and cloud infrastructure consulting firm. Our SRE team ensures that client systems maintain 99.99% uptime while continuously improving automation, monitoring, and incident response capabilities.
Tech Stack
KubernetesPrometheusGrafanaELK StackDatadogPagerDutyPythonGoTerraformAnsibleJenkinsGitLab CI
What You'll Do
- ✓Build and maintain highly available production systems
- ✓Implement comprehensive monitoring, alerting, and observability solutions
- ✓Design and execute disaster recovery and business continuity plans
- ✓Automate operational tasks and eliminate toil
- ✓Participate in on-call rotation and incident response
- ✓Conduct post-incident reviews and implement preventive measures
- ✓Define and track SLIs, SLOs, and error budgets
What You Bring
- ✓4+ years in SRE, DevOps, or infrastructure engineering
- ✓Strong programming skills in Python, Go, or similar languages
- ✓Deep understanding of Linux systems administration
- ✓Experience with container orchestration (Kubernetes preferred)
- ✓Expertise in monitoring tools like Prometheus, Grafana, Datadog
- ✓Proven ability to troubleshoot complex distributed systems
- ✓On-call experience with incident management
Why Join Us
- ★High Impact: Keep critical systems running for major clients
- ★Automation First: Eliminate repetitive tasks through smart automation
- ★Learning Culture: Share knowledge through runbooks and documentation
- ★Work-Life Balance: Fair on-call rotation with compensation
- ★Modern Stack: Work with the latest SRE tools and practices
- ★Problem Solving: Complex technical challenges every day
- ★Team Collaboration: Supportive team that values reliability over speed