Site Reliability Engineer

Keep our multi-chain infrastructure reliable, fast, and scalable as we grow to 500k+ developers.

About the role

You'll ensure Tokra's infrastructure stays reliable, fast, and scalable as we grow from 50k to 500k+ developers. At a blockchain API company, uptime is everything - you'll own observability, incident response, and infrastructure automation that keeps our systems running 24/7.

What you’ll be doing

  • Build monitoring and alerting systems across our multi-chain node infrastructure

  • Optimize blockchain node performance and reliability (Geth, Erigon, archive nodes)

  • Design and implement disaster recovery and high availability systems

  • Automate infrastructure provisioning and deployment using Terraform and Kubernetes

  • Participate in on-call rotation and lead incident response when things break

  • Improve system observability with metrics, logs, and distributed tracing

  • Work with backend team to identify and resolve performance bottlenecks

About you

  • 5+ years in SRE, DevOps, or infrastructure engineering roles

  • Expert with Kubernetes, Docker, and container orchestration at scale

  • Strong experience with AWS (EC2, RDS, S3, CloudWatch) or GCP

  • Proficient in Infrastructure as Code (Terraform, Ansible)

  • Comfortable with Golang or Python for automation and tooling

  • Experience with observability tools (Prometheus, Grafana, DataDog)

  • You thrive during incidents and can debug production issues under pressure

Benefits

  • $170k-$230k + equity

  • Fully remote with flexible hours

  • $3k/year learning and certification budget

  • Premium health insurance with dental and vision

  • 4 weeks PTO + paid sick leave

  • Latest MacBook Pro + home office budget

  • On-call compensation + quarterly team offsites

Infrastructure

San Francisco

Remote

Full-time

$170k–$230k

Create a free website with Framer, the website builder loved by startups, designers and agencies.