SR TechOps (Solana)
Description
We at Pearster are looking for a Senior TechOps Engineer with experience in Solana to join an American prominent Blockchain company through our team. This company is a cloud-based infrastructure provider powering the global blockchain ecosystem. Its mission is to be the indispensable utility that enables companies and innovators worldwide to build next-generation, Web3-enabled businesses and applications using blockchain technology.
In this role, you’ll help keep Solana fast and reliable in production, operating validators, RPC, and indexing at scale. You’ll tune hardware, optimize Agave/Jito, develop tools in Go or Python, and lead incident responses—treating keys, latency, and SLOs with production-level precision. Experience patching clients or contributing upstream fixes is a strong plus.
Responsibilities
- Run validators: Deploy, upgrade, and tune Agave/Jito; minimize missed slots; maintain healthy voting and high leader performance.
- Operate high-throughput RPC: Set smart connection and queue limits, optimize PubSub fan-out and backpressure, and ensure indexers are efficiently served without starving nodes
- Extract performance from hardware: Select optimal servers, tune BIOS, kernel, NIC, and NVMe configurations, and validate performance gains through profiling and metrics.
- Automate everything: Implement reproducible images, manage fleet changes with Terraform and Ansible, create snapshot pipelines, verify state-sync and replay processes, and build automated release systems.
- Lead incidents (SEV0–2): Quickly isolate issues, execute safe roll-forwards or roll-backs, publish clear root cause analyses, and implement preventive measures to avoid recurrences.
- Collaborate with the ecosystem: Reproduce complex bugs, share performance traces, and contribute targeted patches upstream when beneficial.
- Code where it counts: Develop and extend tools for snapshots, replay/load, and state-sync verification; patch client bugs impacting production and upstream relevant fixes when valuable.
Requirements
- Location: Europe (remote-based).
- Linux systems + kernel tuning: NUMA, IRQ affinity, hugepages, cpusets, I/O schedulers, sysctl; filesystem/NVMe layout; BIOS/firmware setup (C-states, power governors); NIC queues/offloads (RSS/RPS/XPS, GRO/LRO/TSO).
- Hardware performance engineering: Choose and tune CPU/RAM/NVMe/NIC; measure replay throughput, p95/p99 RPC latency, IOPS/egress—and push them lower/faster.
- Agave/Jito operations: Build from source; manage feature gates and config flags; snapshots (create/consume), ledger compaction/repair/replay health; accounts-DB tuning; version management.
- Read protocols & surfaces: Operate and tune JSON-RPC (HTTP/WS), gRPC, and PubSub; design connection pools, concurrency limits, caching, timeouts, and backpressure that hold under peak.
- Transaction sending logic: Understand direct-to-TPU (QUIC) vs RPC sendTransaction; preflight/simulation trade-offs; priority fees and compute budget tuning; leader-schedule awareness.
- Go or Python (plus Bash): Build small, sharp tools/CLIs (snapshot/restore pipelines, state-sync verification, health checks, replay/load harnesses).
- Observability that matters: SLOs/error budgets; Prometheus/Grafana; alerts that page only when users hurt (RPC latency, PubSub backlog, missed leader slots, replay stalls).
- Key management & safety: KMS/HSM/Vault; authority rotations; secure backups; tested DR paths; controlled, auditable change windows.
Benefits
Fully remote work arrangement as a contractor.
Competitive salary in USD.
PTO days per year.
100% company-covered international certifications.
Access to coworking spaces.
English classes.
Engaging team-building activities.
Personalized gifts.
Welcome kit.
Referral programs.
- Locations
- United States
- Remote status
- Fully Remote