Fully Remote
SR TechOps (Solana)
Description
We’re seeking a dedicated Technical Operations Engineer specializing in Solana to work for a prominent American company in the blockchain sector.In this role, you'll leverage specialized knowledge of Solana to manage, optimize, and enhance our validator nodes, RPC deployments, and related infrastructure.
Responsibilities
- Solana Network Leadership: Lead end-to-end deployment and optimization projects for Solana infrastructure, including validator nodes, RPC endpoints, and indexing services. Drive design reviews, canary rollouts, and continuous improvements to performance and reliability.
- High-Severity Incident Command: Own SEV 0/1 response, coordinating mitigation across Teams, running postmortems, and ensuring root-cause resolution with follow-through on corrective actions.
- SLO Ownership & Capacity Strategy: Define and manage service-level objectives (SLOs) and SLAs. Build and maintain cost models and capacity planning tools to forecast infrastructure needs and control spend.
- Monitoring & Proactive Issue Detection: Develop dashboards and alerting solutions using tools like Grafana and DataDog. Identify anomalies and trends to prevent outages before they occur.
- Automation & Infra as Code: Implement and maintain automation via Ansible, Terraform, and Kubernetes. Reduce toil, accelerate deployment timelines, and ensure consistent environments across staging and production.
- Mentorship & Peer Review: Provide mentorship to engineers on deployment, observability, and Solana-specific ops. Review infrastructure code and monitoring configs. Raise the bar through shared knowledge.
- Solana Ecosystem Engagement: Act as a technical representative in Solana forums and community calls. Collaborate directly with the Solana Foundation and ecosystem contributors to troubleshoot and evolve protocol-level operations.
- Cross-Team Collaboration: Partner with internal infrastructure, platform, and support Teams to solve customer-impacting issues. Contribute insights to architectural and product-level discussions.
- 24/7 On-Call Participation: Participate in an on-call rotation, ensuring 24/7 availability for critical systems and supporting rapid incident resolution.
Requirements
- Minimum of 5+ years in Technical Operations, Site Reliability Engineering (SRE), or related roles, with proven Linux/Unix system administration and advanced troubleshooting capabilities. Holding an RHCE-level Linux or similar certification would be beneficial.
- Hands-on experience operating and optimizing Solana validator nodes, RPC endpoints, and associated infrastructure at scale. Must be familiar with high-level Solana protocol and core components. Proficient in analyzing validator logs, RPC debugging, and addressing Solana-specific operational issues. Contributions into open-source Solana projects is an asset.
- Solid hands-on experience with configuration management and infrastructure automation tools (Helm, Terraform, Ansible, Consul), including containerization expertise (Docker, Kubernetes), managing and scaling services in cloud environments.
- Competency in scripting/programming languages (Rust, Go, JavaScript).
- Advanced proficiency in monitoring and analytics platforms (Grafana, DataDog), enabling proactive and data-driven operational decision-making.
- Demonstrated ability to identify performance patterns, forecast potential issues, and implement preventive solutions.
- Strong track record defining, measuring, and maintaining SLAs/SLOs, and experienced with incident response tooling and processes (PagerDuty), ensuring quick resolution and systematic root-cause analyses.
- Exceptional interpersonal and communication skills, with a proven ability to collaborate effectively across multiple teams and stakeholders.
- Self-motivated, solution-oriented, and consistently striving for operational improvements, quality enhancements, and reduced technical debt.
- Solid professional attributes, committed to transparency, accountability, and ethical behavior. Capable of managing complexity and staying adaptable under pressure, and able to demonstrate continuous learning and comfort
evolving within a rapidly changing technical landscape. - Self-starter driven by curiosity and initiative, proactively identifying opportunities, addressing gaps, and implementing solutions autonomously.
- Thrives in dynamic environments and committed to maintaining industry leadership through close collaboration with the most innovative and talented minds in Web3.
Benefits
- Fully remote work arrangement as a contractor.
- Competitive salary in USD.
- PTO days per year.
- 100% company-covered international certifications.
- Access to coworking spaces.
- English classes.
- Engaging team-building activities.
- Personalized gifts.
- Welcome kit.
- Referral programs
- Remote status
- Fully Remote