Job Description

Job Title: Site Reliability Engineer

Location: Fully Remote

Job Brief: We are seeking a highly skilled Site Reliability Engineer (SRE) to join our team. The ideal candidate will have extensive experience in service reliability and operations, automation scripting, and application performance management. You will be responsible for ensuring the reliability, performance, and availability of our large-scale, high-performance applications in a hybrid environment.

Responsibilities:

Manage and maintain large-scale, high-performance applications in both on-prem and cloud environments.
Write automation scripts and build dashboards for application performance management to manage transaction journeys.
Develop and maintain containerized applications in GKE/RKE/AKE environments.
Implement cloud observability using OTEL for real-time monitoring, distributed tracing, and incident resolution.
Transition platforms to the cloud and containerization using GCP, AWS, Rancher, Cloud Formation, Azure, and OpenShift.
Work with programming languages such as Go, Python, Java, Rust, etc.
Utilize databases like Oracle, PL/SQL, SQL Server, Redis, Clickhouse, Postgres, Mongo, or any time-series databases.
Implement and manage GraphQL frameworks (Apollo, Prisma, Hasura, etc.).
Troubleshoot issues using knowledge of networking protocols such as TCP/IP, DNS, load balancing, and service mesh.
Monitor and troubleshoot HashiCorp Vault environments to ensure minimal downtime and rapid recovery from incidents.
Manage application availability and build creative solutions to manage repetitive activities, improve gating, and detect issues for a 24x7 high availability platform.
Use monitoring tools like Splunk, AppDynamics, Grafana/Prometheus, and Dynatrace.
Implement in-memory caching solutions, with experience on Redis DB being a plus.
Debug across a variety of integrated technical platforms on API gateway.
Work with GCS, Cloud SQL, PL/SQL, and Spanner.
Utilize Vertex AI, Gen AI, and BigQuery for advanced data analysis and machine learning tasks.

Requirements:

Minimum 3-5 years of service reliability/operation experience running large-scale, high-performance applications in a hybrid environment.
Minimum 3-5 years of experience writing automation scripts and building dashboards for application performance management.
2-4 years of experience working with programming languages such as Go, Python, Java, Rust, etc.
Working knowledge of one or more databases: Oracle, PL/SQL, SQL Server, Redis, Clickhouse, Postgres, Mongo, or any time-series databases.
At least 2+ years of experience transitioning platforms to the cloud and containerization (GCP, AWS, Rancher, Cloud Formation, Azure, OpenShift).
Experience maintaining containerized applications in GKE/RKE/AKE environments.
Experience implementing cloud observability using OTEL.
Experience working with specific GraphQL frameworks (Apollo, Prisma, Hasura, etc.).
Knowledge of networking protocols such as TCP/IP, DNS, load balancing, and service mesh.
Proven experience managing application availability and building solutions for a 24x7 high availability platform.
Working knowledge of monitoring tools (Splunk, AppDynamics, Grafana/Prometheus, Dynatrace).
Experience with tools like Rally, Confluence, and other CI/CD extenders.
Hands-on experience with implementing in-memory caching solutions (Redis DB is a plus).
Excellent debugging skills across various integrated technical platforms on API gateway.
Hands-on experience with GCS, Cloud SQL, PL/SQL, and Spanner.
Monitor and troubleshoot HashiCorp Vault environments.
Working knowledge of Vertex AI, Gen AI, and BigQuery.

Job Tags

Remote job,

Similar Jobs

Planet Fitness

Gym Manager Job at Planet Fitness

...performance Training & development Job Title: Club Manager Reports to: District Manager Status:... ...Exempt/Non-Exempt Job Summary Responsible for oversight of gym operations to ensure positive member experience and a financially...

Eden Dentistry

Dental Bussiness Assisant / Receptionist Front Desk Job at Eden Dentistry

...Welcome to Eden Dentistry! We are a small private dental practice in North Colorado Springs. We have an amazing team and we are looking to add another rock star person to our front desk reception area. We believe in having fun while caring for our wonderful patients!...

MJHS

Creative Arts Therapist Job at MJHS

...and psychological adjustment. Masters Degree or equivalent License within one of the following areas; Music Therapy, Dance Therapy, Drama Therapy, Art Therapy or other accepted area of recreational therapy Excellent oral and written communication, leadership,...

McHenry Hospital

Patient Access Specialist-Neurotrauma Clinic Part-time Days Job at McHenry Hospital

...world-class care. Here, you'll work alongside some of the best clinical talent in the nation leading the way in medical innovation and... ...pay Description The Rehabilitation Patient Access Specialist reflects the mission, vision, and values of NM, adheres to the...

Inverness Golf Club

Lifeguard Job at Inverness Golf Club

...communication skills ~ Previous related experience preferred Inverness Golf Club is a private, member-owned Club founded in 1955 on 150 acres in beautiful Inverness, Illinois. The 18-hole par 72 golf course that challenges golfers of all levels with its rolling terrain and...

Site Reliability Engineer Job at Expert Technology Services, Arizona

NHhTMk1JQ2V3VFhhaVN2VjhCSmc3QjBZRXc9PQ==