Employer: Schlumberger Technology Corporation
Full-time or part-time: Full-time
Job title: Site Reliability Engineer
Job Location: 1430 Enclave Parkway, Houston, TX 77077
Job Description:
Create ultra-scalable and highly reliable software systems through system design consulting, capacity planning, system health monitoring, and sustainable incident response. Engage in and improve the entire lifecycle of services from inception and design through deployment, operation, and refinement. Responsible for the reliability and uptime appropriate to users' needs of Cloud solutions and services. Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews. Maintain and improve services once they are live by measuring and monitoring availability, latency, and overall system health. Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity. Gauge the effectiveness and efficiency of existing systems and infrastructure; implement strategies for improving or further leveraging these systems within a geoscience workflow. Collaborate with network and security staff to ensure smooth, secure and reliable operation of application software and systems. Develop, implement, and document best practice policies and procedures for new projects or initiatives. Use the service management systems, ensuring that best practices and lessons learned are made available to the wider technical community. Engage in incident response and blameless postmortems.
Minimum Education & Experience Requirements:
Bachelor's degree in Computer Science or Electronics and Instrumentation Engineering or Electronics Engineering, or a related field, or a foreign equivalent degree plus 5 years of progressively responsible post-baccalaureate experience in job offered or any engineering related job titles. Applicants must possess 5 years of experience in the following: (1) Azure (Microsoft Cloud Platform), GCP (Google Cloud Platform), and ECE (Elastic Cloud Enterprise) for Cloud Infrastructure setup and maintenance; (2) ADO (Azure DevOps) for an end-to-end DevOps toolchain; (3) Pager Duty for Incident Management; (4) Stack Driver for alerting and monitoring; (5) Python, JAVA, and Bash for scripting; (6) Docker for packaging applications using libraries and other dependencies; (7) Kubernetes for container management; (8) Postman & Swagger UI to troubleshoot the functionality of a REST API using JSON documents; and (9) running pipelines using GIT, IntelliJ IDEA, or Eclipse for deployment and debugging.
-