Wikimedia Foundation Staff Site Reliability Engineer Job

Staff Site Reliability Engineer Job


The Wikimedia Foundation is looking for a Staff Site Reliability Engineer to join our team, reporting to the Sr. Engineering Manager. As the Site Reliability Engineer, you will play a key role in designing, developing, and maintaining reliable, scalable, and highly available infrastructure for our API services. You will contribute heavily to the high-impact challenges behind innovating, building, and maintaining Wikipedia’s data feeds for high-volume reusers.

You are responsible for:

  • Design, implementation, and maintenance of public-facing infrastructure and services
  • Use of configuration management and deployment tools
  • Architectural design and operation at scale
  • Monitoring of systems and services, optimization of performance, and resource utilization
  • Common operating system-level tasks such as logging and backup / restore
  • Cookbook/runbook implementation for common maintenance actions
  • Incident response, diagnosis, and follow-up on system outages or alerts
  • Automation and streamlining of tasks as well as identifying process gaps
  • Collaborating with a global and asynchronously communicating team (don’t worry if you have never worked remotely, we’ll help you get used to it)
  • Mentoring peers in your areas of technical and operational strength

Skills and Experience:

  • Strong experience with automation and configuration management tools such as Terraform, and Ansible. Proficient in at least one programming language (Python, Go, or similar). 
  • Strong understanding of CI/CD pipelines and deployment strategies.
  • Experience managing Cloud services and discovering cost savings (AWS, Azure, GCP) 
  • Experience with monitoring, logging, and alerting tools such as Prometheus, Grafana, or ELK stack.
  • Strong troubleshooting and problem-solving skills, and ability to work effectively under pressure. 
  • Excellent communication skills with a strong emphasis on contributing to documenting processes and runbooks, and the ability to collaborate effectively with cross-functional teams.
  • Incident Management: Experience with incident management and on-call rotation practices, as well as tools like PagerDuty or Opsgenie. 
  • SRE Best Practices: Understanding of SRE principles, such as Service Level Objectives (SLOs), error budgets, and blameless postmortem. 
  • Familiarity with Wikimedia or other open-source projects is a plus.
  • If you are passionate about building and maintaining reliable, scalable, and highly available infrastructure on AWS, and thrive in a dynamic and collaborative environment, we encourage you to apply for this exciting opportunity to join our team at Wikimedia Enterprise

Qualities that are important to us:

  • Experience with operating highly available infrastructure
  • Experience with running applications and services at scale
  • Proficient with shell and a programming language used in an SRE/Operations engineering context (Python, Go, etc.)
  • Comfortable with Open Source configuration management and orchestration tools (, Ansible, TerraForm, etc.)
  • Communicative technical English

Additionally, we’d love it if you have:

  • Experience implementing containerization solutions (Docker, ECS, EC2, Kubernetes)
  • Experience administering code collaboration software (Gitlab, Gerrit)
  • Experience with managing and troubleshooting Event Streaming Services at scale (Kafka, Kinesis, etc) 
  • Experience with AWS tools is an advantage. Key services include EC2, ECS, EKS, Lambda, S3, RDS, VPC, CloudFront, Route 53, IAM, and KMS
  • Infrastructure as Code (IaC): Familiarity with IaC tools like Terraform, Helm, or AWS CDK
  • Security: Knowledge of security best practices in AWS, including the shared responsibility model, IAM policies, encryption, and data protection. Familiarity with AWS WAF, AWS Shield, or AWS Security Hub is a bonus
  • Experience with package management for operating systems (Debian, etc)
  • We are avid supporters (and users) of open-source software; our history of contributing to Open Source projects is valued
  • Prior participation in the Wikimedia movement

