Employers often ask why you'd be a good fit to work for them. At BigGeo, we prefer to start by showing why we’re a good fit for you.
Why You’d Want to Work at BigGeo:
- Be part of a pioneering team driving the future of geospatial intelligence.
- Work in an innovative, data-driven environment that values creativity and rapid problem-solving.
- Experience firsthand how your contributions shape cutting-edge technologies and serve critical industries globally.
- Embrace a modern “self-care” work schedule that emphasizes balance and well-being.
- Shape products that solve major global challenges, from urban planning to environmental conservation.
About BigGeo:
BigGeo is at the forefront of geospatial data intelligence, creating transformative solutions that turn location-based data into actionable insights across industries. Our advanced platform brings geospatial analysis, real-time data processing, and 3D visualization to life, empowering industries to unlock deeper insights and make informed decisions.
Our company has assembled a dynamic, forward-thinking team across all pillars of commercial and technology, united by the mission to redefine how people access and interpret their geospatial data. We make it possible for individuals and businesses alike to unlock the full potential of their data—enabling them to extract valuable insights from massive datasets. With a work environment that thrives on cutting-edge innovation, BigGeo isn’t just a tech company; we’re revolutionizing how the world understands and interacts with data.
Role Overview:
We are seeking a skilled Site Reliability Engineer (SRE) to join our team, focused on ensuring the stability, scalability, and efficiency of our platform. The ideal candidate will have a deep understanding of CI/CD pipelines, containerization, infrastructure as code, and cloud technologies. This role is essential in automating our infrastructure, maintaining high availability, and enabling fast, safe, and reliable software delivery.
Key Responsibilities:
- Implementing and managing CI/CD pipelines for automating the build, test, and deployment processes, ensuring safe and efficient release of software updates.
- Collaborating with development teams to ensure smooth integration of code changes into the pipeline.
- Implementing and maintaining infrastructure as code (IaC) practices using tools like Helm Charts, Terraform or Ansible to manage infrastructure changes.
- Automating infrastructure provisioning and configuration to support scalability and reliability.
- Building and maintaining Docker containers for applications and services.
- Orchestrating container deployments using Kubernetes and Docker Compose, and other relevant technologies.
- Ensuring the reliability and availability of services by proactively identifying and mitigating potential issues and responding to incidents.
- Participating in on-call rotations to respond to critical incidents and minimize downtime.
- Conducting post-incident reviews to identify root causes and prevent future occurrences.
- Developing and testing disaster recovery plans and procedures to minimize data loss and downtime in case of failures.
- Maintaining documentation for infrastructure, processes, and procedures to facilitate knowledge sharing and team collaboration.
- Taking ownership of complex technical issues and coordinating resolutions across teams.
- Defining and enforcing best practices in areas such as performance, and reliability.
KEY Requirements
- Bachelor's or technical degree in computer science or related field.
- Extensive experience (5+ years) in a DevOps/Site Reliability role, demonstrating a track record of successfully leading and implementing complex projects.
- In-depth knowledge of advanced DevOps/SRE concepts, methodologies, and best practices.
- Proven experience in designing and implementing CI/CD pipelines for large-scale, distributed systems.
- Strong leadership skills with the ability to influence and guide team members towards achieving common goals.
- Experience with architectural design and planning for highly available and scalable systems.
- Proficient in conducting root cause analysis and implementing preventive measures.
- Advanced knowledge of cloud computing platforms (e.g. Azure, GCP, AWS) and container orchestration technologies (e.g. Docker, Docker Compose, Kubernetes).
- Strong programming and scripting skills (e.g. Typescript, Rust, Python, Bash).
- Experience with database administration (e.g. MySQL, PostgreSQL).
- Proficient in infrastructure as code (IaC) tools and practices (e.g. Helm Charts, Terraform, Ansible).
- Expertise with monitoring and alerting tools (e.g. Jaeger, Loki, Open Telemetry, Prometheus, Grafana).
- Excellent problem-solving and troubleshooting skills.