Cloud Reliability Engineer

Apply for this Position

Application Form     (* indicates required field)

Add LinkedIn Profile (optional) Login to enable automatically, or enter below: how?

Please attach the following files

Please upload either a word or PDF version of your cover letter for this position.
Please upload either a word or PDF version of your current resume.
Such as letters of recommendation, work examples, etc.

You will receive confirmation after we have received your application.

We are currently supporting a successful global company in the Finance / Investment Technology space who is looking to hire a Cloud Reliability Engineer.  You will work in a greenfield environment and have the chance to help shape their overall cloud security strategy. 

The successful candidate will be an engineer with development skills and deep technical expertise on public cloud platforms. The role will be part of a cloud engineering team that is developing frameworks and tooling for automating and managing the deployment of applications in the cloud. The candidate will be focused on ensuring the reliability and resiliency of our cloud offerings.  This position offers the opportunity to deliver significant value and drive technical innovation.

The right candidate will have deep experience managing and automating the lifecycle and operations of cloud infrastructure on AWS or Google using native tools, open source tools, and third party products.

The candidate will have experience developing production ready code, in one or more languages, that must include Python. They should also be familiar with developing unit and functional tests, and have experience with continuous integration as it applies to infrastructure as code.

The candidate should have experience architecting infrastructure to ensure the availability and resiliency of services and data.  The candidate must be experienced with managing, persisting, and replicating data in different formats in the cloud including databases, file systems, block stores, object stores, and machine images and containers.  The ideal candidate will have experience dealing with key management and encrypted data across multiple regions and accounts.

The candidate should be comfortable with Linux systems and containers as well as automating configuration management.  The candidate should have a full understanding of systems management concepts such as statelessness, immutability, and idempotence.

Experience with log management and monitoring tools is required, as is an ability to aggregate, correlate, and report on both logs and metrics, use them for capacity planning, performance tuning and to trigger automated alerts or actions.

The candidate must be able to work closely with application developers and owners to design and automate meaningful tests that validate functionality, performance, availability, and failover capabilities.  Additionally the candidate needs to be able to perform load testing and capacity planning on applications and cloud infrastructure.

Any experience with building platforms to reliably support large scale data ingest and analytics in a cloud based environment is a strong plus.

Principal Responsibilities

  • Designing and building resiliency as default into our cloud based architecture
  • Design and automate and tests that ensure the reliability of cloud deployed applications
  • Design and  automate deployment mechanisms such as Blue/Green and Canary
  • Designing CI/CD pipelines to include infrastructure, application, and security testing, and gates
  • Implementation of availability, security, and performance monitoring and alerting
  • Implement load testing and capacity planning
  • Automating data resiliency and replication based on policies
  • Automating systems configuration and orchestration using tools, such as Chef, Ansible, or Salt
  • Automating creation of machine images and containers

Qualifications/Skills Required

  • Significant experience designing and supporting production cloud environments
  • Strong coding skills, in one or more languages, to include python
  • Monitoring applications using cloud native, open source, and 3rd party tools
  • Experienced developing collaboratively, including infrastructure as code
  • Experience developing automated tests, preferably in python, to validate application and infrastructure functionality, security, and performance as part of an SDLC process
  • Cloud templating and automation tools for deploying and managing infrastructure
  • Experience building CI/CD pipelines including the use of cloud native tools
  • Experience with data management and protection strategies in the cloud
  • Experience with key management as it pertains to data in cloud environments
  • Deep knowledge of cloud platform APIs and automation
  • Degree preferred in a STEM or related field



    Location: New York City, NY
    Apply for this Position
    Apply at: https://simplex.hiringthing.com/job/55812/cloud-reliability-engineer