Site Reliability Engineer | US | Remote
Spartan Technologies, Inc.
- New York, NY, United States
APPLICANTS NEED TO KNOW
- US Residents Only
- No Sponsorship is provided
- No 3rd Party Companies
- No Corp to Corp
- 6+ month Contract to Hire role
We seek an experienced Site Reliability Engineer located in the United States for a 100% remote work role.
As a Site Reliability Engineer, you will be critical in ensuring our software products' reliability, scalability, and performance. You will be responsible for designing and implementing highly available and fault-tolerant systems while working closely with the development team to deliver high-quality products. In this role, you can work on complex and challenging problems, develop innovative solutions, and contribute to a dynamic and collaborative team environment. If you have a passion for solving complex technical issues and ensuring the highest levels of system performance, we want to hear from you.
- Collaborate with development and product teams to ensure that applications and systems are designed and implemented with reliability, scalability, and performance in mind.
- Automate and streamline operational processes, from deployment to monitoring and alerting, to improve efficiency and reduce manual error.
- Design, implement, and maintain complex infrastructure systems for high-availability production environments using Terraform and Cloud Formation tools.
- Monitor systems and applications for performance, availability, and security, and respond to issues quickly and efficiently.
- Continuously improve systems and applications' reliability, scalability, and performance through root cause analysis, code and architecture review, and proactive monitoring.
- Participate in on-call rotation and respond to critical incidents promptly and efficiently, performing troubleshooting and incident management as needed.
- Develop and maintain disaster recovery and business continuity plans to ensure business continuity in case of service outages or disasters.
- Provide technical guidance and mentorship to other engineers on reliability and scalability best practices, tools, and methodologies.
- At least 3 years of experience as a Site Reliability Engineer or DevOps Engineer.
- A Bachelor’s Degree in Computer Science, Computer Engineering, a related field, or equivalent experience.
- Proven hands-on experience with various software languages such as Python, Ruby, Go, C++, .NET, and BASH.
- Working knowledge of cloud computing services, with experience in Amazon Web Services (AWS), preferred.
- Proficiency in infrastructure toolings such as Terraform, Cloud Formation, and Powershell.
- Familiarity with configuration management systems like Octopus, Chef, and Puppet.
- Experience in rolling out redundant, mission-critical applications in a highly available production environment.
- Familiarity with version control systems like Git or SVN.
- Experience with continuous integration tools like Jenkins, CircleCI, Artifactory, or Nexus.
- Excellent written and verbal communication skills, problem-solving, and process management skills.
- Proven ability to work with team members to deliver projects on time.
- Strong analytical and problem-solving skills.
Nice to Have
- Experience with big data and distributed systems
- Familiarity with container orchestration platforms such as Kubernetes or Mesos
- Familiarity with machine learning frameworks such as TensorFlow, PyTorch, or Keras
- Experience in implementing security and compliance policies in a production environment
- Ability to write automation scripts in a language such as Python, Ruby, or Go.
#SiteReliabilityEngineer #DevOps #CloudComputing #AWS #InfrastructureAsCode #Terraform #Puppet #Chef #Git #ContinuousIntegration #Jenkins #CircleCI #AgileDevelopment #AutomatedTesting #Monitoring #Linux #Python #Ruby #Go #C++ #NET #BASH #SQL #MySQL #Postgres #Networking #Hashicorp #Virtualization #Containerization #JobOpening #Hiring #TechJobs #SoftwareEngineering #ITJobs