Serverless Reliability Engineer

Datacoral - San Francisco, CA

Serverless Reliability Engineer

San Francisco, CA

  • Get in early on building the first truly cloud-native data infrastructure stack
  • Opportunity to make a large impact on product, company, and customers
  • Work closely with a founding team that’s driven massive scale at Facebook, Groupon, and more

Datacoral was founded in 2016 with one overarching goal - eliminate the complexity and cognitive load of building and managing data infrastructure. We strive everyday to reduce friction that data practitioners face when trying to extract value out of their data. Our founding team has built and run distributed systems at scale - data infrastructure at Facebook as it scaled from 50TB to over 100PB of managed storage, and at Groupon as it scaled to sending millions of emails a day. At Datacoral, we are building the first serverless-native data infrastructure stack in the cloud. We’re growing our team of self-starters with folk who want to fundamentally change how companies work with their data.

The Role:

Datacoral is completely serverless. We offer a SaaS product, but instead of building a centralized multi-tenant installation, our architecture has security built-in via multiple isolated installations. We do not deploy clusters or bring up nodes nor do our SREs. We do not want our SREs to do any capacity management (oh yes, we are serverless!) or bring in efficiency or performance improvements (Our product takes care of that!). At Datacoral we are looking at a SRE’s role in a whole new light!

Reliable data flows and data delivery is a key tenet of Datacoral and managing reliability will be your key responsibility. You will be a key driver for the direction for Engineering team and have a rare ability to inspire engineering teams to up their reliability game. You are both a generalist, capable of picking up and working with multiple, disparate systems, and an expert, having an ability to dive deep into specific topics and quickly master them. You comfortably move between system, service, and instance level views.

Success in this role requires a passion for helping others and making their lives better, you do this by simplifying complex systems to make them understandable and operable. You are able to effectively communicate decisions, ideas, designs, and operation of systems and services in a clear and concise manner.

You should expect to collaborate with all other engineering teams to develop solutions that meet reliability, security, and business requirements. Lastly, you will diagnose, triage, and build solutions for complex technical issues at scale.

Being the first SRE in the team you will have a foundational impact on the systems we create.

You'd Be a Great Fit, If You

  • Are passionate about distributed systems, database technologies, and highly scalable services
  • Have a steady hand under pressure
  • Build and maintain services, automation, and tooling that will positively impact key areas
  • Drive continuous improvement by measuring and reducing the amount of manual operational work.
  • A self-starter who thrives in a fast-paced environment
  • Willing to learn new skills and technologies
  • Attentive to details and comfortable with ambiguity
  • Demonstrate the ability to work independently and collaboratively as part of a specialized team.
  • Ability to slow down and communicate clearly and effectively across language barriers.

What we’re looking for

  • Bachelor's or Master's degree in Computer Science or a related field, or relevant work experience
  • A minimum of 5+ years as SRE
  • Experience building and operating public-facing 24x7 distributed systems at scale
  • Strong programming skills in a scripted language (Node, Python, Bash)
  • Experience working with AWS
  • Knowledge and experience in Systems Engineering, Administration, and Operations.
  • Articulate and personable with strong spoken and written English language abilities.

Posted On: Tuesday, May 7, 2019

Apply to this job