Data Engineer

Vangst - San Jose, CA

Our client, a leading vertically integrated cannabis company based in California, is looking for their first Data Engineer to join their team.

Job Purpose

The Data Engineer designs and maintains data pipelines supporting a production grade data warehouse spanning multiple operational / functional departments and hundreds of users. S/he is a meticulous programmer who prides themselves on clean, efficient, maintainable code, develops new ETL pipelines and maintains existing ETL pipelines. The Data Engineer extracts company data from a variety of structured and unstructured sources and normalizes information encoding methods/schemas into datasets for warehousing and subsequent data modeling. In addition, s/he recommends and develops changes to source data structures/systems and assists with the implementation of new systems and updates existing systems from a data integrity and use perspective by developing appropriate data schemas and structures for use in downstream models/reports.

Duties and Responsibilities

  • Embraces and demonstrates alignment with Company Values – Integrity, Positive Energy, Bias to Action, Connectedness, Truth Seeking
  • Primary owner of our ETL pipelines spanning dozens of source systems across all departments to support our data warehouse initiative
  • Develops programmatic tests of existing ETL pipelines to assert the quality of warehoused data
  • Becomes an evangelist of programming best practices within the Analytics team and lead by example on topics such as code clarity, judicious use of whitespace, and unit tests
  • Builds new ETL pipelines to warehouse new data sources as they are identified
  • Assembles large, complex data models to meet the needs of operational and strategic stakeholders
  • Works closely with our in-house analysts to integrate SQL data models to a dependency tree
  • Maintains user permissions for Warehouse data sources and assist in user access training
  • Other duties and responsibilities as assigned by management


  • Previous experience developing ETL pipelines using technologies such as Airflow (preferable), Luigi, Oozie, Azkaban, ect.
  • Experience manipulating and de-normalizing data in JSON format for storage in relational databases
  • Experience with Google Cloud Platform or AWS cloud services
  • Knowledge and experience with Kubernetes and/or Docker (Preferred)
  • Advanced knowledge of SQL and experience working with relational databases (Preferred)
  • Previous experience developing data models to support a data warehouse (Preferred)

Education and Experience

  • Bachelor’s degree or higher in an engineering or technical field such as Computer Science, Physics, Mathematics, Statistics, Engineering, Business Administration, or similar or equivalent combination of education and experience
  • 1-5 years experience manipulating data using Python (experience with Pandas is a plus); extracting data from REST APIs; managing a codebase in GitHub

Working Environment

This job operates in a professional office environment. This role routinely uses standard office equipment such as computers, phones, copiers and filing cabinets.

Physical Requirements

Must be able to sit for extended periods of time, use hands and fingers for data entry for 8 hours or more, some lifting, squatting, bending, some pushing and pulling.

Additional Requirements

  • Must be 21 years of age or older
  • Must comply with all legal and company regulations for working in the industry
  • Must pass a background check with the San Jose Police Department


Posted On: Wednesday, November 6, 2019

Apply to this job