We are looking for an mid level to Sr. level Engineer with experience in Real-time Big Data Pipelines with Spark and Kafka running MapReduce.
This full-time position with RecruiterDNA's Client, is classified as at the :Software Engineer, IV, Level, and will not have any direct reports but will act a lead and mentor.
Industry: Pharma Tech, Healthcare technology
Location: Mountain View, CA or Suburb outside of Pittsburgh, PA. Not C2C, remote or virtual work arrangements.
Is relocation assistance available: Yes
Develop custom batch-oriented and real-time streaming data pipelines working within the MapReduce ecosystem, migrating flows from ELT to ETL Ensure proper data governance policies are followed by implementing or validating data lineage, quality checks, classification, etc.
Act in a technical leadership capacity: Mentor junior engineers and new team members, and apply technical expertise to challenging programming and design problems Resolve defects/bugs during QA testing, pre-production, production, and post-release patches Possess a quality mindset, squash bugs with a passion, and work hard to prevent them in the first place through unit testing, test-driven development, version control, continuous integration, and deployment.
Conduct design and code reviews Analyze and improve efficiency, scalability, and stability of various system resources Contribute to the design and architecture of the project
Operate within Agile Development environment and apply the methodologies Required Knowledge and Skills: Proficient understanding of distributed computing principles Good knowledge of Big Data querying tools, such as Pig or Hive Good understanding of Lambda Architecture, along with its advantages and drawbacks Proficiency with MapReduce, HDFS Ability to solve any ongoing issues with operating the cluster Ability to lead change, be bold, and have the ability to innovate and challenge the status quo Passionate about solving customer problems and develop solutions that result in a passionate customer/community following Basic
Bachelor’s degree in Engineering/IT/Computer Science 6+ years’ experience in software engineering
2+ years experience: developing ETL processing flows using MapReduce technologies like Spark and Hadoop In Software Design
1+ years’ experience: developing with ingestion and clustering frameworks such as Kafka, Zookeeper, YARN building stream using Spark-Streaming with various messaging systems, such as Kafka or RabbitMQ
Experience with integration of data from multiple data sources Experience leading projects or teams
Master Degree in Engineering/IT/Computer Science 8+ years’ experience in software engineering
1+ years’ experience with: DataBricks and Spark NoSQL databases, such as HBase, Cassandra, MongoDB Big Data ML toolkits, such as Mahout, SparkML, or H2O
2+ years’ experience with Scala or Java Language as it relates to product development. Management of Spark or Hadoop clusters, with all included services Experience Service Oriented Architecture ( SOA) /microservices Demonstrable advanced knowledge of data architectures, data pipelines, real-time processing, streaming, networking, and security