Data Engineer, Spark/ Flink/ Scala

Mountain View or Irvine CA

Applications have closed

Samsung Research America

For more than 70 years, Samsung has been at the forefront of innovation. Our discoveries, inventions and breakthrough products have helped shape the history of the digital revolution. We continue to expand our global reach and open new...

View company page

Tittle: Senior Data Engineer - Spark/Flink/Scala

Location: Mountain View, CA or Irvine, CA

Lab Summary:

Samsung is the world’s largest consumer electronics company and the leading provider for smart phones and smart TVs. Samsung smart TVs connect homes to the Internet, providing a full range of intelligence capabilities such as speech recognition, gesture recognition, advanced video processing and personalized recommendation.

The VD intelligence lab at Samsung Research America is building a next-generation data platform to support Smart TV products and services. We have two office locations in California: Irvine and Mountain View.  Our research and development include TV analytics, ads targeting, and personalized services. We are looking for DevOps Engineer, who will focus on designing and developing automation to support continuous delivery and continuous integration processes. Our ideal candidate should have worked in Amazon Web Services (AWS) environments leveraging services beyond basic IaaS provisioning.

General Description

We are looking for Scala Engineers with experience with batch and/or streaming jobs. We utilize Spark for batch jobs and Flink for real-time streaming jobs. Experience with Hadoop, Hive, AWS S3 is also an asset.

Responsibilities

  • Create new, and maintain existing, Spark jobs written is Scala
  • Create new, and maintain existing, Flink jobs written in Scala
  • Produce unit and system tests for all code
  • Participate in design discussions to improve our existing frameworks
  • Define scalable calculation logic for interactive and batch use cases
  • Interact with infrastructure and data teams to produce complex analysis across data

Required Qualifications:

  • A minimum of 2 years of experience with Scala and/or Java
  • A minimum of 5 years of programming experience
  • Required experience with Hadoop, Spark
  • Knowledge and experience with cloud-based technologies
  • Experience in batch or real-time data streaming
  • Ability to dynamically adapt to conventional big-data frameworks and open source tools if project demands
  • Knowledge of design strategies for developing scalable, resilient, always-on data lake
  • Strong development/automation skills
  • Must be very comfortable with reading and writing Scala code
  • An aptitude for analytical problem solving
  • Deep knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance
  • Good understanding/knowledge of HDFS architecture and various components such as Job Tracker, Task Tracker, Name Node, Data Node, HDFS high availability (HA) and Map Reduce programming paradigm.
  • Experienced working with various Hadoop Distributions (Cloudera, Hortonworks, MapR, Amazon EMR) to fully implement and leverage new Hadoop features
  • Experience in developing Spark Applications using Spark RDD, Spark-SQL, Spark -Yarn, Spark Mlib and Data frame APIs
  • Experience with real-time data processing and streaming techniques using Spark streaming and Kafka, moving data in and out HDFS and RDBMS.
  • Familiarity with open source configuration management and development tools  

Preferred Qualifications:

  • Hands on experience and production use of Hadoop/Cassandra, Spark, Flink and other distributed technologies would be a plus
  • Other Technologies
    • Scalatest
    • Gradle/Maven
    • Airflow
    • SQL
    • AWS

Samsung is committed to encouraging a diverse workplace and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) based on race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

If you have a disability or special need that requires accommodation, please let us know.

Tags: Airflow APIs AWS Cassandra DevOps Flink Hadoop HDFS Kafka Map Reduce Maven Open Source RDBMS Research Scala Spark SQL Streaming

Region: North America
Country: United States
Job stats:  10  0  0
Category: Engineering Jobs

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.