Data Engineer, Spark/ Flink/ Scala
Mountain View or Irvine CA
Applications have closed
Samsung Research America
For more than 70 years, Samsung has been at the forefront of innovation. Our discoveries, inventions and breakthrough products have helped shape the history of the digital revolution. We continue to expand our global reach and open new...Tittle: Senior Data Engineer - Spark/Flink/Scala
Location: Mountain View, CA or Irvine, CA
Lab Summary:
Samsung is the world’s largest consumer electronics company and the leading provider for smart phones and smart TVs. Samsung smart TVs connect homes to the Internet, providing a full range of intelligence capabilities such as speech recognition, gesture recognition, advanced video processing and personalized recommendation.
The VD intelligence lab at Samsung Research America is building a next-generation data platform to support Smart TV products and services. We have two office locations in California: Irvine and Mountain View. Our research and development include TV analytics, ads targeting, and personalized services. We are looking for DevOps Engineer, who will focus on designing and developing automation to support continuous delivery and continuous integration processes. Our ideal candidate should have worked in Amazon Web Services (AWS) environments leveraging services beyond basic IaaS provisioning.
General Description
We are looking for Scala Engineers with experience with batch and/or streaming jobs. We utilize Spark for batch jobs and Flink for real-time streaming jobs. Experience with Hadoop, Hive, AWS S3 is also an asset.
Responsibilities
- Create new, and maintain existing, Spark jobs written is Scala
- Create new, and maintain existing, Flink jobs written in Scala
- Produce unit and system tests for all code
- Participate in design discussions to improve our existing frameworks
- Define scalable calculation logic for interactive and batch use cases
- Interact with infrastructure and data teams to produce complex analysis across data
Required Qualifications:
- A minimum of 2 years of experience with Scala and/or Java
- A minimum of 5 years of programming experience
- Required experience with Hadoop, Spark
- Knowledge and experience with cloud-based technologies
- Experience in batch or real-time data streaming
- Ability to dynamically adapt to conventional big-data frameworks and open source tools if project demands
- Knowledge of design strategies for developing scalable, resilient, always-on data lake
- Strong development/automation skills
- Must be very comfortable with reading and writing Scala code
- An aptitude for analytical problem solving
- Deep knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance
- Good understanding/knowledge of HDFS architecture and various components such as Job Tracker, Task Tracker, Name Node, Data Node, HDFS high availability (HA) and Map Reduce programming paradigm.
- Experienced working with various Hadoop Distributions (Cloudera, Hortonworks, MapR, Amazon EMR) to fully implement and leverage new Hadoop features
- Experience in developing Spark Applications using Spark RDD, Spark-SQL, Spark -Yarn, Spark Mlib and Data frame APIs
- Experience with real-time data processing and streaming techniques using Spark streaming and Kafka, moving data in and out HDFS and RDBMS.
- Familiarity with open source configuration management and development tools
Preferred Qualifications:
- Hands on experience and production use of Hadoop/Cassandra, Spark, Flink and other distributed technologies would be a plus
- Other Technologies
- Scalatest
- Gradle/Maven
- Airflow
- SQL
- AWS
Samsung is committed to encouraging a diverse workplace and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) based on race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
If you have a disability or special need that requires accommodation, please let us know.
Tags: Airflow APIs AWS Cassandra DevOps Flink Hadoop HDFS Kafka Map Reduce Maven Open Source RDBMS Research Scala Spark SQL Streaming
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Marketing Data Analyst jobs
- Open MLOps Engineer jobs
- Open Junior Data Scientist jobs
- Open AI Engineer jobs
- Open Data Engineer II jobs
- Open Senior Data Architect jobs
- Open Sr Data Engineer jobs
- Open Senior Business Intelligence Analyst jobs
- Open Data Analytics Engineer jobs
- Open Power BI Developer jobs
- Open Manager, Data Engineering jobs
- Open Product Data Analyst jobs
- Open Principal Data Engineer jobs
- Open Business Data Analyst jobs
- Open Data Quality Analyst jobs
- Open Data Manager jobs
- Open Sr. Data Scientist jobs
- Open Data Scientist II jobs
- Open Big Data Engineer jobs
- Open Business Intelligence Developer jobs
- Open Data Analyst Intern jobs
- Open Principal Data Scientist jobs
- Open ETL Developer jobs
- Open Azure Data Engineer jobs
- Open Data Product Manager jobs
- Open Business Intelligence-related jobs
- Open Data quality-related jobs
- Open Privacy-related jobs
- Open Data management-related jobs
- Open GCP-related jobs
- Open Java-related jobs
- Open ML models-related jobs
- Open Finance-related jobs
- Open Data visualization-related jobs
- Open Deep Learning-related jobs
- Open APIs-related jobs
- Open PyTorch-related jobs
- Open PhD-related jobs
- Open Consulting-related jobs
- Open TensorFlow-related jobs
- Open Snowflake-related jobs
- Open NLP-related jobs
- Open Data governance-related jobs
- Open Data warehouse-related jobs
- Open Airflow-related jobs
- Open Hadoop-related jobs
- Open Databricks-related jobs
- Open LLMs-related jobs
- Open DevOps-related jobs
- Open CI/CD-related jobs