Sr Data Engineer (Spark + SQL) – Health Data Analytics Platform

Remote, anywhere in LATAM

Full Time
Truelogic Software logo
Truelogic Software
Apply now Apply later

Posted 1 month ago

Project Description

Your primary responsibility will be developing transformation logic against disparate datasets into our proprietary format. You will work closely with our Science team in developing this logic, which is in SQL and Scala UDF and executed over a Spark cluster. In addition, you will be integral in developing and enhancing our ELT process which utilizes Apache Airflow (python), Spark/Databricks and a combination of Java/Scala to develop data ingestion infrastructure.

Responsibilities

  • Develop transformation logic to convert disparate datasets into our proprietary format.
  • Work with the Science team to develop this logic in SQL executed over a Spark cluster.
  • Assess, develop, troubleshoot and enhance ELT process, which utilizes Apache Airflow (python) and a combination of Java/Scala.
  • Modify JavaScript Object Notation (JSON) files to describe the schemas of the datasets, ensuring system functionality through routine maintenance and testing.
  • Work on a full-stack rapid-cycle analytic application.
  • Develop highly effective, performant, and scalable components capable of handling large amounts of data for over 10 million patients.
  • Work with the Science and Product teams to understand and assess client needs, and to ensure optimal system efficiency.

Requirements

  • Bachelor’s degree or equivalent in Computer Science, Computer Engineering, Information Systems, or a related field.
  • 4 years of experience in the position offered or related position, including 5 years of experience with: designing, developing, maintaining large-scale data ETL pipelines using Java/Scala in AWS, Hadoop, Spark, and DataBricks to manage Apache Spark infrastructure.
  • Experience building backend modules, low latency REST API in distributed environment using Java, Docker, SQL, MVN, Spring, Jenkins.
  • Experience writing complex SQL queries, UDFs to process large amounts of data across relational, non-relational databases, JSON and Spark SQLs.
  • Experience translating requirements from product, DevOps teams to technology solutions using SDLC.

Project Stack

  • Languages - Java / Scala
  • Databases - SQL
  • Frameworks and Libraries - Spark, Hadoop, Databricks, MVN
  • Infrastructure - AWS
  • Build Tools - Jenkins

Rewards

  • Payment in USD
  • Free credentials for e-learning platforms
  • Remote workshops & activities
Job tags: Airflow AWS Data Analytics Engineering ETL Hadoop Java JavaScript JSON Python Rest API Scala Spark SQL