Senior Data Engineer - Databricks

United Kingdom - Remote

Applications have closed

Heni

HENI is an international art services business working with leading artists and estates across publishing, print-making.

View company page

We now have a huge number of disparate data sources across the business and the data currently sits on a variety of platforms. We are looking to build a data lake (AWS) to pool all the data and then provide structured warehouses that feed on the data lake.

A data lake will feed into Delta Lake with Pyspark/Spark being utilised. Databricks will sit on top allowing for structured cloud warehousing.

Databricks cloud services will be used for data ingestion, data transformation and processing in delta lake, and data serving.

All candidates will need experience with Databricks.

We are looking for someone who has ideally done something similar i.e. Worked on a data lake project and built pipelines to fill the lake with raw data. You would be responsible for architecture, design and development.

Requirements

Data Engineering

  • Experience in data transformation solution design and development using batch and streaming data sources
  • Experience in the development of ETL pipelines using Python and SQL
  • Experience in the development of CI/CD pipelines using GitHub Actions and Javascript
  • Experience in embedding data quality and validation into the release and execution of data pipelines
  • Experience in using traditional ETL tools
  • Experience delivering solutions using distributed processing technologies, principally Spark and MapReduce
  • Cloud-native tooling, for AWS, would include experience in Amazon Glue, Lambda, SNS, Kinesis, RDS, Redshift, S3, Athena et al.
  • Apache Hadoop and knowledge of multiple distributions (Cloudera, HortonWorks, HDInsight etc.) associated with Apache Big data products (Hive, Impala, Oozie etc.)
  • Data ingestion design includes batch and real-time architectures using tools like Kafka, Storm, Kinesis or equivalents.
  • Data governance and metadata management using tools like Apache Atlas.
  • Data transformation technologies include but are not limited to Spark, Python or Nifi.
  • Data deployment experience on cloud-native and hybrid cloud solutions
  • Microservice / SOA / stateless approaches to data ingestion & consumption
  • Expertise and experience in producing solution and information architectures using a subset and/or all of the technologies above.
  • Information Glossary tooling e.g. IIGC, Informatica Enterprise Data Governance or Colilbra.

Data Management

  • Experience in data modelling and optimisation for row and columnar based environments. Using tools such as Infosphere Data Architect, Erwin etc
  • Data Governance approaches and technologies which cover Business Glossary, Metadata Management and Data Lineage
  • Security governance and access management at infrastructure, server, and application levels including role and attribute-based access
  • Consulting with regards to recent data-related regulations including the Data Protection Act (DPA) and Global Data Protection Regulation (GDPR)

General

  • Cross-sector consulting and delivery using the above technologies and capabilities
  • Experience in delivering solutions and capabilities using the above in both an agile and waterfall delivery methodology.
  • To be able to translate requirements/problem statements into a big data and/or analytics solution using the above technologies and capabilities.

Preferred Technical and Professional Expertise

  • Expertise and experience in developing data science solutions using tools such as Python, R, TensorFlow or spaCy
  • Expertise and experience in developing solutions across multiple cloud applications
  • Knowledge of hybrid cloud data development and containerisation techniques including Kubernetes, Docker or Cloud Foundry
  • Evidence in contributing to the community of data engineers, either within or across a single department or organisation.

Benefits

  • Competitive salary + bonus
  • Work in a dynamic and fast-paced environment with new challenges
  • Work with a modern tech stack and the latest frameworks
  • Have your say – a real chance to influence the tech stack
  • Working with a company that uses the most up to date blockchain technology
  • Get involved in a variety of projects, see how they develop into polished products and services
  • Flexible & fully remote working options – you choose where you work
  • Competitive holiday allowance
  • Full private healthcare
  • Visa + relocation support
  • Learning budget for personal development

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Tags: Agile Architecture Athena AWS Big Data Blockchain CI/CD Consulting Databricks Data governance Data management Data pipelines Data quality Docker Engineering ETL GitHub Hadoop Informatica JavaScript Kafka Kinesis Kubernetes Lambda Oozie Pipelines PySpark Python R Redshift Security spaCy Spark SQL Streaming TensorFlow

Perks/benefits: Career development Competitive pay Flex hours Relocation support

Regions: Remote/Anywhere Europe
Country: United Kingdom
Job stats:  10  2  0
Category: Engineering Jobs

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.