Data Engineer - Content Intelligence
New York City
Applications have closed
Spotify
We grow and develop and make wonderful things happen together every day. It doesn't matter who you are, where you come from, what you look like, or what music you love. Join the band!Have you ever had a debate at a party about who originally wrote a song versus who covered it? Tried to find a sample you loved more than the song you heard it in? Wondered who was actually in the recording booth for your favorite song (on both sides of the glass)? Wonder what Britney Spears, Nsync, Pink, Katy Perry, Taylor Swift, and The Weeknd all have in common? (The same songwriter wrote Billboard number-one singles for all of them). Who gets paid every time we sing “Happy Birthday”?
Music attribution at scale is one of the great unsolved technical problems of the music industry, and we’re building powerful technology to solve it. Our goal is to solve this problem for the tens of millions of music tracks playable on Spotify, building a knowledge graph through innovative machine learning models, deep domain expertise, and close integration with human-in-the-loop processes across Spotify and the industry. Content Platform’s catalog data powers Spotify experiences from Artist pages in the app, search and recommendations, human playlist curation, Spotify for Artists, and our music industry-facing strategy.
Our teams are composed of product, machine learning, data and backend engineers, and subject matter experts who average 11 years behind the scenes in the music industry.
Come join our team of talented engineers who share a common interest in distributed systems, scalability, and continued development. You will build the data pipelines that power our application, scale highly distributed systems, and continuously improve our engineering practices. Above all, your work will impact the way the world experiences music.
What You'll Do:
- Build large-scale batch and real-time data pipelines with data processing frameworks like Scio, Beam, Spark, and Flink, deployed and scaled via Google Cloud Platform.
- Construct architectures to synthesize signals from disparate sources (including catalog metadata, audio vectors, and human-generated annotations) and populate scalable data solutions delivering insights about the music industry to Spotify product teams.
- Use standard methodologies in continuous integration and delivery.
- Help drive optimization, testing, and tooling to improve data quality.
- Collaborate with other product managers, software engineers, ML experts, and stakeholders, taking learning and leadership opportunities that will arise every single day.
- Work in multi-functional agile teams to continuously experiment, iterate and deliver on new product objectives.
Who You Are:
- You have professional experience working in a product-driven environment.
- You know how to work with high-volume heterogeneous data, preferably with distributed systems and data stores such as Hadoop, Spark, HBase, Cassandra.
- Experience with graph databases (such as Neo4j), graph algorithms, and/or ontological modeling is a plus.
- Writes distributed, high-volume services in Java or Scala.
- Deep understanding of system design, data structures, and algorithms.
- Knowledgeable about data modeling, data access, and data storage techniques and are able to demonstrate these skills to make architectural decisions based on product opportunities.
- You care about agile software processes and iterative delivery, data-driven development, reliability, and responsible experimentation.
- You understand the value of collaboration within teams.
Where You'll Be:
- We are a distributed workforce enabling our band members to find a work mode that is best for them.
- Where in the world? For this role, you will be working US East Coast hours.
- Prefer an office to work from home instead? Not a problem! We have plenty of options for your working preferences. Find more information about our Work From Anywhere options here.
Spotify transformed music listening forever when we launched in 2008. Our mission is to unlock the potential of human creativity by giving a million creative artists the opportunity to live off their art and billions of fans the chance to enjoy and be passionate about these creators. Everything we do is driven by our love for music and podcasting. Today, we are the world’s most popular audio streaming subscription service.
Global COVID and Vaccination DisclosureSpotify is committed to safety and well-being of our employees, vendors and clients. We are following regional guidelines mandating vaccination and testing requirements, including those requiring vaccinations and testing for in-person roles and event attendance. For the US, we have mandated that all employees and contractors be fully vaccinated in order to work in our offices and externally with any third-parties. For all other locations, we strongly encourage our employees to get vaccinated and also follow local COVID and safety protocols.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: Agile Architecture Cassandra Data pipelines Data quality Distributed Systems Engineering Flink GCP Google Cloud Hadoop HBase Machine Learning ML models Neo4j Pipelines Scala Spark Streaming Testing
Perks/benefits: Career development Equity Team events
More jobs like this
Explore more AI, ML, Data Science career opportunities
Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.
- Open Data Science Manager jobs
- Open Lead Data Analyst jobs
- Open MLOps Engineer jobs
- Open Senior Business Intelligence Analyst jobs
- Open Data Engineer II jobs
- Open Data Manager jobs
- Open Sr Data Engineer jobs
- Open Power BI Developer jobs
- Open Principal Data Engineer jobs
- Open Data Analytics Engineer jobs
- Open Business Intelligence Developer jobs
- Open Junior Data Scientist jobs
- Open Data Scientist II jobs
- Open Product Data Analyst jobs
- Open Senior Data Architect jobs
- Open Sr. Data Scientist jobs
- Open Business Data Analyst jobs
- Open Big Data Engineer jobs
- Open Data Analyst Intern jobs
- Open Manager, Data Engineering jobs
- Open Azure Data Engineer jobs
- Open Data Quality Analyst jobs
- Open Data Product Manager jobs
- Open Junior Data Engineer jobs
- Open Principal Data Scientist jobs
- Open Data quality-related jobs
- Open Business Intelligence-related jobs
- Open GCP-related jobs
- Open ML models-related jobs
- Open Data management-related jobs
- Open Java-related jobs
- Open Privacy-related jobs
- Open Data visualization-related jobs
- Open Finance-related jobs
- Open APIs-related jobs
- Open Deep Learning-related jobs
- Open PyTorch-related jobs
- Open Snowflake-related jobs
- Open Consulting-related jobs
- Open TensorFlow-related jobs
- Open PhD-related jobs
- Open CI/CD-related jobs
- Open NLP-related jobs
- Open Kubernetes-related jobs
- Open Data governance-related jobs
- Open Airflow-related jobs
- Open Hadoop-related jobs
- Open LLMs-related jobs
- Open Databricks-related jobs
- Open Data warehouse-related jobs