Backend / Python Data Engineer

Remote

Applications have closed

Automattic Inc.

We are passionate about making the web a better place. WordPress.com Your blog or website Has a (free!) home on the web. Your story, your way. Tumblr Where your interests connect you to your people one post at a time. Day One Your thoughts...

View company page

Parse.ly is a real-time content measurement layer for the entire web.

Our analytics platform helps digital storytellers at some of the web's best sites, such as Arstechnica, The New Yorker, TechCrunch, Wired, The Intercept, Slate, and many more. In total, our analytics system handles over 65 billion monthly events from over 1 billion monthly unique visitors.

Our entire stack is in Python and JavaScript, and our team has innovated in areas related to real-time analytics, building some of the best open source tools for working with modern stream processing technologies.

On the open source front, we maintain streamparse, the most widely used Python binding for the Apache Storm streaming data system. We have also contributed to the development of high-performance Kafka client libraries for Python.

Our colleagues are talented: our UX/design team has built one of the best-looking dashboards on the planet, using Vue.js and D3.js, and our infrastructure engineers have built a scalable, devops-friendly cloud environment, using tools like Ansible and Terraform.

As a Python Data Engineer, you will help us expand our reach into the area of petabyte-scale data analysis — while ensuring consistent uptime, provable reliability, and top-rated performance of our backend streaming data systems.

Parse.ly’s data engineering team already makes use of modern technologies like Python 3, Storm, Spark, Kafka, and Elasticsearch to analyze large datasets. As a Python Data Engineer at Parse.ly, you will be expected to master these technologies, while also being able to write code against them in Python, and debug issues down to the native C code and native JVM code layers, as necessary.

This team owns a real-time analytics infrastructure that processes over 2 million pageviews per minute from over 5,000 high-traffic sites. It operates a fleet of cloud servers that include thousands of cores of live data processing. We have written publicly about mage, our time series analytics engine. This will give you an idea about the kinds of systems we work on.

WHAT YOU'LL DO

For this role, you should already be a proficient senior Python programmer who wants to work with data at scale.

In the role, you’ll...

  • Write Python code using the best practices. See The Elements of Python Style, written by our founding CTO, for an example of our approach to code readability and design.
  • Analyze data at massive scale. You need to be comfortable with the idea of your code running across 3,000 Python cores, thanks to process-level parallelization.
  • Brainstorm new product ideas and directions with team and customers. You need to be a good communicator, especially in written form.
  • Master cloud technologies and systems. You should love UNIX and be able to reason about distributed systems. A favorite book of yours might be Designing Data-Intensive Applications (DDIA).

PARSE.LY TECH

  • Python 3 for both backend and frontend.
  • Amazon Web Services (AWS) used for production and testing.
  • Self-hosted modern databases like Elasticsearch, Redis, Cassandra, and Postgres.
  • Frameworks like Django, Tornado and the PyData stack (e.g. Pandas).
  • Kafka, Storm, Spark in production atop massive data sets (terabytes per day).
  • Easy system management with Fabric, Ansible, and Terraform.

OUR FULLY DISTRIBUTED TEAM

We are a fully distributed team, which means 100% of our engineers, designers, and product managers (including our founders & management team) work out of home offices. This has been true for years, long before the pandemic switched people's work styles -- so we have a thoughtful approach to fully distributed collaboration that has been refined over time.

Most of the product team is located near the US/Eastern timezone. Candidates should be in GMT-7 thru GMT-3, because even though we operate on a distributed/async model, we like to have timezone overlap for f2f (video) collaboration and pairing. If you can regularly make meetings between an 11am-3pm US/Eastern scheduling period, you're in workable timezone for us. Much of our team is US-based, but we'll also consider other timezone-aligned locations in North America and South America (e.g. Canada, Mexico) for this role.

A NOTE ABOUT AUTOMATTIC & BENEFITS 

In February 2021, Parse.ly was acquired by Automattic's enterprise software division, WPVIP. Automattic is one of the biggest champions of open source and the open web, and also one of the top fully distributed employers in the world. Though you will interview for the Parse.ly team within WPVIP and Automattic, if you receive a job offer, it will be to become an Automattician, which means having colleagues who observe the Automattic creed and getting access to Automattic's excellent employee benefits. Your benefits will include:

  • Time off: Our open vacation policy (no set number of days per year) is designed to help you to be at your best! There is no minimum or maximum, but we encourage you to take at least 25 days of time off per year.
  • Health care: Automattic pays 100% of plan premiums for you, your spouse/domestic partner, and eligible dependents.
  • 401K: 100% match on your contributions up to 6% of annual earnings. You’re eligible to participate in Automattic’s 401(k) plan on your first day of employment. Match is fully vested from day one.
  • Parental leave: Open parental leave (including maternity, paternity, LGBTQ+, and adoption for all parents). If you’ve been with Automattic for 12 months, your leave up to 6 months is fully paid.

Automattic's benefits can vary by location. The above describes US-based staff. See our benefits based on where you are in the world.

Tags: Ansible AWS Cassandra D3 Data analysis DevOps Distributed Systems Django Elasticsearch Engineering JavaScript Kafka Open Source Pandas PostgreSQL Python Spark Streaming Terraform Testing UX

Perks/benefits: 401(k) matching Flex vacation Health care Parental leave Team events

Region: Remote/Anywhere
Job stats:  18  1  0
Category: Engineering Jobs

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.