Intern, Cloud Data Management

Emeryville, CA

Applications have closed

Berkeley Lights

View company page

Here at Berkeley Lights, we think cells are awesome! Cells are capable of manufacturing cures for diseases, fibers for clothing, energy in the form of biofuels, and food proteins for nutrition. So the question is, if nature is capable of manufacturing the products we need in a scalable way, why aren’t we doing more of this? Well, the answer is that with the solutions available today, it is hard. Berkeley Lights is here to change all of that! Our extremely sophisticated proprietary technology and Beacon® and LightningTM systems accelerate the rate researchers can discover and develop cell-based products in a fraction of the time and at a fraction of the cost of conventional, legacy research methods. Using our tools and solutions, scientists can find the best cells, the first time they look. Our goal is to continue to collaborate with customers to drive the adoption of our technologies, making cell-based products and therapeutics more easily accessible the world over! You will play a major role here in the creation and development of these technologies, and our success will depend on you! We have been changing how the world develops cell-based products since 2011, and now our family of around 250 employees welcomes you to consider joining us on this incredible journey.
The cloud data management intern will play a pivotal data engineering role in the evaluation, development, and deployment of the Berkeley Light platform data analytics pipeline. Berkeley Lights is looking for an ambitious intern to contribute to managing the data generated by our unique software that drives our opto-nanofluidic systems capable of single-cell manipulation and annotation! Currently, we have the capability to interrogate and manipulate tens of thousands of single cells in a completely automated fashion, and we are developing multiple platforms with unique capabilities to enable our customers to discover new drugs and practice biology in novel ways.
As a cloud data intern, you will be responsible for designing the data extraction, ingestion, transformation and loading of data into cloud data stores. You will help define the necessary schemas, table structures as well as interfacing the data with analytics platforms.
In addition to developing the data pipeline, you will assist with data migrations as we implement new platforms, retire legacy systems or onboard new processes/user groups into our enterprise systems.

Responsibilities:

  • Responsible for the development of a data integration pipeline including the design of middleware, use of staging tables, combining disparate data sources and formats.
  • Extract, transform and load (ETL) of data from flat data files generated by our platform instrumentation software to cloud datastores.
  • Automate data preparation for use with data warehouses, data lakes, analytics, and machine learning.
  • Write serverless functions (e.g. AWS Lambdas) as part of our data analysis pipeline.
  • Provide guidance on analysis and interpretation of high-dimensional data, including genomic, transcriptomic, and proteomic datasets.
  • Enable a team of world class scientists and experienced data analysts to turn data into actionable insights. These insights can lead to the next vaccine or cell therapy that change the world!

Qualifications:

  • Experienced with programming (C#, Python, R)
  • Familiarity with database design; preferably both relational and non-relational (NoSQL).
  • Comfortable with on premises local databases (PostgreSQL, MySQL, SQLite MS SQL Server) as well as some familiarity with cloud database services (DBaaS; Amazon RDS, DynamoDB, MongoDB, MS Azure).
  • Experienced with cloud services (preferably AWS).

Desired:

  • Experience with big data query using Amazon Athena.
  • Familiarity with desktop data analytics software (e.g. Tableau, Spotfire) as well as data visualization using scripting languages and tools (e.g. Plotly, Seaborn, GGplot, Esquisse).
  • A passion for big data and the visual display of quantitative information.
  • Python, R: grammar of graphics Familiarity with software development in C#/.NET, SQL, REST APIs, JavaScript, React.
The California Consumer Privacy Act (CCPA) is effective January 1, 2020. Please read our California Consumer Privacy Act Disclosure Form regarding the CCPA and Berkeley Lights’ required disclosures about the collection of personal information.

Tags: APIs Athena AWS Azure Big Data Biology Data analysis Data Analytics Data management Data visualization DynamoDB Engineering ETL JavaScript Machine Learning MongoDB MS SQL MySQL NoSQL Plotly PostgreSQL Python R React Research Seaborn Spotfire SQL Tableau

Region: North America
Job stats:  14  5  0

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.