Data Engineering Internship

San Jose, CA

Applications have closed

Vectra

Vectra AI's Threat Detection and Response Platform protects your business from cyberattacks by detecting attackers in real time and taking immediate action.

View company page

Vectra® is the leader in AI-driven threat detection and response for hybrid and multi-cloud enterprises.

The Vectra Platform captures packets and logs across network, public cloud, SaaS, and identity by applying patented security-led AI to surface and prioritize threats for rapid threat response. Vectra's threat detections are powered by a deep understanding of attacker methods and problem-optimized AI algorithms. Alerts uncover attacker methods in action and are correlated across customer environments to expose real attacks. Organizations around the world rely on Vectra to see and stop threats before a breach occurs. For more information, visit www.vectra.ai.

Position Overview

 

Detecting attackers in real-time requires robust data pipelines that enable machine learning and statistical techniques. As an intern for the Data Engineering team, you will help transform rich network traffic data, cloud log data into meaningful features and develop data systems for collecting algorithm telemetry. You will be involved with building pipelines and tools for both on-prem and cloud deployments while collaborating with Data Scientists and Software Engineers in the process.

Responsibilities  

  • Work with the Data Engineers on the team to improve and develop new features enabling Data Scientists to access data in ways previously unavailable
  • Possible projects range from
    • Building out a data converter to parquet format and catalog using AWS Glue
    • Performing ETL on existing data to restructure time series data in a more accessible format
    • Automate the piping of network captures into a process to convert into metadata and load into Spark

Qualifications

  • Required
    • Working towards a BS or MS in Computer Science or related field
    • Strong programming skills with experience in Python, C++, or Java
    • Linux proficiency and shell scripting
  • Desirable
    • Experience with Docker, Kubernetes or other container orchestration tool
    • Experience working with AWS or GCP offerings
    • Experience with a source control system, preferably Git
    • Familiarity with Hadoop, Map/Reduce, Spark, and distributed computing
    • Understanding of data pipeline architectures (e.g. Lambda, Kappa)
    • Database hands-on experience (MySQL, MongoDB, couchdb, ElasticSearch, etc.)
    • Knowledge of real-time data pipelines (e.g. Kafka and Spark Streaming)
    • Experience with continuous integration and deployment workflows

 A two-minute video that describes what we do at Vectra, and an article about Vectra's last funding round:

https://vimeo.com/89579264

https://tcrn.ch/3gVAXNw

* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰

Tags: Architecture AWS Computer Science Data pipelines Docker Elasticsearch Engineering ETL GCP Git Hadoop Kafka Kubernetes Lambda Linux Machine Learning MongoDB MySQL Parquet Pipelines Python Security Shell scripting Spark Statistics Streaming

Region: North America
Country: United States
Job stats:  16  2  0
Category: Engineering Jobs

More jobs like this

Explore more AI, ML, Data Science career opportunities

Find even more open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general - ordered by popularity of job title or skills, toolset and products used - below.