Data Engineer

Delve Deep Learning Inc. • Washington, DC, US • 2w ago

If you’re excited about large-scale scraping, ETL, data enrichment, chunking strategies for LLM embeddings, and optimizing AI-driven data retrieval, this is your opportunity to transform mission critical knowledge work.

Description

You’re the kind of engineer who thrives on solving complex data challenges—designing and optimizing data pipelines that fuel AI-driven insights. Whether it’s ingesting massive streams of data, wrangling unstructured data, or optimizing embeddings for fast retrieval, you love making data work at scale. Your curiosity drives you to challenge conventional thinking, and you care deeply about the user experience and strive to build products that empower users.

Now, imagine applying that expertise to a game-changing AI platform that’s transforming how public affairs professionals track issues, anticipate risks, and make smarter decisions.

At Delve Deep Learning (DDL), we’re not just building another AI product—we’re changing the way public affairs professionals work and navigate the world. We need a Data Engineer who can architect robust data ingestion systems, streamline pipelines, and enrich data to power state-of-the-art AI-driven knowledge discovery.

If you’re excited about large-scale scraping, ETL, data enrichment, chunking strategies for LLM embeddings, and optimizing AI-driven data retrieval, this is your opportunity to make a major impact.

A Quick Note: If job descriptions were checklists, most of us would never get hired. If you’re excited about this role but don’t meet every single requirement, that’s okay—If you think you’d be great at this job, we’d love to hear from you.

What You’ll Do

Build and Optimize Data Ingestion & Scraping

Design and maintain large-scale web scraping and data ingestion pipelines.
Implement robust scraping frameworks to collect structured and unstructured data from diverse sources.
Ensure reliable data extraction, deduplication, and normalization.

Build and Scale Data Pipelines

Develop scalable ETL workflows for processing and transforming large datasets.
Automate data ingestion, storage, and retrieval processes.
Optimize pipeline performance for speed, cost efficiency, and reliability.

Enrich and Structure Data for AI Models

Develop data cleaning and enrichment techniques to improve data usability.
Design entity resolution, linking, and metadata augmentation workflows.
Implement data normalization.

Optimize Data Chunking and AI Retrieval

Working with the team, help experiment with chunking strategies to optimize embeddings for vector search.
Support implementing best practices for tokenization, windowing, and text segmentation.
Ensure high-accuracy retrieval performance for AI-powered search and recommendations.

Scale Data Infrastructure and APIs

Design and manage data warehouses and vector databases for fast retrieval.
Implement robust APIs to serve data efficiently to AI models and applications.
Monitor and optimize data pipelines for high availability and performance.

Who You Are

Strong Engineering Background – You have at least 3 years of experience engineering solutions to complex data challenges.
Fluent In Python – You can write python in your sleep.
An Expert in Data Ingestion & Scraping – You have experience with large-scale web scraping frameworks (Scrapy, Selenium, Playwright) and data ingestion pipelines.
A Strong ETL & Data Pipeline Engineer – You’ve built scalable data workflows.
Fluent in SQL & NoSQL – You’re comfortable with Postgres or similar databases.
Experienced in Data Processing & Enrichment – You’ve worked with NLP techniques, entity resolution, or metadata augmentation.
Knowledgeable in Vector Databases & Chunking – You understand embedding chunking strategies and have worked with tools such as FAISS, Pinecone, or PostgresSQL vector DB.
Skilled in Cloud & Infrastructure – You’re experienced with AWS for data engineering workloads.

Bonus Points If You:

Have experience with Django or similar Python web frameworks.
Have built AI-powered SaaS products or large-scale knowledge retrieval platforms.

Why Join Us?

You’ll Help Build the Data Backbone of AI – Your work will power cutting-edge AI-driven insights for public affairs professionals.
Work With A Sharp And Innovative Team – We’re assembling a team of top-notch engineering talent.
We Move Fast And Nimbly – You’ll work in a high-velocity startup where your ideas and execution matter.
Competitive Pay & Strong Benefits – Salary range of $100,000 to $180,000 (based on experience), stock options, health insurance, 401(k) matching, and more.
Hybrid Flexibility – Work a hybrid schedule from our Washington, D.C. office with a fitness center.

If you’re excited about building cutting-edge AI systems for real-world impact, we want to hear from you.

Apply today and help shape the future of AI-driven intelligence.