Hire me

Senior AI/ML & Data Engineering leader available for hire.

10+ years architecting data platforms and shipping scalable systems. 4+ years putting GenAI into production at Fortune 50 scale from streaming pipelines and warehouses to RAG and LLM evaluation. Available for full-time, fractional, and project work starting this quarter.

Available

Get a senior IC who ships, evaluates, and scales.

I build the unglamorous parts that make AI products work in production retrieval, evals, streaming pipelines, and the CI that keeps quality from regressing.

Download résumé (PDF) Book an intro call Email me

San Francisco, USA · US-remote OK Replies within 24h LinkedIn

10+

years engineering

years GenAI in prod

30%

forecast accuracy lift

F50

production scale

Ways to work together

Pick the shape that fits your team.

Full-time

Senior / Staff IC or Tech Lead

Senior AI/ML, data engineering, LLM platform, or applied research roles. US-remote or Bay Area on-site.

W-2 or contract-to-hire
Senior, Staff, or Lead level
AI/ML, data, or platform

Fractional Tech Lead

10–20 hrs / week

Embed with your team to own RAG, evals, or the ML platform roadmap end-to-end.

Architecture + hands-on code
Hiring & mentorship
Monthly retainer

Project / Consulting

4–12 week scopes

Ship a production RAG system, eval harness, or migrate batch to streaming.

Fixed scope & deliverables
Code you own
Knowledge transfer included

Advisory

Monthly retainer

Architecture reviews, hiring panels, and a direct line for your engineering leaders.

2 calls / month
Async Slack/email
Design & PR reviews

Available · Q2 2026

Open to new roles this quarter

Currently accepting full-time, part-time, and consulting engagements starting this quarter.

Next opening: Available nowCapacity: 1–2 slots openOpen to: Full-time · Part-time · ConsultingLast updated: 2026-05-18

Case studies

Proof of impact, not just titles.

A few representative engagements with the problem, what I built, and the measurable outcome.

Fortune 50 · Retail

RAG-powered supply-chain forecasting

Problem · Planners relied on stale dashboards and intuition; forecasts drifted weekly.

What I built · Built an end-to-end RAG forecasting engine grounding LLM responses in fresh telemetry, with golden-set evals gating every release.

+30%

forecast accuracy

<5 min

data freshness

100%

release gating

PythonFastAPILangChainPyTorchKafkaSparkKubernetesGCP

Under NDA details on request

Fortune 50 · AI Platform

Production LLM evaluation harness

Problem · LLM features regressed silently between releases; no one trusted the metrics.

What I built · Designed an offline + online eval pipeline with golden sets, faithfulness and contradiction scoring, LLM-as-judge, and CI release gates.

silent regressions

20+

release gates

eval turnaround

PythonLLM evalsCI/CDKubernetes

Under NDA details on request

Big Data · Enhance IT

Batch → streaming data platform

Problem · Multi-hour batch jobs blocked analytics and ML feature freshness.

What I built · Re-architected pipelines on Spark Structured Streaming + Kafka with Airflow orchestration across multiple RDBMS, HDFS, and Hive.

Hours → min

latency

92%

ML model accuracy

Multi-source

RDBMS + HDFS + Hive

PySparkKafkaAirflowHivePower BI

Under NDA details on request

iLED Collections

Connected LED wearables + storefront

Problem · No off-the-shelf way to push live pixel art and messages to wearable LEDs.

What I built · Designed the firmware/app BLE protocol, built the React Native companion app, and shipped the e-commerce storefront and fulfillment.

Live

DTC brand

BLE

device ↔ app

End-to-end

hardware + software + ops

React NativeBLENodeWooCommerce

Visit project

What you get

Hands-on across the AI/data stack.

Production RAG

Hybrid retrieval, reranking, grounding, and streaming wired into real systems, not demos.

LLM evaluation

Golden sets, faithfulness/contradiction scoring, and release-gating CI you can trust.

Streaming data

Spark Structured Streaming + Kafka pipelines that move minutes-fresh data to ML.

0→1 product

I run an AI product studio I ship features, not just slides.

Tech leadership

Architecture reviews, mentorship, hiring panels, and stakeholder comms.

Research depth

Ph.D. candidate on long-term LLM memory; peer-reviewed publications.

Ideal engagements

Teams putting their first LLM feature into production
Existing RAG systems that hallucinate, drift, or can't be evaluated
Batch data platforms that need to go real-time
Engineering orgs hiring senior AI/ML or data engineering talent and need an interim lead
Teams shipping AI products who want a senior partner on the build

Probably not a fit

Pure prompt-engineering gigs with no engineering depth
"Add AI" projects with no defined problem or success metric
Crypto / surveillance / adversarial use cases

Track record

Where I've shipped.

Full work history

Senior Software Engineer · Walmart

Jul 2021 Present

Own end-to-end ingestion and processing for petabyte-scale data pipelines streaming, batch, and backfill feeding analytics and ML for hundreds of internal consumers.
Led the migration from batch to near-real-time ingestion with Spark Structured Streaming and Kafka, cutting data latency from hours to under 5 minutes and lifting freshness across dashboards and models.
Built ETL and backfill frameworks in Spark, PySpark, Hive, and Airflow with idempotency, deduplication, and replay cutting reprocessing effort by 30% and improving correctness under late-arriving data.
Re-architected critical pipelines via partitioning, bucketing, broadcast joins, and resource tuning +40% throughput, −15% compute cost.
Built governed data lake foundations on GCP and Azure with standardized schemas and access controls; tuned complex OLAP SQL / HiveQL queries that cut heavy report runtimes by 50%.
On top of that platform, shipped a RAG-powered supply-chain forecasting engine grounding LLM responses in fresh telemetry +30% predictive accuracy and an LLM evaluation pipeline that gates every release on faithfulness and contradiction scores.

Big Data Engineer · Enhance IT

Sep 2020 Jun 2021

Designed and built real-time and batch data pipelines with PySpark, Spark Structured Streaming, Flume, NiFi, Kafka, Airflow, and Hive supporting analytics and ML at scale.
Installed, configured, and operated end-to-end Apache Spark pipelines (Python and Scala) integrated with multiple RDBMSes, HDFS, and Hive tuned for throughput and reliability.
Ingested structured, semi-structured, and unstructured data into HDFS across AVRO, ORC, Parquet, CSV, and JSON; synchronized data from MySQL, PostgreSQL, and SQL Server into HDFS and back out to downstream RDBMSes.
Orchestrated and scheduled the entire Spark pipeline in Airflow with modular, retryable DAGs; layered Spark SQL and Hive for unified analytics across internal and external tables.
Trained supervised ML models reaching 92% accuracy on curated pipeline output and shipped executive analytics dashboards in Power BI, Grafana, and Tableau.

Graduate Research Assistant · University of Kentucky

Jan 2019 Aug 2020

Built end-to-end ETL data pipelines for research data processing ingestion, cleaning, feature extraction, and dataset versioning to support reproducible ML experiments.
Designed experiments, collected and curated datasets, and ran exploratory analysis on FTIR spectroscopy and related sensor data.
Prototyped ML models for classification, regression, and prediction; performed rigorous evaluation, error analysis, and result visualization.
Deployed and integrated ML models with Pickle, Joblib, and MLOps workflows on AWS and Azure, exposing inference endpoints for downstream lab tooling.
Shipped Glutini (glutini-res.com), a TensorFlow Lite mobile app for on-device gluten detection from spectral scans, as the M.Sc. research deliverable.

Let's talk about your role or project.

Send the JD, the problem, or the codebase. I reply within a day with a candid take on fit, scope, and what I'd do first.

Download résumé Book a 15-min intro okekeag@gmail.com