ge gao

Staff AI/ML Engineer & Multi-Agent Intelligence & MLOps·Databricks

United States

Master's degree, Computer Science

Work experience

Total years of experience: 12 years, 7 months

Staff AI/ML Engineer & Multi-Agent Intelligence & MLOps

April 2022 - Present

Databricks

South San Francisco, United States •Hybrid

April 2022 - Present

•Architected Databricks’ internal multi-agent orchestration framework using
LangGraph and LangChain, automating datasetvalidation, code generation, and
metadata reasoning within workspace environments.
•Designed retrieval-augmented generation (RAG) pipelines for intelligent
documentation and code understanding usingcustom embeddings, Pinecone, and
asynchronous retrieval agents.
04/2022 - Present
San Francisco, CA
•Built RLHF fine-tuning infrastructure using PPO, LoRA/qLoRA, MLflow, Ray, and
DeepSpeed ZeRO3, enabling distributedGPU optimization for enterprise-grade
conversational models.
•Architected and deployed a real-time personalized recommendation system to
power contextual content and featurediscovery within Databricks’ internal AI
workspace.
•Implemented a four-stage recommendation pipeline (candidategeneration, scoring,
ranking, re-ranking) achieving sub-100ms inference latency for adaptive
personalization.
Core Distributed Systems & Data Infrastructure & Data Science
•Re-architected Databricks’ unified data storage layer, enabling petabyte-scale, lowlatency access across AWS, Azure, and GCP.
• Built metadata caching and transaction log compaction services, reducing query
latency by 35% and S3 costs by 20%.
• Developed Raft-based coordination subsystems for cluster-wide synchronization of
pipelines and notebook execution.
• Migrated Spark orchestration into a Kubernetes-native control plane, improving job
resiliency and reducing startup times by 50%.
• Built a Rust + Go file indexing service integrated into Unity Catalog, enabling
millisecond-level lookups for billions of files.
• Standardized observability practices with Prometheus, OpenTelemetry, and Grafana,
defining SLO frameworks for global
reliability metrics.

Company industry:: Computer Hardware & High-Tech Manufacture

Staff/Machine Learning Engineer & Data Scientist & Software Engineer

April 2021 - April 2022

Google

California, United States •Hybrid

April 2021 - April 2022

•Core developer of Google Cloud Storage backend, owning key components of object
lifecycle management and namespace consistency.
•Engineered the Cross-Region Replication (CRR) pipeline to synchronize object
mutations across clusters with deterministic replay ordering.
04/2021 - 04/2022
Sunnyvale, CA
•Spearheaded optimization of metadata lookup and I/O scheduling, reducing RPC
latency by 40% and increasing throughput under concurrent workloads.
•Designed distributed sharding and placement algorithms for metadata catalogs,
mitigating hot partitions and improving load balance across storage nodes.
•Delivered data path optimizations with the networking team, improving hybrid
transfer performance between GCS and Compute Engine.
•Created developer tooling for live debugging, replay simulation, and partial rollback,
dramatically improving postmortem analysis capabilities.
•Advocated for and led internal adoption of observability and automation tooling,
improving reliability KPIs and reducing mean-time-to-detect.

Company industry:: IT Services

Senior AI Software Engineer

March 2014 - April 2021

Google

California, United States •Hybrid

March 2014 - April 2021

•Improved the Cross-Region transfer by 7, 200 reports in-memory replicas mutation
across resources content to determine the vector ordering.
•Spun resources in transactional metadata impact and SQL schema reducing R/W
latency by 42% and increasing production under attachment workflows.
03/2014 - 04/2021
Sunnyvale, CA
•Designed data redundancy and placement algorithms for metadata catalogs,
mitigating hot partitions and improving backbone and corner case nodes.
•Delivered data in merge in transactions with networking team improves on hybrid
transfer performance between GCS and Compute Engine.
•Creates diverse down-scaling for fine debugging, representation, and partition
lattices dramatically improving partition memory deployment lines.
•Achieved bonus over well beyond adoption of observability metadata determining
trending, improving repeatability, IP based memory mean time to deliver lines

Company industry:: IT Services

Software Engineer Intern

May 2013 - August 2013

Google

California, United States •Hybrid

May 2013 - August 2013

Contributed to Spanner SQL transaction commit latency and replication
subsystems, improve metadata election stability and performance under latency
priority.
•Implemented Python frameworks for consensus service adrift strengthen top line
rates and true line guarantees for global transaction ordering.

Company industry:: IT Services

Education

Carnegie Mellon University

May 2013

Master's degree, Computer Science

United States

Peking University

June 2012

Bachelor's degree, Computer Science

China

ge gao

Share My Profile

Work experience

Staff AI/ML Engineer & Multi-Agent Intelligence & MLOps

Staff/Machine Learning Engineer & Data Scientist & Software Engineer

Senior AI Software Engineer

Software Engineer Intern

Education

Carnegie Mellon University

Peking University