Edge Inference Lead
On-Device Model Deployment
Specializes in deploying quantized LLMs onto NPUs, mobile SoCs, and microcontrollers. Deep expertise in TensorRT, ONNX, and custom CUDA kernels for sub-10ms inference.
TensorRTCUDANPUEdge AI
Agentic Systems Architect
Multi-Agent Orchestration
Designs autonomous agent frameworks with tool-use, persistent memory, and multi-step planning. Built agent platforms deployed across financial, legal, and enterprise verticals.
LangGraphAutoGenMCPRAG
Token Economics Architect
Cost Optimization & Efficiency
Reduces LLM serving costs 40–75% through prompt compression, KV-cache strategies, speculative decoding, and adaptive batching. Saved $4M+ annually across client deployments.
vLLMKV-CacheSpec. Decode
MLOps Engineer
ML Pipelines & Infrastructure
Architects end-to-end ML platform infrastructure — feature stores, model registries, training pipelines, and serving stacks. Expert in Kubeflow, MLflow, and Ray on Kubernetes.
KubeflowMLflowRayK8s
AI CI/CD Specialist
Model Lifecycle & GitOps
Implements GitOps-native model promotion workflows with automated evaluation gates, canary deployments, shadow mode testing, and zero-downtime rollbacks for production LLMs.
ArgoCDGitHub ActionsDVCHelm
Model Compression Specialist
Quantization & Distillation
PhD-level expertise in post-training quantization, knowledge distillation, and structured pruning. Achieves GPT-4 class accuracy in models 10× smaller for specialized domains.
QLoRAGPTQAWQDistillation
Observability Engineer
LLM Monitoring & Tracing
Builds production observability stacks for AI systems — token-level tracing, latency heatmaps, cost dashboards, and ML-based anomaly detection integrated with existing DevOps tooling.
OpenTelemetryGrafanaPrometheus
RAG & Knowledge Systems
Retrieval & Memory Architecture
Architects production RAG systems with hybrid dense-sparse search, GraphRAG, and long-term agent memory. Built knowledge pipelines ingesting 100TB+ corpora for pharma and legal clients.
GraphRAGWeaviatePinecone