Job Summary

Join Cloudera’s Anywhere Cloud team as a Staff Software Engineer to lead the architecture and delivery of our cloud‑native AI platform. You will bridge cutting‑edge AI research and production‑grade Kubernetes environments, design and implement scalable AI services, orchestrate inference servers, build internal tooling, and develop RAG pipelines.

Key Responsibilities

Design and implement scalable application services (Go/Node.js) that wrap AI capabilities for enterprise use.
Lead the deployment of inference servers (vLLM, Triton) using KServe, KubeRay, or Knative to ensure serverless‑style scaling for AI workloads.
Build internal tooling, SDKs, and AI gateways to enhance team agility and simplify integration of foundation models.
Architect robust Retrieval‑Augmented Generation (RAG) pipelines and prompt management services that integrate with vector databases and enterprise data sources.
Collaborate with UI, UX, and product management to ensure the AI platform is powerful and highly usable for internal developers.
Ensure AI workloads are secure, multi‑tenant, and optimized for GPU resource scheduling (MIG, fractional GPUs) within Kubernetes.

Qualifications

Bachelor’s degree with 6+ years of software engineering experience (or equivalent Masters/PhD tenure), at least 2+ years focused on AI/ML systems.
Expert proficiency in Python for the AI ecosystem and strong competence in a systems language like Go or Rust/C++ for high‑performance serving layers.
Deep understanding of LLM deployment challenges and runtimes (vLLM, ONNX, TorchServe, Triton), familiarity with quantization techniques (AWQ, GPTQ).
Experience building complex workflows using tools like LangChain or LlamaIndex, and deploying them on Docker/Kubernetes.
Ability to navigate the rapidly changing AI landscape, filter hype from practical engineering solutions, and drive technical alignment across teams.

Desired Additional Experience

Model fine‑tuning techniques (PEFT, LoRA/QLoRA) on custom datasets.
GPU optimization: familiarity with CUDA programming or GPU performance profiling (Nsight systems).
Open‑source contributions to AI projects (HuggingFace transformers, vLLM, etc.).

Benefits

Generous PTO policy
Unplugged days to support work‑life balance
Flexible WFH policy
Mental & physical wellness programs
Phone and Internet reimbursement
Access to career development and professional growth
Competitive compensation and comprehensive benefits
Paid volunteer time
Employee resource groups

This role is not eligible for immigration sponsorship.

EEO/VEVRAA

#J-18808-Ljbffr

Apply for this job

Staff Software Engineer , Anywhere Cloud - AI Systems & Runtimes