Job Summary
Join Cloudera’s Anywhere Cloud team as a Staff Software Engineer to lead the architecture and delivery of our cloud‑native AI platform. You will bridge cutting‑edge AI research and production‑grade Kubernetes environments, design and implement scalable AI services, orchestrate inference servers, build internal tooling, and develop RAG pipelines.
Key Responsibilities- Design and implement scalable application services (Go/Node.js) that wrap AI capabilities for enterprise use.
- Lead the deployment of inference servers (vLLM, Triton) using KServe, KubeRay, or Knative to ensure serverless‑style scaling for AI workloads.
- Build internal tooling, SDKs, and AI gateways to enhance team agility and simplify integration of foundation models.
- Architect robust Retrieval‑Augmented Generation (RAG) pipelines and prompt management services that integrate with vector databases and enterprise data sources.
- Collaborate with UI, UX, and product management to ensure the AI platform is powerful and highly usable for internal developers.
- Ensure AI workloads are secure, multi‑tenant, and optimized for GPU resource scheduling (MIG, fractional GPUs) within Kubernetes.
- Bachelor’s degree with 6+ years of software engineering experience (or equivalent Masters/PhD tenure), at least 2+ years focused on AI/ML systems.
- Expert proficiency in Python for the AI ecosystem and strong competence in a systems language like Go or Rust/C++ for high‑performance serving layers.
- Deep understanding of LLM deployment challenges and runtimes (vLLM, ONNX, TorchServe, Triton), familiarity with quantization techniques (AWQ, GPTQ).
- Experience building complex workflows using tools like LangChain or LlamaIndex, and deploying them on Docker/Kubernetes.
- Ability to navigate the rapidly changing AI landscape, filter hype from practical engineering solutions, and drive technical alignment across teams.
- Model fine‑tuning techniques (PEFT, LoRA/QLoRA) on custom datasets.
- GPU optimization: familiarity with CUDA programming or GPU performance profiling (Nsight systems).
- Open‑source contributions to AI projects (HuggingFace transformers, vLLM, etc.).
- Generous PTO policy
- Unplugged days to support work‑life balance
- Flexible WFH policy
- Mental & physical wellness programs
- Phone and Internet reimbursement
- Access to career development and professional growth
- Competitive compensation and comprehensive benefits
- Paid volunteer time
- Employee resource groups
This role is not eligible for immigration sponsorship.
EEO/VEVRAA
#J-18808-Ljbffr