US Citizens/Green Card only
Work on cutting-edge AI infrastructure, contributing to next-generation high-performance compute systems supporting enterprise-scale machine learning and scientific workloads.
Overview
Seeking a Runtime Engineer to build high-performance system software powering large-scale AI and ML workloads. This role focuses on developing low-level infrastructure that maximizes hardware efficiency and enables scalable, distributed compute across enterprise AI platforms.
Key Responsibilities
Core Engineering
- Design and develop runtime stack features for high-performance ML training and inference
- Build system-level software including drivers, kernel integrations, and OS interfaces
- Develop high-performance user-space libraries for optimal hardware utilization
Distributed Systems & Performance
- Enable scalable data processing across distributed environments
- Optimize networking, communication, and workload orchestration across nodes
- Improve system performance, reliability, and observability
Tooling & Ecosystem
- Build user-facing tools for profiling, debugging, and system management
- Support orchestration, monitoring, and error handling across compute systems
- Collaborate cross-functionally with hardware, ML, compiler, and DevOps teams
Profile
- 3–5 years of experience in systems or infrastructure engineering
- Strong programming skills in C/C++ and Python
- Experience with operating systems, kernel development, and user-space libraries
- Background in distributed systems with focus on scalability and performance
Preferred Strengths
- Familiarity with high-speed interconnects (e.g., PCIe, InfiniBand, RoCE)
- Experience with low-latency networking (e.g., RDMA)
- Strong debugging, optimization, and collaboration skills
