Senior GPU Systems Developer
Role:
We’re looking for a Senior GPU Systems Developer to work on high-performance compute infrastructure for advanced AI workloads. This role focuses on extracting maximum efficiency from modern GPU systems, improving communication and execution performance across distributed environments, and contributing to low-level optimization efforts within a fast-paced engineering team.
Responsibilities:
- Write and optimize custom CUDA kernels from scratch for performance-critical workloads
- Develop and optimize GPU-accelerated compute pipelines for performance and scalability
- Improve inter-device communication efficiency and distributed execution workflows
- Analyze and resolve bottlenecks related to memory usage, latency, and throughput
- Collaborate closely with systems, compiler, and infrastructure engineers on performance-critical components
Experience:
- 3+ years of experience working with CUDA and GPU performance optimization
- Strong understanding of GPU architecture, memory systems, and parallel execution models
- Experience with distributed or multi-GPU systems and synchronization concepts
- Familiarity with Triton or similar GPU programming frameworks is a plus
