Lead AI Inference Systems Engineer
Join an early-stage start-up building the future of AI hardware. Working alongside world-class engineers and leading AI labs, you'll help design workload-specialised accelerators that push the boundaries of AI inference performance.
This is a rare opportunity to work across the full hardware-software stack, shaping how frontier AI models execute on next-generation silicon.
About the Role
You'll work at the intersection of AI models, compiler and runtime systems, kernel optimisation and computer architecture. From analysing real-world AI workloads to influencing silicon design, you'll play a key role in building accelerators designed around the demands of modern foundation models.
What You'll Do
- Analyse AI inference workloads to identify performance bottlenecks across compute, memory and data movement
- Optimise models through quantisation, kernel fusion, graph optimisation and low-precision execution
- Build performance models and workload analysis tools to guide hardware design
- Collaborate with architecture, compiler and RTL teams to define next-generation accelerator features
- Develop kernels, profiling tools and reference implementations to evaluate performance
- Use AI agents and automation to accelerate architecture exploration and optimisation
What You'll Bring
- Experience optimising AI inference systems or ML infrastructure
- Strong understanding of AI models, compiler/runtime systems and computer architecture
- Experience with performance analysis, memory optimisation and workload profiling
- Strong Python and C++ development skills
- Experience with PyTorch or modern ML frameworks
- Comfortable working across software and hardware in a fast-moving start-up
Nice to Have
- AI accelerator architecture or hardware-software co-design experience
- Compiler, runtime or kernel optimisation expertise
- Quantisation, graph lowering or memory planning experience
- Experience building performance models or simulation tools
