Member of Technical Staff - ML Training Systems

Modal · New York, San Francisco · Posted 4d ago

$150k-$350k

Salary

staffmlgpu, container, linux kernel, file systems

What this role requires

The specifics that matter for software engineering roles, at a glance.

Salary	$150k-$350k
Seniority	staff
Focus	ml
Experience	5+ yrs
Languages	torch
Frameworks	huggingface, verl, slime
Stack	gpu, container, linux kernel, file systems
Remote	onsite

Description

About Us:

Modal provides the infrastructure foundation for AI teams. With instant GPU access, sub-second container startups, and native storage, Modal makes it simple to train models, run batch jobs, and serve low-latency inference. We have thousands of customers who rely on us for production AI workloads, including Lovable, Scale AI, Substack, and Suno.

We're a fast-growing team based out of NYC, SF, and Stockholm. We've hit 9-figure ARR and recently raised a Series B at a $1.1B valuation. Our investors include Lux Capital, Redpoint Ventures, Amplify Partners, and Elad Gil.

Working at Modal means joining one of the fastest-growing AI infrastructure organizations at an early stage, with many opportunities to grow within the company. Our team includes creators of popular open-source projects (e.g. Seaborn, Luigi), academic researchers, international olympiad medalists, and experienced engineering and product leaders with decades of experience.

The Role:

We are looking for strong engineers with experience training production machine learning models. If you are interested in contributing to open-source projects and evolving Modal's infrastructure to train the next generation of language models, we'd love to hear from you!

Requirements:

5+ years of experience writing high-quality, high-performance code.
Experience working with torch and high-level training frameworks (Huggingface, verl, slime)
Experience with ML training optimization (tell us a story about eliminating data loading bottlenecks, overlapping communications with compute, rewriting a trainer to handle off-policy rollouts, etc.)
Nice-to-have: familiarity with low-level operating system foundations (Linux kernel, file systems, containers, etc).
Ability to work in-person, in our NYC or San Francisco office.
Ability to participate in on-call rotation and respond to production incidents.

About Modal

Modal is a serverless GPU and Python compute platform for AI inference and pipelines.