We're hiring a senior ML engineer (in person, in San Francisco) to join our small team of A+ people and build the foundations of Weights.
Why join
As a small team, we work in a highly collaborative environment and you'll have the opportunity to participate in every part of the business from idea to production.
Impact: Build the foundation and shape engineering practices, team, and company culture.
Excellence: Practice your craft with other ICs in a well-organized, fast-paced environment.
Ownership: Influence the direction of product and strategy — we care about your opinions.
What you'll do (responsibilities)
We're looking for an experienced individual contributor who enjoys working alongside other experienced engineers to quickly build and iterate on our ML serving infrastructure.
- Deploy at scale. Model training and inference are fully owned and controlled by us - we deploy the majority of our models in house, and have full control over them. We maintain multiple clusters across AWS, GCP, and Azure, encompassing hundreds of GPUs. As we build out our stack, you get to make critical choices and lay the foundations.
- Self-direct your work and co-own the product. You're a technical founder type and will have autonomy and responsibility. You'll be involved in shaping the roadmap and will own Cron's exploding backend needs.
- Optimize performance. You're adept at optimizing the performance of ML models. GPUs are expensive, and you understand we need to squeeze as much out of them as possible. You have a passion for eeking out gains, and understand where the low hanging fruit is.
- Create a robust and scalable backend. Build
PostgreSQL
database models, performant REST APIs with Redis
cache, optimized ML serving infra, and deploy on kubernetes.
- Solve interesting technical problems. Bring your full creativity to solve super-challenging technical problems: from complex backend architecture with 3rd party integrations, syncing app state, to real-time collaboration.
What we're looking for (qualifications)
You're a senior IC that has built such systems before and this is not an area you have to ramp up on. We don't require any formal qualifications but value learning new skills — especially from one another. We are looking for someone that feels a sense of duty to the users of their work.
- Experienced ML engineer. You have experience writing and optimizing ML pipelines from scratch. Pytorch comes second nature to you - you might not necessarily be writing raw CUDA kernels every day, but you know your way around a clean nn.Module.
- Highly productive while producing quality code. You enjoy pushing out features in a pragmatic and maintainable way. You know when to use duct tape and when to lay a foundation.
- Curious and quick learning. We don't expect you to have experience in every technology we use, but to learn and be productive quickly. Owning several repos and jumping into all of them doesn't scare you.
- Attention to detail while pragmatic. We strive for few slips in code, Git hygiene, and clear written communication — all while remaining low-ego and simply focusing on solutions.
- Autonomous - you can be dropped into the existing infrastructure and be able to contribute on day 1