Meta Careers Jobs

Job Information

Meta Software Engineer, Infrastructure in Singapore

Summary:

The MRS ML Infra team will be focusing on ML Infra performance and efficiency for both large scale AI training and inference workflows in the recommendation domain. In this role, the engineer works on optimizing the e2e stack for model training and inference for large scale recommendation models. The opportunities are from distributed systems, to model/system co-design, to GPU system optimizations. We are looking for someone who has previous experiences on high performance infrastructure and performance optimization. We need the candidate to not only identify and lead the execution for short/mid term opportunities for perf/efficiency optimization, but also drive long term strategies on things like model/system co-design, performance automation, etc.

Required Skills:

Software Engineer, Infrastructure Responsibilities:

  1. Hands on driving performance and efficiency optimizations by identifying and delivering the large optimizations across MRS models and systems.

  2. Drive XFN collaborations and alignments with multiple partner or product ML teams.

  3. Lead technical directions and roadmap for the SGP perf and efficiency team.

  4. Providing mentorship and guidance to grow junior engineers on the team

Minimum Qualifications:

Minimum Qualifications:

  1. BS/MS in Electrical Engineering, Computer Science or a related field or equivalent experience.

  2. 7+ years of experience on AI Infra or System performance.

  3. Hands on experiences on deep system performance optimization, for example, distributed systems, or high performance GPU/GPU systems, or memory/cache optimizations.

  4. Strong written and verbal communication skills to align XFN and driving team execution

  5. Previous experiences on mentoring and growing junior engineers as either a tech lead or a manager.

  6. Strong debugging skills in complex systems that are across multiple components or sub-systems.

Preferred Qualifications:

Preferred Qualifications:

  1. Hands on experiences on large scale AI infra system (for example, GPU training system)

  2. Experiences on large models training and inference such as LLM or recommendation models.

  3. Experiences in high performance computing including communication optimization, CUDA kernel optimization, distributed training and inference, etc.

Industry: Internet

DirectEmployers