← Back to jobs

AI Infrastructure

HPC Infrastructure Engineer

Apply →

Bengaluru, India

About Soket AI

Soket is an AI research firm headquartered in Bengaluru with a mission to build efficient and generalized intelligence for humanity. We are focused on advancing frontier AI research through the development of large-scale foundation models in math, code and reasoning that are open, energy-efficient, multilingual, and responsible by design. Funded and supported by the IndiaAI Mission, Government of India. Our work places a strong emphasis on India and the Global South, where access to high-quality AI systems remains limited despite immense linguistic and cultural diversity.


At Soket, we believe the future of AI should be accessible, scalable, and aligned with real-world societal needs. Our teams work across large language models, multimodal systems, speech technologies, reasoning systems, and large-scale AI infrastructure, with a strong focus on open research and practical deployment. We are deeply passionate about pushing the boundaries of AI research while building systems that are useful, trustworthy, and globally impactful.


Annual Salary Range

Rs 10,00,000 - Rs 20,00,000 INR


Workloads you would be involved in

  • Manage and maintain large-scale GPU clusters across on-premise and cloud environments.
  • Administer and troubleshoot Slurm and Kubernetes-based infrastructure and scheduling systems.
  • Monitor cluster health, performance, utilization, and reliability across compute, storage, and networking systems.
  • Diagnose and resolve hardware, networking, storage, and distributed infrastructure issues.
  • Manage distributed storage, high-speed networking, and AI infrastructure systems.
  • Implement and maintain infrastructure security, access control, and cyber resilience measures.
  • Build automation, deployment, and infrastructure management tooling.
  • Support large-scale AI training and inference workloads and resolve distributed training issues.
  • Manage users, quotas, permissions, and multi-tenant resource allocation policies.
  • Maintain operational reliability through incident management, disaster recovery, and capacity planning.
  • Collaborate with research and engineering teams to support scalable AI infrastructure operations.

You are a good fit if you:

  • Have 2-5+ years of experience in infrastructure engineering, systems administration, or related fields.
  • Have a knack for troubleshooting and debugging complex infrastructure issues.
  • Love working with large-scale GPU clusters and distributed systems.
  • Have strong understanding of distributed AI/ML workloads and infrastructure.
  • Love writing long and complex shell scripts (and showoff to your colleagues)

You are a strong candidate if you have experience with:

  • Cluster and networking monitoring tools like nvidia-smi, dcgm, cmsh, ibstat etc.
  • Containerization and virtualization tools like Enroot, Kubernetes, etc.
  • Slurm management with experience in configuration, tuning, and optimization.
  • Distributed storage systems and parallel file systems like WEKA, etc.

Why work with Soket?

At Soket, you will get the chance to work on problems that only a handful of teams in the world are solving today - building frontier foundation models at scale. You will see first-hand how intelligence is baked into large models and work across the entire stack that powers modern AI systems. You will work with supercomputing-scale GPU clusters and tackle challenging problems in petabyte scale data aggregation and processing, distributed training, model architectures, infrastructure, inference optimization, and large-scale AI deployment.

One day you might be debugging CUDA kernels or NCCL issues, another day optimizing throughput for multi-GPU training runs, building new infrastructure tooling, or experimenting with ideas that make training faster and more efficient. We are a deeply research-driven and engineering-focused team that loves nerding out about systems, scaling laws, training stacks, and AI research. If you enjoy going deep into technical problems and learning from highly talented researchers and engineers, you will feel right at home here. Most importantly, we are building efficient, open, and accessible AI systems for India, the Global South, and ultimately for humanity as a whole.

If this sounds exciting to you, come build the future with us.

Apply Now!

Soket AI Labs is a research-first AI company headquartered in Bengaluru. We are an equal opportunity employer and strongly encourage applications from people of all genders, backgrounds, and ethnicities. We offer competitive compensation, equity participation opportunities, flexible work arrangements across office and remote settings, comprehensive leave policies including parental and wellness leaves, and regular team offsites designed to foster collaboration and innovation.

As an AI-native organization, we use AI systems as part of our candidate assessment and interview processes. Please make sure your resume aligns with the job description. More details about how candidate data is processed and used will be available on the application page.