HPC Systems Engineer Job at Applied Digital

Applied Digital Dallas, TX 75219

Job Summary

The HPC Systems Engineer will be a subject matter expert in High Performance Computing (HPC) Infrastructure and storage in conjunction with a team responsible for engineering, deploying, and supporting HPC based clusters and data centers. You will provide technical guidance to the team and perform activities required to design, build, support, and automate large, complex High-Performance Compute data center server systems.

The HPC Systems Engineer will also own Applied Digital's data center IT systems for existing cryptocurrency mining which includes design and engineering decisions. Additionally, this role will lead a team of systems engineers at data centers across North America with technical architecture and management responsibilities.

Primary Job Duties

  • Tend and observe equipment and machinery to verify efficient and safe operation.
  • Architect systems based on customer requirements, budgets, timelines, and parts availability
  • Design and implement scalable systems, software, and architectures
  • Support existing teams and operations
  • Enhance efficiency, robustness, and scalability
  • Lead capacity planning to help determine compute and storage
  • Own job scheduler, such as SLURM, including configuration, optimization, and advanced features
  • Plan customer dataset storage and systems to support their requirements
  • Optimize and troubleshoot complex ML/AI jobs and pipelines
  • Apply in-depth HPC and Linux expertise to collaborate with stakeholders across IT and domain disciplines to expand HPC use cases
  • Evaluate, analyze, and integrate HPC technologies such as job schedulers, high performance interconnects, networked filesystems, cybersecurity, cluster management, virtualization, networking, performance tuning, and data center planning
  • Act as the senior engineer assessing innovative technologies and integrates existing commercial and open-source software solutions
  • Work closely with Network team to define and design network requirements for systems environments


Education and Experience

Minimum Bachelor of Science degree in Computer Science Engineering or a related study. Advanced degree preferred.

Authorized to work in U.S.

  • Architecting, developing, deploying, and operating large scale distributed systems at scale
  • System, datacenter, or DevOps engineer in a complex HPC datacenter environment
  • Experience with Job Schedulers for High Performance Computing (HPC) systems, including consideration of resilience, memory, scalability, and central processing unit (CPU) footprint
  • Experience doing performance analysis studies of software and applications on HPC system architectures
  • Working with cloud technologies: Kubernetes (K8s), Docker, microservices, etc
  • Implementing and supporting High-Performance Compute (HPC) Clusters
  • Experience with Virtualization, Windows, and Linux-based operating systems
  • Experience with various Processor architectures (e.g., CPU, GPU, FPGA)
  • Experience with assorted Memory architectures (e.g., DRAM, DDR, HBM, persistent memories)
  • Experience with large-scale storage and filesystems (e.g., Flash, NVMe, HDD)
  • Enterprise software development and processes necessary to communicate with data scientists, ML engineers and effectively orchestrate large scale server clusters (Python, Shell scripting, etc)
  • Experience with Open Cloud Platform (OCP) a plus
  • Knowledge of systems management, logging, and monitoring systems
  • Demonstrated networking knowledge in all OSI network layers


PI199996202




Please Note :
apexdining.ca is the go-to platform for job seekers looking for the best job postings from around the web. With a focus on quality, the platform guarantees that all job postings are from reliable sources and are up-to-date. It also offers a variety of tools to help users find the perfect job for them, such as searching by location and filtering by industry. Furthermore, apexdining.ca provides helpful resources like resume tips and career advice to give job seekers an edge in their search. With its commitment to quality and user-friendliness, Site.com is the ideal place to find your next job.