Job#: 2050704
Job Description:
Our client is looking for a HPC Linux Systems Administrator to join their team to support the National Oceanic and Atmospheric Administration (NOAA), Research and Development High Performance Computing Systems (RDHPCS) customer at the NOAA Global Systems Laboratory in Boulder, Colorado.The qualified candidate will bring their hands-on technical and system administration expertise on-site to maintain the operational readiness and availability of NOAAs high performance computing systems, manage and support new technology insertions, and provide remote technical support and collaboration with our other supported NOAA sites at Fairmont, West Virginia and Princeton, New Jersey.
We are looking for an individual to join our clients team to deploy, operate, and support leading-edge technology for NOAA RDHPCS. Specific technology training will be provided.
We think. We act. We deliver. There is no challenge we can’t turn into an opportunity.
HOW A SYSTEMS ENGINEER ADVISOR WILL MAKE AN IMPACT
- Apply current systems administrative skills.
- Learn and deploy new technologies.
- Develop and deploy monitoring capabilities.
- Develop and implement tools for cluster administration.
- Provide technical support with a team of HPC System & Storage Administrators to resolve operational issues.
- Independent problem solving and troubleshooting to quickly advance towards viable resolutions.
- Perform hardware break/fix support, which may include node, blade, or board-level replacements, replacement of backplanes, failed DIMMs, hard drives, controller boards, failed cables, network switches, and other failed components.
- Manage and maintain spare part inventories.
- Perform tracking, shipping, and receiving of vendor RMAs.
- Develop, improve, and enhance user and system administration online documentation repositories.
- Support HPC system users by leveraging the helpdesk ticketing system.
WHAT YOU’LL NEED TO SUCCEED:
? Education:
- Bachelor’s degree or 8+ years of experience.
- Experience with Systems Administration or IT support with diverse responsibilities.
? Required Technical Skills:
- Hands-on experience with computer hardware maintenance and troubleshooting, such as identifying and replacing failed processors, DIMMs, disk drives, PCIe cards, and other field-replaceable components.
- Programming or scripting knowledge in at least one language (e.g., Bash, Perl, Python).
? Required Skills and Abilities:
- Demonstrated experience deploying and managing large-scale HPC systems using OS provisioning tools (e.g., xCAT, Warewulf).
- Demonstrated experience using configuration management tools (e.g., Ansible, Puppet).
- Linux system administration experience (e.g., RedHat or Rocky Linux).
- Batch management/scheduling experience, Slurm preferred.
- Network interconnect configuration and monitoring experience (e.g., InfiniBand, Ethernet).
- Strong writing skills for technical documents, system procedures, user wiki’s and FAQs.
- Team player with the ability to work with a diverse team in both local and remote technical support environments.
- Resourceful with initiative to perform independent technical troubleshooting and identify/recommend solutions and improvements.
- Willingness and motivation to learn, grow, and retain and apply knowledge acquired towards future projects.
- Disciplined troubleshooting skills balanced with creative problem-solving skills to tackle highly complex large-scale technical problems.
- Attention to detail in areas such as time management, pre-planning, analytical thinking, observation, and active listening.
? Preferred Skills: **Preferred skills, keep in mind that what you post here may limit your applicant pool**
? Location: Remote, Hybrid, On Customer Site (Boulder, CO)
? US Citizenship or Green Card Required
EEO Employer
Apex Systems is an equal opportunity employer. We do not discriminate or allow discrimination on the basis of race, color, religion, creed, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), age, sexual orientation, gender identity, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, disability, status as a crime victim, protected veteran status, political affiliation, union membership, or any other characteristic protected by law. Apex will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable law. If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation in using our website for a search or application, please contact our Employee Services Department at [email protected] or 844-463-6178.
Apex Systems is a world-class IT services company that serves thousands of clients across the globe. When you join Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career resources, training, certifications, development opportunities, and a comprehensive benefits package. Our commitment to excellence is reflected in many awards, including ClearlyRated's Best of Staffing® in Talent Satisfaction in the United States and Great Place to Work® in the United Kingdom and Mexico.