Join us and make YOUR mark on the World!
Come join Lawrence Livermore National Laboratory (LLNL) where we apply science and technology to make the world a safer place; now one of 2020 Best Places to Work by Glassdoor!
We have an opening for a High Performance Computing (HPC) System Engineer to support HPC clusters, including numerous high-speed, multi-petabyte Lustre file systems comprised of Linux servers and high performance RAID arrays all connected via Ethernet and Infiniband SANs. You will independently contribute to technical projects using creativity and imagination. This position is in the Livermore Computing (LC) Division within the Computing Directorate, supporting the LC Supercomputing Center.
This position will be filled at either the SES.2 or SES.3 level depending on your qualifications. Additional job responsibilities (outlined below) will be assigned if you are selected at the higher level.
- Provide system administration support for Linux-based HPC, Network Attached Storage (NAS) systems, Infrastructure and Parallel file systems servers and clusters.
- Participate in the design and implementation of multiple Linux-based HPC, Infrastructure and Parallel file system servers and clusters.
- Build, configure, and maintain multiple RAID controllers and disk enclosures systems.
- Deploy and maintain Infiniband fabrics for compute and storage networks.
- Monitor installation of software releases, patches of the operating system, and third-party utilities with emphasis on overall system security.
- Collaborate with other system engineers, Hotline, and Operations staff to improve the quality of service for end users.
- Troubleshoot and determine root cause of moderately complex system issues.
- Respond to system problems and user questions in person, via email, and via a trouble ticket system.
- Perform other duties as assigned.
In Addition at the SES.3 Level
- Analyze and tune performance of complex computer, network, file system and disk sub-systems.
- Investigate, evaluate, test and recommend technical solutions for future systems.
- Develop tools and procedures to monitor and automate system tasks on servers and clusters.
- Bachelor’s degree in Computer Science or related field, or the equivalent combination of education and related experience.
- Broad experience with Linux/Unix systems including installation, configuration, networking, backups, updates and patching, and system security.
- Experience with or knowledge of HPC environments and technologies such as Infiniband, Slurm, Lustre, and GPFS.
- Comprehensive knowledge of scripting and programming languages, such as Python, Perl, and bash/csh/ksh.
- Broad experience with disk and storage systems, such as host-based RAID controllers, software RAID and vendor RAID systems (e.g. Network Appliance, Raid Inc, DDN, etc.)
- Experience with version control and configuration management systems, such as Subversion, git, Ansible, cfengine, etc.
- Ability to work off-hours and on-call (intermittently either as needed or as part of a rotation).
- Proficient communication, interpersonal skills, and the ability to work and communicate with other technical staff and end-users.
In Addition at the SES.3 Level
- Significant Linux/UNIX system administration experience in support of a number of independent but inter-related systems and software packages, containers, Kubernetes, virtualization environments and tools, such as VMware, KVM, etc.
- Advanced knowledge of and significant experience providing innovative solutions to broadly defined tasks and problems.
- Advanced communication, interpersonal skills, and the ability to effectively interact with system developers and vendors with minimal direction.
- Master’s degree in Computer Science or related field.
- Experience with local, parallel and distributed file systems such as XFS, ZFS, GPFS, Lustre, and with NAS platforms such as Network Appliance cDot
- Experience with Hadoop MRv2 (YARN), Docker containers, Kubernetes ecosystems, and current RedHat certifications.
Pre-Employment Drug Test: External applicant(s) selected for this position will be required to pass a post-offer, pre-employment drug test. This includes testing for use of marijuana as Federal Law applies to us as a Federal Contractor.
Security Clearance: This position requires a Department of Energy (DOE) Q-level clearance.
If you are selected, we will initiate a Federal background investigation to determine if you meet eligibility requirements for access to classified information or matter. In addition, all L or Q cleared employees are subject to random drug testing. Q-level clearance requires U.S. citizenship. If you hold multiple citizenships (U.S. and another country), you may be required to renounce your non-U.S. citizenship before a DOE L or Q clearance will be processed/granted.
Note: This is a Career Indefinite position. Lab employees and external candidates may be considered for this position.
Lawrence Livermore National Laboratory (LLNL), located in the San Francisco Bay Area (East Bay), is a premier applied science laboratory that is part of the National Nuclear Security Administration (NNSA) within the Department of Energy (DOE). LLNL's mission is strengthening national security by developing and applying cutting-edge science, technology, and engineering that respond with vision, quality, integrity, and technical excellence to scientific issues of national importance. The Laboratory has a current annual budget of about $2.3 billion, employing approximately 6,900 employees.
LLNL is an affirmative action/ equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, marital status, national origin, ancestry, sex, sexual orientation, gender identity, disability, medical condition, protected veteran status, age, citizenship, or any other characteristic protected by law.