Your Roles and Responsibilities:
Consults with supervisor, systems analysts, other programmers, and end users to gather information about program intent, functions, features, data requirements, input requirements, output requirements, internal and external checks and controls, hardware and operating system environment, and interfaces with other systems.
Designs and writes program specifications based on consultations with supervisor, systems analysts, other programmers, and end users.
Converts designs and specifications into computer code.
Compiles code into programs and corrects errors detected in the compile process.
Creates test transactions and runs tests to find errors and confirm the program meets specifications.
Analyzes code to find causes of errors and revise programs.
Writes and maintains documentation of changes to computer code, programs, and specifications.
Designs and codes layouts for onscreen user interfaces, printed outputs, and interfaces with other systems.
Engages in user review and technical documentation written by others to confirm consistency with program operations.
Provides technical assistance by responding to inquiries regarding errors, problems, or questions with programs.
Revises programs for corrections, enhancements, or system environment changes.
Trains end users or technical support staff to use and support programs.
Builds applications from source and deploy to a cluster environment in a reproducible manner.
Performs Linux system administration and develop automation tools for system monitoring and deployment automation using Bash or Python.
Develops applications for highly interconnected clustered environments, preferably HPC.
Creates VMs, containers (Docker and Singularity) and OS images for network booting.
Develops CLI tools and wrappers for efficiently debugging HPC cluster deployments at a cluster and node level using Python, bash, or other scripting languages
Develops automated test suite for validating cluster and node health
Creates scripts for gathering cluster component logs and parsing data for troubleshooting deployment issues
Builds internal tools that have robust error handling, resilient, and scale for multiple cluster deployment types
Builds tools capable of debugging both CPU and GPU components
Uses build system tools including Make, CMake, autoconf, and autotools
Interfaces directly with experienced users and determine proper delegation of roles and approving or denying requests.
Required Qualifications/Skills:
Bachelor’s degree (B.S/B.A) from four-college or university and 3+ years’ related experience and/or training; or equivalent combination of education and experience
Proven experience in integration projects for HPC and Machine Learning
environments
C/C++ programming skills
Expertise with autotools, make, cmake, containers (Docker, Singularity)
Strong team software development skills including demonstrated expertise with git, Jenkins, Jira, and similar tools.
In-depth knowledge of software development practices including debug, test, revision control, documentation, and bug tracking
Experience with virtualization and cloud computing
Linux administration, scripting expertise, cluster tools
Outstanding interpersonal and communication skills
Ability to work well and effectively in a team environment, including with geographically dispersed teams.
Physical Demand & Work Environment:
Must have the ability to perform office-related tasks which may include prolonged sitting or standing
Must have the ability to move from place to place within an office environment
Must be able to use a computer
Must have the ability to communicate effectively
Some positions may require occasional repetitive motion or movements of the wrists, hands, and/or fingers
Precisely
ClassPass
Everbridge
Boulevard
Sumo Logic