Objectives of this Role
• Maintain services once they are live by measuring and monitoring availability,
latency and overall system health.
• Build software and systems to manage platform infrastructure for public cloud
and applications
• Measure and optimize system performance, with an eye toward pushing our
capabilities forward, getting ahead of customer needs, and innovating to
continually improve
• Provide primary operational support and engineering for multiple large
distributed software applications
• Improve reliability, quality, and time-to-market of our suite of software
solutions
• Work with Process on ISO27001/2000
Daily and Monthly Responsibilities
• Gather and analyze metrics from both operating systems (centos, ubuntu,
freebsd, windows server) and virtualization technology (openstack, proxmox)
• Participate in system design consulting, platform management production
system through application reviews, capacity planning, and performance
tuning.
• You enjoy hands-on troubleshooting in a distributed Linux systems environment
and are comfortable in tracing problems through applications, systems and
networks.
• Use your analytical skills to identify issues, propose solutions and deliver
successful resolutions and use monitoring tools zabbix prtg cacti etc.
• Support team members throughout the cloud operations organization, being a
prime escalation point and Tier 2 cloud infrastructure engineer ticket queue use
tools freshdesk and clickup
Required Skills and Qualifications
• Bachelor’s degree in computer Science or related technical major, or
commensurate experience
• Advanced experience with System Engineer with Linux and Windows (Ubuntu,
CentOS, RedHat,Windows Server)
• Experience with cloud or virtualization at least two Helm, Docker, Kubernetes,
AWS, Azure, Google Cloud, OpenStack, OpenShift, VMware vSphere
• Advanced of experience with Networking and troubleshooting (TCP/IP, DNS,
routing, switching, firewalls, LAN/WAN, traceroute, iperf, dig, cURL or related)
• Experience with distributed storage technologies like NFS, Ceph, S3 as well
• Exposure with one of the following relational databases MySQL, MariaDB,
Percona, MongoDB
• Develop and maintain installation and configuration procedure
• Deploy applicable server patches and perform system backups
• Scripting in Ansible, Shell, Perl, and Python to automate system
• A proactive approach to spotting problems, areas for improvement, and
performance.
• Ability to handle and resolve critical customer issues/escalations, including on-
call job rotations
• Strong network security fundamentals
• Hands-on, self-starter