Job 1000 van 1000


Report this listing

Solliciteren



Site Reliability Engineer


ph3Make an impact with NTT DATA /h3 pJoin a company that is pushing the boundaries of what is possible. We are renowned for our technical excellence and leading innovations, and for making a difference to our clients and society. Our workplace embraces diversity and inclusion – it’s a place where you can grow, belong and thrive. /p h3Your day at NTT DATA /h3 pThe Site Reliability Engineer (SRE) is a seasoned subject‑matter expert, responsible for ensuring the reliability, availability, and performance of company systems and infrastructure. The SRE works closely with development teams, operations teams, and other stakeholders to enhance system resiliency, automate processes, and improve overall system reliability. /p h3Key responsibilities: /h3 ul liMonitors system health, performance metrics, and alerts to identify and respond to incidents promptly and diagnoses issues, troubleshoots problems, and restores services in a timely manner. /li liImplements incident response processes to minimize downtime and improve system availability. /li liDesigns, develops, and maintains automation tools, scripts, and processes to streamline system management tasks, deployments, and configuration changes. /li liImplements infrastructure‑as‑code principles to ensure consistency and repeatability. /li liOptimizes system resources, configurations, and processes to enhance performance, scalability, and efficiency. /li liUses monitoring tools and performance testing to identify bottlenecks and implement optimizations. /li liCollaborates with teams to forecast system resource needs, plans for capacity growth, and ensures adequate scalability. /li liLeads incident response efforts, coordinates with cross‑functional teams, and drives the resolution of system issues. /li liPerforms thorough post‑incident analysis to identify root causes and implements preventive measures to minimize future incidents. /li liIdentifies opportunities for automation and drives the implementation of self‑healing, monitoring, and deployment of automation tools and frameworks. /li liContinuously improves operational efficiency, system reliability, and availability through process enhancements and automation. /li liEnsures consistency across environments, tracks changes, and enforces configuration standards. /li liWorks closely with development teams, operations teams, and other stakeholders to ensure effective collaboration, knowledge sharing, and alignment on reliability goals. /li liImplements security best practices, works with security teams to assess and address vulnerabilities, and ensures compliance with security standards and regulations. /li liPerforms any other related task as required. /li /ul h3To thrive in this role, you need to have: /h3 ul liSeasoned technical expertise in Linux/Unix systems, networking, and system administration. /li liSeasoned proficiency in scripting or programming languages, such as Python, Go, Java, or Ruby. /li liSeasoned knowledge of cloud platforms (such as AWS, Azure, or Google Cloud) and associated services. /li liSeasoned proven expertise in performance monitoring, optimization, and troubleshooting using tools such as Prometheus, Grafana, or New Relic. /li liSeasoned expertise in incident management, root cause analysis, and post‑incident reviews. /li liExcellent problem‑solving and analytical skills, with a keen attention to detail. /li liExcellent communication, collaboration, and leadership skills. /li liSeasoned ability to optimize system performance, scalability, and reliability. experience with performance monitoring and tuning tools (for example, Prometheus, Grafana, or New Relic) to identify bottlenecks, analyze performance data, and implement optimization strategies. /li liSeasoned understanding of security principles, best practices, and compliance requirements. experience in designing and implementing security controls, performing security assessments, and ensuring compliance with industry standards. /li /ul h3Academic qualifications and certifications: /h3 ul liBachelor's degree or equivalent in Computer Science, Information Technology, or a related field. /li liRelevant certifications, such as AWS Certified DevOps Engineer - Professional, Google Cloud Professional DevOps Engineer, or Certified Kubernetes Administrator (CKA) preferred. /li /ul h3Required experience: /h3 ul liSeasoned hands‑on experience in a Site Reliability Engineering role or related roles, including experience in designing and maintaining highly available and scalable systems. /li liSeasoned hands‑on experience with Linux/Unix systems, networking, and system administration is crucial. In-depth knowledge of cloud platforms (such as AWS, Azure, or Google Cloud) and associated services is essential. /li liSeasoned proficiency in multiple programming languages like Python, Java, Go, or Ruby is important for developing and maintaining automation tools, frameworks, and complex system integrations. Expertise in scripting languages like Bash or PowerShell is beneficial. /li liSeasoned understanding of complex infrastructure architectures, including scalable and fault‑tolerant designs. experience with infrastructure‑as‑code tools (such as Terraform or CloudFormation) and containerization technologies (such as Docker or Kubernetes) is essential. /li liSeasoned experience in designing and implementing robust automation frameworks, CI/CD pipelines, and deployment strategies. Proficiency in tools like Jenkins, GitLab CI/CD, or CircleCI to build, test, and deploy applications with a focus on reliability and scalability. /li liSeasoned experience in incident management, troubleshooting complex system issues, and conducting post‑incident analysis. Advanced ability to lead incident response efforts, drive root cause analysis, and implement preventive measures. /li liSeasoned understanding of DevOps principles, Agile methodologies, and a strong commitment to continuous improvement and learning. experience in promoting a DevOps culture and driving the adoption of best practices. /li /ul h3Workplace type: /h3 pOn‑site Working /p h3About NTT DATA /h3 pNTT DATA is a $30+ billion business and technology services leader, serving 75% of the Fortune Global 100. We are committed to accelerating client success and positively impacting society through responsible innovation. We are one of the world’s leading AI and digital infrastructure providers, with unmatched capabilities in enterprise‑scale AI, cloud, security, connectivity, data centers and application services. Our consulting and industry solutions help organizations and society move confidently and sustainably into the digital future. As a Global Top Employer, we have experts in more than 50 countries. We also offer clients access to a robust ecosystem of innovation centers as well as established and start‑up partners. NTT DATA is part of NTT Group, which invests over $3 billion each year in RD. /p h3Equal Opportunity Employer /h3 pNTT DATA is proud to be an Equal Opportunity Employer with a global culture that embraces diversity. We are committed to providing an environment free of unfair discrimination and harassment. We do not discriminate based on age, race, colour, gender, sexual orientation, religion, nationality, disability, pregnancy, marital status, veteran status, or any other protected category. Join our growing global team and accelerate your career with us. Apply today. /p h3Third parties fraudulently posing as NTT DATA recruiters /h3 pNTT DATA recruiters will never ask job seekers or candidates for payment or banking information during the recruitment process, for any reason. Please remain vigilant of third parties who may attempt to impersonate NTT DATA recruiters—whether in writing or by phone—in order to deceptively obtain personal data or money from you. All email communications from an NTT DATA recruiter will come from an b@nttdata.com /b email address. If you suspect any fraudulent activity, contact us. /p /p #J-18808-Ljbffr

Solliciteren

Meer banen van je zoekopdracht