Role: Incident Manager
Shift: PST (Night Shift)
Experience: 8+ years
We are seeking a dedicated Incident Manager to join our Infrastructure Operations (IOPS) team. As part of the Service Management group, you will lead the processes for incident, problem, and change management. During major incidents, you will act as the Incident Commander, taking full responsibility for managing the organization’s response. This includes leading investigations, assigning tasks, coordinating with stakeholders, and ensuring timely service restoration.
To succeed in this role, you should have a foundational understanding of internet technologies and a willingness to learn new systems and procedures. A top candidate will be able to multitask effectively and remain composed under pressure.
Key Responsibilities:
– Lead and coordinate incident response during major incidents, including communication and escalation.
– Manage the incident lifecycle and act as Incident Commander during critical events.
– Quickly respond to service disruptions, assess the situation, and initiate incident management protocols.
– Prioritize incidents based on urgency and business impact.
– Maintain and update service management documentation, including standardized definitions and SLAs.
– Ensure adherence to established protocols in collaboration with the service management team.
– Document incidents and resolutions, and support root cause analysis and post-incident reviews.
– Continuously improve the incident management process for greater efficiency.
– Keep senior leadership informed during major incidents.
– Host weekly production health meetings to align teams on current issues, changes, and releases.
– Support team members by helping prioritize tasks and reschedule non-urgent work.
Requirements:
– Bachelor’s degree in IT, engineering, or a related discipline.
– Minimum of two years of experience in IT service management or a similar role (ITIL preferred).
– One to two years of leadership experience is a plus.
– Familiarity with critical incident response, emergency protocols, or risk mitigation.
– Proficiency in ITSM tools such as ServiceNow and JIRA.
– Experience with monitoring and alerting tools like PagerDuty, SignalFX, Splunk, and Zabbix.
– Strong communication and collaboration skills.
– Ability to clearly and concisely communicate during incidents and in meetings.
– Strong analytical thinking and the ability to perform under pressure.
– Excellent organizational, problem-solving, and time management abilities.
About Rackspace Technology:
Rackspace Technology is a leader in multicloud solutions. We partner with top technology providers to deliver comprehensive solutions across applications, data, and security. Our approach includes advising clients, designing scalable solutions, managing deployments, and optimizing performance. Recognized by Fortune, Forbes, and Glassdoor as a top workplace, we are committed to developing exceptional talent. Join us in our mission to embrace innovation, empower our clients, and shape the future.
More About Rackspace Technology:
At Rackspace, we are united by a shared mission: to be a valued member of a high-performing team with a meaningful purpose. We bring our authentic selves to work and believe that diverse perspectives drive innovation and better serve our global customers and communities. We are proud to be an equal opportunity employer and welcome applicants from all backgrounds. If you require accommodations due to a disability or special need, please let us know.
To find out more about this job, please visit this link