Job Overview:
The Incident Response Analyst – Data Center Infrastructure is a critical member of the Incident Response Center (IRC), responsible for real-time monitoring, incident triage, troubleshooting, and escalation of alerts & alarms related to data center infrastructure, facilities systems, and supporting IT services.
This role is primarily focused on Data Center Infrastructure Management (DCIM), facilities systems, and environmental monitoring, including UPS, power distribution, temperature, humidity, and HVAC systems, while also providing triage-level support for network, server, storage, and cloud-related alerts.
The Analyst is expected to operate independently during shifts, demonstrate strong situational awareness during incidents, and ensure accurate escalation and communication to downstream teams.
Key Responsibilities:
- Monitoring & Detection
- Monitor alarms across DCIM, BMS, NMS, cloud dashboards, and monitoring platforms.
- Actively monitor: UPS systems, batteries, bypass states, load, and runtime. Power distribution systems (PDUs, breakers, feeds), Environmental conditions (temperature, humidity, airflow), HVAC / cooling systems (CRAC, CRAH, chillers where applicable)
- Monitor network devices, servers, storage, WAN circuits, and PoP sites for triage and escalation.
- Track external risk events (weather, seismic activity, fires) impacting data center operations.
- Monitor CCTV systems for physical security or safety-related events.
- Incident Triage & Response
- Validate alarms and identify false positives, duplicates, and correlated events.
- Assess impact and assign severity based on SOPs and thresholds.
- Perform troubleshooting, including:Redundancy validation, Load and capacity checks, Trend and threshold analysis
- Create and manage incidents using Jira or equivalent ITSM tools.
- Escalate incidents to Facilities, Electrical, HVAC, Network, Compute, Storage, Security, and Cloud teams with complete diagnostic context.
- Provide timely incident updates via Everbridge, email, and IRC communication channels.
- Maintain accurate handover notes across shift changes.
- Post-Incident & Operational Excellence
- Contribute to Post-Incident Reviews (PIRs) and timeline documentation.
- Track vendor notifications related to planned maintenance and emergency activities.
- Review and improve SOPs, runbooks, and escalation procedures.
- Act as a mentor for junior analysts during active incidents.
Required Qualifications
4–6 years of experience in Data Center Operations, DCIM, Facilities Monitoring, NOC/SOC, or Incident Response roles.
Strong working knowledge of: UPS and power distribution systems,Environmental monitoring and HVAC fundamentals, DCIM and monitoring tools
Practical experience handling live incidents in a 24×7 operational environment.
Solid understanding of network and server fundamentals for alert triage.
Excellent written and verbal communication skills.
Preferred Qualifications
Hands-on experience with DCIM / BMS platforms
Familiarity with SolarWinds, Nagios, Grafana, or similar tools.
Exposure to Oracle Cloud Infrastructure (OCI) or other public cloud platforms.
ITIL Incident / Problem / Change Management knowledge.
Experience working with facilities vendors and service providers.
Top Skills
Astreya Dublin, Dublin, IRL Office
40 Mespil Road, 5th floor, , , Dublin, County Dublin, Ireland, D04 C2N4


