As a global leader in cybersecurity, CrowdStrike protects the people, processes and technologies that drive modern organizations. Since 2011, our mission hasn’t changed — we’re here to stop breaches, and we’ve redefined modern security with the world’s most advanced AI-native platform. Our customers span all industries, and they count on CrowdStrike to keep their businesses running, their communities safe and their lives moving forward. We’re also a mission-driven company. We cultivate a culture that gives every CrowdStriker both the flexibility and autonomy to own their careers. We’re always looking to add talented CrowdStrikers to the team who have limitless passion, a relentless focus on innovation and a fanatical commitment to our customers, our community and each other. Ready to join a mission that matters? The future of cybersecurity starts with you.
About the Role
The CrowdStrike Information Technology team is looking for a Senior IT Monitoring Engineer/Site Reliability Engineer (SRE) to lead the design, implementation, and evolution of our enterprise monitoring and observability platforms. In this leadership role, you will architect scalable monitoring solutions, drive reliability initiatives, and serve as a technical authority for monitoring best practices. You will mentor junior team members, collaborate with cross-functional teams to establish SLOs, and play a key role in major incident management. This position requires advanced technical expertise, strategic thinking, and the ability to balance operational excellence with innovation.
What You’ll Need:
Required Skills and Qualifications
8+ years of experience with enterprise monitoring platforms and observability tools (LogicMonitor, DataDog, LogScale, Zscaler Digital Experience (ZDX), ThousandEyes)
Advanced proficiency in multiple scripting/programming languages (Python, Go, Bash)
Expert knowledge of modern monitoring ecosystems (Prometheus, Grafana, ELK)
Demonstrated experience architecting monitoring solutions at scale across hybrid environments
Strong background in SRE practices, including SLO definition, error budgets, and reliability engineering
Advanced knowledge of cloud platforms (AWS, GCP) and their native monitoring capabilities
Expertise in log aggregation, metrics and KPIs collection, and distributed tracing implementations
Experience designing and implementing automated remediation systems
Strong understanding of Infrastructure as Code and GitOps principles
Proven ability to mentor junior engineers and provide technical leadership
Shift timings- 12PM -9PM IST
What You'll Do:
Technical Leadership
Architect and implement enterprise-wide monitoring and observability solutions
Establish monitoring standards, best practices, and governance frameworks
Lead the evaluation and adoption of new monitoring technologies and approaches
Design scalable, resilient monitoring Infrastructure as Code
Serve as the technical escalation point for complex monitoring issues
Reliability Engineering
Lead the implementation of SRE practices across the organization
Partner with service owners to define appropriate SLOs and error budgets
Drive reliability improvements through data-driven analysis and recommendations
Design and implement advanced alerting strategies
Develop comprehensive observability strategies covering metrics, logs, and traces
Incident Management
Lead major incident response for critical service disruptions
Conduct thorough post-incident reviews and drive systematic improvements
Establish incident management processes and tooling improvements
Mentor team members on effective incident response techniques
Analyze incident patterns to identify and address systemic issues
Strategic Initiatives
Develop the monitoring and observability roadmap aligned with business objectives
Lead monitoring platform migrations and major upgrades
Implement cost optimization strategies for monitoring infrastructure
Drive automation initiatives to reduce toil and improve operational efficiency
Collaborate with security teams to integrate security monitoring capabilities
Team Development
Mentor junior engineers on monitoring best practices and SRE principles
Provide technical guidance and code reviews for monitoring implementations
Create documentation and knowledge-sharing materials for the broader organization
Contribute to hiring and team development activities
Foster a culture of continuous improvement and learning
Bonus Points:
Advanced certifications in cloud platforms or SRE practices
Experience leading incident response for complex, high-impact service disruptions
Experience with AIOps and ML-based monitoring approaches
Background in performance engineering or capacity management
Experience with chaos engineering and resilience testing
Bachelor's or Master's degree in Computer Science, Engineering, or related field
#LI-DP1
#LI-VJ1
#LI-Remote
Benefits of Working at CrowdStrike:
Remote-friendly and flexible work culture
Market leader in compensation and equity awards
Comprehensive physical and mental wellness programs
Competitive vacation and holidays for recharge
Paid parental and adoption leaves
Professional development opportunities for all employees regardless of level or role
Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections
Vibrant office culture with world class amenities
Great Place to Work Certified™ across the globe
CrowdStrike is proud to be an equal opportunity employer. We are committed to fostering a culture of belonging where everyone is valued for who they are and empowered to succeed. We support veterans and individuals with disabilities through our affirmative action program.
CrowdStrike is committed to providing equal employment opportunity for all employees and applicants for employment. The Company does not discriminate in employment opportunities or practices on the basis of race, color, creed, ethnicity, religion, sex (including pregnancy or pregnancy-related medical conditions), sexual orientation, gender identity, marital or family status, veteran status, age, national origin, ancestry, physical disability (including HIV and AIDS), mental disability, medical condition, genetic information, membership or activity in a local human rights commission, status with regard to public assistance, or any other characteristic protected by law. We base all employment decisions--including recruitment, selection, training, compensation, benefits, discipline, promotions, transfers, lay-offs, return from lay-off, terminations and social/recreational programs--on valid job requirements.
If you need assistance accessing or reviewing the information on this website or need help submitting an application for employment or requesting an accommodation, please contact us at [email protected] for further assistance.