Lessons Learned from the CrowdStrike Outage: A Guide for Public Safety Answering Points (PSAPs)

By Paresh Patel, CISO, Carbyne

crowdstrike paresh blog

The recent CrowdStrike outage on July 19, 2024, has sent shockwaves through the cybersecurity community and beyond. This incident, triggered by a faulty Falcon update, caused a global crisis affecting 8.5 million devices and critical services. For Public Safety Answering Points (PSAPs), which rely heavily on uninterrupted IT services, the lessons learned from this outage are particularly crucial. Here’s a breakdown of key takeaways and how PSAPs can apply them to enhance their resilience and operational efficiency.

1. Expect the Unexpected: Outages Happen

No system is immune to failures. The CrowdStrike outage is a stark reminder that even the most robust cybersecurity solutions can experience catastrophic failures. PSAPs must acknowledge this reality and prepare for potential disruptions. This means having a comprehensive disaster recovery (DR) plan in place and regularly testing it to ensure it works when needed.

2. Comprehensive Update Management

The outage was caused by a defective update that led to widespread system crashes¹. This highlights the importance of a meticulous update management strategy. PSAPs should:

  • Test Updates Thoroughly: Before deploying updates, conduct extensive testing in a controlled environment to identify potential issues.
  • Staggered Rollouts: Implement updates in phases rather than all at once to minimize the impact of any unforeseen problems.
  • Rollback Plans: Have a clear rollback plan to revert to a previous stable state if an update causes issues.

3. Proactive Monitoring and Anomaly Detection

Continuous monitoring of systems for anomalies is crucial. The CrowdStrike incident underscores the need for real-time monitoring and rapid response capabilities. PSAPs should invest in advanced monitoring tools that can detect unusual activities and trigger immediate alerts for swift action.

4. Enhanced Resilience and Redundancy

Building resilience into IT infrastructure is essential. This includes:

  • Redundant Systems: Ensure critical systems have redundant backups that can take over in case of a failure.
  • Geographical Diversity: Distribute critical infrastructure across multiple locations to avoid a single point of failure.
  • Regular Drills: Conduct regular drills to simulate outage scenarios and test the effectiveness of your response plans.

5. Vendor Management and Accountability

The CrowdStrike outage also highlights the risks associated with relying on third-party vendors. PSAPs should:

  • Evaluate Vendors Thoroughly: Assess the reliability and track record of vendors before integrating their solutions.
  • Clear SLAs: Establish clear Service Level Agreements (SLAs) that define the vendor’s responsibilities and the penalties for non-compliance.
  • Regular Audits: Conduct regular audits of vendor performance and security practices.

6. Communication and Transparency

During the outage, effective communication was critical in managing the crisis¹. PSAPs should:

  • Establish Communication Protocols: Define clear communication protocols for internal and external stakeholders during an outage.
  • Transparency: Be transparent about the issues and the steps being taken to resolve them. This builds trust and ensures stakeholders are informed.

How Carbyne Can Help

Carbyne, a leader in emergency communication solutions, offers several features that can significantly enhance the resilience and efficiency of PSAPs:

  • Emergency Communications as a Service (ECaaS): Carbyne’s ECaaS integrates Next Generation 911 (NG911) capabilities like rich media, translation, and geolocation into existing PSAP workflows. This seamless integration ensures that PSAP staff can quickly adapt to new tools without extensive training.
  • Resilient Infrastructure: Carbyne’s platforms are built on government-grade public safety architecture, ensuring high availability and robust security. This resilience is crucial for maintaining uninterrupted service during crises.
  • Rapid Deployment: Carbyne’s solutions can be deployed quickly, minimizing downtime and ensuring that PSAPs can continue to operate effectively even during unexpected outages.
  • Advanced Features: With capabilities like live video streaming, pinpoint location tracking, and real-time translation, Carbyne enhances situational awareness and decision-making for emergency responders.

Conclusion

The CrowdStrike outage serves as a powerful reminder of the vulnerabilities inherent in our digital infrastructure. For PSAPs, the lessons learned from this incident are invaluable. By implementing robust disaster recovery plans, comprehensive update management strategies, proactive monitoring, enhanced resilience, effective vendor management, and leveraging advanced solutions like those offered by Carbyne, PSAPs can better prepare for and mitigate the impact of future outages. Remember, the key to resilience is not just in preventing failures but in being prepared to respond effectively when they occur.

Sources

Blog
https://blog.devolutions.net/2024/07/the-crowdstrike-it-outage-what-we-know-and-lessons-learned-so-far/

Techopedia
https://www.techopedia.com/lessons-learned-from-the-global-crowdstrike-outage-expert-panel

Netwoven
https://netwoven.com/cloud-infrastructure-and-security/what-really-happened-the-crowdstrike-outage-explained/

Scroll to Top