Did Delta's Aging IT Systems Turn a Tech Outage Into a $500 Million Disaster?

05.09.2024

In the wake of Delta Air Lines’ prolonged recovery following the CrowdStrike-induced tech outage in July 2024, a complex blame game has ensued. Central to the discussion is whether Delta’s outdated IT systems hampered its ability to recover swiftly, while competitors bounced back in days. The dispute has become a legal battle between Delta, CrowdStrike, and Microsoft, each pointing fingers as the primary source of Delta’s operational meltdown.

The Wall Street Journal wrote an article titled “The Day Delta’s ‘On-Time Machine’ Broke, and the Blame Game It Sparked,” which inspired WebProNews to investigate the aging IT issue further. 

On July 19, 2024, a faulty update from CrowdStrike’s Falcon antivirus software crashed millions of Microsoft Windows-based computers across the globe. Airlines, hospitals, banks, and other industries were hit, but Delta Air Lines experienced the brunt of the fallout. The Atlanta-based airline canceled over 7,000 flights within five days, leaving thousands of passengers stranded, while other airlines, also affected, managed to resume operations in a fraction of that time.

According to Delta CEO Ed Bastian, the damage to Delta was significant—$500 million in losses—and largely due to the airline’s reliance on Microsoft’s Windows systems. “The CrowdStrike update took us out,” Bastian said in an interview with CNBC, “but the recovery process was excruciatingly slow. We had to manually reset 40,000 servers.” While Delta scrambled to get its systems back online, other airlines that use similar tech infrastructure returned to normal operations within a couple of days. This disparity has fueled speculation that Delta’s aging IT systems exacerbated the crisis.

Microsoft and CrowdStrike have strongly pushed back against Delta’s narrative, with both companies implying that Delta’s IT infrastructure was behind the times. “Our preliminary review suggests that Delta, unlike its competitors, has not modernized its IT infrastructure, either for the benefit of its customers or for its pilots and flight attendants,” Microsoft attorney Mark Cheffo wrote in a letter to Delta.

CrowdStrike echoed this sentiment, stating, “It’s clear that Delta’s IT decisions—such as its heavy reliance on outdated systems—played a role in their slow recovery. We worked quickly to address the initial problem, but Delta did not act on the solutions offered.”

According to sources familiar with Delta’s internal operations, much of the company’s critical crew-tracking software, which schedules pilots and flight attendants, was running on older systems that were ill-prepared to handle a disruption of this magnitude. A former Delta executive noted, “This crew-tracking system was not robust enough to catch up quickly once it was overwhelmed by the backlog of data during the outage.”

Bastian has remained adamant that Delta’s systems were not to blame for the drawn-out recovery. “Since 2016, we’ve invested billions in IT upgrades and infrastructure,” he said, arguing that the severity of the outage was unprecedented and that Delta was simply unlucky. “We recognized the need to move away from older systems and had already been overhauling our crew-tracking software, which has performed well in previous disruptions.”

Delta’s COO Mike Spanos, who left the airline shortly after the incident, backed this claim, saying, “In the heat of the moment, we opted not to cancel flights en masse, believing a more surgical approach would help minimize customer impact. In hindsight, a more aggressive cancellation strategy could have reduced the chaos.”

However, despite these assertions, internal reports from Delta pilots and union officials suggest that Delta’s systems, particularly its Gate Keeper and crew-tracking systems, were slow to recover and overdue for a major overhaul. “The crew-tracking system is ancient,” said a Delta pilot who asked to remain anonymous. “We were relying on outdated technology, and it failed when we needed it the most.”

The blame game reached its peak when Delta hired David Boies, a high-profile litigator, to pursue damages against both CrowdStrike and Microsoft. In a letter to CrowdStrike dated July 29, Delta claimed that the tech company was responsible for “substantial harm” and demanded $500 million in compensation. CrowdStrike’s response was swift and firm, with a company spokesperson calling Delta’s legal threats “unreasonable” and asserting that their liability was contractually capped at single-digit millions.

Delta’s expectations, according to CrowdStrike, were unrealistic. “Delta wanted us to take full responsibility without any substantiation,” said a CrowdStrike representative. “They didn’t take into account the role their own IT decisions played in prolonging their recovery.”

Microsoft also entered the fray, accusing Delta of declining offers of help. “Microsoft made daily offers to assist Delta,” Cheffo wrote, “but these were turned down repeatedly,” Microsoft claims that Delta’s crew-tracking system, which does not run on Microsoft platforms, was the primary cause of the delays. “This was not a Microsoft problem,” a source close to the company said. “Delta’s internal systems weren’t able to handle the situation.”

While the CrowdStrike software update has been identified as the trigger for the meltdown, many industry experts and insiders argue that Delta’s underlying technology was the real culprit behind the airline’s delayed recovery. Delta’s rivals, including American Airlines and United, also faced the same faulty update, yet they managed to bounce back far more swiftly, with minimal disruption to their schedules. So why did Delta, a company that prides itself on operational efficiency and punctuality, stumble so badly in its recovery?

According to Microsoft, Delta’s reliance on outdated infrastructure played a significant role in dragging out the recovery process. In a letter from Microsoft’s attorney, Mark Cheffo, the tech giant did not mince words, stating that Delta “apparently has not modernized its IT infrastructure, either for the benefit of its customers or for its pilots and flight attendants.” Microsoft further highlighted that Delta had declined offers of technical support during the outage, raising questions about how prepared Delta really was for such a crisis.

Delta Turned Down Help From Microsoft

“Delta was offered daily assistance from Microsoft starting July 19, when the outage occurred, through July 23,” said Cheffo. “Yet, each time, the airline turned it down. Senior executives from Microsoft also reached out to their Delta counterparts, including CEO Ed Bastian, but they received no response.” This narrative paints a picture of an airline caught flat-footed by the outage and unwilling, or unable, to accept help when it was most needed.

But Delta has pushed back hard on this assertion. In an internal memo to employees, Delta CEO Ed Bastian claimed the airline had made significant investments in its IT systems over the years, citing billions of dollars spent on technology upgrades since 2016. “We have a long track record of investing in safe, reliable, and elevated service for our customers and employees,” Bastian wrote. He also underscored that Delta’s recovery challenges were a direct result of its heavy reliance on Microsoft and CrowdStrike systems. “It’s important to recognize that Delta’s IT infrastructure is among the most complex in the industry, and the failure of CrowdStrike’s update to properly integrate with Microsoft Windows caused significant disruptions that could not be easily fixed.”

Delta’s Crew Tracking System “Limping Along for Years”

However, others within the airline and cybersecurity sectors suggest that Delta’s IT infrastructure has long been overdue for a comprehensive overhaul. According to a pilot union leader, who requested anonymity, Delta’s crew-tracking system — a critical component that matches pilots and flight attendants to flights — was already a known weak point before the CrowdStrike outage. “That system has been limping along for years. It doesn’t surprise me that it crashed so spectacularly,” the union leader said. “The sheer volume of data and operational complexity was just too much for the outdated system to handle once the outage hit.”

Moreover, internal documents viewed by The Wall Street Journal in its article (linked above) reveal that Delta has been planning to modernize its crew IT systems for years, but much of that work was slowed down during the COVID-19 pandemic. A presentation to pilots in June of 2024, just weeks before the outage, outlined a roadmap for updating the crew-tracking infrastructure. Still, it was not clear if those changes would have been in place in time to mitigate the fallout from the CrowdStrike incident. “We recognize the need to move away from these 40-plus-year-old systems,” said Philip Higgins, Delta’s managing director of operations, in a recording of the meeting.

Delta IT “Running of Fumes for Years”

Yet, the pace of these upgrades raises questions about Delta’s prioritization of its tech investments. A former Delta executive, speaking on the condition of anonymity, was more critical, suggesting that the airline’s leadership has been too focused on short-term cost savings at the expense of long-term resilience. “Look, it’s no secret that Delta likes to spend money on things passengers can see — the flashy airport lounges, the new planes, the service upgrades. But when it comes to the backbone of their operation, they’ve been running on fumes for years,” the former executive said. “This outage exposed the cracks in that approach. You can’t keep running mission-critical systems on legacy infrastructure and expect everything to hold together when disaster strikes.”

Delta’s crew-tracking system was at the heart of the airline’s operational meltdown, taking days to catch up with the flood of delayed and canceled flights. Pilots and flight attendants were stranded in the wrong cities and unable to be reassigned to new flights because the system couldn’t process the backlog of data fast enough. The airline was forced to rely on manual processes to match available crew to planes — a method that was not only inefficient but entirely unsustainable given the scale of the disruption. “We had pilots who couldn’t get through to the system for days,” said a Delta pilot, who was caught in the chaos. “We were being asked to self-report where we were on Monday because the airline had lost track of us by Friday. It was an absolute mess.”

Delta’s Outage a “Wakeup Call” to the Entire IT Industry

Adding to the complexity, Delta’s scheduling systems, including its Gate Keeper program, which manages the flow of planes through the airline’s Atlanta hub, also struggled to recover from the outage. The software snarls forced Delta to reduce traffic to just 20 arrivals and departures per hour, far below the normal 50 to 60 flights per hour, resulting in a cascading effect throughout Delta’s global network. Thousands of flights were canceled, and tens of thousands of passengers were left stranded at airports across the world. “The slow recovery wasn’t just about fixing computers — it was about trying to untangle a massive web of disrupted operations,” said John Laughter, Delta’s senior vice president of operations.

This slow recovery and the evident strain on Delta’s aging systems have led to a deeper conversation within the aviation industry about how airlines allocate resources to their IT infrastructure. For years, airlines have been heavily dependent on complex, interlocking systems to manage everything from ticket sales to flight operations. “When those systems work, they’re invisible,” said one airline IT consultant who has worked with major U.S. carriers. “But when they fail, the cracks become very visible, very quickly. Delta’s outage should be a wake-up call not just for them, but for the entire industry.”

Delta’s technology woes may lead to substantial changes in how the airline approaches its IT investments. Bastian has acknowledged that Delta is conducting a thorough review of its response to the outage, and the airline is reportedly considering a new vendor to replace CrowdStrike. However, replacing one software provider may not be enough to resolve the deeper issue of outdated infrastructure. “This isn’t just about blaming CrowdStrike or Microsoft,” said a former Delta IT manager. “This is about Delta taking a hard look at the systems they’ve been relying on for decades and realizing that if they don’t modernize now, they’re going to keep facing these kinds of crises.”

What Delta Should Do ASAP to Modernize Its IT:

In the aftermath of Delta Air Lines’ disastrous IT outage caused by a faulty CrowdStrike software update in July 2024, the question of whether Delta’s outdated IT infrastructure exacerbated the airline’s slow recovery has become a focal point of discussion. While the blame game between Delta, CrowdStrike, and Microsoft continues, it is clear that Delta’s reliance on legacy systems has left the airline vulnerable to such operational meltdowns. Modernizing Delta’s IT infrastructure is not just an operational necessity—it’s a strategic imperative.

WebProNews analyzed reams of IT expert talking points following Delta’s IT breakdown, and it comes down to just a few clear strategic IT initiatives. Delta should consider implementing the following steps as soon as possible to safeguard its operations from future disruptions and maintain its reputation as an industry leader in reliability.

1. Move to a Cloud-First Approach

Delta has long relied on on-premises systems, particularly in areas such as crew scheduling, gate management, and customer service. While cloud technology has been revolutionizing industries for over a decade, Delta’s commitment to fully transitioning to the cloud has been slow.

“Delta’s infrastructure is a patchwork of legacy systems and on-premise software that’s been duct-taped together over the years,” said an IT consultant familiar with Delta’s operations. “The first step toward real modernization is adopting a cloud-first strategy.”

The cloud offers the scalability, flexibility, and redundancy that Delta needs to avoid future disasters. Cloud-based systems would allow Delta to scale up resources when necessary, better handle the vast amounts of data generated by airline operations, and ensure that systems can quickly recover from failures.

What Delta should do: Transition critical applications, such as crew scheduling and gate management, to cloud-based solutions. This could involve partnerships with cloud service providers like Amazon Web Services (AWS) or Microsoft Azure, ensuring real-time access to vital systems from anywhere, along with built-in redundancy and disaster recovery options.

2. Overhaul Crew Scheduling and Operations Systems

One of the most significant pain points during the recent outage was the failure of Delta’s crew scheduling system, which left pilots and flight attendants stranded in the wrong locations with no easy way to track or reassign them. This outdated system, running on legacy software, couldn’t handle the backlog of flights and crews, contributing to Delta’s slow recovery.

A former Delta pilot described the chaos, saying, “The system was completely overwhelmed. We were essentially invisible to the airline for days, and that led to more cancellations and delays than necessary. It’s a system that should have been upgraded years ago.”

Modernizing crew scheduling with more robust, real-time solutions is critical. These systems must be able to process large amounts of data instantly, ensure crews are in the right place at the right time, and quickly recover from outages or other disruptions.

What Delta should do: Invest in modern crew management software that can handle the airline’s complex operations. Leveraging AI and machine learning could optimize crew allocation during irregular operations (IROPS) and provide predictive analytics for more efficient planning.

3. Invest in AI-Driven Operational Resilience

Airlines are complex, interconnected systems, and disruptions can ripple through them rapidly. Artificial intelligence (AI) has the potential to predict disruptions, optimize recovery, and offer real-time solutions to mitigate the impact of technical failures.

AI-driven tools can help airlines like Delta analyze vast amounts of operational data to foresee delays, optimize flight paths, and even provide real-time recommendations during operational crises. Several airlines, including Delta’s competitors, have already begun integrating AI-driven solutions into their operations to enhance resilience.

“AI offers a way for airlines to preemptively manage disruptions, rather than just reacting after the fact,” said an industry expert in aviation technology. “It’s a tool that, when fully integrated, could have minimized the impact of this summer’s outage.”

What Delta should do: Partner with AI-focused technology firms to implement predictive maintenance systems, real-time crew management tools, and optimization algorithms that can keep operations running smoothly during both normal conditions and disruptions.

4. Modernize Gate and Ground Operations Technology

Delta’s Atlanta hub is one of the busiest in the world, and during the CrowdStrike outage, gate operations ground to a halt. The system that controls the movement of planes through Delta’s hub, known as Gate Keeper, became a choke point during the crisis, reducing the number of flights Delta could handle by more than 50%. This technology, much like the crew scheduling system, needs an immediate overhaul.

Beyond gate management, the airline’s ground operations technology—systems that manage baggage, boarding, and customer service—need to be fully modernized. Any delays or malfunctions in these systems have a direct impact on Delta’s ability to deliver on its promise of timely, efficient service.

“Gate and ground operations are critical cogs in the machine of any airline. The moment these systems start faltering, the entire network can fall apart,” said a former Delta executive. “Delta can’t afford to be complacent with outdated systems in this area.”

What Delta should do: Upgrade the Gate Keeper system and other ground operations technology with modern, cloud-based solutions that are capable of handling large volumes of data and traffic. Implement redundancy and failover systems to ensure smooth operations even in the event of a failure.

5. Develop a Robust Disaster Recovery Plan

One of the most glaring lessons from the CrowdStrike outage is the importance of having a robust disaster recovery (DR) and business continuity plan (BCP). While Delta undoubtedly had some level of disaster recovery in place, the scale and duration of the outage suggest that these plans were either insufficient or not properly executed.

“Other airlines experienced the same software bug but managed to recover quickly because they had well-thought-out disaster recovery plans,” said an industry insider. “Delta’s delayed recovery is evidence that they need to go back to the drawing board.”

What Delta should do: Create a comprehensive disaster recovery plan that includes regular testing, real-time simulations, and redundant systems that can be quickly brought online in the event of a failure. This plan should encompass all critical systems, including crew scheduling, flight operations, customer service, and IT infrastructure.

6. Enhance Cybersecurity and Vendor Management

The relationship between Delta, CrowdStrike, and Microsoft is strained, with each side pointing fingers over the July outage. While Delta works to modernize its IT infrastructure, it must also reassess its cybersecurity posture and vendor management practices.

CrowdStrike’s security software was supposed to protect Delta, but when the update went wrong, the reliance on third-party software proved to be a vulnerability. In a world where cyber threats are increasing, Delta can no longer afford to leave its cybersecurity entirely in the hands of external vendors.

What Delta should do: Conduct a thorough audit of all third-party software and service providers, and implement strict oversight to ensure all systems are secure and regularly updated. Delta should also consider bringing more cybersecurity capabilities in-house to reduce its reliance on external vendors for mission-critical systems.

7. Rethink IT Talent and Culture

A critical component of any IT modernization effort is the people behind it. Delta has historically outsourced much of its IT work, relying on vendors like IBM and other managed service providers. While this may reduce costs in the short term, it can lead to a lack of institutional knowledge and agility during times of crisis.

In the long run, Delta needs to invest in building a strong internal IT team capable of handling both day-to-day operations and emergencies. This team must also foster a culture of innovation and continuous improvement to keep pace with technological advancements.

What Delta should do: Focus on recruiting top IT talent, particularly in cloud computing, AI, and cybersecurity. Delta should also invest in training and development to ensure its in-house IT team is well-equipped to handle future challenges.

The CrowdStrike outage was a wake-up call for Delta Air Lines, exposing deep vulnerabilities in its IT infrastructure. Delta needs to act decisively and swiftly to modernize its technology to prevent a repeat of the summer’s chaos. Delta’s path to IT modernization is clear from cloud migration and AI implementation to stronger cybersecurity and a robust disaster recovery plan. The question now is whether the airline will seize this opportunity and future-proof its operations—or risk being left behind in an industry that depends more than ever on technology.

Delta’s meltdown has sparked broader conversations about the vulnerability of modern airlines to tech outages and the need for robust disaster recovery plans. Industry experts are quick to point out that while tech failures like the CrowdStrike update are unpredictable, the ability to recover quickly is a matter of preparation.

One senior IT consultant remarked, “It’s not about whether your system will fail; it’s about how quickly you can get it back online. Delta’s response shows that they didn’t have a strong enough business continuity plan in place, and that’s what really prolonged the chaos.”

As the legal battle unfolds, the question of whether Delta’s outdated IT infrastructure exacerbated the recovery remains at the center of the controversy. What’s clear is that in today’s interconnected world, airlines—and other industries—are only as resilient as their tech systems allow them to be. Delta’s experience may well serve as a cautionary tale for companies relying on aging systems in an era of rapid technological advancement.

It should be noted that decisions on IT spending are always obvious in hindsight and difficult to make when operations are running smoothly. Delta is the largest airline in the world and made strategic decisions to go with premium partners like Microsoft and Crowdstrike. The company clearly was spending a lot on infrastructure via these partnerships.

The question is, did Delta wait too long to upgrade other company-controlled software and IT systems that were critical to their operations, or were they simply victims of a bad software update by a valued partner? The real answer is likely very nuanced, with plenty of blame to go around. In other words, shit happens.

Leave a Reply

Your email address will not be published.

loader-image
Ashgabat
8:48 pm, Nov 9, 2024
temperature icon 14°C
пасмурно
Humidity 54 %
Pressure 1030 mb
Wind 2 mph
Wind Gust Wind Gust: 0 mph
Clouds Clouds: 100%
Visibility Visibility: 0 km
Sunrise Sunrise: 7:40 am
Sunset Sunset: 5:59 pm
Previous Story

Volkswagen May Shutter German Plants

Next Story

UAE Leads the Charge in Revolutionizing Education with AI Integration

Latest from Innovation

Go toTop