Beyond Airline Disruptions: Air Travel and Telecoms: Different Disruptions, Same Management Problem

As airlines race to ensure traffic growth intensifies, knowledge about how far they can go with low fares to avoid losses has become crucial for their survival. Their inability to control authentic costs and service quality has become a critical issue, especially for major hub operators. The difficulties arise because much of the cost and service quality cannot be easily quantified and consequently have no place in the existing performance metrics and management practices.

The legacy mindset is still strongly influencing how the work is designed and managed within airlines. Fragmented information, departmental detachments, local optimizations, and functional hierarchies are just some of the issues. This keeps managers out of touch with reality resulting in continuous decline in passenger experience and rise in costs.

The main focus of my work is to overcome these obstacles and help airlines to create a new framework for decision making more in line with the dynamics and complexity of airline business. It is based on my first-hand experiences and opportunities to see problems from various departmental and systems perspective. Some of these problems and solutions were described in my books.

I found much inspiration for my work from systems thinkers from within and from the outside of the airline industry, among them Martin Geddes. Martin is a telecoms expert but his connection with the airline industry is deeply personal. His father worked for British Airways (and its predecessor) for 34 years as a maintenance engineer. Martin grew up in a Heathrow neighbourhood in a home 'scented with kerosene', watching Concorde streak by his window. He witnessed the historical ups and downs of BA, and his travel experience stretches from a standby passenger to a 'Gold' customer. On the professional side, for a brief period in the late 1990s, Martin worked as an IT consultant to BA, architecting the first Web check-in systems.

But this is only part of the story related to Martin's interest in the airline industry. More of it is published in his insightful articles, including "Brand suicide case study:British Airways". which was my first encounter with his work. It is a rare mix of personal experiences, business insights, and parallels between telecom and the airline industry, offering glimpses of new possibilities for improvement on many fronts of these industries.

This prompted me to further explore Martin's work. The universal principles described in his blueprint for a lean industry transformation are applicable to many areas of the airline business and the industry as a whole. By transcending industry boundaries, Martin deepens our understanding of new possibilities and challenges conventional thinking.

No need to say that I was really thrilled when Martin accepted my invitation for an interview. We had an amazing conversation that lasted much longer than we had planned for. We discussed the issues faced by airlines and telecoms, each experiencing different disruptions but sharing the same underlying management problems.

Martin's ability to rise above industrial divisions, quickly grasp common underlying problems, "see" the solutions, and explain them in an easily digestible and inspiring way was truly astonishing.

The following are excerpts from our conversation, focusing on disruption-related issues, mostly from an airline perspective, including quality, cost, optimization, risk, passenger experience, and some aspects of the lean quality revolution.

(JR-Jasenka Rapajic, MG-Martin Geddes)

JR: It seems that airlines have forgotten that their core purpose is to ensure that passengers reach their destination at or near the time they were told they will when they bought their ticket, and that they will be cared about if their flight is delayed or cancelled. Most of the problems are related to absence of measures of system values like quality of service and authentic costing. This neglect leads to fragile operational performance, growing passenger dissatisfaction, and a higher risk of financial failures.

MG: It feels to me like airlines have fundamentally misunderstood their business, as has happened in networking. The core (wrong) belief about packet networks is that they exist to deliver “bandwidth”, and thus should process as many packets as possible as quickly as possible. In reality, this is an insane economic model where revenues are tied to the quality-insensitive traffic and costs to the quality-sensitive. Instead, they should be thinking about resource trades.

Likewise, airlines see themselves as being in the people cargo business, when really, they are meant to be designing systems of travel and identifying the profitable “trades” of supply and demand in space and time.

Airlines are carrying lots of historical baggage, especially constraints like runway and slot capacity. To resolve this undesirable situation, airlines need to deliver supply that does not under-deliver quality, which eliminates over-delivery and is agile in responding to inevitable changes in the nature and structure of demand, which assumes the need for rethinking the operating model.

JR: What does this mean in practice?

MG: The airline industry is still based on stocks rather than flows. There is a stock of seats they are trying to fill up. Even if the planes physically move around, they have a static stock view of the nature of industry. The moment you sell the seat reservation, you sell the arrival option.

To reinvent aviation, you can create a “virtual airline” which buys capacity and sells arrival options to different segments, categorised by performance. The basic idea being borrowed here from computer science is the concept of “semantics” or meaning. There are three meanings: intentional, denotational, and operational; or, in other words: what do we want, what do we ask for, and what do we get. And success is lining these three things up.

JR: Cost-saving measures can be the underlying cause of costly disruptions. For example, under pressure to save costs, the maintenance department can decide to reduce the stock of spare parts. This may seriously disrupt operations, generating losses disproportionate in comparison to the expected savings. It may become a cause of disruptions, with ripples spread across the network, affecting passengers and a whole range of operational services, flight, and maintenance schedules. But no one would notice that, because the links between spare parts, disruption costs, and passengers experiencing disruptions are not visible.

MG: The cost optimisation models are tied to the static world whereby disruptions are seen as exceptions: we wish them to go away, but they keep coming. They are probably unable to understand the relationship between the nature of the cost optimisation they do for normal operation, and the impact they need to recover from normal variation. They are trying to locally optimise local functions. This happens in every part of an organisation.

In the marketing department, for example, the efforts are turned on the normal operational case, rather than on how we experience disruptions. This is because there is no revenue attached to it. If the accounting system reflected the actual reality, which is the choice of the customer to fly with you again, some of the revenue of future flights would have to be reattributed to how well you handled previous disruptions. If the accounting system doesn’t attach any monetary value to accumulating goodwill, not just pure cash income, the whole framing of disruptions becomes a cost centre. It is this cost accounting that is the fatal flaw that “lean” manufacturing overcame in the 1970s and 1980s.

In aviation today the CFO may see the slack as a target for cost-cutting. The result is that the airline locally optimises for each function, such as maintenance, crew rostering, aircraft scheduling, seat inventory management, etc. Yet the collective set of interactions, which is what the passenger experiences, is highly suboptimal. This is how cost accounting takes over and damages the customer experience.

JR: Airlines don’t like the mention of disruptions. They tend to avoid discussing them at system level unless they hit the headline news.

MG: Airlines see “disruptions” - delays and cancellations - as a pure negative, resulting in revenue loss, internal cost, and foregone customer goodwill. As such, they work to eliminate as many disruptions as possible. This comes at a price, as you must have slack in the system (such as spare parts, idle aircraft, standby crews, and extra landing slots). The internal carrier rewards are attached to revenue-growing activities, not risk-reducing ones, so there’s little promotion value from fine-tuning this slack.

The very name “disruption” implies something is wrong, when actually “arrival time variability” is a perfectly acceptable and necessary part of travel. For the airline, the financial management system fails to attach retention rewards to designing good “disruption experiences”, so there is underinvestment in that area of the passenger journey.

Both industries - broadband and aviation - struggle to deal with failure as a normal part of operations. Variability happens in all kinds of orders of magnitude. You have the base level of normal operations, then normal variability which is ordinary weather (predictable fog), and the next order in variability could be the strikes – and there are other higher orders like volcanoes or war.

Indeed, the real issue is a deep re-framing of the very nature of travel. Passengers would ideally like to be teleported like in Star Trek, or at least told to deplane right after sitting down, having been magically whisked around the world without the wait. From the passenger perspective, the whole experience of sitting in a noisy cramped metal tube is a “disruption” with respect to their dream experience.

JR: What level of variability would then be associated with consistent disruptions caused by airport congestion? Take Heathrow as an example. It already operates at maximum technical capacity, yet still pursues traffic growth mainly by keeping more aircraft and passengers in holding stacks before landing. You can’t see this in airline reports. Nor can you see how much idle time an aircraft spends on the ground and in the air, or the true cost of the slack intended to keep punctuality at competitive level.

MG: What currently happens with Heathrow is operation in saturation, the same thing we have (by design) with packet networks. Now in packet networks we observe a local optimum of keeping the next transmission link busy, but that results in a global network performance pessimum. We have been building new kinds of networks that challenge these fundamental (wrong) beliefs, and the results are “impossible” within the incumbent paradigm. We have to be careful translating metaphors, as we as a matter of routine we “shoot dead” packets, but that’s not such a great outcome in the travel industry (even if temptingly desirable at times).

To achieve this transformational goal, I and my colleagues are trying to define a service that includes an acceptable level of failure. Another way to put it is “how much network impairment are you willing to take to deliver what kind of application experience disappointment?”. And no one else has done it yet this way.

This process is designed to align three essential aspects: what you wanted, what you asked for, and what you got. These have fancy names in computer science - intentional, denotational, and operational semantics. To this end, we have developed a quantitative statistical language in which to express what we are supposed to do: i.e. what kind of network service impairment should lead to what kind of user experience disappointment. After all, you can’t get rid of impairment or disappointment because it is the very nature of the thing.

In some way, the airline industry needs to reframe the nature of the job to be done. It seems to be primarily of the positivist view of making the planes fly, while the core job is actually managing negativist variation. The thing that really distinguishes your success is lining these three semantics: can you make sure that the customer expectation was assessed appropriately that it was then sold and defined correctly and that was delivered on. After all, this is what Ryanair did historically. Their message is clear: you know it is a point-to-point service, that there is no through collection of baggage, no misconnection, with high chances that you will arrive on time. It is then up to you to decide whether to fly with them or not.

So, we have taken the language and concepts of safety-critical systems and brought them into telecoms. How at risk is the experience of going outside of the accepted failure? This is the idea of the “performance hazard” which, applied to aviation, is about how at risk passenger experience is of being unacceptable given the expectation that was set. Is it an “armed” hazard, with an impending possibility of an unexpectedly bad experience? Or has it already happened? And would you even know? What we are trying to do is to surface this underlying risk potential.

JR: With growing network congestion, airline and airport slacks keep shrinking. This destabilises the operation, resulting in hidden losses.

MG: There are three parts to any commercial transaction: cost, benefit, and risk. The issue is that risk is misunderstood and mispriced on both the demand and supply side. Airlines are unable to correctly model their performance hazards, and therefore their internal cost models are wrong. And then the customers are also carrying risks they don’t understand.

There is a word for it - SLAZARD (“slack” and “hazard” put together). We have a mathematical way of defining performance hazards. First, we have a framework for discussing intentional semantics. This originates in the military, as they are willing to engage with failure. Most commercial people see deviation as disruptions and disruptions are bad, they are fault or failure. Actually, disruption is an essential and unavoidable part of the airline business, just as death is a part of war.

The term “disruption” implies a normative world in which there is a perfect laminar flow of value — that doesn’t exist. The very word disruption seems to be a false framing of ordinary things being abnormal, yet this is the normal variation of flying, not a special variation due to unforeseeable events. In the latter case, passengers can cope with that idea, and understand that extreme situations are a shared risk.

The economic model is currently such that airlines and passengers are engaged in a tug-of-war pushing risk back and forth. The risk hasn’t been priced, and therefore the passengers who are risk intolerant are being partly mis-sold the product. Others who are very risk and variation tolerant have no product at all to buy! (There are some lessons from telecoms on why standby tickets were not a good product, and how to fix this.)

JR: Airline executives are often too focused on the financial aspects of the business to fully grasp the underlying operational issues and their impact on cost and quality. This crucial information is not presented in a form that is easily digestible for them. Is it possible to define an acceptable level of failure in such a dynamic and complex industry? What criteria should be used to establish this standard?

MG: To do that, you need to understand what makes a profitable “trade” of resources under management control. So, if I move all my planes from A to B, that’s a “trade” as I have more supply now at B than A. The nature of demand is dynamic, and you need to have a model to know how to reallocate resources to keep the “network of travel supply” aligned to it (and not just flights). You cannot wait for failure to happen to discover demand. Instead, you have to model the kind of things that could happen (and go wrong) and make appropriate resource trades.

Getting someone to somewhere else on time is not so much a “baseline” as a “ceiling”; it’s the best we could possibly do. The real differentiator is how we manage variation at the intentional, denotational and operational levels. By selling the seats on the plane service, you are really selling the option to be delivered somewhere within a certain time. There are some customers who will have different tolerances for different arrival windows.

At the moment customers are not allowed to engage in these resource trade-offs. So, I may be a 'Gold' customer, but on this particular leisurely flight you don’t need to prioritise me for rebooking: my intentional semantics is hidden from you. I might be happy to go the long way, or even be sent from a trip I never really wanted, and now I have an excuse not to go!

To understand why the model is so inflexible, you need to understand that the whole industry is held captive by the history of reservation systems. You cannot change the business model because you cannot fix the IT system, since it only has a concept of a physical plane with a seat stock which is reserved by the reservation system. It is actually a journey system with a supply-side view. I don’t care which plane it is. It all comes down to one thing - the contract between supply and demand.

It is about finding the management system as a series of different contracts to deliver some kind of flow value. And the customer experience is the composition of all of that. Whether this experiential contract is explicit or implicit, good or bad, legal or social, quantified or unquantified: the customer experience is the end result. The job of the management system is to explicitly define those roles, talk about intentional and unintentional operational semantics and start to define those “performance contracts” at internal and external boundaries.

These “contracts” include constraints on demand and supply for any resource. One constraint is that aircraft cannot be flown continuously forever, as it needs maintenance. To define the “performance system” as a whole requires capturing the interfaces between those functions, and the constraints placed upon them, and how they are composed together to deliver the passenger experience.

To optimise the system and the trade-offs, you need to know what experience you aim to deliver, and what the customers are willing to pay for improvement beyond that (or lowering of the outcome).

Low-cost carriers like Ryanair have engineered a simple product where the intentional semantic is clear, albeit limited in flexibility. Whether they are operationally going to deliver it or not is also clear. The things that can go wrong are the system design priority, and misalignment between the product and delivery is quite small. For BA, however, who is dealing with a complex global system which can be a multi-carrier, the impact of schedule variations at another continent is much higher.

EasyJet is in-between, and it remains to be seen how Norwegian will do, especially relating to Gatwick capacity issues and how this affects the airline’s punctuality.

JR: Passenger experience is mostly in the hands of outsourced service providers. The same applies to surveys about passenger satisfaction. These are additional layers that can impact the airline's understanding of quality of passenger services.

MG: The visibility of true customer experience is limited. Airlines really don’t understand how the sum of the operational parts results in the experience as a whole. I discussed this at length in my popular article on ‘‘Brand suicide case study: British Airways’‘. The passenger experience is not a series of disjointed activities, but the cumulative effect of their interaction. We just don’t know how to repeatedly compose an experience from its service elements. The core service delivery knowledge is often tacit, held by a select few personnel with long service.

When you are BA and you tell me that I am now a 'Gold' customer, and you are going to send me off to first class lounge every time, I don’t expect to find the first-class queue which is the longest queue in Terminal 5. This is wrong.

I live very near Heathrow, and if my flight is delayed or cancelled, I can even walk back home in an hour. The nature of the transport contract I want to have is different to someone who has travelled from Cornwall, which is different from someone in transit from America to Africa. I am willing to tolerate very large amounts of risk for that price, but I don’t need to be accommodated.

It is about a change of attitude to what is success, about defining the kinds of failure you want to have for different kinds or different classes of travel. I am not trying to make something better all the time for everyone. It is about working with variability as an opportunity, not a wrongness. The thing to focus also on is the tail risk (no pun) and how it impacts on the customer. It is all about making bad experiences rare, and very bad experiences very rare.

JR: In your article “Why we need antifragile applications and polyservice networks” you mentioned that Nicholas Taleb’s idea of antifragility has significant implications for telecommunications. This universal concept has also an important place in the complex, interdependent, and highly stressed airline industry. Your idea of a polyservice network that optimises for long-term systemic stability with predictable performance and lower cost is quite inspiring.

MG: This is because the essence of an antifragile system is that the system gains strength (and hence longevity) from experiencing variability and disorder. Continued small stressors to the system create a “learned state”, which makes it more adaptive to future large stressors. As a consequence, large “tail risk” shocks are no longer catastrophic. This learning process requires “optionality”: the availability of choices when stressed. The end result is a system with a decelerating response to stress, and predictable systemic outcomes that emerge out of the randomness of individual events.

JR: In the airline world, major hub networks are fragile as are the passengers. The long-term systemic stability is not in sight. Robustness is often associated with a scheduling “mechanism” which, in the stressed environment, becomes fragile on a regular basis.

MG: Antifragile systems contrast with “fragile” ones. In fragile systems, there is no strength gain from small stressors. As a result, there is an accelerating response to stress, so catastrophe can and does happen at relatively moderate levels of stress. The strength and longevity of such systems is low. Taleb emphasises that “antifragile” and “robust” are distinct and disjoint ideas. “Antifragile” makes a feature out of the inevitability of stress; “robustness” treats it as damage which has to be expensively mitigated

We are building networks with the low returns from fragile customer experiences, yet with the cost structure of robustness. Only an antifragile approach, using polyservice ‘option trading’ technology, can get us the best of all worlds: a rational incentive structure and sustainable economic model.

Today’s (monoservice) contract is ‘offer any load you like, whenever you like, and anything might happen as a result’. In contrast, a polyservice network offers many kinds of “options contracts” and brokers the different demands to allocate the resources to where they create most value.

In a true polyservice network, the ‘quality’ on offer is separated out into multiple classes. They have little or no overlap in quality with classes which are differentially priced to reflect their cost. Crucially, the higher classes allow for a higher level of choice between price and performance.

JR: While adaptability to disruptive travel which is becoming a “new normal” may come more naturally to frequent travellers, some of which have already mastered the skills in antifragility and robustness, majority of air passengers will still remain fragile to unexpected changes in their travel plans. One thing is certain: the level of passenger tolerance to disruptions will keep shaping the course of air travel in years to come.

MG: Flight delays and cancellations differently affect different kinds of passengers. In the example mentioned in my article I described the reaction of three types of passengers sharing the experience of power outage at Gatwick airport: Ms Fragile, with Valium in her handbag, goes into a frenzy; the well-travelled Mr Robust, with Kindle in his laptop bag, goes into a stoical state in response to the power cut at the airport; and the adventurous Mrs Antifragile, with Kendal Mint Cake in her backpack, sees this as an opportunity. The moral of the story is that our past experiences, and our responses to them, condition our ability to cope with future stressors. Their equipment gave them optionality, their attitude the ability to exploit it, and their past learning contributed to both.

JR: You are a 'Gold' BA customer exceptionally tolerant to disruptions. At what point did you decide that after so many years, your super loyalty to BA would come to an end?

MG: It was during my travel with Iberia. I described my experience in my article ‘‘Brand suicide case study: British Airways’‘. The moment I decided that I had had enough was when I lost trust in British Airways. The final straw was my really bad experience with Iberia, their merger partner. BA broke the promise they had given me as their customer by refusing to help when I needed their help. I believed I would be cared for throughout my journey by BA who told me I am their 'Gold' customer. But they didn’t care about it or me at all.

JR: You said that the airline industry is stuck with an industrial era ‘‘batch” paradigm and has to prepare itself for ‘‘lean” quality revolution. Could you tell us more?

MG: The ‘‘lean” quality revolution means moving from a resource planning including plane seats and time slots, for example, to one that abstracts its end-user value like destination arrival. You then sell the abstract outcome, not the underlying concrete resource. In this case, this means switching from managing stocks (with unpredictable quality) to flows (with predictable quality possibly with wide bounds). That in turn decomposes into three things: what to change (a new intent), what to change to (a new denotational requirement), and how to cause the change (a new operational behaviour).

Good abstraction is what lets us cut through complexity when done across the whole service lifecycle, from strategy to operations. It allows us to make rational management interventions, safe in the knowledge that we won’t experience unintended side-effects. It is how we go from emergent to engineered experiences.

Low-cost carriers transformed the airline industry with radically simplified services that break the cost vs quality trade-off. This is achieved by scheduling resource appropriately and redefining our services as user-centric outcomes rather than network-centric inputs.

A lean business transformation is ultimately rooted in a management system designed to deliver change. That change could be of a continuous improvement nature, which over time results in a qualitative and radical change; or it could be more of a “leap” in capability designed to be delivered as a whole. There is no right answer, as it depends on the context and need. This breaks the historic trade-off between lower cost and better experience, allowing both to be improved simultaneously. This means relating quality to the business process at every stage. For instance, global travellers between continents could be offered a richer set of products that give the airline more flexibility and share arrival time risk in different ways.

In order to comprehend a new paradigm, you first need to become aware of the one you are presently working inside. This is a matter of making the unconscious and unexamined into the conscious and examined. It can be a difficult thing, as it may challenge our core beliefs, making us want to double-down on them to stay safe. It may also call into question our self-identity and status as “experts”; this is very uncomfortable and can provoke intense internal resistance.

JR: I thought that following quotes from Martin’s “Blueprint for a ‘‘lean” business transformation” would wrap up our conversation well:

“There is a balance between human issues and technology issues. And the initial balance is 100% human and 0% technology. What matters most is not the mathematics, high-fidelity network measurements, or clever new mechanisms. What matters is you, the manager, and the quality management system you are operating”.

“Every business has a quality management system of some kind, and a process for improving it. The first step is to surface what your present system is, identify what is most unsatisfactory about it, and begin to understand what the true underlying root causes are. We always start with the people, then understand the processes they operate, and finally the technology gets attention.”

Beyond Airline Disruptions

Monday, 30 October 2017

Air Travel and Telecoms: Different Disruptions, Same Management Problem - In Conversation with Martin Geddes