“You know, a cellular network with QoS is a little like the mule with a spinning wheel. No one knows how he got it, and danged if he knows how to use it!”
-Bastardising Lyle Lanley's quote about ‘a town with money'
A typical conversation between different groups in a cellular operator (say products, finance and engineering) might go something like:
- Engineering: Our network is terribly congested — our customers are getting a terrible user experience. We really need to increase our capital investment in hardware so that we can split some sectors, add some new carriers and build some new sites.
- Finance: Ouch! That sounds awfully expensive — isn't there some way we can achieve the same goal, but without spending any money? Mwhahaha!
- Products: I know — how about we use QoS?
- Engineering: That could make some people happy, but we will still fundamentally have a congested network — in fact it'll probably make things worse overall.
- Products and Finance (together): Oh really? That doesn't sound intuitive — why is that?
I should note that this article is written from the viewpoint of an engineer and in the spirit of honesty, it is entirely possible that the finance department don't cackle manically as they refuse yet another request to spend money that the business doesn't have. It is, therefore, very important that the engineering department have access to tools and information that allow them to make reasonable and well justified requests for additional CAPEX, especially when dealing with augmenting network capacity.
The focus of this blog post is to describe why Quality of Service (QoS) is not a useful tool for solving capacity issues, and can, in fact, decrease overall network capacity.
When this article refers to QoS, it's actually talking about ‘Differentiated QoS'; mechanisms that get put in place to provide differentiated performance between classes of users and/or services.
In a fixed network, resources can be managed in a predictable and deterministic manner. This means that the relationship between QoS and capacity is an amicable one.
For example, imagine a fixed network that has a capacity of C (and for the sake of using numbers, let's say C = 10Mbps). If we then pretend that capacity is a pipe (making the internet a series of tubes):
If there are 5 (full-buffer) users and resources are shared fairly amongst all users, then each user will get on average 2Mbps.
Likewise, if there are 10 users then on average, then each user will get on average 1Mbps.
Differentiated QoS can be used to give more bandwidth to one class of users at the expense of reducing the bandwidth given to other users:
Whilst amount of capacity (C) doesn't change, the prioritised users are made happy at the expense of the remaining users.
Things are simple in a fixed line world, however when we talk about wireless/radio, things get much more complicated.
For a start, performance and capacity is dependent on radio conditions. Users in good radio conditions will be able to achieve higher data rates than users in poor radio conditions:
To further complicate this, radio conditions fluctuate quickly due to a phenomenon known as fading.
The above diagrams shows that on average User 1 is in better radio conditions than Users 2 or 3. However, there are periods of time when User 1 is in ‘a deep fade' and Users 2 or 3 briefly experience better radio conditions. A smart radio scheduler can exploit this fact for capacity gains (known as a ‘scheduling gain'). So, unlike in the fixed line world, wireless channel capacity (C) is not a constant, it depends on different factors including:
- Radio Conditions
- Scheduling Algorithm
- Number of Schedulable Users
The most capacity efficient scheduling algorithms is referred to as Max C/I, this scheduling algorithm always chooses the user in the best radio conditions to schedule. An unfortunate side effect of this, is that users in suboptimal radio conditions tend to get poor performance (starved). This is an example of an unfair scheduling algorithm.
On the flip-side, the Equal Rate scheduling algorithm is perfectly fair. This algorithm schedules users in bad radio conditions more frequently than those in good radio conditions. This ensures that all users get approximately the same user experience (user throughput). Unsurprisingly, this is not an efficient approach from a capacity perspective.
A compromise between these two extremes is the Proportional Fair scheduler. This algorithm schedules users in good radio conditions most of the time, but also occasionally schedules users in suboptimal radio conditions to make sure that nobody is completely starved. This tends to be the best trade-off between capacity and fairness and is typically what mobile operators use.
It's also important to realise that the capacity of a cell grows (on average) when the pool of schedulable users increases. This is simply because with a larger pool of schedulable users, the probability that one or more users are in good radio conditions (not everybody is experiencing a deep fade) increases. This is known as a ‘statistical multiplexing gain'.
This variability in channel capacity means that using Differentiated QoS to prioritise a class of users can materially affect channel capacity and the manner in which capacity is affected can be somewhat unpredictable.
Imagine the situation where we use QoS to prioritise User 1 (the person in good radio conditions). This would make the Proportional Fair scheduler behave more like a Max C/I scheduler (the user in good radio conditions gets scheduled more than the users in poor radio conditions); channel capacity is increased. On the other hand, if User 3 was prioritised, then capacity would substantially decrease because more time would be spent scheduling users in bad radio conditions which is inefficient.
Finally, even if the class of prioritised users is spread across a variety of different radio conditions, using QoS tends to decrease channel capacity because it limits the pool of users that the scheduler can choose from. If a class of users are prioritised then we need to schedule these users more frequently. In the same manner as increasing the pool of schedulable users increases capacity because there is a higher probability of a schedulable user being in good radio conditions, limiting the pool of schedulable users reduces capacity (the probability that a prioritised schedulable user is in good radio conditions is decreased).
All of the above leads to the following observation — QoS is not a panacea that can be used to improve cell capacity, in fact introducing QoS tends to reduce available capacity and this costs money. There are certainly legitimate uses for QoS, particularly for managing Quality of Experience (QoE), these include things such as:
- Prioritising users in congested cells (e.g. Emergency Services)
- Managing sessions such as VoLTE and Video over LTE
- Limiting user throughputs (e.g. when a data cap has been exceeded)
Properly understanding the effects of QoS on capacity and performance in a wireless network is a complicated endeavour, beyond the scope of either a pen and paper study or excel spreadsheet modelling. An ever more common approach used by sophisticated wireless operators is to use tele-traffic simulations customized to their customer experience KPI targets. These simulations allow call arrivals and departures to be simulated and the QoE for individual users modelled. The basics of tele-traffic simulation and how it can be applied to ‘what-if' capacity modelling will be the subject of a forthcoming blog entry.