Analytical modelling of Software and Hardware Switches with Internal Buffer in Software-Defined Networks

Deepak Singh*, Bryan Ng*,†††, Yuan-Cheng Lai†, Ying-Dar Lin‡, Winston K. G. Seah‡

*School of Engineering and Computer Science, Victoria University of Wellington, New Zealand
†Dept of Information Management, National Taiwan University of Science and Technology, Taipei, Taiwan
‡Dept of Computer Science, National Chiao Tung University, Hsinchu, Taiwan

Abstract

OpenFlow supports internal buffering of data packets in Software-Defined Networking (SDN) switch whereby a fraction of data packet header is sent to the controller instead of an entire data packet. This internal buffering increases the robustness and the utilization of the link between SDN switches and the controller by absorbing temporary burst of packets which may overwhelm the controller. Existing queuing models for an SDN have focused on the switches that immediately sends packets to the controller for decisioning, with no existing models investigating the impact of the internal buffer in SDN software and hardware switches. In this paper, we propose a unified queueing model to characterise the performance of SDN software and hardware switches with the internal buffer. This unified queueing model is an analytical tool for network engineers to predict a delay and loss during SDN deployments in delay and loss sensitive environments. Our results show that a hardware switch achieves up to 80% lower average packet transfer delay and 99% lower packet loss rate at the cost of requiring up to 50% more queue capacity than a software switch. The proposed models are validated with a discrete event simulation, where the error between 0.6%-2.8% was observed for both average packet transfer delay and average packet loss rate. Moreover, a hardware switch outperforms a software switch with increasing number of hosts per switch suggesting that a hardware switch has better scalability. We use the insights from the model to develop guidelines that help network engineers decide between a software and hardware switch in their SDN deployments.

1. Introduction

Software-Defined Networking (SDN) is a new networking architecture that simplifies the switch by moving the forwarding decisions away from a switch to a centralised system which is typically realised as a software-based controller. The concept of an SDN is realised with OpenFlow which is among the first (and most widely used) specification to define the communication between the controller and switch in an SDN architecture [1]. At present, OpenFlow is the dominant protocol for programming SDN switches [2]. OpenFlow handles different types of messages from the controller-to-switch or conversely from the switch-to-controller. The switch-to-controller messages are called asynchronous messages as they are sent without controller solicitation [3].

In the OpenFlow specifications [3], an OpenFlow switch maintains one or more flow tables to make decisions on packet forwarding behavior. Flow tables are linked together to form a pipeline, where each flow table has flow table entries (FTEs) that consist of match fields and actions. Incoming data packets are matched against the match fields and if there is no matching FTE, an asynchronous message called a “packet-in” is generated and sent to the controller.

As SDN deployments move away from traditional data centers to wide area SDNs, wireless access network (called SDWAN - software defined wireless access network) and mobile SDNs, the usual assumption of a reliable and highly available control channel no longer holds. Fortunately, the OpenFlow specifications have provisions for switches to internally buffer packet-
in messages destined for the controller.

Packet-in messages are sent by the switch when there is no matching flow information for an arriving packet to the switch. Packet-in messages are sent either with the arriving data packet or only with a fraction of the data packet header based on the availability of memory in a switch for internal buffering. The data packet header contains routing information which is used by the controller to make forwarding decisions. If a switch has sufficient memory to buffer packets, then the packet header along with a buffer ID (identifier of the buffered packet) is sent with the packet-in message. Similarly, some switches do not support internal buffering and require full data packet (not just the header) to be sent with the packet-in message.

1.1. Internal Buffer

The internal buffer in a switch plays an important role in packet forwarding behaviour. Data packets are internally buffered in an SDN switch to avoid congestion and improve the throughput of the network. The concept of internal buffering is not new to SDN switches and has been traditionally used in Banyan switches [4]. These are a network of complex crossover switches designed to avoid blocking between packets at the input ports. In other areas of networking such as ATM (asynchronous transfer mode), internal buffering has been used in ATM switches to reduce the packet loss rate due to the asynchronous nature of ATM traffic [5].

In an SDN, the internal buffer helps in addressing the impact of lossy and unreliable control plane behavior, a scenario of increasing importance. The study in [6] showed that a lossy control channel significantly degrades a data plane throughput and latency. Some of the benefits of internal buffering in SDN switches are: the forwarding delay of data packets can be decreased [7], Quality of Service can be improved with reduced packet loss [8], and bandwidth of the control channel can be optimized [9].

In an OpenFlow-based SDN switch, if a packet-in event is configured to internal buffering and the switch has sufficient memory to buffer a data packet, then the fraction of a data packet header and buffer ID is encapsulated with a packet-in message. Otherwise, an entire data packet is encapsulated with a packet-in message. The controller processes a packet-in message and generates a packet-out message to the switch updating flow information [3].

Most existing research in the literature analyses the performance of SDN switches with no internal buffering [10–25]. This is perhaps attributed to the evolving nature of the OpenFlow specifications which in their current incarnation leaves the buffering of a data packet as an optional feature. However, it will be increasingly important for the next generation of SDN switches to support internal buffering with increasing diversification of SDN deployments. In these new diversified SDN deployments, there may be intermittent connectivity between the SDN switch and the controller during SDN deployments in domains such as SDWANs, mobile SDN and IoT.

1.2. Hardware vs. Software Switch

In this paper, we are concerned with the modelling of both physical SDN switches (i.e. hardware switches) and virtual switches (i.e. software switches), both with the internal buffer. Both software and hardware switches have strengths and weaknesses, and internal buffering may affect their performance in an SDN. To identify the potential bottlenecks that could hinder the performance of an SDN, the trade-offs between choosing a hardware versus software switch with the internal buffer need to be studied and investigated to improve the performance of SDN.

An SDN-based software switch with the internal buffer maintains the flow table in SDRAM (synchronous dynamic random access memory) where the incoming packet is matched against the FTE using a CPU (central processing unit) [26]. If there is no matching FTE, a data packet is internally buffered and a packet-in message is sent to the controller which feeds back forwarding information to the switch and updates the software flow table. The packet processing logic in a software switch is implemented in software [1] usually with the help of optimized software libraries. Open vSwitch (OVS) [27], Pantou/OpenWRT [28], of-softswitch13 [29], Indigo [30] running on commodity hardware (e.g. desktops with several network interface cards) are a few examples of SDN software switches.

Similarly, in an SDN-based hardware switch with the internal buffer, a packet processing function is embedded in the specialised hardware. This specialised hardware includes layer two forwarding tables implemented using content-addressable memories (CAMs), layer three forwarding tables using ternary content-addressable memories (TCAMs) [1] and application specific integrate circuits (ASICs). In a hardware switch, FTEs are stored in CAMs and TCAMs of the specialised hardware and packets are processed by ASICs. Hardware switches are also equipped with SDRAM and CPU allowing a hardware switch to maintain flow tables in both TCAM and SDRAM [26]. Similar to software switches in an SDN, the CPU in a hard-
ware switch internally buffers data packets when there is no matching FTE.

In this paper, we use queueing theory to derive a first order estimate of an OpenFlow switch’s performance and to identify potential trade-offs between an SDN-based software and hardware switch with the internal buffer. Queueing models are useful in predicting switch performance trends as parameterized functions and link the cause to effect relationships of the switch performance. The main contributions of this paper are as follows:

- It proposes a unified queueing model to characterise the performance of hardware and software switches with the internal buffer in an SDN.
- It identifies the benefits and trade-offs of hardware switch vs. software switch with the internal buffer in an SDN.
- It validates a unified queueing model with a discrete event simulation.
- It investigates the performance of software and hardware switches for a scalable SDN with increasing number of hosts connected to the switch.

The remainder of this paper is structured as follows. Section II discusses the related work and background theory of SDN-based software and hardware switches, and internal buffering. Section III presents the queueing model for an SDN-based software switch with the internal buffer which is followed by the queueing model for an SDN-based hardware switch with the internal buffer in Section IV. Section V discusses buffer dimensioning. Section VI discusses the analytical and validation results in detail. Finally, Section VII concludes the paper with a discussion of the results and conclusion.

2. Related work & Theory

While internal buffering has been well studied in a traditional switch, the buffering of asynchronous messages over a separated control-data plane remains unexplored. The separation of the data plane and control plane in SDN brings a different set of challenges for switch designers working with SDN switches. For example the control decisions from the controller may take up to 1 millisecond to reach the switch.

The internal buffering for software-based SDN switches can be easily realised by configuring packet-in events to support buffering of packets. However, for hardware-based SDN switches, there are very few commodity switches that support internal buffering. Pica8 switches are among the few that support OpenFlow’s feature to configure temporary buffering of packets [31], while other commodity switch manufacturers like Cisco [32], HP enterprise [33], Juniper [34], Arista network [35], and Extreme network [36] still do not support internal buffering. The reason behind fewer commodity switches supporting internal buffering is due to hardware limitations in hardware switches. This is also the reason why there is almost no experimental research conducted on SDN commodity switches to analyse internal buffering.

In [37], the authors adopted an SDN for wireless mesh networks and show that the delay variability and limited bandwidth over the wireless induces throughput and packet losses. However, no internal buffering was considered. The use of internal buffering in [37] could have improved the channel utilization in the SDN-enabled wireless networks for increasing control traffic. Earlier studies [7, 9] suggest that the smoothing of control traffic via the internal buffer would reduce the losses during periods of poor wireless connectivity or sudden burst of new flows to a mesh router. However, these studies have not explored the drawbacks of internal buffering in an SDN.

For SDWAN applications, a multi-path OpenFlow channel for resilience and scalability in wireless environments was proposed in [38]. In SDWANs, the control path may incur failure due to many reasons, such as deep fading, mobility, etc. In such cases, buffering packets in the internal buffer of the switch allows the switch to continue operating momentarily while the control channel recovers back to its stable state.

Hu et al. [39] take a radically different approach whereby the control packets are neither buffered nor sent to the controller immediately but sent through a looping path - inducing delay to allow the control messages to be processed and the feedback from the controller. The internal buffering in [39] could have reduced the delay at the cost of extra memory.

From a performance modelling perspective, queueing theory has been widely used to model and predict the performance of an SDN [10–22, 40]. Most of these studies have modelled a software switch except for [17, 20] which are among the first to model a hardware switch in an SDN. Similarly, the model presented in [40] is among the first to model an SDN switch with the internal buffer.

The above mentioned models use the generic models as shown in Figure 1 for a software switch and Figure 2 for a hardware switch where the input buffer of the CPU is modelled either as a single shared queue...
or two-priority queue. In the single shared queue model [10–12, 16–18, 21, 22], the data traffic and control traffic shares a single queue with FIFO service discipline. While in a two-priority queue model [13–15, 19, 20, 40], control traffic goes to a high priority class queue and data traffic goes to a low priority class queue where data traffic is served without preemption. The single shared queue model is not suitable for modelling internal buffers because there is no packet level distinction between data and control traffic. This differentiation is easily modelled in the two-priority queue model and thus is the most relevant starting point for our work presented in this paper.

Moreover, a key finding from previous modelling work on SDN switches is that the use of two-priority queue in the output buffer of a switch better reflects the SDN behaviour. Analytical and simulation studies in [19] show that the time to install FTE is significantly lower in a priority queue compared to a single shared queue.

The model presented in [40] shows the benefit of the internal buffer which significantly reduces the delay and loss rate but at the cost of the higher memory required by the CPU for internal buffering. However, this model considers internal buffering in a software switch and cannot model the dynamics of a faster hardware switch that has dual service rates (i.e. specialized hardware service rate and CPU service rate) – thus producing estimates that are less accurate for hardware switches.

The model presented in [17] assumes that the input buffer of a switch as a single shared buffer but have not accounted the switch-controller interaction. The analysis in this work does not map the workings of a hardware switch such as the flow matching and dedicated packet processing to the queueing model as shown in Figure 2. These limitations of [17] have been addressed in [20] through a unified queueing model with both software and hardware switches. However, a unified queueing model in [20] has not considered the internal buffering capabilities of an SDN switch.

The models in [40] and [20] have paved the way for building a new unified queueing model for internal buffering within SDN switches. A summary of existing queueing models for SDN switches with and without the internal buffer is shown in Table 1. Similarly, Table 2 lists the notations used for performance analysis in this paper.

In the following subsection, generic models for SDN software and hardware switches are discussed.

2.1. Packet Flow in Software and Hardware SDN Switches

A generic block diagram of an SDN-based software switch where the external data packet arrives at the switch which is connected to the controller is shown in Figure 1. There are three important phases an SDN model of a software switch must capture. Phase (1), the first packet of a flow arrives at the switch and there is no matching FTE for the packet in SDRAM. Phase (2), the packet without a matching flow entry is forwarded to the controller or a packet with the matching FTE is serviced by the switch and forwarded to the destination. All packet processing and forwarding in the switch is executed on the CPU and the SDRAM. Finally, Phase (3), the controller feeds the forwarding information back to the switch and updates the flow table. Software switches have been studied and analysed in [10–16, 18–22, 40] based on the generic block model shown in Figure 1.

Figure 2 shows the block diagram of an SDN-based hardware switch where the switch maintains flow tables in both hardware and software. The hardware and software flow tables are synchronised through a middleware layer on the switch [42, 43] to avoid duplicate entries and to ensure consistent forwarding behaviour.

There are four important phases an SDN model of a hardware switch must capture. Phase (1), the first packet of a flow arrives at the specialised hardware in the switch that maintains hardware FTEs and there is no matching FTE for the packet. Phase (2), a packet with the matching FTE in the TCAM is serviced by the ASIC and forwarded to the destination, otherwise a packet without a matching FTE in TCAM is matched against the FTE in SDRAM and processed by the CPU for forwarding to the destination. In phase (3), a packet without any matching FTE in the TCAM or SDRAM is forwarded to the controller. In phase (4), the controller
Table 1: Summary of queueing models for SDN switches with and without the internal buffer.

<table>
<thead>
<tr>
<th>Model</th>
<th>Internal Buffer</th>
<th>CPU Model</th>
<th>Analysis</th>
<th>Switch Type</th>
</tr>
</thead>
<tbody>
<tr>
<td>Jarschel [10]</td>
<td>Yes</td>
<td>M/M/1</td>
<td>Exact</td>
<td>Software</td>
</tr>
<tr>
<td>Mahmood [11]</td>
<td>Yes</td>
<td>M/M/1</td>
<td>Exact</td>
<td>Hardware</td>
</tr>
<tr>
<td>Miao [41]</td>
<td>Yes</td>
<td>M/M/1</td>
<td>Approximate</td>
<td>Software</td>
</tr>
<tr>
<td>Shang [16]</td>
<td>Yes</td>
<td>M/H/1</td>
<td>Exact</td>
<td>Hardware</td>
</tr>
<tr>
<td>Sood [17]</td>
<td>Yes</td>
<td>M/Geo/1</td>
<td>Approximate</td>
<td>Hardware</td>
</tr>
<tr>
<td>Miao [13]</td>
<td>Yes</td>
<td>MMAP</td>
<td>Exact</td>
<td>Hardware</td>
</tr>
<tr>
<td>Goto [14]</td>
<td>Yes</td>
<td>GI/M/1/K</td>
<td>Approximate</td>
<td>Hardware</td>
</tr>
<tr>
<td>Javed [18]</td>
<td>Yes</td>
<td>M/G/1</td>
<td>Exact</td>
<td>Hardware</td>
</tr>
<tr>
<td>Singh [19]</td>
<td>Yes</td>
<td>GI/M/1/K</td>
<td>Approximate</td>
<td>Hardware</td>
</tr>
<tr>
<td>Lai [21]</td>
<td>Yes</td>
<td>MMPP/M/1</td>
<td>Exact</td>
<td>Hardware</td>
</tr>
<tr>
<td>Singh [40]</td>
<td>Yes</td>
<td>GI/M/1/K</td>
<td>Approximate</td>
<td>Hardware</td>
</tr>
<tr>
<td>Fahmin [22]</td>
<td>Yes</td>
<td>M/M/1</td>
<td>Approximate</td>
<td>Hardware</td>
</tr>
<tr>
<td>Singh [20]</td>
<td>Yes</td>
<td>GI/M/1/K</td>
<td>Approximate</td>
<td>Hardware</td>
</tr>
<tr>
<td>Our Analysis</td>
<td>Yes</td>
<td>GI/M/1/K</td>
<td>Approximate</td>
<td>Hardware</td>
</tr>
</tbody>
</table>

Table 2: Notations used for performance analysis.

<table>
<thead>
<tr>
<th>Parameters</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>$\lambda$</td>
<td>External arrival rate at the switch</td>
</tr>
<tr>
<td>$\mu_c$</td>
<td>Service rate of the controller processor</td>
</tr>
<tr>
<td>$\mu_{sp}$</td>
<td>Service rate of the CPU processor</td>
</tr>
<tr>
<td>$\mu_{sh}$</td>
<td>Service rate of the hardware processor</td>
</tr>
<tr>
<td>$\beta$</td>
<td>Table miss probability</td>
</tr>
<tr>
<td>$BER$</td>
<td>Bit Error Rate</td>
</tr>
<tr>
<td>$PER$</td>
<td>Packet Error Ratio</td>
</tr>
<tr>
<td>$n$</td>
<td>Number of bits in the packet</td>
</tr>
<tr>
<td>$N$</td>
<td>Number of hosts connected to the switch</td>
</tr>
<tr>
<td>$\rho$</td>
<td>Server utilisation at the queue</td>
</tr>
<tr>
<td>$T$</td>
<td>Throughput of the queue</td>
</tr>
<tr>
<td>$t$</td>
<td>Mean sojourn time of packets at the queue</td>
</tr>
<tr>
<td>$E[L]$</td>
<td>Total number of data packets in the system</td>
</tr>
<tr>
<td>$PL$</td>
<td>Average packet loss rates</td>
</tr>
<tr>
<td>$K_{min}$</td>
<td>Minimum queue capacity for a switch</td>
</tr>
<tr>
<td>$m_r$</td>
<td>Controller to CPU Processing Ratio ($\mu_c/\mu_{sp}$)</td>
</tr>
<tr>
<td>$m_s$</td>
<td>Specialised hardware to CPU Processing Ratio ($\mu_{sh}/\mu_{sp}$)</td>
</tr>
<tr>
<td>$\epsilon_k$</td>
<td>Relative minimum capacity</td>
</tr>
<tr>
<td>$\epsilon_d$</td>
<td>Relative average delay</td>
</tr>
<tr>
<td>$\epsilon_l$</td>
<td>Relative packet loss rate</td>
</tr>
</tbody>
</table>

Figure 2: Generic model for an SDN with Hardware Switch.

feeds the forwarding information back to the switch, updates the flow tables in both TCAM and SDRAM. Finally, the packet is serviced by the CPU and forwarded to the destination. A hardware switch has been studied and analysed in [20] based on the generic block model shown in Figure 2.

Based on these generic models, this paper investigates SDN software and hardware switches that support internal buffering with the help of queuing theory. Ad-
tionally, a priority queueing structure is used for the CPU that handles data and control packets, and buffer dimensioning is performed to calculate the minimum queue capacity for the switch which is discussed in the following subsection.

2.2. Buffer Dimensioning

The concept of buffer dimensioning in queueing network is to determine the buffer size \( K \) for a given desired loss probability, hence to ensure losses due to queueing are below desired loss probability. In an SDN queuing network, it is of prime importance to provide no losses to control packets that carry the updated flow table information. The desired loss probability for the outgoing link is given in bit error rate (BER), which is \( 10^{-12} \) for 1Gbps link according to IEEE 802.3 standard [44]. In this paper, we use this value of BER for buffer dimensioning.

For buffer dimensioning, the buffers are first assumed to be an infinite queue, and the queue is truncated at some finite integer \( K \), such that the desired loss probability is achieved [45, 46]. The required buffer space is measured in packets.

The minimum queue capacity for a switch (denoted by \( K_{\text{min}} \)) can be derived using an infinite queue model (i.e. M/M/1 queue). However, losses in queues are typically expressed as Packet Error Ratio (PER) while losses in outgoing links are expressed by BER. The relationship between BER and PER is given as:

\[
\text{PER} = 1 - (1 - \text{BER})^n,
\]

where \( n \) is the number of bits in the packet. In an M/M/1 queue, the probability the queue length \( (L) \) exceeds \( K_{\text{min}} \) is given by \( P_l(L > K_{\text{min}}) = \rho^{K_{\text{min}}} \), where \( \rho^{K_{\text{min}}} \) is the server utilization at the queue for given \( K_{\text{min}} \). The value of \( K_{\text{min}} \) is calculated as

\[
K_{\text{min}} \geq \frac{\log[\text{PER}]}{\log(\rho^{K_{\text{min}}})}.
\]

In this paper, \( K_{\text{min}} \) determines the minimum queue capacity of the switch in the queueing model.

2.3. Quasi-Birth-Death process

In queueing theory, Quasi-Birth-Death (QBD) process has been widely used to model a computer network due to the flexibility it provides to account a larger amount of details [47, 48]. For this reason, the modelling approach in this paper is based on QBD processes. Hence, this subsection is devoted to describe the notation and concepts behind QBD processes.

A QBD process is a continuous-time Markov chain with multidimensional state spaces that can be partitioned into disjoint levels [49]. A continuous-time QBD process is a two-dimensional Markov chain represented as \( \{(X_t, Y_t), t \geq 0\} \) with the state space \( \mathbb{S} = \{(i, j) \in [0, 1, \ldots, K] \times [0, 1, \ldots, L] \} \) where \( i \) and \( j \) denotes level and phase variables of the process, respectively [50]. Similarly, \( K \) and \( L \) determines the queue capacities of level and phase variables respectively which can be finite or infinite.

In queueing networks, a QBD process can be multi-dimensional with one level variable and multi-dimensional phase variables, whereby the phase variables denote the number of the nodes or queues in the network. For \( N \) number of queues, the state of the network can be represented by the vector \( n = (n_1, n_2, \ldots, n_N) \) where \( n_i \) is the number of packets in queue \( i \). If queue \( 1 \) is the queue of interest for analysis, then packets at queue \( 1 \) are represented by the level variable and packets at queues other than queue \( 1 \) are represented by phase variables as the vector \( r = (n_2, n_3, \ldots, n_N) \) [51].

In QBD process, the transitions between the state are limited within the level or between two adjacent levels. If the transitions of QBD process are independent of the level, then such type of QBD process is homogenous or level-independent. Similarly, if the transitions are dependent of the level, then QBD process is nonhomogenous or level-dependent [52].

For an SDN switch with the internal buffer, the level variable tracks the number of packets in the internal buffer of the switch and the phase variable tracks the number of packets in the switch (excluding the internal buffer) and the controller. Due to the dependency of packets in the controller and the switch with temporarily buffered packets in the internal buffer, QBD process for an SDN switch with the internal buffer is non-homogeneous.

Using standard QBD notation [53], the transition rate matrix of non-homogenous QBD process is given by infinitesimal generator matrix denoted as \( G \) with a repetitive block structure expressed as:

\[
G = \begin{bmatrix}
B_1 & A_0^{(0)} & 0 & \cdots \\
A_2^{(1)} & A_1^{(1)} & A_0^{(1)} & \cdots \\
0 & A_2^{(2)} & A_1^{(2)} & A_0^{(2)} & \cdots \\
& \vdots & \ddots & \ddots & \ddots 
\end{bmatrix},
\]

where \( A_i^{(j)} \), \( A_i^{(0)} \), and \( A_i^{(k)} \) are non-negative sub-matrices for \( i \geq 0 \). The sub-matrices \( A_i^{(j)} \), \( A_i^{(k)} \), and \( A_i^{(l)} \) represent phase variable distribution when the level variable
increases by 1 (i.e. $i \rightarrow i + 1$), remains unchanged
(i.e. $i \rightarrow i$), and decreases by 1 (i.e. $i \rightarrow i - 1$ for
$i > 0$), respectively. Note that the use of $A_0, A_1, A_2,$
and $B_1$ are standard QBD notations [14, 51, 52]. The
sub-matrices $B_1$ and $A_1$ represent phase distribution when
level variable remains unchanged. The sub-matrix $B_1$ or
$A_1^{(0)}$ represents the state of the network when the level
variable is “0” (i.e. the internal buffer has no packets in
its queue), while $A_1$ represents the network with the
internal buffer having at least one packet in its queue.
Similarly, $A_0$ and $A_2$ represent phase distribution when
number of packets in the internal buffer increases or
decreases by 1 respectively.

The exact distribution probabilities or the stationary
state distribution ($\pi$) for QBD process is obtained by
solving the system, $\pi G = 0$ and $\pi \epsilon = 0$, where $\epsilon$
is the column vector with all elements as one. These distribution
probabilities can be used to determine various perfor-
mance metrics of the queuing network like average
delay and throughput.

Throughout this paper, we assume the controller has
an infinite capacity queue with M/M/1 distribution, and
the CPU of a switch has a finite capacity GI/M/1/K
queue to represent independent arrivals with general
distribution [54]. The total number of hosts connected
to the switch is denoted as $N$. External packets arriving
at the switch from each connected hosts are assumed to
arrive according to a Poisson distribution with param-
eter $\lambda$. If there is no matching FTE in the switch, an
external packet is temporarily buffered in the internal
buffer and packet-in message is generated and sent to
the controller with a probability $\beta$. The service rates
of the CPU and the specialised hardware in the switch,
and the controller are denoted by $\mu_{sp}$, $\mu_{sh}$, and $\mu_c$, re-
spectively. The average packet transfer delay and loss
rate are primary performance metrics to compare SDN
software and hardware switches with the internal buffer.

3. Software Switch with the internal buffer: SPQ

We have named our queuing model for a software
switch with the internal buffer as Model SPQ, where
“S” refers to the switch with a software data plane, “P”
refers to use of a two-priority queuing structure, and
“Q” refers to queuing of data packets in the internal
buffer.

As seen in Figure 3, the switch supports internal
buffering and the input buffer of the switch is modelled
as a finite capacity with non-preemptive two-priority
class queues, Class ES (low priority class for data pack-
ets) and Class CS (high priority class for control pack-
ets) like “SPE” in [19].

The packet processing in SPQ can be explained in
four steps as shown in Fig. 3: (1) external data pack-
ets arrive at Class ES of the switch from $N$ number of
hosts cooverallnected to the switch, (2) data packets are
temporarily buffered in the internal memory and a
fraction of data packets are forwarded to the controller
encapsulated with “packet-in” control messages if the
switch does not have matching FTE or successfully
forwarded to the destination through an output port,
(3) the controller feedback the forwarding information
with packet-out message to Class CS of the switch, (4)
switch process the control packets in Class CS, update
the flow table with forwarding information, temporari-
ously buffered data packets are extracted from the internal
buffer and forwarded to the destination through an out-
put port.

SPQ is modelled as a continuous time
Markov process with four state variables,
\[ (n_0(t), n_1(t), n_2(t), n_3(t), t \geq 0) \]

The state variables denoted by $n_0(t), n_1(t), n_2(t),$ and $n_3(t)$ represent
the number of packets in the internal buffer, controller,
Class CS, and Class ES respectively. Let the Markov
process at time $t$ be defined as:

\[ \{n_0(t), n_1(t), n_2(t), n_3(t)\} = \{w, x, y, z\} \]

where $w \in \mathbb{Z}_{\geq 0}^K$, $x \in \mathbb{Z}_{\geq 0}^1$, $y \in \mathbb{Z}_{\leq K_1}$, and $z \in \mathbb{Z}_{\leq K_2}$.
The number of packets in the controller and Class CS
is dependent on the number of packets in the internal
buffer. Therefore, the state space of the controller and
Class CS can be rewritten as, $x \in \mathbb{Z}_{\leq w}^1$, and $\gamma = (w - x)$
subject to $(w - x) \leq K_1$.

For example, if the number of packets in the inter-
nal buffer at some instant $t$ is 1, i.e. $n_0(t) = 1,$
then the permissible state space for controller and Class
where,\

where, w

the number of packets in the internal buffer distribution of the controller, Class CS, and Class ES when

3.1. Elements of matrix A

metrics for SPQ. These sub-matrices are inputs to the matrix geo-

and these help us to derive sub-matrices (denoted by

Due to this dependency, Markov process in SPQ is nonhomogenous QBD process [51] with the internal buffer as a level variable; while the number of packets in the controller, Class CS and Class ES are phase variables. The permissible transitions for the Markov chain \([(n_b(t), n_c(t), n_{cs}(t), n_{es}(t))]\) are shown in Table 3 and these help us to derive sub-matrices (denoted by \(A_0, A_1, B_1 \) and \(A_2\) ) of transition generator matrix \((G)\) for SPQ. These sub-matrices are inputs to the matrix geometric solution for computing the stationary distribution probability \((\pi)\) which is used to determine performance metrics for SPQ.

3.2. Elements of matrix \(A_1\)

The sub-matrix \(A_1\) for SPQ represents the phase distribution of the controller, Class CS, and Class ES when the number of packets in the internal buffer (i.e. \(n_b(t)\) or \(w\) in Eq. (3)) increases by 1:

\[
A_{0(x,x')} = \begin{cases} 
A_{00}^{(x,x')}, & \text{if } x' = x + 1, \\
0, & \text{otherwise},
\end{cases}
\]

where,

\[
A_{00}^{(x,x')} = \begin{cases} 
A_{001}^{(x,x')}, & \text{if } y' = y = 0, \\
0, & \text{otherwise},
\end{cases}
\]

where,

\[
A_{001}^{(x,x')} = \begin{cases} 
\mu_{sp}\beta, & \text{if } z' = z - 1, \\
0, & \text{otherwise}.
\end{cases}
\]

3.2. Elements of matrix \(A_1\)

The sub-matrix \(A_1\) for SPQ represents the phase distribution of the controller, Class CS, and Class ES when the number of packets in the internal buffer remain unchanged and there are some packets in the internal buffer (i.e. \(n_b(t)\) or \(w\) in Eq. (3) is a positive integer that remain unchanged):

\[
A_{1(x,x')} = \begin{cases} 
A_{11}^{(x,x')}, & \text{if } x' = x, \\
A_{12}^{(x,x')}, & \text{if } x' = x - 1, \\
0, & \text{otherwise},
\end{cases}
\]

where,

\[
A_{11}^{(x,y')} = \begin{cases} 
A_{111}^{(x,y')}, & \text{if } y' = y, \\
0, & \text{otherwise},
\end{cases}
\]

and

\[
A_{12}^{(x,y')} = \begin{cases} 
A_{120}^{(x,y')}, & \text{if } y' = y + 1, \\
0, & \text{otherwise},
\end{cases}
\]

where,

\[
A_{111}^{(x,y,z')} = \begin{cases} 
N\lambda, & \text{if } z' = z + 1, \\
\mu_{sp}(1 - \beta), & \text{if } y = 0, z' = z - 1, \\
0, & \text{otherwise}.
\end{cases}
\]

and

\[
A_{120}^{(x,y,z')} = \begin{cases} 
\mu_{c}, & \text{if } z' = z, \\
0, & \text{otherwise}.
\end{cases}
\]

The diagonal elements of \(A_{111}^{(x,y,z')}\) where \(z\) is equal to \(z'\) has four distinct cases:

\[
\begin{align*}
A_{111}^{(x,y,z')} & = \begin{cases} 
N\lambda, & \text{if } z' = z + 1, \\
\mu_{sp}(1 - \beta), & \text{if } y = 0, z' = z - 1, \\
0, & \text{otherwise}
\end{cases} \\
A_{120}^{(x,y,z')} & = \begin{cases} 
\mu_{c}, & \text{if } z' = z, \\
0, & \text{otherwise}.
\end{cases}
\end{align*}
\]

The diagonal elements of \(A_{111}^{(x,y,z')}\) where \(z\) is equal to \(z'\) has four distinct cases:
when the number of packets in the controller is equal to that in the internal buffer i.e. \(0 < x < w\) and \(w < K_2\),

\[
A_{111}^{(x,w)}(ii) = \begin{cases} 
-N \lambda - \mu_0, & 0 \leq z < K_2; \\
-\mu_0, & z = K_2; \\
0, & \text{otherwise},
\end{cases}
\]

(iii) when the number of packets in the controller is equal to that in the internal buffer which is not full i.e. \(x = w\) and \(w < K_3\),

\[
A_{111}^{(x,w)}(iii) = \begin{cases} 
-N \lambda - \mu_0 - \mu_c, & 0 \leq z < K_2; \\
-\mu_0 - \mu_c, & z = K_2; \\
0, & \text{otherwise},
\end{cases}
\]

(iv) when the number of packets in the controller and the internal buffer are equal to the queue size of the internal buffer i.e. \(x = w = K_3\),

\[
A_{111}^{(x,w)}(iv) = \begin{cases} 
-N \lambda - \mu_c, & z = 0; \\
-N \lambda - \mu_0(1 - \beta) - \mu_c, & 0 < z < K_2; \\
-\mu_0(1 - \beta) - \mu_c, & z = K_2; \\
0, & \text{otherwise},
\end{cases}
\]

3.3. Elements of matrix \(B_1\)

The sub-matrix \(B_1\) for SPQ represents the phase distribution of the controller, Class CS and Class ES when the number of packets in the internal buffer is unchanged and there is no packet in the internal buffer (i.e. \(n_b(t)\) or \(w\) in Eq. (3)) is equal to 0:

\[
B_{1(x,x)} = \begin{cases} 
B_{11}^{(x)}, & x' = x = 0, \\
0, & \text{otherwise},
\end{cases}
\]

where,

\[
B_{11}^{(0,y')} = \begin{cases} 
B_{111}^{(y)}, & y' = y = 0, \\
0, & \text{otherwise},
\end{cases}
\]

\[
B_{111}^{(0,y')} = \begin{cases} 
N \lambda, & z' = z + 1, \\
\mu_0(1 - \beta), & z' = z - 1, \\
0, & \text{otherwise}.
\end{cases}
\]

The diagonal elements of \(B_{111}^{(0,y')}\) where \(z\) is equal to \(z'\) are expressed as

\[
B_{111}^{(0,y')} = \begin{cases} 
-N \lambda, & z = 0, \\
-N \lambda - \mu_0, & 0 < z < K_2, \\
-\mu_0, & z = K_2, \\
0, & \text{otherwise}.
\end{cases}
\]

3.4. Elements of matrix \(A_2\)

The sub-matrix \(A_2\) for SPQ represents the phase distribution of the controller, Class CS and Class ES when the number of packets in the internal buffer (i.e. \(n_b(t)\) or \(w\) in Eq. (3)) decreases by 1:

\[
A_{2(x,x')} = \begin{cases} 
A_{21}^{(x)}, & x' = x, \\
0, & \text{otherwise},
\end{cases}
\]

where,

\[
A_{21}^{(x)} = \begin{cases} 
A_{212}^{(x)}, & y' = y - 1, \\
0, & \text{otherwise},
\end{cases}
\]

where,

\[
A_{212}^{(x)} = \begin{cases} 
\mu_0, & z' = z, \\
0, & \text{otherwise}.
\end{cases}
\]

3.5. Performance Metrics for SPQ

The throughputs of Class CS (\(T_{cs}\)) and the internal buffer (\(T_b\)) for SPQ are same because we have assumed a data packet in the internal buffer is extracted instantaneously after a control packet in Class CS has been processed. This assumption is reflected in the permissible transitions table for SPQ as shown in Table 3. The throughput of the internal buffer for SPQ is given by the sum of probabilities that the internal buffer has at least one data packet to forward (service rate of \(\mu_{sp}\)) and this is given by:

\[
T_b = T_{cs} = \mu_{sp} \sum_{w=1}^{K_1} \sum_{x=0}^{K_2} \pi_{w,x,0,0}.
\]

overall Similarly, the throughput of the controller (\(T_c\)) for SPQ is given by the sum of probabilities that the controller has at least one control packet to forward (service rate of \(\mu_c\)) with the condition that there is at least one
data packet temporarily buffered in the internal buffer, and this is given by:

\[ T_c = \mu_c \sum_{i=1}^{\infty} \sum_{x=0}^{\infty} \sum_{z=0}^{\infty} \pi_{i,x,y,z} \]  

(5)

Also, the throughput of Class ES \( (T_{es}) \) for SPQ is given by the sum of probabilities that the Class ES has at least one data packet to forward (service rate of \( \mu_{sp} \)) and there is no packet in Class CS in the stationary state, and this is given by:

\[ T_{es} = \mu_{sp} \sum_{i=0}^{\infty} \sum_{x=0}^{\infty} \sum_{z=0}^{\infty} \pi_{i,x,y,z} \]  

(6)

The average number of data packets in SPQ is \( E[L]_{SPQ} \) where data packets travel only through the switch (the Class ES and the internal buffer). Therefore, \( E[L]_{SPQ} \) is expressed as:

\[ E[L]_{SPQ} = \sum_{i=0}^{\infty} \sum_{x=0}^{\infty} \sum_{z=0}^{\infty} (w+2) \pi_{i,x,y,z} \]  

(7)

Again, applying Little’s theorem to Eq. (7) yields the average packet transfer delay in SPQ (commonly denoted by the mean sojourn time of the packet) at the switch (denoted by \( t_{SPQ} \)) which is expressed as:

\[ t_{SPQ} = E[L]_{SPQ}/T_{SPQ} \]  

(8)

where \( T_{SPQ} \) is the throughput of SPQ expressed as:

\[ T_{SPQ} = T_b + (1 - \beta) T_{cs} \]  

(9)

Similarly, the average packet loss rate of the Class CS \( (PL_{cs}) \), the Class ES \( (PL_{es}) \), and the internal buffer \( (PL_{ib}) \) represents the average number of packets being blocked or dropped by the Class CS, the Class ES, and the CPU’s internal buffer out of total incoming packets. The loss rates \( PL_{cs}, PL_{es}, \) and \( PL_{ib} \) for Model SPQ are expressed as:

\[ PL_{cs} = PL_{ib} = 1 - T_{cs}/T_c \]  

\[ PL_{es} = 1 - T_{es}/N.A. \]  

(10)

Assuming independence between the arrival at the Class CS, the Class ES and the internal buffer, the total packet loss rate for SPQ \( (PL_{SPQ}) \) is the sum of packet loss rate in the Class CS, the Class ES and the internal buffer which is given as,

\[ PL_{SPQ} = PL_{cs} + PL_{es} + PL_{ib} \]  

(11)

4. Hardware Switch with internal buffer: HPQ

Similar to “SPQ” for a software switch with the internal buffer, we have named queueing model for a hardware switch with the internal buffer as Model HPQ, where “H” refers to a hardware data plane. HPQ is an extension of SPQ, with one additional server and a queue for the specialised hardware with M/M/1/K distribution.

As shown in Figure 4, the switch has two servers, one for the specialised hardware (referred as hardware processor and denoted by \( \mu_{sh} \)) and other one for the CPU (referred as CPU processor and denoted by \( \mu_{sp} \)). Similar to SPQ, CPU is modelled as a finite capacity with non-preemptive two-priority class queues; Class HP (similar to Class ES for SPQ) as a low priority, Class CP (similar to Class CS for SPQ) as a high priority.

![Figure 4: HPQ–An SDN hardware switch with internal buffer.](image)

The packet processing in HPQ can be explained in five steps as shown in Fig. 4: (1) external data packets arrive at the specialised hardware of the switch from \( N \) number of hosts connected to the switch, (2) data packets are forwarded to Class HP of the CPU if specialised hardware in the switch does not have matching FTE or forwarded to destination through an output port, (3) data packets are temporarily buffered in the internal memory and a fraction of data packets are forwarded to the controller encapsulated with packet-in messages, (4) the controller feedback the forwarding information with packet-out messages to Class CP of the CPU, (5) finally the CPU processes control packets in
Class CP, updates and synchronises the software flow table with the flow table in the specialised hardware, extracts temporarily buffered data packets from the internal buffer and forwards them to the destination through an output port.

HPQ is modelled as a continuous time Markov process with five state variables, \((n_0(t), n_1(t), n_{es}(t), n_{cs}(t), n_{sh}(t)), t \geq 0\). The state variables denoted by \(n_0(t), n_1(t), n_{es}(t), n_{cs}(t), \) and \(n_{sh}(t)\) represent the number of packets in the internal buffer, controller, Class CP, Class HP, and the specialised hardware respectively.

Similar to SPQ, queue capacities of the internal buffer, Class CP and Class HP are \(K_1, K_1,\) and \(K_2\) respectively; and the controller is assumed to have infinite capacity. The queue capacity of the specialised hardware is \(K_4\). Let the Markov process at time \(t\) be defined as:

\[
[n_0(t), n_1(t), n_{es}(t), n_{cs}(t), n_{sh}(t)] = [v, w, x, y, z] \quad (12)
\]

where \(v \in Z_{n_0}^{K_1}, w \in Z_{n_1}, x \in Z_{n_{es}}^{K_1}, y \in Z_{n_{cs}}^{K_1},\) and \(z \in Z_{n_{sh}}^{K_1}.\) The number of packets in the controller and Class CP is dependent on the number of temporarily buffered packets in the internal buffer, therefore state space of the internal buffer, controller and Class CP can be rewritten as, \(v \in Z_{n_0}^{K_1}, w \in Z_{n_1},\) and \(x = (v - w) \leq K_1.\)

Due to the dependency of \(n_1(t)\) and \(n_{es}(t)\) on the internal buffer, the process governing the number of packets in HPQ is also nonhomogenous QBD process with the internal buffer as a level variable; the controller, Class CP, Class HP, and the specialised hardware as phase variables. The permissible transitions for the Markov chain \([n_0(t), n_1(t), n_{es}(t), n_{cs}(t), n_{sh}(t)])\) are shown in Table 4. These transitions help us derive sub-matrices \((A_0, A_1, B_1\) and \(A_2)\) of the generator matrix \((G)\) for HPQ. These sub-matrices are input to matrix geometric solution to compute the stationary distribution probability \((\pi)\) which is used to determine performance metrics for HPQ.

### 4.1. Elements of matrix \(A_0:\)

The sub-matrix \(A_0\) for HPQ represents the phase distribution of the controller, Class CP, Class HP, and the specialised hardware when the number of packets in the internal buffer (i.e., \(n_0(t)\) or \(v\) in Eq. (12)) increases by 1:

\[
A_0\{w, w'\} = \begin{cases} 
A_{00}^{(w)}, & w' = w + 1, \\
0, & \text{otherwise}, 
\end{cases}
\]

where,

\[
A_{00}^{(w)} = \begin{cases} 
A_{001}^{(w)}, & x' = x = 0, \\
0, & \text{otherwise}, 
\end{cases}
\]

where,

\[
A_{001}^{(w)} = \begin{cases} 
A_{0012}^{(w)}, & y' = y + 1, \\
0, & \text{otherwise}, 
\end{cases}
\]

where,

\[
A_{0012}^{(w)} = \begin{cases} 
\mu_{sh}, & z' = z, \\
0, & \text{otherwise}. 
\end{cases}
\]
The diagonal elements of and of the system (HPQ).  

One packet arrives to switch hardware.  

One packet departs from hardware to out of the system (HPQ).  

One packet arrives at Class HP for CPU processing.  

One packet departs from Class HP to the internal buffer and subsequently one packet-in message is sent to controller.  

One packet serviced by Controller to Class CP.  

One packet out in Class CP is processed and subsequently one packet departs from the internal buffer to out of the system (HPQ).

\[ A_{1110}^{(z)} \]  

\[ A_{1201}^{(z)} \]  

The diagonal elements of \( A_{1111}^{(z)} \) where \( z \) is equal to \( z' \) has four distinct cases:

(i) when there is no packet in the controller (i.e. \( n_c(t) \) or \( w \) in Eq. (12) is equal to 0),

\[ A_{1111}^{(z)} = \begin{cases}  
-N\lambda - \mu_{sp}, & 0 \leq y \leq K_2, \\
-N\lambda - \mu_{sh} - \mu_{sp}, & 0 \leq y < K_2, \\
-N\lambda - \mu_{sh}(1 - \beta) - \mu_{sp}, & y = K_2, \\
-\mu_{sh} - \mu_{sp}, & 0 < y < K_2, \\
-\mu_{sh}(1 - \beta) - \mu_{sp}, & 0 < y < K_2, \\
0, & z = K_4; \\
\end{cases} \]

(ii) when the number of packets in the controller is less than that in the internal buffer which is not full.

(iii) when the number of packets in the controller is equal to that in the internal buffer which is not full.
\[ \begin{align*}
A_{1111}^{(z',z)} &= \begin{cases}
-N\lambda - \mu_c, & y = z = 0; \\
-N\lambda - \mu_{sh} - \mu_c, & y = 0, 0 < z < K_4; \\
-\mu_{sh} - \mu_c, & z = K_4; \\
-N\lambda - \mu_{sp} - \mu_c, & y < K_2, z = 0; \\
-N\lambda - \mu_{sp} - \mu_{sh} - \mu_c, & 0 < y \leq K_2, 0 < z < K_4; \\
-\mu_{sp} - \mu_{sh} - \mu_c, & y = K_2, z = K_4; \\
-\mu_{sp} - \mu_{sh}(1 - \beta), & y < K_2; \\
-\mu_{sh}(1 - \beta), & z = K_4; \\
-\mu_c & z = K_4; \\
0, & \text{otherwise},
\end{cases}
\end{align*} \]

(iv) when the number of packets in the controller and the internal buffer are equal to the queue size of the internal buffer i.e. \( w = v = K_3 \),

\[ \begin{align*}
A_{1111}^{(z',z)} &= \begin{cases}
-N\lambda - \mu_c, & 0 \leq y \leq K_2, z = 0; \\
-N\lambda - \mu_{sh} - \mu_c, & 0 \leq y \leq K_2, 0 < z < K_4; \\
-\mu_{sh} - \mu_c, & z = K_4; \\
-N\lambda - \mu_{sh}(1 - \beta), & y = K_2, z = K_4; \\
-\mu_{sh}(1 - \beta), & z = K_4; \\
0, & \text{otherwise},
\end{cases}
\end{align*} \]

where,

\[ B_{1111}^{(z',z)} = \begin{cases}
-N\lambda, & z' = z + 1, \\
\mu_{sh}(1 - \beta), & z' = z - 1, \\
0, & \text{otherwise}.
\end{cases} \]

The diagonal elements of \( B_{1111}^{(z',z)} \) where \( z \) is equal to \( z' \) is expressed as

\[ \begin{align*}
B_{1111}^{(z',z)} &= \begin{cases}
-N\lambda, & y = 0, z = 0; \\
-N\lambda - \mu_{sh}, & y = 0, 0 < z < K_4; \\
-\mu_{sh}, & y = 0, z = K_4; \\
-N\lambda - \mu_{sp}, & 0 < y \leq K_2, z = 0; \\
-\mu_{sh} - \mu_{sp}, & 0 < y \leq K_2, 0 < z < K_4; \\
-N\lambda - \mu_{sh}(1 - \beta) - \mu_{sp}, & y = K_2, 0 < z < K_4; \\
-\mu_{sh}(1 - \beta) - \mu_{sp}, & y = K_2, z = K_4; \\
0, & \text{otherwise}.
\end{cases}
\end{align*} \]

4.4. Elements of matrix \( A_2 \)

The sub-matrix \( A_2 \) for HPQ represents the phase distribution of the controller, Class CP, Class HP, and the specialised hardware when the number of packets in the internal buffer (i.e. \( n_0(t) \) or \( v \) in Eq. (12)) decreases by 1:

\[ A_{2(w,w')} = \begin{cases}
A_{21}^{(w)}, & w' = w, \\
0, & \text{otherwise},
\end{cases} \]

where,

\[ A_{21}^{(w)} = \begin{cases}
A_{212}^{(x)}, & x' = x - 1, \\
0, & \text{otherwise}.
\end{cases} \]

where,

\[ A_{212}^{(x)} = \begin{cases}
A_{2121}^{(y)}, & y' = y, \\
0, & \text{otherwise}.
\end{cases} \]

where,

\[ A_{2121}^{(y)} = \begin{cases}
\mu_{sp}, & z' = z, \\
0, & \text{otherwise}.
\end{cases} \]
4.5. Performance Metrics for HPQ

Like SPQ, the throughputs of the Class CP ($T_{cp}$) and the internal buffer ($T_{ib}$) for HPQ are same. The throughput of the internal buffer for HPQ is given by the sum of probabilities that the internal buffer has at least one data packet to forward with service rate of $\mu_{ib}$, and this is given by:

$$T_{ib} = \mu_{ib} \sum_{i=1}^{K_1} \sum_{u=0}^{v} \sum_{x=0}^{K_2} \sum_{y=0}^{K_3} \pi_{v,w,x,y,z}.$$  \hspace{1cm} (13)

Similarly, the throughput of the controller ($T_c$) for HPQ is given by the sum of probabilities that the controller has at least one control packet to forward with service rate of $\mu_c$, and there is at least one data packet temporarily buffered in the internal buffer. This is given by:

$$T_c = \mu_c \sum_{i=1}^{K_1} \sum_{v=0}^{y} \sum_{w=0}^{K_2} \sum_{x=0}^{K_3} \pi_{v,w,x,y,z}.$$  \hspace{1cm} (14)

Also, the throughput of Class HP ($T_{hp}$) for HPQ is given by the sum of probabilities that the Class HP has at least one data packet to forward with service rate of $\mu_{hp}$ and there is no packet in Class CP in the stationary state, and this is given by:

$$T_{hp} = \mu_{hp} \sum_{i=1}^{K_1} \sum_{y=0}^{v} \sum_{w=0}^{K_2} \sum_{x=0}^{K_3} \pi_{v,w,x,y,z}.$$  \hspace{1cm} (15)

Finally, the throughput of the specialised hardware ($T_{sh}$) for HPQ is given by the sum of probabilities that the specialised hardware has at least one data packet to forward with service rate of $\mu_{sh}$ and this is given by:

$$T_{sh} = \mu_{sh} \sum_{i=1}^{K_1} \sum_{y=0}^{v} \sum_{w=0}^{K_2} \sum_{x=0}^{K_3} \sum_{z=0}^{K_4} \pi_{v,w,x,y,z}.$$  \hspace{1cm} (16)

The average number of data packets in HPQ is $E[L]_{HPQ}$ where data packets travel only through the specialised hardware (i.e. TCAM) and the CPU (i.e. the Class HP and the internal buffer). Therefore, $E[L]_{HPQ}$ is expressed as:

$$E[L]_{HPQ} = \sum_{i=1}^{K_1} \sum_{v=0}^{y} \sum_{w=0}^{K_2} \sum_{x=0}^{K_3} \sum_{z=0}^{K_4} (v + y + z)\pi_{v,w,x,y,z}.$$  \hspace{1cm} (17)

Again, applying Little’s theorem to Eq. (17) yields the average packet transfer delay in HPQ (commonly denoted by the mean sojourn time of the packet) at the switch (denoted by $t_{HPQ}$) which is expressed as:

$$t_{HPQ} = E[L]_{HPQ}/T_{HPQ}.$$  \hspace{1cm} (18)

where $T_{HPQ}$ is the throughput of HPQ expressed as:

$$T_{HPQ} = T_b + (1 - \beta)T_{ib}.$$  \hspace{1cm} (19)

Similarly, assuming independence of packet arrivals between the Class CP, Class HP, internal buffer and the specialised hardware queue, the average packet loss rate of the Class CP ($PL_{cp}$), Class HP ($PL_{hp}$), internal buffer ($PL_{ib}$) and the specialised hardware queue ($PL_{sh}$) represents the average number of packets being blocked or dropped by the Class CS, Class ES, internal buffer and the specialised hardware queue out of total incoming packets in respective queue. The packet loss rates $PL_{cp}$, $PL_{hp}$, $PL_{ib}$ and $PL_{sh}$ for HPQ are expressed as,

$$PL_{cp} = PL_{cp} = 1 - T_{cp}/T_c,$$

$$PL_{hp} = 1 - T_{hp}/T_{sh},$$

$$PL_{ib} = 1 - T_{ib}/N\lambda.$$  \hspace{1cm} (20)

Therefore, the total packet loss rate for HPQ ($PL_{HPQ}$) is the sum of packet loss rate in the Class CP, Class HP, internal buffer and the specialised hardware queue of the switch which is given as,

$$PL_{HPQ} = PL_{cp} + PL_{hp} + PL_{ib} + PL_{sh}.$$  \hspace{1cm} (21)

In the following section, we will discuss buffer dimensioning for SPQ and HPQ.

5. Buffer Dimensioning for SPQ and HPQ

In this section, to perform buffer dimensioning for SPQ and HPQ, we assume that the switch queues are M/M/1 (see Section 2.2) as opposed to GI/M/1/K (used for the CPU in both SPQ and HPQ) and M/M/1/K (used for the specialised hardware in HPQ).

The minimum capacity for the switch in SPQ is denoted by ($K_{min}$)$_SPQ$ which is the sum of $K_1$ (i.e. minimum queue capacity required for the Class CS), $K_2$ (i.e. minimum queue capacity required for the Class ES), and $K_3$ (i.e.minimum queue capacity required for the internal buffer) which are calculated using Eq. (2) as:

$$K_1 \geq \frac{\log[PER]}{\log[\rho_{cs}]} \hspace{0.5cm} K_2 \geq \frac{\log[PER]}{\log[\rho_{cs}]} \hspace{0.5cm} K_3 \geq \frac{\log[PER]}{\log[\rho_{ib}]}.$$  \hspace{1cm} (22)

where $\rho_{cs}$, $\rho_{cs}$, and $\rho_{ib}$ are the server utilization at the Class CS, Class ES, and the internal buffer, respectively, which are defined as:

$$\rho_{cs} = \frac{\beta N \lambda}{\mu_{sp}}, \hspace{0.5cm} \rho_{cs} = \frac{N \lambda}{\mu_{sp}}, \hspace{0.5cm} \rho_{ib} = \frac{\beta N \lambda}{\mu_{sp}}.$$
Therefore, $(K_{\text{min}})_{SPQ}$ can be expressed as

$$(K_{\text{min}})_{SPQ} = K_1 + K_2 + K_3. \quad (23)$$

Likewise, for HPQ, the minimum queue capacities for the Class CP, Class HP, internal buffer, and the specialised hardware are denoted as $K_1$, $K_2$, $K_3$, and $K_4$, respectively, and can be calculated using Eq. (2) as:

$$K_1 \geq \frac{\log(\text{PER})}{\log(\mu_{cp})}, \quad K_2 \geq \frac{\log(\text{PER})}{\log(\mu_{hp})}, \quad K_3 \geq \frac{\log(\text{PER})}{\log(\mu_{ib})}, \quad K_4 \geq \frac{\log(\text{PER})}{\log(\mu_{sh})}, \quad (24)$$

where $\rho_{cp}$, $\rho_{hp}$, $\rho_{ib}$, and $\rho_{sh}$ are the server utilization at the Class CP, Class HP, internal buffer of the CPU, and the specialised hardware, respectively, which are defined as:

$$\rho_{cp} = \frac{N \beta \lambda}{\mu_{cp}}, \quad \rho_{hp} = \frac{N \beta \lambda}{\mu_{hp}}, \quad \rho_{ib} = \frac{N \beta \lambda}{\mu_{ib}}, \quad \rho_{sh} = \frac{N \lambda}{\mu_{sh}}.$$

Therefore, the minimum queue capacity for the switch in HPQ is the sum of minimum queue capacity for the Class CP, Class HP, internal buffer, and the specialised hardware:

$$(K_{\text{min}})_{HPQ} = K_1 + K_2 + K_3 + K_4. \quad (25)$$

In this paper, the minimum queue capacity of the switch for SPQ and HPQ are $(K_{\text{min}})_{SPQ}$ and $(K_{\text{min}})_{HPQ}$, respectively.

### 6. Results

This section presents the analytical and discrete event simulation results of the unified queueing model for SDN software and hardware switches with the internal buffer (i.e. SPQ and HPQ respectively). This section is divided into the following subsections:

- **Validation**: where analytical results are compared with discrete event simulation results.

- **Performance Characterisation**: where the total minimum queue capacity for SPQ and HPQ is compared.

- **Relative average delay** where average packet transfer delay of SPQ and HPQ is compared.

- **Relative packet loss rate** where packet loss rate of SPQ and HPQ is compared.

- **Effect of varying number of hosts connected to the switch** is investigated and compared between SPQ and HPQ.

- **Effect of varying $\mu_{sh}$ in a hardware switch** where the effect of varying hardware processing capacity (i.e. $\mu_{sh}$) in HPQ is investigated.

The parameters used for analysis and simulation is shown in Table 5. From Table 5, the table miss probability $\beta$ varies from 0.1 to 1, the switch processor or CPU processing rate ($\mu_{cp}$) is assumed to be 1000 packets/sec, the controller to switch processing ratio ($m_r$) varies from from 0.1 to 1, and the specialised hardware to CPU processing ratio ($m_s$) varies from 1 to 1000. The external arrival rate ($\lambda$) at the switch from each host is assumed to be 24 or 48 packets/sec and we assume an Ethernet network for which the $BER$ is assumed to be $10^{-12}$. We use TCP as the transport protocol with maximum transmission unit (MTU) of 1500 bytes. Thus, the PER is $1.2 \times 10^{-8}$ (using Eq. (1)). The number of hosts per switch ($N$) is varied from 1 to 80.

The simulations are repeated hundred times and the 95% confidence intervals (CI) are computed on the basis that the errors are normally distributed.

In the following subsections, to take the packet loss rate into consideration, we assume queue capacities of

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Table miss probability, $\beta$</td>
<td>0.1–1</td>
</tr>
<tr>
<td>CPU processing rate, $\mu_{cp}$ (packets/sec)</td>
<td>1000</td>
</tr>
<tr>
<td>Controller to CPU Processing Ratio ($m_r/\mu_{cp}$), $m_r$</td>
<td>0.1–2</td>
</tr>
<tr>
<td>Specialised hardware to CPU Processing Ratio ($m_s/\mu_{cp}$), $m_s$</td>
<td>1–1000</td>
</tr>
<tr>
<td>Arrival rate, $\lambda$ (packets/sec)</td>
<td>12, 24</td>
</tr>
<tr>
<td>Bit Error Rate, $BER$</td>
<td>$10^{-12}$</td>
</tr>
<tr>
<td>MTU TCP packet size (byte)</td>
<td>1500</td>
</tr>
<tr>
<td>Number of hosts per switch, $N$</td>
<td>1–80</td>
</tr>
</tbody>
</table>

Table 5: Parameter used for Numerical Simulation for both SPQ and HPQ.
the Class ES (in SPQ), the Class HP (in HPQ) and the specialised hardware queue (in HPQ) to be half of their minimum queue capacities determined from buffer dimensioning (using Eq. (23) and Eq. (25)). The queue capacities of the Class CS (in SPQ), the Class CP (in HPQ) and the internal buffer (in both SPQ and HPQ) are minimum queue capacities determined from buffer dimensioning where there is no packet loss. This buffer sizing ensures no loss of control packets.

6.1. Validation of analytical models

The validation of analytical results for SPQ and HPQ is done by comparing them with discrete event simulation results. Figures 5 and 6 show the validation results for SPQ and HPQ respectively for increasing $\beta$ with $m_r = 1$ and $m_s = 1000$. The error percentage between analysis and simulation predictions for both average packet transfer delay and packet loss rate is between 0.6%-2.8% as shown in Figure 5 and Figure 6. 

![Figure 5: Validation of SPQ for (a) average packet transfer delay; (b) packet loss rate.](image)

![Figure 6: Validation of HPQ for (a) average packet transfer delay; (b) packet loss rate.](image)
range of error is acceptable for analysis as computation of π distributions for nonhomogenous QBD process is prone to inaccuracy due to the possibility of singular matrix becoming nonsingular in machine precision [49].

6.2. Relative minimum capacity

In this subsection, we compute the relative minimum queue capacity between SPQ and HPQ denoted as ε_K which is defined as,

$$\epsilon_K = \frac{(K_{\min})_{SPQ} - (K_{\min})_{HPQ}}{(K_{\min})_{SPQ}} \times 100\%.$$  

A positive value of \( \epsilon_K \) means HPQ requires less capacity than SPQ, while a negative value implies HPQ requiring more capacity than SPQ.

Figure 7 shows the \( \epsilon_K \) curve for increasing \( \beta \). From Figure 7, we can observe that HPQ requires up to 50% more buffer capacity than SPQ. This is because the switch in HPQ requires queue capacities for the CPU, the specialised hardware, and the internal buffer. While, the switch in SPQ requires queue capacities for the CPU and its internal buffer only.

6.3. Relative average delay

We compare the average packet transfer delay between SPQ (denoted by \( t_{SPQ} \) as in Eq. (8)) and HPQ (denoted by \( t_{HPQ} \) as in Eq. (18)). This comparison helps to investigate the effect of the internal buffer in a software and hardware switch with reference to average packet transfer delay.

The relative average packet transfer delay (denoted by \( \epsilon_d \)) between SPQ and HPQ (both with finite capacity) is calculated as:

$$\epsilon_d = \frac{(t_{SPQ} - t_{HPQ})}{t_{SPQ}} \times 100\%.$$  

A positive value of \( \epsilon_d \) means HPQ has lower average delay for packet to travel in the network compared to SPQ.

Figure 8 shows the relative average packet transfer delay between SPQ and HPQ in percentile. Figures 8(a) and 8(b) show the relative average delay for increasing \( \beta \) and \( m_r \), respectively with \( m_s = 1000 \). From Figure 8(a), we can observe that HPQ exhibits up to 80% reduction in average delay of the packet compared to SPQ for increasing \( \beta \). Similarly, Figure 8(b) shows the relative average packet transfer delay between SPQ and HPQ for increasing \( m_r \), where HPQ exhibits up to 60% reduction in average delay of the packet.

This is because the specialised hardware of the switch processes external packets arriving at the switch much faster than the CPU which reduces the overall average delay of the packet. However, this reduction in average delay diminishes with the increasing number of packets being forwarded to the CPU with increasing \( \beta \) as seen in Figure 8(a). Similarly, with the increasing controller processing capacity, the average delay of packet reduces. The relative reduction in average packet transfer delay reaches saturation when \( m_r \) is greater than 1 as seen in Figure 8(b).

This shows the benefit of a hardware switch with the internal buffer over a software switch with the internal buffer, that significantly reduces the overall average delay of the packet for lower \( \beta \) and higher \( m_r \).

6.4. Relative packet loss rate

We compared the average packet loss rate between SPQ (denoted by \( PL_{SPQ} \) as in Eq. (11)) and HPQ (denoted by \( PL_{HPQ} \) as in Eq. (21)). This comparison helps us to investigate the effect of the internal buffer in a software and hardware switch with reference to the average packet loss rate.

The relative average packet loss rate (denoted by \( \epsilon_l \)) between SPQ and HPQ (both with finite capacity) is calculated as:

$$\epsilon_l = \frac{(PL_{SPQ} - PL_{HPQ})}{PL_{SPQ}} \times 100\%.$$  

A positive value of \( \epsilon_l \) means HPQ has lower packet loss rate compared to SPQ.

Figure 9 shows the relative average packet loss rate between SPQ and HPQ in percentile. Figure 9(a) and
Figure 8: Relative average delay between SPQ and HPQ in % i.e. $d$ for increasing $m_r$: (a) $m_r = 1$; (b) $m_r = 0.5$.

Figure 9: Relative average packet loss rate between SPQ and HPQ in % i.e. $l$ for increasing $m_r$: (a) $m_r = 1$; (b) $m_r = 0.5$.

Figure 9(b) show the relative average packet loss rate for increasing $\beta$ and $m_r$, respectively with $m_s = 1000$. From Figure 9(a) and Figure 9(b), HPQ exhibits up to 100% reduction in average packet loss rate compared to SPQ increasing $\beta$ and $m_r$, respectively.

This reduction in average packet loss rate is because average waiting time of packets in the specialised hardware queue of the switch is less than the CPU. Due to the lower waiting time, the packet loss rate in specialised hardware queue is also lower than the CPU.

This shows the benefit of a hardware switch with the internal buffer over a software switch with the internal buffer, that significantly reduces the packet loss rate.

6.5. Effect of varying number of hosts connected to the switch

In this subsection, the effect of varying number of hosts for both SPQ and HPQ is presented by varying $m_s$.
Figure 10: Effect of varying number of hosts for \( m_r = 1 \) and \( \beta = 0.5 \).

\[ m_r = 1, \beta = 0.5, \lambda = 12 \text{ pkts/sec} \]

(a) m_r = 1, \( \beta = 0.5, \lambda = 12 \text{ pkts/sec} \)

(b) m_r = 1, \( \beta = 0.5, \lambda = 12 \text{ pkts/sec} \)

From this investigation, the processing capacity of specialised hardware should be at least 100 times of the CPU to have optimum reduction in packet transfer delay and almost zero packet loss rate.

7. Conclusion

In this study, we have proposed a unified queueing model for software and hardware switches with the internal buffer. Internal buffering in SDN-based software and hardware switches has not been investigated much, especially from the analytical modelling aspect. Therefore, a unified queueing model is a useful tool for network analysts to get quick insights into SDN-based software and hardware switches with the internal buffer.

The impact of the internal buffer in both software and hardware switches is investigated and the summary of our analysis is as follows:

- A hardware switch significantly reduces the average packet transfer delay (almost by 80%) than a software switch.
- A hardware switch requires additional buffer (almost 50% more) than a software switch, which is the tradeoff for the gains mentioned in the previous point – this insight is not provided by any of the existing models in the literature.
- A hardware switch significantly reduces the packet loss rate (almost by 99%) compared to a software switch.

N from 1 to 80. Figure 10 shows the effect of varying number of hosts for \( m_r = 1 \) and \( \beta = 0.5 \). Figures 10(a) and 10(b) show the effect of varying number of hosts on average packet transfer delay and packet loss rate respectively. From Figure 10(a), with the increase in number of hosts, HPQ exhibits much lower average packet transfer delay than SPQ. Similarly, from Figure 10(b), the packet loss rate for both SPQ and HPQ is identical and increases with the increase in the number of switches.

This increase in the packet loss rate is because with the increase in number of hosts, the net arrival of packets at both SPQ and HPQ increases exponentially. The specialised hardware of HPQ processes these incoming packets at line rate that results into relatively lower average delay than SPQ which has slower processing via the CPU.

6.6. Effect of varying \( \mu_{sh} \) in a hardware switch

In this subsection, the effect of varying \( \mu_{sh} \) in a hardware switch with the internal buffer is presented. This is done by varying \( m_s \) (i.e., ratio of specialised hardware to CPU processing) from 1 to 1000.

Figure 11 shows the results for varying \( \mu_{sh} \) in HPQ with \( m_r = 1 \) and \( \beta = 0.5 \). Figures 11(a) and 11(b) show the effect of varying \( \mu_{sh} \) in HPQ for average packet transfer delay and packet loss rate respectively. From Figure 11(a) and Figure 11(b), both average packet transfer delay and packet loss rate becomes steady for \( m_s \) greater than 100.

From this investigation, the processing capacity of specialised hardware should be at least 100 times of the CPU to have optimum reduction in packet transfer delay and almost zero packet loss rate.
• For an increasing number of hosts connected to the switch, a hardware switch exhibits significantly lower delay compared to a software switch.

Lastly, the model also suggests that the processing power of the switch and the controller are intrinsically tied. Our results show that no improvements in packet loss occur after the specialized hardware to CPU processing ratio \((m_s)\) exceeds 0.2.

8. Acknowledgement

This research is partly supported by Victoria’s Huawei NZ Research Programme, Software-Defined Green Internet of Things project (E2881) and a Victoria Doctoral Scholarship.

References

[27] Open vSwitch. URL http://openvswitch.org/
[31] Picos Support for OpenFlow 1.3. https://docs.picas.com/display/PICO20111Cg/PicOS%5BSupport+for+OpenFlow+1.3.