In a large layer 2 environment loops in a world without stp would be a sure thing, in fact depending on the size of the network and the amount of overhead being processed by the cpu chances are the entire network would come to a grinding halt.
How can this loops form ? 2 switches A and B are connected with two 1Gbps links for increased speed and 2 users seat on opposite ends, Laptops C (Gbolahan) facing switch A and laptop D (Toma) facing switch B.
At initial time of boot both laptops get their ip address and life is good, laptop C needs to access a file on laptop D and uses whatever application/protocol it wants (tftp/ftp/sftp), it knows the ip address on laptop D but doesn’t know the mac address and so cannot build a frame to speak with laptop D, to overcome this challenge it sends out an arp request to resolve the ip address it knows to the mac address it doesn’t know, an arp request is essentially a layer 2 broadcast (sent by switches, broken by routers at layer 3), laptop C sends out the request on port 1, switch A seeing its a arp request (layer 2 broadcast) sends it out ports 2 and 3 (interconnecting the switches) apart from port 1 (broadcasts are not sent on receiving port), the first broadcast from port 2 gets to port 4 and it sends it out port 5 and others (apart from port 4), switch A gets the broadcast on port 3 and forwards it out all ports including port 2 (starting point of the broadcast) and we have ourselves a beautiful loop. Both switches process other frames of other devices, have some netflow/syslogs/snmp traps to report to an nms, have some qos markings to impose on some frames on some ip phones, have stp instances for more than 100 vlans, frames to queue, and the arps alternating between both switches in less than 1 ms, you get the flow, the cpu has a lot on its hands, processor spikes and network melts.
To stop this type of occurrence Radia Perlman develops a protocol that ensures that no matter the number of links between the switches there is no redundant part for a broadcast to follow and hence loops are broken. (stp). Stp is a control plane protocol that uses 2 types of bpdu packets to calculate the best path for frames to follow while keeping redundant paths blocked.
How stp stops the loop in our scenario: stp blocks a link say port 2 on switch A, this ensures only port 3 is open to forward on switch A, when switch B gets the broadcast on port 4, it turns around and forwards on port 3 now thankfully switch A is blocking on port 2 and so a loop is broken (as switch A would have forwarded back out port 3).
Challenge: in modern day data-centers and campus having all links forward is good for bandwidth sake. Imagine having a 10gb link applying
Solution: typical solution is ether-channel (bundle multiple links together). Ether-channel essentially deceives stp into seeing both physical links as one logical link and so forwarding takes place on both links between switch A and B without any being blocked.
Question about the ether-channel implication: if both ports are bundled into a logical single port at layer 2, this doesn’t take away the fact that they are individual ports at layer 1, so if a broadcast is sent out an ether-channel bundled between switch A and B, the frame does in fact still travel via a physical port, it still goes via a physical port at layer 1, lets imagine in this scenario like the former that switch A sends out the broadcast frame on port 2 to switch B on port 4, switch B ought to turn around and forward back on port 5 like former scenario right ? Wrong, the ether-channel implication in the ios software means all ports behave like one port (one reason why all attributes must match) and this means when switch B receives it, it receives it from the processor’s point of view from port-channel x and not physical interface port 2 which helps break the loop, and so from a physical perspective it should forward out of physical interface port 3 but from from a bundled perspective port 3 and 2 are one port (port-channel x)
Other far reaching implications of the ether-channel bundling is the concept that if Gbolahan and Toma on their pcs between 2 switches use a lot of bandwidth just bundling links would increase the bandwidth available to them and hence less congestion and latency, well wrong, ether-channel uses a load-balancing algorithm to determine for each frame which link to use, the algorithm is flow-based and uses as its input an address in the frame (could be l2 source, l2 destination, l3 source, l3 destination, l2 source and destination), you get the drift essentially pc 1 will still always forever forward on a single link because the algorithm gets the same input as feed (pc1’s l2/l3 source/destination and/or pc2’s l2/l3 source/destination to each other which also means it is truly not load-balancing as only one link is being used)(the reason for this though is to ensure frames are received in the right sequence which is only guaranteed if they take the same path out and in), to make it truly load balance across all link you need to feed the algorithm with a random variable, like port numbers, and if you load-balance based on say a source and destination tcp number (source port 344 on Gbolahan’s pc and destination port 80 to Toma’s pc for flow 1 and source port 444 on Gbolahan’s pc and destination port 25 to Toma’s pc for flow 2) then flow 1 would follow port 2 on switch A and flow 2 would follow port 3 on switch B and you have true load-balancing (most low end switch do not load balance based on layer 4 and above though for cpu reasons).
Isn’t networking cool 😀