BEEN A WHILE

The last time i posted here was probably some 2 years or so ago.

I consider this outlet more of a thought-formation canvas and a sort of public diary i can look back on later in future, so i don’t know that anyone has missed my posts and stats data show this blog isn’t viewed much.

So so so many things have happened between my last post and this.

I probably need to now put the disclaimer ‘All opinions are mine’ on this site now as i now work for a leading Cloud Provider.

I used to work as a Solutions Architect for a Cisco Gold partner in my home country of Nigeria, but i am now recently based out of Cape Town helping customers with their move to the Cloud and maintain their infrastructure after.

That has meant a big tremendous amount of learning, re-learning and un-learning as my core skillsets have shifted and are still shifting from core Layer 3 to somewhere between Layer 3 and 7 now. It has been fun since i consider myself naturally curious, howbeit it has come with large time trade-offs.

I have learnt so much and might probably start posting agnostically about topics i have learnt and a mix of other life endeavors and interests.

It is impressive how much you think you know and then realize you don’t know again. I find myself in this constant battle daily and i think it is part of being in a fluid industry like Tech.

Well this was just a post to break my multi-year silence, cheers to the remainder of the year and may the winds of fortune sail you.

Advertisements

WHITE-BOX SWITCHING & DISAGGREGATION AT FACEBOOK

A colleague at work sent this link: http://www.businessinsider.com/facebook-is-again-putting-the-computer-network-industry-to-shame-2016-11   and titled it: MORE DISRUPTION.

It is basically an article discussing Facebook’s next move on their disaggregation and white-box switching drive, few years back they internally designed 2 switching platforms: Wedge and 6pack with Wedge operating at the Top-of-Rack and aggregating all server traffic southbound and uplinking to the core/spine (where 6pack is), all using a clos topology that allows them to scale not only up but out (especially considering most traffic at hyperscale is W-E and not N-S), the article discusses the introduction of new designs of successor switches (Backpack and Wedge100) that came out last week and is a move towards 100Gb/s switching. The switches run their own internal OS called FBOSS and it is part of a broader push to reduce their dependence on vendors who typically have their hardware and OS  tightly coupled in such a way that no company has access to the base code to innovate around it which is highly critical for hyperscale companies.

The below links give a deeper insight into these:

  1. https://code.facebook.com/posts/681382905244727/introducing-wedge-and-fboss-the-next-steps-toward-a-disaggregated-network/
  2. https://code.facebook.com/posts/145488969140934/open-networking-advances-with-wedge-and-fboss/
  3. https://code.facebook.com/posts/1802489260027439/wedge-100-more-open-and-versatile-than-ever/
  4. http://www.networkworld.com/article/2910014/cisco-subnet/lessons-from-altoona-what-facebooks-newest-data-center-can-teach-us.html

I would write a longer post on the white-box movement, but for now here is my reply to the email on this new move from Facebook again:

 

wedge1

 

wedge2

 

 

 

ROUTING IS SUPERIOR

Radia Perlman designed stp to help us avoid loops, stp was good for its time: a period in history when 2mb ram routers and 10mbps of bridge links were considered lightening speed and so interconnecting layer 2 devices could afford to have a redundant link stay idle/blocked till the active forwarding link failed and it took over.

Fast-forward to the 2000s and we have the rise of web applications, hd video, hadoop computing, database machines operating in clusters: things are not as sluggish as they used to be and finding speeds of less than 10gb/s is near impossible in a serious datacenter. The realization of this meant we cannot afford to have a link staying idle or playing safe (stp blocking to avoid loops), we need every single link forwarding! Radia Perlman foreseeing this challenge and realizing the drawback of stp proposes a new protocol: TRILL (Transparent Interconnection of Lots of Links), while Cisco takes this and spins their own protocol: Fabricpath, both of which have similar logic of routing layer 2.

All of these all point to one conclusion: Learning by routing has always/will always be superior to Learning by flooding. This used to be the standard thought-process of layer 2 vs layer 3 where with layer 2 frames are flooded out all ports except the source port till the end host replied with its mac address stored against its port on the switch’s cam table (the mere idea of flooding out all ports except source stamped the possibility of loops).

With layer 2 routing however (fabricpath/trill) this danger is taking care of as there is no flooding, the nodes participating in the layer 2 routing first built a shortest path tree to all other nodes running layer 2 routing using is-is and have a demarcation point where they receive classical or legacy ethernet and then encapsulate it with new fabricpath headers, in other words the fabricpath core is not aware of the ethernet (like mpls in bgp free cores for sps) until it gets to the egressing switch connecting to the destination mac address which then strips off the fabricpath header and sends the information to the end host.

Its the same notion that has been taking place with large scale dcs like facebook and an increasing host of others that are decreasing the demarcation point of layer 2 and routing using bgp to the top of rack switch or some even all the way to the end host.

In datacenters that are bandwidth hungry we would keep seeing more and more of this push where its either: 1) switching boundary would be eroded and routing would occur down to the end host  2) layer 2 routing would be implememented from the interconnecting switches northbound and eastbound till the layer 3 routing boundary is reached.

In large scale bandwidth intensive datacenters its safe to say Adios layer 2 switching!!!!

A TALE OF LAYER 2 LOOPS AND BUNDLES

In a large layer 2 environment loops in a world without stp would be a sure thing, in fact depending on the size of the network and the amount of overhead being processed by the cpu chances are the entire network would come to a grinding halt.

How can this loops form ? 2 switches A and B are connected with two 1Gbps links for increased speed and 2 users seat on opposite ends, Laptops C (Gbolahan) facing switch A and laptop D (Toma) facing switch B.

 

Drawing1.jpg

 

At initial time of boot both laptops get their ip address and life is good, laptop C needs to access a file on laptop D and uses whatever application/protocol it wants (tftp/ftp/sftp), it knows the ip address on laptop D but doesn’t know the mac address and so cannot build a frame to speak with laptop D, to overcome this challenge it sends out an arp request to resolve the ip address it knows to the mac address it doesn’t know, an arp request is essentially a layer 2 broadcast (sent by switches, broken by routers at layer 3), laptop C sends out the request on port 1, switch A seeing its a arp request (layer 2 broadcast) sends it out ports 2 and 3 (interconnecting the switches) apart from port 1 (broadcasts are not sent on receiving port), the first broadcast from port 2 gets to port 4 and it sends it out port 5 and others (apart from port 4), switch A gets the broadcast on port 3 and forwards it out all ports including port 2 (starting point of the broadcast) and we have ourselves a beautiful loop. Both switches process other frames of other devices, have some netflow/syslogs/snmp traps to report to an nms, have some qos markings to impose on some frames on some ip phones, have stp instances for more than 100 vlans, frames to queue, and the arps alternating between both switches in less than 1 ms, you get the flow, the cpu has a lot on its hands, processor spikes and network melts.

 

 

Drawing2.jpg

To stop this type of occurrence Radia Perlman develops a protocol that ensures that no matter the number of links between the switches there is no redundant part for a broadcast to follow and hence loops are broken. (stp). Stp is a control plane protocol that uses 2 types of bpdu packets to calculate the best path for frames to follow while keeping redundant paths blocked.

 

Drawing3.jpg

How stp stops the loop in our scenario: stp blocks a link say port 2 on switch A, this ensures only port 3 is open to forward on switch A, when switch B gets the broadcast on port 4, it turns around and forwards on port 3 now thankfully switch A is blocking on port 2 and so a loop is broken (as switch A would have forwarded back out port 3).

Challenge: in modern day data-centers and campus having all links forward is good for bandwidth sake. Imagine having a 10gb link applying

Solution: typical solution is ether-channel (bundle multiple links together). Ether-channel essentially deceives stp into seeing both physical links as one logical link and so forwarding takes place on both links between switch A and B without any being blocked.

Question about the ether-channel implication: if both ports are bundled into a logical single port at layer 2, this doesn’t take away the fact that they are individual ports at layer 1, so if a broadcast is sent out an ether-channel bundled between switch A and B, the frame does in fact still travel via a physical port, it still goes via a physical port at layer 1, lets imagine in this scenario like the former that switch A sends out the broadcast frame on port 2 to switch B on port 4, switch B ought to turn around and forward back on port  5 like former scenario right ? Wrong, the ether-channel implication in the ios software means all ports behave like one port (one reason why all attributes must match) and this means when switch B receives it, it receives it from the processor’s point of view from port-channel x and not physical interface port 2 which helps break the loop, and so from a physical perspective it should forward out of physical interface port 3 but from from a bundled perspective port 3 and 2 are one port (port-channel x)

Other far reaching implications of the ether-channel bundling is the concept that if Gbolahan and Toma on their pcs between 2 switches use a lot of bandwidth just bundling links would increase the bandwidth available to them and hence less congestion and latency, well wrong, ether-channel uses a load-balancing algorithm to determine for each frame which link to use, the algorithm is flow-based and uses as its input an address in the frame (could be l2 source, l2 destination, l3 source, l3 destination, l2 source and destination), you get the drift essentially pc 1 will still always forever forward on a single link because the algorithm gets the same input as feed (pc1’s l2/l3 source/destination and/or pc2’s l2/l3 source/destination to each other which also means it is truly not load-balancing as only one link is being used)(the reason for this though is to ensure frames are received in the right sequence which is only guaranteed if they take the same path out and in), to make it truly load balance across all link you need to feed the algorithm with a random variable, like port numbers, and if you load-balance based on say a source and destination tcp number (source port 344 on Gbolahan’s pc and destination port 80 to Toma’s pc for flow 1 and source port 444 on Gbolahan’s pc and destination port 25 to Toma’s pc for flow 2) then flow 1 would follow port 2 on switch A and flow 2 would follow port 3 on switch B and you have true load-balancing (most low end switch do not load balance based on layer 4 and above though for cpu reasons).

Isn’t networking cool 😀

WEEK 24’S WEB READ LIST

Quotes from some of this week’s reads (click on the quotes for full content):

Every John Chambers keynote and presentation I heard from from 2002 through 2014 was functionally the same speech. Cisco is awesome, Cisco sold a lot of stuff, spend lots of time listening and talking with customers, and data networking is about improving productivity. I never really understood the concept of productivity until Chambers made me think about it. In some ways, it generated a renaissance in my career in the early 2000’s by encouraging me to think past the technology to the impact that is has on the wider business environment. At a time when I was still grappling with learning enough technology to be competent as an engineer, I started to consider the impact I wanted to create on the business. And then, well something interesting happens. When you start thinking about the “business impact” you start to “make money” for your clients/customers/employers by considering the applied value of technology.

 

Traditional suppliers like HP, Dell and Cisco need high profit margins of >50% to meet shareholder expectations where as a manufacturing company can make a simpler product that is good enough for most requirements at <25% margins. Expect sales of branded servers to shrink in the years ahead.

 

“I’m Argentinian, and I don’t like Barcelona, so I see it from a different point of view,” said Bryan Smeail of Los Angeles. “I just want to see him play well and lift the trophy to shut everybody up. I know he’s the greatest player. But a lot of fans — and I’m in included in them — see him not play that good for Argentina and then in a couple of weeks score a hat trick in Spain. It can be heartbreaking.

 

As networks become more defined by software, so too will the engineers that run them.

 

Myth No.2: Your failover will work. Applications that live in multiple data centers are complicated beasts. In some ways, they are small-scale examples of distributed computing, with many of the caveats and concerns related to data synchronization. This isn’t a problem when the application is initially set up. Everyone in IT is on their best behavior to make sure firewall policy functions well, routing behaves as expected, and data synchronization is both functional and timely. Failover is tested and even works. Inevitably, time goes by, and as it does, infrastructure changes. Applications get new features. Perhaps ADCs get upgraded with fancy new clustering. A new inter-data center network link is installed. And for whatever reason, testing application failover after each seemingly straightforward change gets overlooked. It’s too hard to schedule, there are too many people involved in the change control process, or no one thinks failover would simply break. The reality is that unless you’re testing your failover regularly, your active-active data center probably isn’t active-active at all. Your failover might just fail.

 

“I try to use every game as an opportunity to witness. I try to do a little signal every time I make a shot as a way to preach the message in little ways that I can,” Curry revealed. “Each game is an opportunity to be on a great stage and be a witness for Christ. When I step on the floor, people should know who I represent, who I believe in.”

 

No doubt the world is changing rapidly with virtualization and cloud technologies, NFV and SDN. Core networking knowledge is not as cool as coding in Python or writing the next great app. The young engineers I mentor today don’t care about redistributing EIGRP or getting IPX to work over the WAN. People often ask me if I think that CCIE still matters. My stock response is that it means that a CCIE knows (or at least knew) a little “something about something.” A newly minted CCIE might get a raise or change jobs but it does not pronounce them King (or Queen) of the Networking Universe. What is way more important, in my opinion, is that most CCIEs had a passion to learn and then performed successfully in a high pressure environment under time duress. Oh and don’t forget the lab fee (now $1600) plus travel and expenses for those who are paying for it out of their own pocket. So here’s to all of you that still have that passion to learn. Keep doing what you’re doing because that is what pushes this industry forward with innovation.

 

Musk extrapolated: “So given that we’re clearly on a trajectory to have games that are indistinguishable from reality, and those games could be played on any set-top box or on a PC or whatever, and there would probably be billions of such computers or set-top boxes, it would seem to follow that the odds that we’re in base reality is one in billions.”

 

Over the years, Cisco three times funded and later purchased MPLS-led startups whose products became crucial to Cisco’s business. Insieme, the most recent of the startups, helped develop a line of programmable switching systems and related software that Robbins estimated is generating revenue at an annual rate of $2 billion.

 

Nigeria gets an average of three hours of electricity a day from the national grid, usually in short bursts. The average Nigerian uses barely one hundredth as much electric power as the average American — just enough to keep a lightbulb on. When the grid goes dark, the generators fire up with a loud roar before settling into a steady background drone. Solving the power shortage is arguably the biggest development challenge for Nigeria, and for much of the African continent.

 

JEVONS PARADOX

I find myself focused these days more on datacenter and cloud oriented projects (green field and brown field) and readings these days and my experience with a recent project forced me to ask the ultimate business question of any IT project submitted to management: How does this help us minimize costs in the long run ? (I think a well crafted response to this allows CEOs/CFOs sign off a lot faster).

Virtualization helped accelerate the push to the cloud and for private cloud and public cloud versus separate discrete systems with OS per Hardware (1 application to 1 server, unlike many to 1 server in virtualization on premise or off-premise) limitation of the past, the cost cuts are pretty obvious, reduced datacenter space and land acquisitions, less cooling needs, electricity bills & cabling, less switches/routers/storage switches etc. The capex costs are pretty obvious, but i find myself asking about the opex costs, they don’t seem too obvious to me.

If company X moves to say a vblock 340 or flexpod or any of the others out there a team of engineers from an SI would have to brought in to implement, cost you $$$ in professional services, hotel, air travel etc; you want to upgrade the infrastructure, you call same team again because your inhouse staff doesnt have the requisite hands on or knowledge to work on the systems, they only deploy on top of the built infra and monitor; and if you choose to train the staffs you probably need to fly them out and pay for specialized training or a third option is to employ specialized Systems Engineers to help manage the pod, all in all am not too bought on opex savings yet.

The good news is in the long run savings is a cumulative of Capex + Opex and even though the OPEX savings are not too much or obvious the Capex savings are huge!! as i stated before including: reduced datacenter space and land acquisitions, less cooling needs, electricity bills & cabling, less switches/routers/storage switches etc and i think this is where a CTO or project leader can hammer on to convince Mgt.

On the side i heard of an economic postulation for the 1800s around energy and the environment: Jevons paradox that is being used to understand cloud computing cost implications, though not directly related to this post but is an interesting finding:
https://en.wikipedia.org/wiki/Jevons_paradox
http://www.cio.com/article/2934733/cloud-computing/cloud-computing-has-its-jevons-moment.html

THE PHILOSOPHY OF PYTHON

>>> import this‎

The Zen of Python, by Tim Peters

Beautiful is better tham ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren’t special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
….

The aim of the Zen is the point that while coding you should strive to keep your codes simple and straight to the point. Python is considered one of the easiest programming languages to learn because of its easy syntax and design philoshophy which mirrors what most expert coders/network engineers believe i.e to KISS (keep it simple stupid).

The shrinking half life of knowledge/skillsets

I work in the network industry as those of you who know me personally already know and I was watching a presentation from one of my favourite network enigmas – Russ White (Presentation Title: Defining Software Defined Networks) and he made a statement I have over time come to know and accept with mixed feelings but which he proposed with measurable metrics – which was that many years back the half-life of a skill set was about 8-10 years but it’s now about 2 and half years.

I remember clearly years back jumping into the industry and being bamboozled with talks of igps, bgp, stp, dns, ipv6, glbp, mpls, vrfs and the sorts knowledge of which is necessary to run a network, it all felt fuzzy and I thought how do you gobble this all up, few years later after studying, taking and passing an exam considered to be the pinnacle of networking – the CCIE I felt confident enough to run up and down Mount Olympus in seconds, hold the weight of the Earth like Atlas and send thunder bolt access lists on bad packets like Zeus, oh but nay the journey had only started.

Jumping right into highly complex projects that needed you quickly demonstrating grasp of topics far out from your knowledge vertical and keeping up with head scratching topics one’s counterpart in the West are embracing and bringing down here, it wasn’t quick before I knew I had a career Dilemma on my hands, do you stubbornly stick to your knowledge of old protocols and watch your relevance in the industry die out or do you shorten what’s left of your social life to go through rounds of learning again.

Knowledge is the legal tender you bring to the table in exchange for remuneration from a company and you are only paid as much as your knowledge is perceived to tangibly contribute to the economic outlook of a company and so anyone looking to stay relevant and get paid needs to embrace the idea that your current knowledge half-life is roughly about 2/3 years, down here in Africa it’s higher but the trend is similar.

There are 2 approaches to escaping this and they fall closely to the sector you work in:

1) if you work for a system integrator then you are more hit by this, to survive get a broad understanding of everything you need to know to hold conversational discussions with clients and for areas you are very interested in or which you get the nod to implement projects on then dive deep.

2) if you work for a service provider then you are less hit by this, because of a more narrow tech scope, but that means to survive you need to do a deeper dive into technologies you work with as knowledge in this sector can be easily commoditized.

So whether it is a new project on iwan, vblock, Flexpod, SDN, SDDC related, that involves learning about OTV, pfr, vpc, fabricpath, ise, vm-fex, and the cacophony of complex tech out there embrace the change with tact and remember it’s easier to move with the flow when you are curious and truly love the industry and pray a day doesn’t come when our knowledge half life is 1 week.

By the way if you didn’t know the half-life as defined by Wikipedia is the time required for the amount of something to fall to half its initial value. ‎

CHOICE OF ROUTING PROTOCOL

A show run command on a router is always an easy pointer for me to know 1) how efficiently packets are moved 2) how quickly the network self-heals in times of trouble. 3) how quickly it would be to spot issues when troubleshooting.

I believe a good network design is 50% of the work as regards a network deployment and a critical portion though easy to overlook is that of the routing protocol.

In lay man’s terms all a routing protocol helps a network device do is provide a map on how to reach nodes and subnets by exchanging this information with contiguous (physically or virtually) nodes running same protocols. It was a no-brainer alternative to the tediousness of static routes that is constrained to numerous manual inputs.

Through the craziness of the IP World we have had 8 routing protocols for communication within same autonomous systems/domains and between different autonomous systems.

A which is better between RIP vs EIGRP vs OSPF vs IS-IS argument is one you would definitely come across but which in my opinion does not have absolute answers but should be case-study dependent. All routing protocols have the upper hand in certain deployments and the question should sound like which to use for my small network of 30
routers or which to use for my resources-constrained routers, or which to use in my WAN connected by low bandwidth or which works well with a certain WAN overlay that interconnects my sites or which to use for my mixed-vendor environment. You see how easily from these questions it is to deduce the answer. The best way to get an answer is to refine the question.

Below is a short and concise look at the pros, cons and best use case for the 3 popular igps.

1) RIPv2:
Pros:

– Easy to configure.
– Updates by multicast (224.0.0.9) not broadcast which disturbs non-rip speaking routers on same subnet because it interrupts their processing cycle (they unwrap the unneeded message and discard it when they realise it is not for them.

Cons:

– Cannot be used in large networks since it has a maximum hop count limit of 15.
– Slow convergence since routing is by rumour (all neighbors need to tell all neighbors about connected routes or routes they know and this update takes 30 seconds per router.
– Sub optimal path selection: rip uses hop count to determine the best route and a path of 4 hops with intermediate 1Gb links is considered ‘farther’ than 2 hops with intermediate 100mb/s links even though the 4 hops path is obviously faster.
– Susceptible to loops and black-holes: when a link goes down each router has to be notified to flush out the route after waiting for a 3 minute timer, problem is after this router A flushes the route another router B  that has not yet flushed its can tell A that it still has it and you have yourself a nice looking  loop.
– Not good for low bandwidth links: rip periodically sends out updates containing its entire routing   table  every  30 seconds and this can eat up a significant portion of your bandwidth.

Use cases: small network with less than 15 routers that need basic policies.

2) EIGRP:
Pros:

– Easy to configure.
– Can load-balance traffic across unequal links. (only protocol with this great ability).
– Has an intrinsic loop free prevention feature (The feasibility condition).
– Has a rich granular option of choice to choose from in determining which route to install in the topology table  (lowest bandwidth,total delay, reliabiliy, load, MTU) depending on the network engineer’s choice and is not just restricted to hop count or bandwidth.
– More cpu friendly than ospf (ospf’s spf algorithm which is comparatively more processor intensive is triggered on every router at startup and in special scenarios).
– Flexible with network designs banking on summarization at any point (ospf can only be summarized at an abr or asbr).
– Self-heals instantaneously when a route goes down by keeping track of possible alternate routes (feasible successor) if any.
– Can use the HMAC-SHA-256 algorithm for authentication which is superior to md5 and prevents against packet replay attacks (rip doesn’t support HMAC-SHA-256).

Cons:

– Easily the most mentioned is its proprietary nature. Cisco in 2013 ‘opened’ EIGRP up and there is an informational ieft draft (http://tools.ietf.org/html/draft-savage-eigrp-00), however not all features can be run with the open eigrp for now.
– On a shared segment eigrp routers form full adjacency with all other routers and subsequently get route updates from each other which can be a bandwidth hog, unlike with ospf where on a shared segment all routers form full adjacency only with 2 routers (a dr and a bdr) and get route updates from them only.

Use cases: small, medium and large network. Very efficient with large and small scale WAN networks with large or small bandwidth constrains (ipsec, mpls, dmvpn etc).

3) OSPF:
Pros:

– Updates by multicast which ensures only concerned routers get the ospf message and full adjacency on shared segments is formed with only 2 routers (dr and bdr) ensuring there is no n(n-1) number of full adjacencies and route exchange at startup and during spf reruns which can easily hog the bandwidth.
– Uses the concept of area 0 to avoid loops (by forming a leaf-spine skeleton and ensuring only abrs can take in  lsa 1s and send to other areas as lsa 3s).
– Can use the HMAC-SHA-256 algorithm for authentication which is superior to md5 and prevents against packet replay attacks (rip doesnt support HMAC-SHA-256).

Cons:

– Resource intensive: each router has its own copy of the link state database and does a full/partial spf calculation for certain scenarios.
– Summarization is constrained to only the abrs (has a leg in area 0 and another area) and not any arbitrary postion.
– Harder to learn than eigrp/rip and has lots of caveats (area types, route types etc) ignorance of which would lead to sub-optimal routing or no routing at all, in complex networks with ospf in its glory it takes a highly skilled network engineer to handle ospf well.
– In hub and spoke topologies (like dmvpn) the dr must deterministic (in dmvpn’s case dr should = nhrp nhs) and that link should have the largest bandwidth.

Use cases: small, medium and large networks (with expert planning). Good for use with general WAN tech but not not as good as eigrp for dmvpn.

***The above is written with the notion that each routing protocol is running with its default values***

DMVPN-OSPF CHALLENGE

Topology: The ISP provides internet connection to the branches of the bank at their different geographical locations. Lagos is the headquarter while Abuja, PH and Delta are the branches of same bank. DMVPN is used as the bank’s choice for data connectivity from the HQ to its branches using the dmvpn functionality riding over the internet access provided by the ISP and connects all the branches to the headquarters with Lagos being the hub and the branches being the spokes.

full - new

Challenge: Finacle (a popular banking application) on the Abuja branch is being accessed by a banker on the PH branch but suddenly connection to the PH branch is lost and you are called in as the network expert to resolve the issue.
Pointers: The only information staffs can glean from the outage is that there was a power failure at the headquarter at Lagos and after some minutes power got restored leading to the Lagos router going off and later coming on and since then the users across the Abuja, PH and Delta have been complaining of connectivity issues across branches.
Based on the pointers and configurations presented below:
1) what do you suspect could be the technical issue that lead to the loss of connection between devices across the PH and Abuja branches?
2) what should be done/implemented to prevent a re-occurrence even if there is a power outage at the headquarter again?

BELOW ARE USEFUL CONFIGS, LOGS, SHOW COMMANDS AND VERIFICATION TO HELP YOU TROUBLESHOOT THE PROBLEM:

full - new

LAGOS - HUB - ROUTER - OSPF ROUTE TABLE
LAGOS – HUB – ROUTER – OSPF ROUTE TABLE
PH USER - ABJ - USER SUCCESSFUL PING
PH FINACLE  USER – ABJ – USER SUCCESSFUL PING
RELOAD (USED TO SIMULATE POWER OUTAGE) ON LAGOS HUB ROUTER
RELOAD (USED TO SIMULATE POWER OUTAGE) ON LAGOS HUB ROUTER
PH - USER TO ABJ - USER PING FAILURE BEFORE AND AFTER LAGOS ROUTER POWER OUTAGE
PH – FINACLE USER TO ABJ – FINACLE USER PING FAILURE BEFORE AND AFTER LAGOS ROUTER POWER OUTAGE
LAGOS - HUB - ROUTER AFTER RELOAD AND RE-ESTABLISHMENT OF FULL ADJACENCY
LAGOS – HUB – ROUTER AFTER RELOAD AND RE-ESTABLISHMENT OF FULL ADJACENCY

iou2 (LAGOS)

interface Tunnel1
ip address 10.1.1.10 255.255.255.0
no ip redirects
ip mtu 1400
ip nhrp map multicast dynamic
ip nhrp network-id 10
ip tcp adjust-mss 1360
ip ospf network broadcast
tunnel source Ethernet0/0
tunnel mode gre multipoint
tunnel key 10
!
interface Ethernet0/1
ip address 192.168.11.2 255.255.255.0

interface Ethernet0/0
ip address 10.10.10.1 255.255.255.0

router ospf 1
network 10.0.0.0 0.255.255.255 area 0

ip route 0.0.0.0 0.0.0.0 192.168.11.1

iou3 (DELTA)

interface Tunnel1
ip address 10.1.1.2 255.255.255.0
no ip redirects
ip mtu 1400
ip nhrp map multicast 192.168.11.2
ip nhrp map 10.1.1.10 192.168.11.2
ip nhrp network-id 10
ip nhrp nhs 10.1.1.10
ip tcp adjust-mss 1360
ip ospf network broadcast
tunnel source Ethernet0/1
tunnel mode gre multipoint
tunnel key 10
!
interface Ethernet0/0
ip address 10.10.20.1 255.255.255.0
!
interface Ethernet0/1
ip address 192.168.2.2 255.255.255.0

router ospf 1
network 10.0.0.0 0.255.255.255 area 0

ip route 0.0.0.0 0.0.0.0 192.168.2.1

iou4 (PH)

interface Tunnel1
ip address 10.1.1.3 255.255.255.0
no ip redirects
ip mtu 1400
ip nhrp map multicast 192.168.11.2
ip nhrp map 10.1.1.10 192.168.11.2
ip nhrp network-id 10
ip nhrp nhs 10.1.1.10
ip tcp adjust-mss 1360
ip ospf network broadcast
tunnel source Ethernet0/2
tunnel mode gre multipoint
tunnel key 10

interface Ethernet0/0
ip address 10.10.30.1 255.255.255.0
!
interface Ethernet0/2
ip address 192.168.3.2 255.255.255.0
!
router ospf 1
network 10.0.0.0 0.255.255.255 area 0
!
ip route 0.0.0.0 0.0.0.0 192.168.3.1
!

iou5 (DELTA)
interface Tunnel1
ip address 10.1.1.4 255.255.255.0
no ip redirects
ip mtu 1400
ip nhrp map multicast 192.168.11.2
ip nhrp map 10.1.1.10 192.168.11.2
ip nhrp network-id 10
ip nhrp nhs 10.1.1.10
ip tcp adjust-mss 1360
ip ospf network broadcast
tunnel source Ethernet0/3
tunnel mode gre multipoint
tunnel key 10
!
interface Ethernet0/0
ip address 10.10.40.1 255.255.255.0
!
interface Ethernet0/2
no ip address
!
interface Ethernet0/3
ip address 192.168.4.2 255.255.255.0
!
router ospf 1
network 10.0.0.0 0.255.255.255 area 0

ip route 0.0.0.0 0.0.0.0 192.168.4.1
!

*obviously there are other devices in between like L2 devices, this is just a logical representation of the dmvpn hub and spokes and also obviously a company would have an alternate form of power supply to back up the main power from the grid*

THE QUESTIONS ONCE AGAIN:

Based on the pointers and configurations presented below:

1) what do you suspect could be the technical issue that lead to the loss of connection between devices across the PH and Abuja branches?
2) what should be done/implemented to prevent a re-occurrence even if there is a power outage at the headquarter again?

DIFFICULTY LEVEL: MEDIUM