This should be short and sweet. Simple ISP outage.

At almost exactly midnight on August 3rd, extreme latency was detected on all VPN links, and my game of Factorio that was in progress shit the bed. After doing quick debugging, I determined that all external connectivity was gone, and the VPN tunnels went dead completely. Checking the core router, I found that there was no upstream gateway responding, and udhcpc was sending discover requests with no reply.

(The timestamp above is UTC-5).

At this stage, an ISP failure was determined as the cause. I unplugged and replugged the CAT6 cable going to the fiber ONT to look for a link down/up event in the syslog of the core router, and that was present, telling me that a mouse did not eat the CAT6 cable. I then went outside to the side of the house and cracked open the ONT waterproof housing to check the status lights. Nothing was amiss, according to the ONT. I power cycled the ONT with no effect.

At that point, downdetector.com lit up, telling me that there was a northeast USA Verizon outage. Nothing I could do from this point forward but keeping sending udhcpc requests manually, hoping Verizon would reply. At about 1:30am, all services were restored and stable.

So, what have we learned?

  • That single point of failure is still a single point of failure. Shocker.
  • I'm a bit more comfortable with my core router after poking and prodding it.
  • We are still static IP agnostic! I lost the DHCP lease on the WAN IP I had for the past year. Nothing broke.
  • I need to do a wiring check on everything from the core router to the ONT, and ensure the ONT is clean and waterproof. Just in case.
  • Uptime Robot could really be sorted into folders/separate pages.
  • We need that redundant WAN implemented.

So, I ordered the 5G LTE T-Mobile service and it'll be here next week. I also ordered a 4x1GbE PCIe card for the core router so I have enough ports to add redundant WAN (and later, redundant LAN). There will be some downtime events to add the card and then test a failover script. I'm excited to see how fast the 5G gateway is - but I'm not hopeful that it will meet my 50mbps minimum target on upload. We'll see. (UPDATE: It was shit. 17mbps. Unusable.)

This re-adds that $50/mo cost to Lain.la's budget, by the way. Ouch.