The network chart in my previous post is a good thing to read before starting this article.
There were a few issues I had identified with the v1 architecture that were not a problem for the first year of running lain.la, but began to worry me. I liked what I had built, but I needed to mitigate some more risks to uptime and network integrity, as well as upgrade my core network. I devised the V2 architecture as a solution for those problems. Here's what those problems were:
- Single point of failure in router hardware/PSU
- Single point of failure in switching hardware/PSU
- Incompatibility with multi-host clustering for redundancy
- Weak routing hardware based on Broadcom/ARM chips
- Slow Gigabit internal network (for storage and VM movement, for example)
- I'm out of switchports!
- No VLAN support at switch level
The V2 network solves all of these problems in one fell swoop. Twin geographically diverse VPN connections ensure that if a VPS dies in one datacenter, lain.la will not lose it's connection to the world. All core components have redundant PSUs and redundant storage, as well as enterprise grade hardware to begin with. Ten Gigabit switching allows for incredible host-to-host communication. Layer 3 switching allows for features such as VLANs, port channels, and so much more.
My choices for hardware were simple.
- The Dell T5810 tower is the rank-and-file of my hosting infrastructure. It is by far the best performance-per-dollar platform today. So I retrofitted another one.
- A Dell R730 provides a platform for even more redundant hosting infrastructure for critical applications with remote management and dual PSUs, as well as local RAID.
- The Brocade ICX 6610 is a monster of a switch, with 48 Gigabit ports and up to 16 SFP+ 10G ports, and even 2 QSFP 40G ports for a future storage server.
- A Dell R320 is a 1U light duty server perfect for a routing role, although DD-WRT is certainly a bit on the light side even for it.
- Mellanox 10G SFP+ cards were simple single NIC additions to any PCIe equipped machine to enable it for 10G comms.
With all of this in mind, there are still a few weaknesses for the future:
- Single ISP. Hard to really fix this... Not cheap.
- No SAN. Again, hard to fix this... Not cheap. Some VMs are on a NAS entirely and those would fail over though, and RAID5 SSDs is a step forward.
- Single pfSense outbound gateway. Can fix this but really, really low risk.
- Single endpoint. Medium risk but super hard to failover properly.
A special thanks to the author of this page: https://forums.servethehome.com/index.php?threads/brocade-icx-series-cheap-powerful-10gbe-40gbe-switching.21107/