10/22/2022 4:00am - Incident Post-Mortem Analysis
Two incidents in a month. Ouch. And always when I'm sleeping, too.
So, at about 4:00am, the two main NYC lain.la nodes on BuyVM had their 1TB block storage mounts effectively jammed. No reads, no writes, no nothing. This wreaked all sorts of havoc - CPU deadlocked, Nginx deadlocked, nothing was going in or out of these two nodes. The other two nodes in Miami were fine. I woke up at about 10am, saw the problem, yanked the nodes, and popped open a ticket to BuyVM.
10/13/2022 7:30am - Incident Post-Mortem Analysis
At approximately 7:30am on October 13th, 2022, Pomf suffered an upstream (uncached GETs and all POSTs) failure due to a loss of storage connectivity on the hypervisor (esxi3.lain.local) it was running on. The issue was traced back to a transient NIC failure, a Mellanox ConnectX-2 10GbE card, that had timed out in responding to OS commands. This led to the software iSCSI stack crashing irrecoverably. This condition was rectified at 11:30am after a hard reboot of the host, after approximately 45 minutes of troubleshooting and VM evacuations from it.
Storage. Storage Storage Storage.
I jumped the gun on the storage server project. Decided to just go for it. And guess what? It's done! Pomf is on the new server right now!

August Updates and Metrics
Hello again! Time for updates and metrics.
I don't need your money. But I have big ideas.
Hello. I'm 7666. I single-handedly run this massive collection of services known as Lain.la. I started all this a little over three years ago (August 31st, 2020) as a productive outlet for myself, to make systems and services I'd be proud of that were also useful to me and my friends, while keeping privacy and security paramount.
The Joys of NAT, IP Masquerading, Default Routes, and More
Recently, I was asked by a friend to solve a long standing issue in my network for their freebie server. Under a network where NAT is being done at least two network hops upstream, the source IP address will be lost in translation, making it impossible to determine the real IP address of an incoming client to a server inside my network. That server will only see the upstream edge node's IP, not the user, such as 10.13.0.1.
8/3/2022 12:00am - Incident Post-Mortem Analysis
This should be short and sweet. Simple ISP outage.
At almost exactly midnight on August 3rd, extreme latency was detected on all VPN links, and my game of Factorio that was in progress shit the bed. After doing quick debugging, I determined that all external connectivity was gone, and the VPN tunnels went dead completely. Checking the core router, I found that there was no upstream gateway responding, and udhcpc was sending discover requests with no reply.
The Indonesian Porn Incident, Round 2
Well, here we are again. Indonesian porn breaking my infrastructure. We had a half hour outage of all HTTP and VPN based services on OPT1-4 due to overload from Pomf traffic. This was a little different, however. My script DIDN'T cause a cascade failure. No, actually there was just so much load spread out across all four endpoints that the whole thing collapsed. Let's go into that load profile.
First, an image.
I Missed an Article for June!
Hey, I have a good excuse though. Multiple, actually!
Vacation!
I took a vacation in the middle of June! That's why not much got done. Did you notice though? Probably not - except for a poorly timed ISP outage, Lain.la kept humming along wonderfully while I was frolicking around London. I had to step in once or twice to handle abuse requests but otherwise, pain-free. Very happy that my infrastructure basically runs and heals itself in my absence. Makes the time investment minimal for daily operations.
May Updates and Metrics
Here's another updates and metrics article for your enjoyment.
Updates:
We've had quite a few improvements lately: