Lain.La Infrablog

The DIY UPS Project - Power Outages? Not a problem!

Oh boy. Where do I begin with this article. Let's start with: Happy New Year!

Okay, maybe a table of contents is a better start:

Read more about The DIY UPS Project - Power Outages? Not a problem!

Caching LVM for Pomf

After the slice range improvement, IOWait (The amount of time the CPU is deadlocked waiting for storage calls to finish) across the four edge nodes uses for Pomf traffic went up quite a bit due to the need to address a larger amount of fragmented files across the slow storage cache disk. Before it wasn't really a problem, but now when there's over 100,000 slices to manage, it changes the IOPS requirements quite a bit.

Read more about Caching LVM for Pomf

Funding for Ideas - Short Version

Update 1/7/2025: BuyVM has been acquired, and this throws Lain.la's future into doubt.

BuyVM is the provider of the edge nodes that I use to publish Lain.la, and Francisco, the owner, has sold the company. While I don't expect anything to change immediately, I would expect that what I am continuing to do today (e.g. pushing 500TB to 600TB of traffic a month) is in jeopardy.

Read more about Funding for Ideas - Short Version

Pomf Now Uses Cache Slicing!

Hello again dear reader! Pomf has continued to scale to wild heights, and so the cracks are starting to show. One such "crack" was the issue of cache refills. Read more on how I have just solved this problem, hopefully forever.

Read more about Pomf Now Uses Cache Slicing!

November Updates and Metrics

I'm still in one of the most aggressive infrastructure upgrade periods I've ever had for Lain.la. I spent 13 hours today just working on upgrades, servers, patching, etc, not to mention the largest maintenance window I've ever had to put up with this month. Let's go over some stuff.

Read more about November Updates and Metrics

Rant: EC-Council - Beware the False Prophets of Security

Preface: This article is entirely my opinion, based on my direct experiences with EC-Council courseware, training, and examinations. I currently hold an EC-Council certification. This might change if they ever read this and manage to round up enough outsourced Indians to figure out who I am.

Read more about Rant: EC-Council - Beware the False Prophets of Security

October Updates and Metrics

Oh boy, how things have changed. This has been one of the most aggressive few months I've had for upgrades and changes.

Updates:

Stor1.lain.local is now fully operational, with a usable storage pool of 115TB.
Stor2.lain.local is now fully operational, with a usable storage pool of 90TB.

Read more about October Updates and Metrics

10/22/2022 4:00am - Incident Post-Mortem Analysis

Two incidents in a month. Ouch. And always when I'm sleeping, too.

So, at about 4:00am, the two main NYC lain.la nodes on BuyVM had their 1TB block storage mounts effectively jammed. No reads, no writes, no nothing. This wreaked all sorts of havoc - CPU deadlocked, Nginx deadlocked, nothing was going in or out of these two nodes. The other two nodes in Miami were fine. I woke up at about 10am, saw the problem, yanked the nodes, and popped open a ticket to BuyVM.

Read more about 10/22/2022 4:00am - Incident Post-Mortem Analysis

10/13/2022 7:30am - Incident Post-Mortem Analysis

At approximately 7:30am on October 13th, 2022, Pomf suffered an upstream (uncached GETs and all POSTs) failure due to a loss of storage connectivity on the hypervisor (esxi3.lain.local) it was running on. The issue was traced back to a transient NIC failure, a Mellanox ConnectX-2 10GbE card, that had timed out in responding to OS commands. This led to the software iSCSI stack crashing irrecoverably. This condition was rectified at 11:30am after a hard reboot of the host, after approximately 45 minutes of troubleshooting and VM evacuations from it.