Pictured: Ms Majors outside the former Linden Lab office in Brighton, UK. Yes, her shirt says “Works for L$”.
Charity Majors has a great blog post about creating culture and rituals in the engineering team of a technology company and knows what she writes about. She is the CTO and co-founder honeycombs, the leading provider of observability is IE. A service that your company uses if it has a very complex computer network and you want to see in real time throughout the system what is working well and what is not. Charity cut her teeth at Linden Lab, a company with crazy, stupid complex computer network – and shares some of the insider rituals Linden engineers practiced to keep morale up and keep Second Life online and running as smoothly as (relatively) possible.
Which brings us to Shrek’s ears:
We had a tangled green felt headband with ogre ears called Shrek Ears. The first time the engineer interrupted production, he wore the Ears for the day. It might sound unpleasant, like a stupid hat, but no, it was a rite of passage. It was a mark of distinction! Each eventually stops production if they are working on something meaningful.
If you wore Shrek ears, people would stop you during the day and excitedly ask what happened and reminisce about the first time. They broke production. It has become a way for 1) new engineers to meet many of their teammates, 2) to interact with a lot of manufacturing wisdom and risk factors, and 3) to normalize the fact that yes, sometimes things break, and everything is fine – no one will yell at you.
So this is an internal understanding for Second Life users. Whenever the virtual world went offline or crashed constantly, chances are someone in San Francisco was roaming the office with green felt ears on their heads. Those ears were worn out, Charity tells me, “[L]Literally every outage.
But it wasn’t a mandatory hazing ritual, she quickly adds: “The spirit of it has always been very lighthearted and fun. congratulate people. Have you stopped production? Congratulations, you are finally one of us. Our thesis was that if you never abandoned a product, you probably didn’t move fast enough or took a risk. (Second Life was an insanely complex distributed system, even by today’s standards, at a time when there were no devops, IaaC, or any modern toolkit. The world was much more fragile.)”
I remember Shrek’s ears well because wearing them became a ritual during my time at the Linden Lab, from 2003 to 2006. In fact, they came from an engineering team to be worn by random people in the company after they messed up in their own department. (I even put them in for a while after I mixed up the New World Notes details, but I don’t think anyone noticed.)
Speaking about Second Life’s downtime, Charity told me about some of the big incidents she had on the engineering team between 2004 and 2010. Buckle up, we’re moving on to tech weed, sharing a photo of Shrek Ears, and also talking about “Transactpocalypse”:
Pictured: Linden alum Erica Firment in 2008 showing off Shrek’s ears: “I messed up the Subversion commit and overwrote Callum. [Linden]the code. I wore Shrek’s ears for a couple of hours as punishment.” (Via her flickr)
“There was a time when we upgraded MYSQL from 4.1 to 5.0 and ended up having to rollback and lose an entire day’s updates because the performance was terrible. (After that, I spent a year working on MYSQL performance testing tools, trying to eliminate the risk of upgrading).
“Another funny thing about MYSQL being idle was that the world literally could never come back again after a complete crash…because people trying to log in, From login service and mysql.agni. That’s what led us to develop the “velvet rope” process to slowly and selectively allow people to come back into the system in waves.
Was there, as we called it, a “transactionpocalypse” or something like that? When we noticed that there was an auto-increment column in the transaction database that only had a few days left before running out of integers and destroying the entire world indefinitely.”
According to her, the Lindens fixed this: “By creating a new column with a different integer type, then copying the content from the old column to the new one – everything first, then syncing the changes as you started copying in smaller and smaller incremental passes, then latching the table and moving the old transaction column to the temporary name and the new one to the old name. Nowadays, you can do fancy “online migrations” for your tables, but back then we had to do it all by hand.
“From a user perspective, the only impact was about two seconds of write errors—virtually unnoticeable. , it was huge and those were the days of spinning rust – pre-SSD)”. Fortunately, the world was saved by the keen eye of a Linden engineer named Ryan, who quickly found a solution.
Read the full Charity post hereand marvel at the rituals required to ensure that a fully 3D, user-generated virtual world accessed by approximately 1 million people a month does not become a mess of offline data.