“We're Out of Columns” – Best Worst Codebase

Oh, the merchants2 table? Yeah, we ran out of columns in merchants, so we made merchants2.

When I started programming as a kid, I didn’t know people got paid to code. Even when I graduated high school, I assumed the world of “professional development” looked very different than the code I was writing in my spare time. When I was fortunate enough to land my first software job, I quickly learned how wrong I was, and how right I was. My first job was a trial by fire, and to this day, that codebase remains the worst and best codebase I’ve ever worked on. While that codebase will forever remain locked within the proprietary walls of that particular company, I hope I can share with you some of the funniest and scariest stories from it.

image

The database lives forever

In a large legacy system, the database is not just a place to store data, it is a creator of culture. The database sets the constraints for the operation of the system as a whole. It is the place where all the code meets. The database is the “watering hole”. In our case, this watering hole was quite dirty.

Did you know that SQL Server has a limit on the number of columns in a table? I didn't either. At the time it was 1024, today it seems to be 4096. Needless to say, most people don't need to know this. But we did. The reason is that Merchants (our table for storing customer information) ran out of columns a long time ago. The solution was Merchants2. A table that (if I remember correctly) had about 500+ columns.

Merchants (and its best friend Merchants2) were the lifeblood of the system. Everything came back to Merchants. But it wouldn't be so bad if Merchants was one weird table (okay, two). There were many properly normalized tables, all of which had foreign keys to Merchants. But one of them will always hold a special place in my heart: SequenceKey.

Sequence Key

For ease of understanding, I recreated the entire SequenceKey table above. Yes, you read that right, that's the entire table. A table with one key and one value. If simplicity is a virtue, then SequenceKey could be declared the perfect table. What could be simpler?

But you may be asking yourself, what use would a table with one column and one row have? Generating IDs. At the time, I heard a story that SQL Server once didn't support auto-incrementing IDs. That was an acceptable, correct answer. My research to find out if that was true turned up nothing. But in practice, it served much more than that.

SequenceKey was the glue. In each stored procedure that created new entities, you first took the key from SequenceKey, incremented it. And then inserted it as an ID into N different tables. Now you had an implicit join between all those entity tables. If you saw an ID in the system, there was a good chance that adjacent tables would have a row with the exact same ID. Quite clever, frankly.

Calendar

The database can live forever, but our login system was limited to a calendar. I don't mean an actual calendar. I mean a database table called calendar. What did it contain? A calendar, filled in manually. When I asked our shaman (his name was Munch), he informed me that when the calendar runs out, we can't log in. That happened a few years ago. So they asked an intern to fill in another 5 years to make sure that it wouldn't happen anytime soon. What system used that calendar? Nobody knew.

Employees

Every morning at 7:15, the employees table would drop. All the data would be gone. Then the csv from adp would be loaded into the table. During this time, you couldn't log in. Sometimes this process would fail. But the process didn't end there. The data had to be replicated to headquarters. So an email was sent to the person who would press a button every day to copy the data.

Database Replacement

You're probably thinking: Can't anyone clean up this database? Make it easier to work with? Well, the company beat you to it. A copy of the database was created. The data in that copy was about 10 minutes out of date. It only synced one way. But that database was normalized. How normalized? It took 7 connections to go from a salesperson to a phone number.

Sales figures

Each salesperson had a quota they had to hit each month, called a “win.” The tables that stored this data (not financial, but sales-specific) were incredibly complex. Every day, you had to figure out which rows had been added and updated, and sync them with some system at headquarters. This wasn’t a problem until one salesperson figured out that he could ask for these records to be changed manually.

This salesperson had already won and closed another big deal that month. They wanted to roll it over to the next month. An intern was assigned to do it. Word spread and over the next three years the number of requests grew exponentially. At one point we had 3 interns whose full-time job was writing these SQL queries. Writing an app for this was considered too difficult. However, before leaving I tried to help the interns create their own app. I don't know if they succeeded.

Code base

But what's a database without a codebase? And what a great codebase it was. When I joined, everything was in Team Foundation Server. If you're not familiar, that was a centralized version control system built by Microsoft. The core codebase I was working with was half VB, half C#. It ran in IIS and used session state for everything. What did that mean in practice? If you went to a page on Path A or Path B, you'd see completely different things on that page.

But to describe this codebase as half VB, half C# would be to praise it. Every javascript framework that existed at the time was uploaded to this repository. Usually with some custom modifications that the author felt were necessary. Most notably, knockout, backbone, and marionette. But of course, there was some jquery and jquery plugins.

But this codebase wasn't isolated. There were about a dozen soap services and a few native Windows apps next to it. The most notable was the delivery manager. The Tale says that the entire application was built in a weekend by a single developer. Let's call him Gilfoyle. By all accounts, Gilfoyle was an incredibly fast programmer. I never met him, but I felt like I knew him not just from his code in the repositories, but from all the code he left on hard drives.

Gilfoyle Hard Drives

Munch (yes, that was his real name) kept Gilfoyle's hard drive in a RAID configuration on his desktop for years after Gilfoyle left the company. Why? Because Gilfoyle was known for not checking code. Not only that, but for creating the occasional one-off Windows app for a single user. So it wasn't unusual for a user to contact us with a bug report for an app that existed only on Gilfoyle's hard drive.

Error in delivery

Most of my job involved identifying bugs that teams didn't want to spend time on. One particularly nasty bug would pop up every few months. After we shipped items, there would be items in the shipping queue that had both shipped and not shipped. I used a number of workarounds (SQL script, Windows app, etc.) to try to get us out of the outage. I was advised not to try to find the root cause. But I couldn't help myself.

Along the way, I learned how Gilfoyle thinks. The delivery app pulled the entire database, then filtered by date, keeping all orders after the app's launch date. The app used a SOAP service, not a service function. No, the service was a simple function. It was the client that was creating all the side effects. In this client, I found a huge class hierarchy. 120 classes, each with different methods, inheritance going 10 levels deep. The only problem? ALL THE METHODS WERE EMPTY. I'm not exaggerating here. Not partly empty. Completely empty.

This stumped me for a while. Eventually I learned that it was done to create a structure that he could then reason about. This reasoning would allow him to create a delimited string (whose structure was entirely database dependent, but completely static) that he would send over the socket. It turns out that all of this was eventually sent to Kewill, a service that interacted with carriers. Why did this error happen? Kewill was reusing long 9-digit numbers every month, and someone had disabled the cron job that was deleting old orders.

Beautiful mess

There's so much more to say about this codebase. Like the team of Super Senior developers who spent 5 years rewriting everything without shipping a single piece of code. Or the Red Hat consultants who built a single database to rule them all. There were so many crazy corners of this codebase. So many reasons why there were entire teams dedicated to rewriting at least one part of its functionality from scratch.

But I think the most important story to tell is about Justin’s improvement to the seller search page. The seller search page was the entry point to the entire app. Every customer service rep on the phone with a seller would type in their ID or name to find information about them. This would take you to a huge page with all the information about the seller. The page was information-rich in the best sense of the word, full of any information you could possibly need and any links you might want to visit. But it was dog-slow.

Justin was the only senior developer on my team. He was bright, snarky, and couldn't give a shit about the business. He told it like it was, didn't stall, and could always solve problems faster than the team around him. One day, Justin got tired of hearing about how slow the vendor search page was, so he went and fixed it. Each field on that screen became its own endpoint. When it loaded, it would start fetching data, and as one request loaded, the rest would be executed. Page load times dropped from minutes to seconds.

Two ways of division

Why was Justin able to do this? Because there was no master plan for this codebase. There was no overall design that the system was supposed to fit into. There was no expected format for the API. There was no documented design system. There was no architectural board to make sure everything was aligned. The application was a complete and utter mess. No one could fix it, so no one tried. What did we do instead? We dug in and made our own little world of sanity.

This monolithic app, out of necessity, became a microcosm of many little apps at the edges. Anyone tasked with improving some part of the app inevitably abandoned the untangling of this web and found some cozy corner to build new things. And then slowly updated the links to point to the new things, crowding out the old.

It may seem like a mess to you. But it was surprisingly pleasant to work in. The problems with code duplication were gone. The problems with consistency were gone. The worries about extensibility were gone. Code was written to serve its purpose, to affect as little of the area around it as possible, and to be easily replaceable. Our code was disjointed because it would have been much harder to connect.

After

In my subsequent career, I have never worked in such a remarkably ugly codebase. Every ugly codebase I have encountered since never got over its need for consistency. Maybe it was because the codebase had been abandoned by “serious” developers long before. What remained were a ragtag group of interns and junior developers. Or maybe it was because there was no layer between those developers and the users: no translations, no requirements gathering, no maps. You were just standing behind a support rep's desk, asking how to make their lives better.

I miss that direct connection. The quick feedback. The lack of need to make grand plans. A simple problem and code. Maybe it's just naive nostalgia. But when I'm lying on the couch wishing I could go back to the worst years of my childhood, when I encounter yet another “enterprise design pattern”, memories of that beautiful, terrible code base flash through my mind.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *