Mistakes of choosing MongoDB as the main database in a startup
In this article, I want to talk about my mistakes that I made when I wrote a service that had MongoDB as the main database for storing user data (and not only, but more on that below).
By no means do I think that MongoDB is a bad database and should not be used. Moreover, I believe that only my crooked hands got me into a situation that I had to get out of by rewriting the service for a different database (I went to Postgres and get high).
However, one cannot know everything, and reading the documentation does not save one from disaster during the actual implementation of the project. Especially if your expectations from the tool have diverged from reality.
In my opinion, MongoDB marketers have embellished the database application areas on their site. MongoDB is not generic. Far from universal and don’t even try to look at it as a solution to all your problems.
Didn’t use ORM or at least DTO
I’ll start with the first 155 mm bullet in the leg. If, after reading this block, you want to laugh and say that “well, it’s obvious,” then believe me, I’m not the only one so naive.
Using MongoDB, it is very easy to code natively if you are in a hurry and do not spend time writing dataclass or DTO.
When I started writing, I took pymongo. Without everything. It seemed to me very convenient: I created a client, picked up the configuration and go ahead. You can quickly access collections and write to them without thinking about data schemas at all. Of course, if you are a single developer, then in the beginning you have nothing to worry about. Service is small, you are alone. The whole project fits in your head. What problems can there be?
After a month or two, you will still start making mistakes in field names. The first time it will look like a simple typo that you are looking for for half an hour. “Why didn’t this field register?”. Ahh, because I’m a weirdo, here’s why: user_email instead of just email. It’s good if you notice this before this error goes to production and part of the data is created with the wrong field. If this happened: if you please write a migration and transfer the data. Again: it’s fine if it’s just a simple error in the field and it’s not related to the data type.
I tried to fix the error by changing PyMongo to MongoEngine. Spoiler: it didn’t get much easier.
Use classes or containers for data. Write data validations. All this is obvious, but after the fact. And write tests.
I believed that without migrations and data schemas, MongoDB would be easier
At first it seems that “incrementing the value of the project” will be easier without database schemas. No no and one more time no.
Migrations for a developing service are not only about renaming fields or adding new ones. It is also about the integrity and clarity of the data. In MongoDB, you can write documents that are very different in structure to one collection. They can be generally different, not a single field (well, except for id) is similar to each other. And this is a very big problem.
I’m already silent about making a fast DSL based on a data schema, because … but this data schema of yours does not exist. If you feel the urge to redo something, then you will have to build a very complex query (I ended up honestly googling when I gave up after the second day) to collect the data structure of a complex document that has been filled for almost a year.
Data schemas are about order and discipline. Data schemas are validation. This is an opportunity to quickly take a look at the project data and understand how your model or object works. It hurts so much without a plan. If you add my first mistake here, then imagine how my ass burned when I transferred everything to SQL.
Shot in the foot with data types
Int or string? MongoDB don’t care. You can have two documents with the same field, say, age, but if there is a mistake on the application code side, then catch the exceptions. Thousands of exceptions and loud swearing from your mouth in the middle of the night. Without tests or DTOs in the rush that almost every startup has, you will catch this problem all the time. And your favorite IDE may not notice this (specify types in your functions, yeah, I know).
Thought MongoDB was fast and always will be
MongoDB can really be fast – when you need to write stupidly and not think that you will ever have to read data or, oh my god, do complex queries where you want to do an aggregate or simulate a join of several collection tables.
Have you read somewhere that MongoDB is RAM efficient? This is a lie. MongoDB will consume a lot of RAM as data grows. It depends on two factors at once: the amount of data in the database and what kind of operation you want to do.
JOIN is very expensive, and query on nested documents is a guarantee to break brains
MongoDB authors recommend against using Mongo JOINs. He’s there, sort of, but he’s shit. It is correct to make one document and store all related data in it. Everything seems to be great, but then try to make a query using this very data. Complex query, implying exceptions and “or” logic. It hurts like an orbit doorknob.
Analytics is a circus
Actually, this problem comes from the previous one. Calculating something is more difficult than just total() – you need to get very confused. Answering simple questions like “how many N objects do we have for users B who are in categories C,W,Z and did such and such things” in MongoDB turns into multi-level unreadable JSON-like queries that are difficult to maintain. Requests are unreadable. You can, of course, teach a monkey to smoke, but what’s the point? MongoDB is not for analytical data.
Search is also a circus with two-legged horses
The MongoDB site has sweet examples of inventory, store, music collections, or books. They are so cool and so in a vacuum that at the moment when you need to implement a search with different conditions, the hair on your ass moves. What is easy to do in SQL even for beginners, in MongoDB turns into a hell of a hell.
I also thought that the search would be easy to implement. As simple as writing write queries without migrations, schemas and everything that is usually scary when comparing SQL and NoSQL databases.
I’ve tried to avoid search problems by separating large documents into separate collections. You know, extract from user its list of downloaded documents by transferring them to a separate files collection. But this implies that you need to join it back when you need it. But the more simultaneous requests to the database, the more it hurts – RAM consumption grows by leaps and bounds.
Searching MongoDB is expensive, complicated, and has no comparison to the simplicity of SQL.
What else hurt?
One list, so as not to get up twice.
There are few or no recipes and ready-made solutions
Forget calculations and built-in procedures. Without JS experience, it is much more difficult than SQL and expensive
There are no normal free GUIs (despite the fact that I know how to use the terminal and love console applications). All deeply curtailed (hello, Robo3T) and still forced to write queries by hand, which I emphasize again, far from sugar like SQL
Support is paid and sooooo bad (who used it knows)
So what is MongoDB really good for?
Still, they use the tool. Here I have listed how I see the correct assignment for MongoDB:
Logs. Good old logs that are written in a stream from different services and read once every six months if an incident occurs.
Data that you will never look beyond the top level. Maximum: take some document by its ID or another field (for example, user_id, which you previously took from another database).
Data that you will write most of the time in one document. Rarely will you have to change individual fields.
Large cache, divided into separate fields for convenience. It can be read quickly and easily invalidated. As a structured document, it can be immediately used by your application.
Sessions (although the issue of speed still remains). Similar to cache. We created a document, threw it into MongoDB and took one or two fields after a while. No deep search. The most simple. MongoDB, by the way, is conveniently sharded and replicated – this is really its advantage.
Queues. More specifically, you can use MongoDB as a database to store queues of data for later processing. Similar to cache and sessions, MongoDB allows you to store a structured document with queue data.
As you can see, first of all, I write about things that do not imply data lookup in MongoDB and will not require joining, aggregating, or somehow perverting in any other way.
Results
All the mistakes that I made were multiplied by the number of developers who came to the project and added something on their own. Sometimes sticking to what was already written, and sometimes adding your own creativity depending on your experience. As a result, the number of problems grew and at some point adding new features and debugging became simply unbearable. Given that this is not the first project, I myself was shocked at how I ended up in this situation.
The absence of a data schema is not an advantage of MongoDB, but a huge disadvantage. The impossibility of making normal joins (I emphasize: normal, not Mongolian ones with a horse overhead) is also a problem that does not allow breaking the database into logical blocks (or, more simply, putting things in order). Any project can be put into any popular SQL database, but not everything is in MongoDB.
Someone may say that I am a neo-silver and RTF in general. Maybe, but I don’t want to struggle and adjust the tool, I want to use it and be sure that it will not let me down, and I will know exactly what it can and cannot do. I have enough experience to understand that there is no silver bullet. I think that I fell for the MongoDB marketers sweet talk and decided that it would help the startup grow faster. Result: this article and +10 experience.
However, don’t be afraid to experiment and rewrite. In the end, I still consider it a good experience. Fortunately, I had the brains to write the application code itself so that the transition from NoSQL to SQL did not take too much time and money. Yes, and I noticed the problem quite early. I was even able to rewrite everything without stopping the service, bit by bit, but that’s another story.
Most importantly, if you are as naive as I am and understand what your mistake is, share your jambs with others – maybe someone is already ready to jump off a cliff, but will change their mind after reading something like that.