I’ve recently started working with MongoDB at work: it forms a core part of our tracking and reporting infrastructure, and all events that we track are slotted tidily into a Mongo database. Mongo has been getting some criticism of late, and while most of this has been largely misdirected (not using the right tool for the job doesn’t mean the tool was the problem), this piece was particularly interesting.
MongoDB does not give us much control over where data is placed, so the frequently accessed data (or data that is scanned together) may be spread over a large area. When scanning data only once, there is no way to prevent that data evicting the more frequently accessed data from memory. Once the frequently accessed data is no longer in memory, MongoDB becomes IO bound and lock contention becomes an issue.
My initial introduction to MongoDB was MongoDB in Action. I remember being struck by this quote:
Database tuning, which in most RDBMSs means tinkering with a wide array of parameters controlling memory allocation and the like, has become something of a black art. MongoDB’s design philosophy dictates that memory management is better handled by the operating system than by a DBA or application developer. Thus, data files are mapped to a system’s virtual memory using the mmap() system call. This effectively offloads memory management responsibilities to the OS kernel.
This sounds great. Why bother configuring when the kernel will probably do a better job than you ever could? It turns out this may have been a poor design decision: the kernel does a good job at managing memory, but doesn’t do the best job because it doesn’t know enough about how that memory is used within MongoDB.
It’s always difficult to separate the hype from the reality with any new technology, as this issue shows.
1. Use Mongo as WEB SCALE DOCUMENT STORE OF CHOICE LOL
2. Assume basic engineering principles applied throughout due to HEAVY MARKETING SUGGESTING AWESOMENESS.
3. Spend 6 months fighting plebbery across the spectrum, mostly succeed.
4. NIGHT BEFORE INVESTOR DEMO, TRY UPLOADING SOME DATA WITH “{$ref: ‘#/mongodb/plebtastic'”
5. LOL WTF?!?!? PYMONGO CRASH?? :OOO LOOOL WEBSCALE
6. It’s 4am now. STILL INVESTIGATING
7. DISCOVER PYMONGO DOES NOT CHECK RETURN VALUES IN MULTIPLE PLACES. DISCOVER ORIGINAL AUTHOR SHOULD NOT BE ALLOWED NEAR COMPUTER
8. REALIZE I CAN CRASH 99% OF ALL WEB 3.9 SHIT-TASTIC WEBSCALE MONGO-DEPLOYING SERVICES WITH 16 BYTE POST
9. REALIZE 10GEN ARE TOO WORTHLESSLY CLUELESS TO LICENCE A STATIC ANALYZER THAT WOULD HAVE NOTICED THIS PROBLEM IN 0.0000001 NANOSECONDS?!!?!?@#