2. What Are Documents?
Structure Optional
The majority of the previous page was dedicated to describing a conceptual structure of our data, and how that is structured in a high-level language with an ORM library. This is not a bad thing on its own; most data has a defined structure. What happens when that structure changes? Or, what happens when we may not know the structure?
This is where the document database can provide benefits. We did not show the SQL to create the tables in the library example, but our book type might look something like this in SQLite:
CREATE TABLE book ( id NUMBER NOT NULL PRIMARY KEY, title TEXT NOT NULL, copies_on_hand INTEGER NOT NULL DEFAULT 0);
If we wanted to add, for example, the date the library obtained the book, we would have to change the structure of the table…
ALTER TABLE book ADD COLUMN date_obtained DATE;
Document databases do not require anything like this. For example, creating a book
collection in MongoDB, using their JavaScript API, is…
db.createCollection('book')
The only structure requirement is that each document have some field that can serve as an identifier for documents in that table. MongoDB uses _id
by default, but that can be configured by collection.
Mapping the Entities
In our library, we had books, authors, and patrons as entities. In an equivalent document database setup, we would likely still have separate collections for each. A book
document might look something like…
{ "Id": 342136, "Title": "Little Women", "CopiesOnHand": 3 }
Because no assumptions are made on structure, if we began adding books with a DateObtained
field, the database would simply add it, no questions asked.
{ "Id": 452343, "Title": "The Hunt for Red October", "DateObtained": "1986-10-20", "CopiesOnHand": 1 }
The only field the database cares about is Id
, assuming we specified that for our collection's ID.
Mapping the Relations
We certainly could bring book_author
and book_checked_out
across as documents in their own collection. However, document databases do not (generally) have the concept of foreign keys.
Let's first tackle the book/author relationship. JSON has an array type, which allows multiple entries of the same type to be entered. We can add an Authors
property to our book
document:
{ "Id": 342136, "Title": "Little Women", "Authors": [55923], "CopiesOnHand": 3 }
With this structure, if we're rendering search results and want to display the author's name(s) next to the title, we will either need to query the author
collection for each ID in our Authors
array, or come up with a projection that crosses two collections. Since we're still storing properties of a book
, though, we could include the author's name.
{ "Id": 342136, "Title": "Little Women", "Authors": [{ "Id": 55923, "Name": "Alcott, Louisa May" }], "CopiesOnHand": 3 }
This document does a lot for us; we can now see the title and the authors all together, and the IDs being there would allow us to dig into the data further. If we were writing a Single-Page Application (SPA), this could be used without any transformation at all.
Conversely, any application code would have to be aware of this structure. Our C# code from the last page would now likely need a DisplayAuthor
type, and Authors
would be ICollection<DisplayAuthor>
. We also see our first instance of repeated data. The next page will be a deeper discussion of the trade-offs we should consider.
For now, though, we still need to represent the checked out books. We can use a similar technique as we did for authors, including the return date.
{ "Id": 342136, "Title": "Little Women", "Authors": [{ "Id": 55923, "Name": "Alcott, Louisa May" }], "CopiesOnHand": 3, "CheckedOut": [{ "Id": 45112, "Name": "Anderson, Alice", "ReturnDate": "2025-04-02" }, { "Id": 38472, "Name": "Brown, Barry", "ReturnDate": "2025-03-27" }] }
Structure Reconsidered
One of the big marketing points for document databases is their ability to handle “unstructured data.” I won't go as far as saying that's something that doesn't exist, but the vast majority of data described this way is data whose structure is unknown to the person considering doing something with it. The data itself has structure, but they do not know what it is when they get started - usually a prerequisite for creating the data store. On rare occasions, there may be data sets with several structures mixed together in the same set; even in these data sets, though, the cacophony usually turns out to be a finite set of structures, mixed inconsistently.
Keep that in mind as we look at some of the trade-offs between document and relational databases. Just as your body needs its skeletal structure against which your muscles and organs can work, your data has structure. Document databases do not abstract that away.
Next: 3. Relational / Document Trade-Offs
Previous: 1. A Brief History of Relational Data