Loading…

1. A Brief History of Relational Data

Relational Data

Relational databases were not the first data structure, but when people talk about document databases, their behavior is usually contrasted with relational databases. We do not need a PhD-level knowledge of these databases, but some high-level concepts will serve us well.

The “relation” in “relational database” names the concept that instances of data can be linked (related) to other pieces of data. Think of a system that keeps track of books for a library; some items they would need to track the book's title, its author, how many copies they have, and who has a copy checked out. If they were to make a new “book” entry for each physical copy, they would end up repeating nearly everything. With a relational database, though, we could structure tables where very little would be duplicated.

In this example, we have 5 tables to hold the information, and two of those are there solely for the purpose of associating entities with each other. When we think about this structure, there are some interesting ways of parsing the data that weren't covered by the description above.

Notice the word “could” in the descriptions of the tables; there are different ways to define relations among entities, and database purists could also come up with scenarios that this structure does not cover. The intent here is to present an isolated yet non-trivial working example that we can use as we think through how this data is structured.

The ORM Bridge

In high-level programming languages, developers create structures with data representing some entity. Most database drivers deal in tables, rows, and columns. If we were to use one of those libraries, we would end up writing several queries and lots of property setting to constitute a domain object from the relational data in our tables.

An Object-Relational Mapper (ORM) library helps to translate between these structures and the relational database. There are lots of these tools, and they have some pretty solid up-sides in most cases. Take, for example, C# objects and Microsoft's Entity Framework Core (EF Core).

// This is the definition of a book
public class Book
{
    public long Id {get; set;} = 0L;
    public string Title {get; set;} = "";
    public int CopiesOnHand {get; set;} = 0;
}

// This is how we retrieve a book
//  - ctx is an open database context (connection)
//  - theId is the variable with the ID of the book we want to retrieve
var book = await ctx.Books.FirstOrDefaultAsync(book => book.Id = theId);

This illustrates the simplicity of code using an ORM library. Rather than create a query, execute it, check to see if anything was returned, then assign each column from the result to a property in the class - it's the single line above. (If no Book for that ID exists, the book variable will be null.)

That's great for the book information, but that's not all we need to access; we need authors for the book and we need patrons to check them out. We're not trying to become EF Core experts, but adding this information looks something like this…

public class Author
{
    public long Id {get; set;} = 0L;
    public string Name {get; set;} = ""; // naive
    // dates of birth/death, bio, etc.
    // A "navigation property" to find books by author
    public ICollection<Book> Books {get; init;} = new List<Book>();
}

public class Patron
{
    public long Id {get; set;} = 0L;
    public string Name {get; set;} = "";
    public string Phone {get; set;} = "";
    // skipping the navigation property here
}

public class CheckOut
{
    public long BookId {get; set;} = 0L;
    public long PatronId {get; set;} = 0L;
    public DateTime ReturnDate {get; set;} = DateTime.Now;
    // "Navigation properties"
    public Book Book {get; set;} = default!;
    public Patron Patron {get; set;} = default!;
}

// A new Book class
public class Book
{
    // properties as before, then...
    // ...more navigation properties
    public ICollection<Author> Authors {get; init;} = new List<Author>();
    public ICollection<CheckOut> CheckOuts {get; init;} = new List<CheckOut>();
}

Notice that the Author has a collection of Books, and a Book has a collection of Authors. This is how the book_author table is represented. For checked-out books, we actually have a type that links a Book to a Patron and also stores the return date. EF Core's “navigation properties” are how it exposes these other entities within .NET code. If we do that same simple Book retrieval command from the first example, we can now traverse these properties to display the author's name, who has books checked out, and when they're due back.

Referential Integrity

The term “referential integrity” is used to describe how these relations are kept in sync. In our example above, we wouldn't want a book_author record pointing to either a book or an author that does not exist. We would not want to allow a patron to be deleted while they still had books checked out. (And, we may not want to allow them to be deleted at all! If we need to keep records of who checked out what book, when, and when they were returned, we would lose those records if we deleted the patron.)

This is an area where relational databases excel. The IDs we have in some tables that point to tables where rows have that ID are called foreign keys. Relational databases allow us to define foreign keys that must exist; ones that must exist or can be missing; and what should happen when the parent record is deleted. To accomplish this, indexes will be applied to these key fields; this not only lets the integrity checks happen quickly, these can also be used to create queries to join the information efficiently.

So… What's the Problem?

There are no problems, only trade-offs. Let's look at how we get to documents from here.

Next: 2. What Are Documents?


« Using Relational Databases as Document Stores • Home