Loading…

BitBadger.Npgsql.Documents
Advanced Usage: Related Documents / Custom Queries

Document stores generally have fewer relationships than traditional relational databases, particularly those that arise when data is structured in Third Normal Form; related collections are stored in the document, and ever-increasing surrogate keys (a la sequences and such) do not play well with distributed data. Unless all data is stored in a single document, though, there will still be a natural relation between documents.

Thinking back to our earlier examples, we did not store the collection of rooms in each hotel's document; each room is its own document, and contains the ID of the hotel as one of its properties.

// F#; for C#, picture a standard class or record type
type Hotel =
    {   Id : string
        // ... more properties ...
    }

type Room =
    {   Id : string
        HotelId : string
        // ... more properties ...
    }

Document Table SQL in Depth

The library creates tables with a data column of type JSONB, with a unique index on the configured ID name that serves as the primary key (for these examples, we'll assume it's the default Id). The indexes created by the library all apply to the data column. The by-ID query for a hotel would be…

SELECT data FROM hotel WHERE data ->> 'Id' = @id

...with the ID passed as the @id parameter.

Continue reading; using WhereById (C#) / whereById (F#) will create the data ->> 'Id' = @id criteria using the configured ID name.

Finding all the rooms for a hotel would use a JSON containment query, which looks like…

SELECT data FROM room WHERE data @> @criteria

...with something like new { HotelId = "abc123" } passed as the matching document in the @criteria parameter.

So far, so good; but, if we're looking up a room, we do not want to have to make 2 queries just to also be able to display the hotel's name. The WHERE clause on the first query above uses the expression data ->> 'Id'; this extracts a field from a JSON or JSONB field as TEXT (there is also a -> operator, which extracts the field as JSONB). Since this is the value our unique index indexes, and we are using a relational database, we can write an efficient JOIN between these two tables.

SELECT r.data, h.data AS hotel_data
  FROM room r
       INNER JOIN hotel h ON h.data ->> 'Id' = r.data ->> 'HotelId'
 WHERE r.data ->> 'Id' = @id

(This syntax would work without the unique index; it would default to using the GIN index (Full or Optimized), if it exists, but it wouldn't be quite as efficient as a zero-or-one unique index lookup.)

Using “Building Blocks”

For every method or function described in Basic Usage, the Query module / static class contains the query for that operation. Each Query method or function takes the table name as it (usually) lone parameter. All ById queries declare an @id parameter; all ByContains queries declare a @criteria parameter; and all ByJsonPath queries declare a @path parameter. It also contains SelectFromTable, WhereById, WhereDataContains, and WhereJsonPathMatches functions to generate those portions of a SQL statement.

There are also two functions that help translate queries into domain objects. FromData<T> deserializes a data column into the requested type, while FromDocument<T> does the same, but allows the column to be named as well. We'll see how we can use these in further examples.

Finally, there are two functions to assist with parameter creation. JsonbDocParam will create a JSONB parameter, but the user is responsible for providing the name. DocParameters creates two parameters, one for the document's ID as @id, and the other for the document itself as @data.

Executing Queries in F#

But, C# users, please read this part, as we'll build on it in the next section.

As this library uses Npgsql.FSharp to implement its data access, harnessing it to run our custom queries works. (Ensure open Npgsql.FSharp is present.) The query is passed as a string, the parameters are passed as a list of tuples, and the function passed to the execution function is responsible for extracting the desired data from the query.

/// Inferred types - roomId = string, return = Task<Room * Hotel>
let findRoomById roomId = backgroundTask {
    Configuration.dataSource ()
    |> Sql.fromDataSource
    |> Sql.query $"""
        SELECT r.data, h.data AS hotel_data
          FROM room r
               INNER JOIN hotel h ON h.data ->> '{Configuration.idField ()}' = r.data ->> 'HotelId'
         WHERE r.{Query.whereById "@id"}"""
    |> Sql.parameters [ "@id", Sql.string roomId ]
    |> Sql.executeRowAsync (fun row ->
        fromData<Room> row, fromDocument<Hotel> "hotel_data" row)
}

The query above demonstrates using Configuration.idField and whereById to use the configured ID field. Creating custom queries using these building blocks allows us to utilize the configured value without hard-coding it throughout our custom queries. If the configuration changes, these queries will pick up the new field name seamlessly.

While this example retrieves the entire document, this is not required. If we only care about the name of the associated hotel, we could amend the query to retrieve only that information.

/// Inferred types - roomId = string, return = Task<Room * string>
let findRoomByIdImproved roomId = backgroundTask {
    Configuration.dataSource ()
    |> Sql.fromDataSource
    |> Sql.query $"""
        SELECT r.data, h.data ->> 'Name' AS hotel_name
          FROM room r
               INNER JOIN hotel h ON h.data ->> '{Configuration.idField ()}' = r.data ->> 'HotelId'
         WHERE r.{Query.whereById "@id"}"""
    |> Sql.parameters [ "@id", Sql.string roomId ]
    |> Sql.executeRowAsync (fun row ->
        fromData<Room> row, row.string "hotel_name")
}

This query is amazingly efficient, using 2 unique index lookups to return this data. Even though we do not have a foreign key between these two tables, simply being in a relational database allows us to retrieve this related data.

Executing Queries in C#

(jump to the solution, which applies to F# as well)

The Challenge

In the above examples, all of the Sql. calls are function calls. The result of each one is piped to the next one, where it is provided as the final parameter. This is a process called “currying,” the concept being that each function only has one parameter. For functions that have multiple parameters, they can be thought of as functions that return another function with one fewer parameter required; the final function in the chain returns the value instead of another function. In C#, instead of…

/// Type is Func<string, string, string, string>
public void UnnecessaryConcatenation(string value1, string value2, string value3) =>
    value1.Concat(value2).Concat(value3);

...it would be defined as…

/// Type is Func<string, Func<string, Func<string, string>>>
public void StillUnnecessaryConcatenation(string value1) =>
    (string value2) =>
        (string value3) => value1.Concat(value2).Concat(value3);

Currying goes hand-in-hand with the concept of partial application. In the F# examples above, Sql.query has two parameters - string and SqlProps - but we call it just with the string (partial application), resulting in a function SqlProps -> SqlProps (Func<SqlProps, SqlProps> in traditional .NET parlance). The pipe operator |> takes the output from the previous function (the SqlProps that define the connection), and returns a new SqlProps that has the query definition as well. This pattern continues through the pipeline, until the final call returns the domain type rather than more SqlProps.

This is a bit different from “fluent” functions, which use method chaining to handle several calls in one chain. Fluent approaches like LINQ return an object that has both state (like SqlProps does), and also has behavior to apply future changes to that object.

    // We cannot do this; SqlProps has no methods
    Sql.FromDataSource(Configuration.DataSource())
        .Query("the query text")
        .Parameters( /* the parameters */ )
        .ExecuteAsync(FromData<Room>);

Since the behavior (Sql. functions) is defined separately from the state (SqlProps), state is preserved through pipelining rather than method chaining. People have written books about the differences this brings - our purpose here is documentation, after all, not education - but for our purposes, that is the main reason we cannot “just use a method chain instead.”

The Solution

Since we cannot use the pipeline operator, this library provides a few methods that bridge this gap. The Document class has a Custom class, with methods that can be called with queries and parameters, based on the expected return type. These methods also utilize the configured data source and serializer, so they may be preferable for F# users as well; there are F# functions available as well.

Method When to Use It
Custom.Single Retrieving 0 or 1 result; return value may be null
Custom.List Retrieving a list of matching documents
Custom.NonQuery Executing a query that will not return results

Using the second query from the F# examples above, we'll use Custom.Single to retrieve a Tuple<Room, string>.

// C#
    public async Task<Tuple<Room, string>?> FindRoomById(string roomId) =>
        await Document.Custom.Single(
            $"SELECT r.data, h.data ->> 'Name' AS hotel_name
                FROM room r
                     INNER JOIN hotel h ON h.data ->> '{Configuration.IdField()}' = r.data ->> 'HotelId'
               WHERE r.{(Query.WhereById("@id"))}",
            new { Tuple.Create("@id", Sql.string(roomId)) },
            row => Tuple.Create(Document.FromData<Room>(row), row.string("hotel_name")));

If we wanted to retrieve all rooms for any hotel in Los Angeles, we can combine Query class calls, joins, and the Custom module to create that query:

// C#
    public async Task<List<Room>> FindRoomsInLA() =>
        await Document.Custom.List(
            $"{(Query.SelectFromTable("room"))} r
                    INNER JOIN hotel h ON h.data ->> '{Configuration.IdField()}' = r.data ->> 'HotelId'
              WHERE h.{(Query.WhereDataContains("@criteria"))}",
            new { Tuple.Create("@criteria", Query.JsonbDocParam(new { City = "Los Angeles" }) },
            Document.FromData<Room>);

A few notes on the above:

F# can use a similar technique, but with triple-quoted strings to allow us to interpolate a function call that also requires quotes.

// F#
    // implied return type - Task<Room list>
    let findRoomsInLA () =
        Custom.list
            $"""{Query.selectFromTable "room"} r
                       INNER JOIN hotel h ON h.data ->> '{Configuration.idField ()}' = r.data ->> 'HotelId'
                 WHERE h.{Query.whereDataContains "@criteria"}"""
            [ "@criteria", Query.jsonbDocParam {| City = "Los Angeles" |} ]
            fromData<Room>

Custom Scalar Queries

The Count and Exists modules and static classes represent a form of scalar query; they return a single value. If the need arises to customize these queries, Custom.Scalar provides the ability to perform these, using the data source configured for the library. Continuing our example, above, if we just wanted the count of rooms in LA hotels, we could obtain that via:

// C#
    public async Task<int> CountRoomsInLA() =>
        await Document.Custom.Scalar(
            $"SELECT COUNT(*) AS the_count
                FROM room r
                    INNER JOIN hotel h ON h.data ->> '{Configuration.IdField()}' = r.data ->> 'HotelId'
              WHERE h.{(Query.WhereDataContains("@criteria"))}",
            new { Tuple.Create("@criteria", Query.JsonbDocParam(new { City = "Los Angeles" }) },
            row => row.int("the_count"));

Using This Library for Non-Document Queries

Some of the methods and functions provided by the library can be used with non-document tables as well, particularly the Custom functions. This may be a convenient and consistent way to access your data, while delegating connection management to the library and its configured data source.

Let's walk through a short example using C#; F# uses very similar patterns.

// C#
    using Npgsql.FSharp; // Needed for RowReader and Sql types

    // Stores metadata for a given user
    public class MetaData
    {
        public string Id { get; set; } = "";
        public string UserId { get; set; } = "";
        public string Key { get; set; } = "";
        public string Value { get; set; } = "";
    }

    // Static class to hold mapping functions
    public static class Map
    {
        // These parameters are the column names from the underlying table
        public MetaData ToMetaData(RowReader row) =>
            new MetaData
            {
                Id = row.string("id"),
                UserId = row.string("user_id"),
                Key = row.string("key"),
                Value = row.string("value")
            };
    }

    // somewhere in a class, retrieving data
    public Task<List<MetaData>> MetaDataForUser(string userId) =>
        Document.Custom.List("SELECT * FROM user_metadata WHERE user_id = @userId",
            new { Tuple.Create("@userId", Sql.string(userId)) },
            Map.ToMetaData);

« Back to Advanced Usage