BitBadger.Npgsql.Documents
Advanced Usage: Related Documents / Custom Queries
Document stores generally have fewer relationships than traditional relational databases, particularly those that arise when data is structured in Third Normal Form; related collections are stored in the document, and ever-increasing surrogate keys (a la sequences and such) do not play well with distributed data. Unless all data is stored in a single document, though, there will still be a natural relation between documents.
Thinking back to our earlier examples, we did not store the collection of rooms in each hotel's document; each room is its own document, and contains the ID of the hotel as one of its properties.
// F#; for C#, picture a standard class or record type type Hotel = { Id : string // ... more properties ... } type Room = { Id : string HotelId : string // ... more properties ... }
Document Table SQL in Depth
The library creates tables with a data
column of type JSONB
, with a unique index on the configured ID name that serves as the primary key (for these examples, we'll assume it's the default Id
). The indexes created by the library all apply to the data
column. The by-ID query for a hotel would be…
SELECT data FROM hotel WHERE data ->> 'Id' = @id
...with the ID passed as the @id
parameter.
Continue reading; using
WhereById
(C#) /whereById
(F#) will create thedata ->> 'Id' = @id
criteria using the configured ID name.
Finding all the rooms for a hotel would use a JSON containment query, which looks like…
SELECT data FROM room WHERE data @> @criteria
...with something like new { HotelId = "abc123" }
passed as the matching document in the @criteria
parameter.
So far, so good; but, if we're looking up a room, we do not want to have to make 2 queries just to also be able to display the hotel's name. The WHERE
clause on the first query above uses the expression data ->> 'Id'
; this extracts a field from a JSON
or JSONB
field as TEXT
(there is also a ->
operator, which extracts the field as JSONB
). Since this is the value our unique index indexes, and we are using a relational database, we can write an efficient JOIN between these two tables.
SELECT r.data, h.data AS hotel_data FROM room r INNER JOIN hotel h ON h.data ->> 'Id' = r.data ->> 'HotelId' WHERE r.data ->> 'Id' = @id
(This syntax would work without the unique index; it would default to using the GIN index (Full
or Optimized
), if it exists, but it wouldn't be quite as efficient as a zero-or-one unique index lookup.)
Using “Building Blocks”
For every method or function described in Basic Usage, the Query
module / static class contains the query for that operation. Each Query
method or function takes the table name as it (usually) lone parameter. All ById
queries declare an @id
parameter; all ByContains
queries declare a @criteria
parameter; and all ByJsonPath
queries declare a @path
parameter. It also contains SelectFromTable
, WhereById
, WhereDataContains
, and WhereJsonPathMatches
functions to generate those portions of a SQL statement.
There are also two functions that help translate queries into domain objects. FromData<T>
deserializes a data
column into the requested type, while FromDocument<T>
does the same, but allows the column to be named as well. We'll see how we can use these in further examples.
Finally, there are two functions to assist with parameter creation. JsonbDocParam
will create a JSONB parameter, but the user is responsible for providing the name. DocParameters
creates two parameters, one for the document's ID as @id
, and the other for the document itself as @data
.
Executing Queries in F#
But, C# users, please read this part, as we'll build on it in the next section.
As this library uses Npgsql.FSharp
to implement its data access, harnessing it to run our custom queries works. (Ensure open Npgsql.FSharp
is present.) The query is passed as a string, the parameters are passed as a list of tuples, and the function passed to the execution function is responsible for extracting the desired data from the query.
/// Inferred types - roomId = string, return = Task<Room * Hotel> let findRoomById roomId = backgroundTask { Configuration.dataSource () |> Sql.fromDataSource |> Sql.query $""" SELECT r.data, h.data AS hotel_data FROM room r INNER JOIN hotel h ON h.data ->> '{Configuration.idField ()}' = r.data ->> 'HotelId' WHERE r.{Query.whereById "@id"}""" |> Sql.parameters [ "@id", Sql.string roomId ] |> Sql.executeRowAsync (fun row -> fromData<Room> row, fromDocument<Hotel> "hotel_data" row) }
The query above demonstrates using Configuration.idField
and whereById
to use the configured ID field. Creating custom queries using these building blocks allows us to utilize the configured value without hard-coding it throughout our custom queries. If the configuration changes, these queries will pick up the new field name seamlessly.
While this example retrieves the entire document, this is not required. If we only care about the name of the associated hotel, we could amend the query to retrieve only that information.
/// Inferred types - roomId = string, return = Task<Room * string> let findRoomByIdImproved roomId = backgroundTask { Configuration.dataSource () |> Sql.fromDataSource |> Sql.query $""" SELECT r.data, h.data ->> 'Name' AS hotel_name FROM room r INNER JOIN hotel h ON h.data ->> '{Configuration.idField ()}' = r.data ->> 'HotelId' WHERE r.{Query.whereById "@id"}""" |> Sql.parameters [ "@id", Sql.string roomId ] |> Sql.executeRowAsync (fun row -> fromData<Room> row, row.string "hotel_name") }
This query is amazingly efficient, using 2 unique index lookups to return this data. Even though we do not have a foreign key between these two tables, simply being in a relational database allows us to retrieve this related data.
Executing Queries in C#
(jump to the solution, which applies to F# as well)
The Challenge
In the above examples, all of the Sql.
calls are function calls. The result of each one is piped to the next one, where it is provided as the final parameter. This is a process called “currying,” the concept being that each function only has one parameter. For functions that have multiple parameters, they can be thought of as functions that return another function with one fewer parameter required; the final function in the chain returns the value instead of another function. In C#, instead of…
/// Type is Func<string, string, string, string> public void UnnecessaryConcatenation(string value1, string value2, string value3) => value1.Concat(value2).Concat(value3);
...it would be defined as…
/// Type is Func<string, Func<string, Func<string, string>>> public void StillUnnecessaryConcatenation(string value1) => (string value2) => (string value3) => value1.Concat(value2).Concat(value3);
Currying goes hand-in-hand with the concept of partial application. In the F# examples above, Sql.query
has two parameters - string
and SqlProps
- but we call it just with the string
(partial application), resulting in a function SqlProps -> SqlProps
(Func<SqlProps, SqlProps>
in traditional .NET parlance). The pipe operator |>
takes the output from the previous function (the SqlProps
that define the connection), and returns a new SqlProps
that has the query definition as well. This pattern continues through the pipeline, until the final call returns the domain type rather than more SqlProps
.
This is a bit different from “fluent” functions, which use method chaining to handle several calls in one chain. Fluent approaches like LINQ return an object that has both state (like SqlProps
does), and also has behavior to apply future changes to that object.
// We cannot do this; SqlProps has no methods Sql.FromDataSource(Configuration.DataSource()) .Query("the query text") .Parameters( /* the parameters */ ) .ExecuteAsync(FromData<Room>);
Since the behavior (Sql.
functions) is defined separately from the state (SqlProps
), state is preserved through pipelining rather than method chaining. People have written books about the differences this brings - our purpose here is documentation, after all, not education - but for our purposes, that is the main reason we cannot “just use a method chain instead.”
The Solution
Since we cannot use the pipeline operator, this library provides a few methods that bridge this gap. The Document
class has a Custom
class, with methods that can be called with queries and parameters, based on the expected return type. These methods also utilize the configured data source and serializer, so they may be preferable for F# users as well; there are F# functions available as well.
Method | When to Use It |
---|---|
Custom.Single |
Retrieving 0 or 1 result; return value may be null |
Custom.List |
Retrieving a list of matching documents |
Custom.NonQuery |
Executing a query that will not return results |
Using the second query from the F# examples above, we'll use Custom.Single
to retrieve a Tuple<Room, string>
.
// C# public async Task<Tuple<Room, string>?> FindRoomById(string roomId) => await Document.Custom.Single( $"SELECT r.data, h.data ->> 'Name' AS hotel_name FROM room r INNER JOIN hotel h ON h.data ->> '{Configuration.IdField()}' = r.data ->> 'HotelId' WHERE r.{(Query.WhereById("@id"))}", new { Tuple.Create("@id", Sql.string(roomId)) }, row => Tuple.Create(Document.FromData<Room>(row), row.string("hotel_name")));
If we wanted to retrieve all rooms for any hotel in Los Angeles, we can combine Query
class calls, joins, and the Custom
module to create that query:
// C# public async Task<List<Room>> FindRoomsInLA() => await Document.Custom.List( $"{(Query.SelectFromTable("room"))} r INNER JOIN hotel h ON h.data ->> '{Configuration.IdField()}' = r.data ->> 'HotelId' WHERE h.{(Query.WhereDataContains("@criteria"))}", new { Tuple.Create("@criteria", Query.JsonbDocParam(new { City = "Los Angeles" }) }, Document.FromData<Room>);
A few notes on the above:
SelectFromTable
,WhereById
, andWhereDataContains
require a string parameter; wrapping these statements in parentheses lets us use them in an interpolated string (even if the above code confuses this site's syntax highlighter).- The table aliases for
room
andhotel
are placed where they apply to the query portions being used to construct the query. - As we are using the standard select statement, we can pass
Document.FromData
as the function to deserialize the results.
F# can use a similar technique, but with triple-quoted strings to allow us to interpolate a function call that also requires quotes.
// F# // implied return type - Task<Room list> let findRoomsInLA () = Custom.list $"""{Query.selectFromTable "room"} r INNER JOIN hotel h ON h.data ->> '{Configuration.idField ()}' = r.data ->> 'HotelId' WHERE h.{Query.whereDataContains "@criteria"}""" [ "@criteria", Query.jsonbDocParam {| City = "Los Angeles" |} ] fromData<Room>
Custom Scalar Queries
The Count
and Exists
modules and static classes represent a form of scalar query; they return a single value. If the need arises to customize these queries, Custom.Scalar
provides the ability to perform these, using the data source configured for the library. Continuing our example, above, if we just wanted the count of rooms in LA hotels, we could obtain that via:
// C# public async Task<int> CountRoomsInLA() => await Document.Custom.Scalar( $"SELECT COUNT(*) AS the_count FROM room r INNER JOIN hotel h ON h.data ->> '{Configuration.IdField()}' = r.data ->> 'HotelId' WHERE h.{(Query.WhereDataContains("@criteria"))}", new { Tuple.Create("@criteria", Query.JsonbDocParam(new { City = "Los Angeles" }) }, row => row.int("the_count"));
Using This Library for Non-Document Queries
Some of the methods and functions provided by the library can be used with non-document tables as well, particularly the Custom
functions. This may be a convenient and consistent way to access your data, while delegating connection management to the library and its configured data source.
Let's walk through a short example using C#; F# uses very similar patterns.
// C# using Npgsql.FSharp; // Needed for RowReader and Sql types // Stores metadata for a given user public class MetaData { public string Id { get; set; } = ""; public string UserId { get; set; } = ""; public string Key { get; set; } = ""; public string Value { get; set; } = ""; } // Static class to hold mapping functions public static class Map { // These parameters are the column names from the underlying table public MetaData ToMetaData(RowReader row) => new MetaData { Id = row.string("id"), UserId = row.string("user_id"), Key = row.string("key"), Value = row.string("value") }; } // somewhere in a class, retrieving data public Task<List<MetaData>> MetaDataForUser(string userId) => Document.Custom.List("SELECT * FROM user_metadata WHERE user_id = @userId", new { Tuple.Create("@userId", Sql.string(userId)) }, Map.ToMetaData);