BitBadger.Documents
Advanced Usage: Related Documents / Custom Queries
NOTE: This page is longer than the ideal documentation page. Understanding how to assemble custom queries requires understanding how data is stored, and the list of ways to retrieve information can be… a lot. The hope is that one reading will serve as education, and the lists of options will serve as reference lists that will assist you in crafting your queries.
Overview
Document stores generally have fewer relationships than traditional relational databases, particularly those that arise when data is structured in Third Normal Form; related collections are stored in the document, and ever-increasing surrogate keys (a la sequences and such) do not play well with distributed data. Unless all data is stored in a single document, though, there will still be a natural relation between documents.
Thinking back to our earlier examples, we did not store the collection of rooms in each hotel's document; each room is its own document and contains the ID of the hotel as one of its properties.
// C# public class Hotel { public string Id { get; set; } = ""; // ... more properties } public class Room { public string Id { get; set; } = ""; public string HotelId { get; set; } = ""; // ... more properties }
// F# [<CLIMutable>] type Hotel = { Id: string // ... more fields } [<CLIMutable>] type Room = { Id: string HotelId: string // ... more fields }
The
CLIMutable
attribute is required on record types that are instantiated by the CLR; this attribute generates a zero-parameter constructor.
Document Table SQL in Depth
The library creates tables with a data
column of type JSONB
(PostgreSQL) or TEXT
(SQLite), with a unique index on the configured ID name that serves as the primary key (for these examples, we'll assume it's the default Id
). The indexes created by the library all apply to the data
column. The by-ID query for a hotel would be…
SELECT data FROM hotel WHERE data->>'Id' = @id
...with the ID passed as the @id
parameter.
Using a “building block” method/function
Query.WhereById
will create thedata->>'Id' = @id
criteria using the configured ID name.
Finding all the rooms for a hotel, using our indexes we created earlier, could use a field comparison query…
SELECT data FROM room WHERE data->>'HotelId' = @field
...with @field
being “abc123”; PostgreSQL could also use a JSON containment query…
SELECT data FROM room WHERE data @> @criteria
...with something like new { HotelId = "abc123" }
passed as the matching document in the @criteria
parameter.
So far, so good; but, if we're looking up a room, we do not want to have to make 2 queries just to also be able to display the hotel's name. The WHERE
clause on the first query above uses the expression data->>'Id'
; this extracts a field from a JSON column as TEXT
in PostgreSQL (or “best guess” in SQLite, but usually text). Since this is the value our unique index indexes, and we are using a relational database, we can write an efficient JOIN between these two tables.
SELECT r.data, h.data AS hotel_data FROM room r INNER JOIN hotel h ON h.data->>'Id' = r.data->>'HotelId' WHERE r.data->>'Id' = @id
(This syntax would work without the unique index; for PostgreSQL, it would default to using the GIN index (Full
or Optimized
), if it exists, but it wouldn't be quite as efficient as a zero-or-one unique index lookup. For SQLite, this would result in a full table scan. Both PostgreSQL and SQLite also support a ->
operator, which extracts the field as a JSON value instead of its text.)
Using Building Blocks
Most of the data access methods in both libraries are built up from query fragments and reusable functions; these are exposed for use in building custom queries.
Queries
For every method or function described in Basic Usage, the Query
static class/module contains the building blocks needed to construct query for that operation. Both the parent and implementation namespaces have a Query
module; in C#, you'll need to qualify the implementation module namespace.
In BitBadger.Documents.Query
, you'll find:
- StatementWhere takes a SQL statement and a
WHERE
clause and puts them together on either side of the textWHERE
- Definition contains methods/functions to ensure tables, their keys, and field indexes exist.
- Insert, Save, Count, Find, Update, and Delete are the prefixes of the queries for those actions; they all take a table name and return this query (with no
WHERE
clause) - Exists also requires a
WHERE
clause, due to how the query is constructed because it is inserted as a subquery
Within each implementation's Query
module:
- WhereByFields takes a
FieldMatch
case and a set of fields.Field
has constructor functions for each comparison it supports; these functions generally take a field name and a value, though the latter two do not require a value.- Equal uses
=
to create an equality comparison - Greater uses
>
to create a greater-than comparison - GreaterOrEqual uses
>=
to create a greater-than-or-equal-to comparison - Less uses
<
to create a less-than comparison - LessOrEqual uses
<=
to create a less-than-or-equal-to comparison - NotEqual uses
<>
to create a not-equal comparison - Between uses
BETWEEN
to create a range comparison - In uses
IN
to create an equality comparison within a set of given values - InArray uses
?|
in PostgreSQL, and a combination ofEXISTS
/json_each
/IN
in SQLite, to create an equality comparison within a given set of values against an array in a JSON document - Exists uses
IS NOT NULL
to create an existence comparison - NotExists uses
IS NULL
to create a non-existence comparison; fields are considered null if they are either not part of the document, or if they are part of the document but explicitly set tonull
- Equal uses
- WhereById takes a parameter name and generates a field
Equal
comparison against the configured ID field. - Patch and RemoveFields use each implementation's unique syntax for partial updates and field removals.
- ByFields, ByContains (PostgreSQL), and ByJsonPath (PostgreSQL) are functions that take a statement and the criteria, and construct a query to fit that criteria. For
ByFields
, each field parameter will use its specified name if provided (an incrementingfield[n]
if not).ByContains
uses@criteria
as its parameter name, which can be any object.ByJsonPath
uses@path
, which should be astring
.
That's a lot of reading! Some examples a bit below will help this make sense.
Parameters
Traditional ADO.NET data access involves creating a connection object, then adding parameters to that object. This library follows a more declarative style, where parameters are passed via IEnumerable
collections. To assist with creating these collections, each implementation has some helper functions. For C#, these calls will need to be prefixed with Parameters
; for F#, this module is auto-opened. This is one area where names differ in other than just casing, so both will be listed.
- Parameters.Id / idParam generate an
@id
parameter with the numeric,string
, orToString()
ed value of the ID passed. - Parameters.Json / jsonParam generate a user-provided-named JSON-formatted parameter for the value passed (this can be used for PostgreSQL's JSON containment queries as well)
- Parameters.AddFields / addFieldParams append field parameters to the given parameter list
- Parameters.FieldNames / fieldNameParams create parameters for the list of field names to be removed; for PostgreSQL, this returns a single parameter, while SQLite returns a list of parameters
- Parameters.None / noParams is an empty set of parameters, and can be cleaner and convey intent better than something like
new[] { }
(For C# 12 or later, the collection expression[]
is much terser.)
If you need a parameter beyond these, both NpgsqlParameter
and SqliteParameter
have a name-and-value constructor; that isn't many more keystrokes.
Results
The Results
module is implementation specific. Both libraries provide Results.FromData<T>
, which deserializes a data
column into the requested type; and FromDocument<T>
, which does the same thing, but allows the column to be named as well. We'll see how we can use these in further examples. As with parameters, C# users need to qualify the class name, but the module is auto-opened for F#.
Putting It All Together
The Custom static class/module has four methods/functions:
- List requires a query, parameters, and a mapping function, and returns a list of documents.
- Single requires a query, parameters, and a mapping function, and returns one or no documents (C#
TDoc?
, F#'TDoc option
) - Scalar requires a query, parameters, and a mapping function, and returns a scalar value (non-nullable; used for counts, existence, etc.)
- NonQuery requires a query and parameters and has no return value
Within each library, every other call is written in terms of
Custom.List
,Custom.Scalar
, orCustom.NonQuery
; your custom queries will use the same path the provided ones do!
Let's jump in with an example. When we query for a room, let's say that we also want to retrieve its hotel information as well. We saw the query above, but here is how we can implement it using a custom query.
// C#, All // return type is Tuple<Room, Hotel>? var data = await Custom.Single( $"SELECT r.data, h.data AS hotel_data FROM room r INNER JOIN hotel h ON h.data->>'{Configuration.IdField()}' = r.data->>'HotelId' WHERE r.{Query.WhereById("@id")}", new[] { Parameters.Id("my-room-key") }, // rdr's type will be RowReader for PostgreSQL, SqliteDataReader for SQLite rdr => Tuple.Create(Results.FromData<Room>(rdr), Results.FromDocument<Hotel>("hotel_data", rdr)); if (data is not null) { var (room, hotel) = data; // do stuff with the room and hotel data }
// F#, All // return type is (Room * Hotel) option let! data = Custom.single $"""SELECT r.data, h.data AS hotel_data FROM room r INNER JOIN hotel h ON h.data->>'{Configuration.idField ()}' = r.data->>'HotelId' WHERE r.{Query.whereById "@id"}""" [ idParam "my-room-key" ] // rdr's type will be RowReader for PostgreSQL, SqliteDataReader for SQLite fun rdr -> (fromData<Room> rdr), (fromDocument<Hotel> "hotel_data" rdr) match data with | Some (Room room, Hotel hotel) -> // do stuff with room and hotel | None -> ()
These queries use Configuration.IdField
and WhereById
to use the configured ID field. Creating custom queries using these building blocks allows us to utilize the configured value without hard-coding it throughout our custom queries. If the configuration changes, these queries will pick up the new field name seamlessly.
While this example retrieves the entire document, this is not required. If we only care about the name of the associated hotel, we could amend the query to retrieve only that information.
// C#, All // return type is Tuple<Room, string>? var data = await Custom.Single( $"SELECT r.data, h.data ->> 'Name' AS hotel_name FROM room r INNER JOIN hotel h ON h.data->>'{Configuration.IdField()}' = r.data->>'HotelId' WHERE r.{Query.WhereById("@id")}", new[] { Parameters.Id("my-room-key") }, // PostgreSQL row => Tuple.Create(Results.FromData<Room>(row), row.string("hotel_name"))); // SQLite; could use rdr.GetString(rdr.GetOrdinal("hotel_name")) below as well // rdr => Tuple.Create(Results.FromData<Room>(rdr), rdr.GetString(1))); if (data is not null) { var (room, hotelName) = data; // do stuff with the room and hotel name }
// F#, All // return type is (Room * string) option let! data = Custom.single $"""SELECT r.data, h.data->>'Name' AS hotel_name FROM room r INNER JOIN hotel h ON h.data->>'{Configuration.idField ()}' = r.data->>'HotelId' WHERE r.{Query.whereById "@id"}""" [ idParam "my-room-key" ] // PostgreSQL fun row -> (fromData<Room> row), row.string "hotel_name" // SQLite; could use rdr.GetString(rdr.GetOrdinal("hotel_name")) below as well // fun rdr -> (fromData<Room> rdr), rdr.GetString(1) match data with | Some (Room room, string hotelName) -> // do stuff with room and hotel name | None -> ()
These queries are amazingly efficient, using 2 unique index lookups to return this data. Even though we do not have a foreign key between these two tables, simply being in a relational database allows us to retrieve this related data.
Revisiting our “take these rooms out of service” SQLite query from the Basic Usage page, here's how that could look using building blocks available since version 4 (PostgreSQL will accept this query syntax as well, though the parameter types would be different):
// C#, SQLite var fields = [Field.GreaterOrEqual("RoomNumber", 221), Field.LessOrEqual("RoomNumber", 240)]; await Custom.NonQuery( Sqlite.Query.ByFields(Sqlite.Query.Patch("room"), FieldMatch.All, fields, new { InService = false }), Parameters.AddFields(fields, []));
// F#, SQLite let fields = [ Field.GreaterOrEqual "RoomNumber" 221; Field.LessOrEqual "RoomNumber" 240 ] do! Custom.nonQuery (Query.byFields (Query.patch "room") All fields {| InService = false |}) (addFieldParams fields []))
This uses two field comparisons to incorporate the room number range instead of a BETWEEN
clause; we would definitely want to have that field indexed if this was going to be a regular query or our data was going to grow beyond a trivial size.
You may be thinking “wait - what's the difference between that an the regular Patch
call?” And you'd be right; that is exactly what Patch.ByFields
does. Between
is also a better comparison for this, and either FieldMatch
type will work, as we're only passing one field. No building blocks required!
// C#, All await Patch.ByFields("room", FieldMatch.Any, [Field.Between("RoomNumber", 221, 240)], new { InService = false });
// F#, All do! Patch.byFields "room" Any [ Field.Between "RoomNumber 221 240 ] {| InService = false |}
Going Even Further
Updating Data in Place
One drawback to document databases is the inability to update values in place; however, with a bit of creativity, we can do a lot more than we initially think. For a single field, SQLite has a json_set
function that takes an existing JSON field, a field name, and a value to which it should be set. This allows us to do single-field updates in the database. If we wanted to raise our rates 10% for every room, we could use this query:
-- SQLite UPDATE room SET data = json_set(data, 'Rate', data ->> 'Rate' * 1.1)
If we get any more complex, though, Common Table Expressions (CTEs) can help us. Perhaps we decided that we only wanted to raise the rates for hotels in New York, Chicago, and Los Angeles, and we wanted to exclude any brand with the word “Value” in its name. A CTE lets us select the source data we need to craft the update, then use that in the UPDATE
's clauses.
-- SQLite WITH to_update AS (SELECT r.data->>'Id' AS room_id, r.data->>'Rate' AS current_rate, r.data AS room_data FROM room r INNER JOIN hotel h ON h.data->>'Id' = r.data->>'HotelId' WHERE h.data->>'City' IN ('New York', 'Chicago', 'Los Angeles') AND LOWER(h.data->>'Name') NOT LIKE '%value%') UPDATE room SET data = json_set(to_update.room_data, 'Rate', to_update.current_rate * 1.1) WHERE room->>'Id' = to_update.room_id
Both PostgreSQL and SQLite provide JSON patching, where multiple fields (or entire structures) can be changed at once. Let's revisit our rate increase; if we are making the rate more than $500, we'll apply a status of “Premium” to the room. If it is less than that, it should keep its same value.
First up, PostgreSQL:
-- PostgreSQL WITH to_update AS (SELECT r.data->>'Id' AS room_id, (r.data->>'Rate')::decimal AS rate, r.data->>'Status' AS status FROM room r INNER JOIN hotel h ON h.data->>'Id' = r.data->>'HotelId' WHERE h.data->>'City' IN ('New York', 'Chicago', 'Los Angeles') AND LOWER(h.data ->> 'Name') NOT LIKE '%value%') UPDATE room SET data = data || ('{"Rate":' || to_update.rate * 1.1 || '","Status":"' || CASE WHEN to_update.rate * 1.1 > 500 THEN 'Premium' ELSE to_update.status END || '"}') WHERE room->>'Id' = to_update.room_id
In SQLite:
-- SQLite WITH to_update AS (SELECT r.data->>'Id' AS room_id, r.data->>'Rate' AS rate, r.data->>'Status' AS status FROM room r INNER JOIN hotel h ON h.data->>'Id' = r.data->>'HotelId' WHERE h.data->>'City' IN ('New York', 'Chicago', 'Los Angeles') AND LOWER(h.data->>'Name') NOT LIKE '%value%') UPDATE room SET data = json_patch(data, json( '{"Rate":' || to_update.rate * 1.1 || '","Status":"' || CASE WHEN to_update.rate * 1.1 > 500 THEN 'Premium' ELSE to_update.status END || '"}')) WHERE room->>'Id' = to_update.room_id
For PostgreSQL, ->>
always returns text, so we need to cast the rate to a number. In either case, we do not want to use this technique for user-provided data; however, in place, it allowed us to complete all of our scenarios without having to load the documents into our application and manipulate them there.
Updates in place may not need parameters (though it would be easy to foresee a “rate adjustment” feature where the 1.1 adjustment was not hard-coded); in fact, none of the samples in this section used the document libraries at all. These queries can be executed by Custom.NonQuery
, though, providing parameters as required.
Using This Library for Non-Document Queries
The Custom
methods/functions can be used with non-document tables as well. This may be a convenient and consistent way to access your data, while delegating connection management to the library and its configured data source.
Let's walk through a short example using C# and PostgreSQL:
// C#, PostgreSQL using Npgsql.FSharp; // Needed for RowReader and Sql types using static CommonExtensionsAndTypesForNpgsqlFSharp; // Needed for Sql functions // Stores metadata for a given user public class MetaData { public string Id { get; set; } = ""; public string UserId { get; set; } = ""; public string Key { get; set; } = ""; public string Value { get; set; } = ""; } // Static class to hold mapping functions public static class Map { // These parameters are the column names from the underlying table public MetaData ToMetaData(RowReader row) => new MetaData { Id = row.string("id"), UserId = row.string("user_id"), Key = row.string("key"), Value = row.string("value") }; } // somewhere in a class, retrieving data public Task<List<MetaData>> MetaDataForUser(string userId) => Document.Custom.List("SELECT * FROM user_metadata WHERE user_id = @userId", new { Tuple.Create("@userId", Sql.string(userId)) }, Map.ToMetaData);
For F#, the using static
above is not needed; that module is auto-opened when Npgsql.FSharp
is opened. For SQLite in either language, the mapping function uses a SqliteDataReader
object, which implements the standard ADO.NET DataReader
functions of Get[Type](idx)
(and GetOrdinal(name)
for the column index).