Imagine the following data model, in which we have three collections Invoice
, Suburb
and Campaign
. The Invoice
type has a nested object type Addresses
that has foreign keys to the Address
collection. Invoice
also has a nested array of Discount
objects, of which each has a foreign key to the Campaign
collection. So Invoice
can be said to have "nested relationships" to both Address
and Campaign
.
// Collection
type Invoice = {
invoiceId: int // PK
customerId: int
addresses: Addresses
discounts: Discount[]
total: decimal
}
// Nested object in Invoice
type Addresses = {
billingAddressId: int // FK to Address collection
shippingAddressId: int // FK to Address collection
}
// Collection
type Address = {
addressId : int // PK
streetAddress: string
name: string
city: string
postcode: string
country: string
}
// Nested object inside nested array in Invoice
type Discount = {
description: string
percentage: decimal
campaignId: int // FK to Campaign collection
}
// Collection
type Campaign = {
campaignId: int // PK
name: string
}
Such a model would not be unusual in a document database such as MongoDB, or someone using Postgres as a pseudo-document database by using jsonb
columns. If we want MongoDB to be a first-class experience in Hasura, we need to be able to handle nested relationships as well as we handle regular non-nested relationships.
We are currently able to navigate these nested relationships during field selection, like so:
collection: Invoice
query:
fields:
billingAddress:
type: column
column: addresses
fields:
type: object # Nest into the object
fields:
billingAddress:
type: relationship # Navigate the relationship
relationship: BillingAddressToAddress
arguments: {}
query:
fields:
name:
type: column
column: name
discounts:
type: column
column: discounts
fields:
type: array # Nest into the array
fields:
type: object # Nest into the object
fields:
campaign:
type: relationship # Navigate the relationship
relationship: DiscountToCampaign
arguments: {}
query:
fields:
name:
type: column
column: name
arguments: {}
collection_relationships:
BillingAddressToAddress:
column_mapping:
billingAddressId: addressId
relationship_type: object
target_collection: Address
arguments: {}
DiscountToCampaign:
column_mapping:
campaignId: campaignId
relationship_type: object
target_collection: Campaign
arguments: {}
However, there are other contexts where we cannot navigate nested relationships. Specifically:
-
ExistsInCollection::Related
(used in filter predicates to navigate relationships) has no facility to descend into nested fields before navigating the relationship (example usage). -
PathElement
is used to traverse relationships, but it does not have a nested field traversal facility. It is used in:-
ComparisonTarget::Aggregate
- part of filter predicates; where the left hand side of a comparison operation references an aggregate (example usage) -
ComparisonValue::Column
- part of filter predicates; where the right hand side of a comparison operation references a column (this is currently unused in v3-engine, but is intended to be used in model permissions, as it was used for that purpose in v2 permissions) -
OrderByTarget::Column
- when you want to order by a column across an object relationship (example usage) -
OrderByTarget::Aggregate
- when you want to order by an aggregate that happens across a nested object relationship (example usage) -
Dimension::Column
- when selecting a column to group by that occurs across a nested object relationship (example usage) -
GroupOrderByTarget::Aggregate
- when you want order your groups by an aggregate applied to the group (example usage)
-
We are also unable to target a relationship at a collection where the target columns are nested. This is because Relationship.column_mapping
does not allow for the target column to reference anything other than a column directly on the collection type. So if we wanted to follow the Invoice.addresses.billingAddressId -> Address.addressId
object relationship in reverse, from Address (ie Address.addressId -> Invoice.addresses.billingAddressId
), we couldn't.
collection_relationships:
AddressToInvoiceBillingAddress:
column_mapping:
addressId: ["addresses", "billingAddressId"] # This is not supported, we only allow a single column name here
relationship_type: array
target_collection: Invoice
arguments: {}
We also have an issue where nested relationships cannot be described in the schema, because the foreign keys concept has a column mapping that only works as a flat mapping from one collection object type to another and cannot address nested fields.
schema:
collections:
- name: Invoice
type: Invoice
foreign_keys:
BillingAddressToAddress:
column_mapping:
## Can't describe addresses.billingAddress -> addressId
foreign_collection: Address
pub enum ExistsInCollection {
Related {
#[serde(skip_serializing_if = "Option::is_none", default)]
/// Path to a nested field within an object column that must be navigated
/// before the relationship is navigated
field_path: Option<Vec<FieldName>>,
/// The name of the relationship to follow
relationship: RelationshipName,
/// Values to be provided to any collection arguments
arguments: BTreeMap<ArgumentName, RelationshipArgument>,
},
...
}
pub struct PathElement {
#[serde(skip_serializing_if = "Option::is_none", default)]
/// Path to a nested field within an object column that must be navigated
/// before the relationship is navigated
pub field_path: Option<Vec<FieldName>>,
/// The name of the relationship to follow
pub relationship: RelationshipName,
/// Values to be provided to any collection arguments
pub arguments: BTreeMap<ArgumentName, RelationshipArgument>,
/// A predicate expression to apply to the target collection
pub predicate: Option<Box<Expression>>,
}
This will allow navigation of nested fields before navigating through the relationship when using ExistsInCollection::Related
in predicates, and everywhere where PathElement
is used (predicates, ordering, groups, aggregates). It is imporant to note that the field_path
only allows traversal of nested objects and does not allow traversal into nested arrays. This means that no implicit existential quantification is introduced.
Relationships in objects in nested arrays can be traversed in predicates via ExistsInCollection::NestedCollection
and then using ExistsInCollection::Related
.
pub struct Relationship {
/// A mapping between columns on the source collection to columns on the target collection
pub column_mapping: BTreeMap<FieldName, Vec<FieldName>>,
pub relationship_type: RelationshipType,
/// The name of a collection
pub target_collection: CollectionName,
/// Values to be provided to any collection arguments
pub arguments: BTreeMap<ArgumentName, RelationshipArgument>,
}
This would allow addressing a nested column in a target collection by navigating through the field path. This would be a breaking change in NDC Spec v0.2.0, but a relatively trivial one for connectors to adopt as, unless they enable the new capabilities, they will only receive arrays of one FieldName
here.
BREAKING: Move foreign_keys
in the schema from Collections to ObjectTypes and modify ForeignKeyConstraint.column_mapping
pub struct ObjectType {
/// Description of this type
pub description: Option<String>,
/// Fields defined on this object type
pub fields: BTreeMap<FieldName, ObjectField>,
/// Any foreign keys defined for this object type's columns
pub foreign_keys: BTreeMap<String, ForeignKeyConstraint>,
}
pub struct ForeignKeyConstraint {
/// The columns on which you want want to define the foreign key.
pub column_mapping: BTreeMap<FieldName, Vec<FieldName>>,
/// The name of a collection
pub foreign_collection: CollectionName,
}
The actual foreign key columns are really defined on ObjectTypes rather than on Collections. Collections are simply a way to query an array of the ObjectType. So if we move the foreign keys to the ObjectType, we can define FKs that are "nested" by simply defining them on the nested type. This also makes sense if the ObjectType is used in more than one place in the schema; logically, a relationship could be formed from wherever that ObjectType occurs in the schema to the target of the relationship.
The ForeignKeyConstraint.column_mapping
also needs to swap its type for the target column from FieldName
to Vec<FieldName>
to be able to address nested columns in the target Collection.
This is a breaking change in NDC Spec v0.2.0, but should be straightforward to implement. Connectors generally have an object type defined per collection and so they can simply move their foreign keys to the object type from the collection. The column_mapping
change should also be straightforward as, unless they enable the new capabilities, they will only receive arrays of one FieldName
.
Since these new features are significant for connectors to implement, we should gate them behind capabilities. The relationships.nested
capability has been added to indicate that a connector can navigate relationships from inside a nested object. This includes in selection, filtering, ordering, etc. If a connector does not declare this capability, it would not see the new field_path
fields used, and relationship fields would not be requested inside nested field selections.
The relationships.nested.array
capability has also been added, which additionally declares that a connector can support navigating a relationship from inside a nested object inside a nested array (ie. a nested array of objects). This is separated from the basic relationships.nested
capability as it's harder to implement and not all data sources may be able to do so (eg. Clickhouse) (example query).
{
"query": {
"aggregates": {},
"variables": {},
"nested_fields": {
"filter_by": {},
"order_by": {},
"aggregates": {}
},
"exists": {
"nested_collections": {}
}
},
"mutation": {},
"relationships": {
"nested": { // !NEW! Does the connector support navigating a relationship from inside a nested object
"array": {} // !NEW! Does the connector support navigating a relationship from inside a nested object inside a nested array
},
"relation_comparisons": {},
"order_by_aggregate": {}
}
}
These examples demonstrate GraphQL queries that v3-engine
needs to expose and the matching NDC queries that satisfy the GraphQL query, using the proposed spec changes.
This example filters invoices to only include invoices that have a billing address with a suburb of "Southbank".
query {
Invoice(
where: {
addresses: { # Nested object
billingAddress: { # Object relationship
suburb: { _eq: "Southbank" }
}
}
}
) {
invoiceId
}
}
collection: Invoice
query:
fields:
invoiceId:
type: column
column: invoiceId
predicate:
type: exists
in_collection:
type: related
field_path: ["addresses"]
relationship: BillingAddressToAddress
arguments: {}
predicate:
type: binary_comparison_operator
column:
type: column
name: suburb
operator: eq
value:
type: scalar
value: Southbank
arguments: {}
collection_relationships:
BillingAddressToAddress:
column_mapping:
billingAddressId: ["addressId"]
relationship_type: object
target_collection: Address
arguments: {}
This example filters invoices to only include invoices where there is at least one discount that belongs to a campaign named "EOFY 2024".
query {
Invoice(
where: {
discounts: { # Nested array
campaign: { # Object relationship
name: { _eq: "EOFY 2024" }
}
}
}
) {
invoiceId
}
}
collection: Invoice
query:
fields:
invoiceId:
type: column
column: invoiceId
predicate:
type: exists
in_collection:
type: nested_collection
column_name: discounts
predicate:
type: exists
in_collection:
type: related
relationship: DiscountCampaignToCampaign
arguments: {}
predicate:
type: binary_comparison_operator
column:
type: column
name: name
operator: eq
value:
type: scalar
value: EOFY 2024
arguments: {}
collection_relationships:
DiscountCampaignToCampaign:
column_mapping:
campaignId: ["campaignId"]
relationship_type: object
target_collection: Campaign
arguments: {}
This example filters invoices to only include invoices where the billing address has been used as a billing address on more than two invoices.
query {
Invoice(
where: {
addresses: { # Nested object
billingAddressInvoices_aggregate { # Aggregated array relationship
predicate: {
_count: { # Aggregation
_gt: 2
}
}
}
}
}
) {
invoiceId
}
}
collection: Invoice
query:
fields:
invoiceId:
type: column
column: invoiceId
predicate:
type: binary_comparison_operator
column:
type: aggregate
aggregate:
type: star_count
path:
- field_path: ["addresses"]
relationship: BillingAddressToInvoicesBillingAddress
arguments: {}
predicate: null
operator: gt
value:
type: scalar
value: 2
arguments: {}
collection_relationships:
BillingAddressToInvoicesBillingAddress:
column_mapping:
billingAddressId: ["addresses", "billingAddresssId"]
relationship_type: array
target_collection: Invoice
arguments: {}
This example orders invoices by the suburb of their billing address ascending.
query {
Invoice(
order_by: {
addresses: { # Nested object
billingAddress: { # Object relationship
suburb: Asc
}
}
}
) {
invoiceId
}
}
collection: Invoice
query:
fields:
invoiceId:
type: column
column: invoiceId
order_by:
elements:
- target:
type: column
path:
- field_path: ["addresses"]
relationship: BillingAddressToAddress
arguments: {}
predicate: null
name: suburb
order_direction: asc
arguments: {}
collection_relationships:
BillingAddressToAddress:
column_mapping:
billingAddressId: ["addressId"]
relationship_type: object
target_collection: Address
arguments: {}
This example orders invoices by those invoices whose billing address has been used in the most number of invoices first.
query {
Invoice(
order_by: {
addresses: { # Nested object
billingAddressInvoices_aggregate { # Aggregated array relationship
_count: Desc
}
}
}
) {
invoiceId
}
}
collection: Invoice
query:
fields:
invoiceId:
type: column
column: invoiceId
order_by:
elements:
- target:
type: aggregate
aggregate:
type: star_count
path:
- field_path: ["addresses"]
relationship: BillingAddressToInvoicesBillingAddress
arguments: {}
predicate: null
order_direction: desc
arguments: {}
collection_relationships:
BillingAddressToInvoicesBillingAddress:
column_mapping:
billingAddressId: ["addresses", "billingAddresssId"]
relationship_type: array
target_collection: Invoice
arguments: {}
This example groups invoices by their billing address's suburb.
query {
Invoice_groups(
grouping_keys: {
addresses: { # Nested object
billingAddress: { # Object relationship
_scalar_field: suburb
}
}
}
) {
grouping_key {
addresses {
billingAddress { suburb }
}
}
}
}
collection: Invoice
query:
groups:
dimensions:
- type: column
path:
- field_path: ["addresses"]
relationship: BillingAddressToAddress
arguments: {}
predicate: null
column_name: suburb
aggregates: {}
arguments: {}
collection_relationships:
BillingAddressToAddress:
column_mapping:
billingAddressId: ["addressId"]
relationship_type: object
target_collection: Address
arguments: {}
This example groups invoices by their customerId, and then orders the groups by those customers that have invoices shipped to the most different countries first.
query {
Invoice_groups(
grouping_keys: { _scalar_field: customerId } # Group by Invoice customer
order_by: { # Order Customer-Invoice groups by those customers that have shipped to the most different countries first
group_aggregate: {
addresses: { # Nested object
shippingAddress: { # Object relationship
country: { _count_distinct: Desc }
}
}
}
}
) {
grouping_key {
customerId
}
}
}
collection: Invoice
query:
groups:
dimensions:
- type: column
path: []
column_name: customerId
aggregates: {}
order_by:
elements:
- target:
type: aggregate
aggregate:
type: column_count
column: country
distinct: true
path:
- field_path: ["addresses"]
relationship: ShippingAddressToAddress
arguments: {}
predicate: null
order_direction: desc
arguments: {}
collection_relationships:
ShippingAddressToAddress:
column_mapping:
shippingAddressId: ["addressId"]
relationship_type: object
target_collection: Address
arguments: {}