Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hierarchal Batch Table in 3D Tiles Next? #558

Open
pjcozzi opened this issue Nov 14, 2021 · 2 comments
Open

Hierarchal Batch Table in 3D Tiles Next? #558

pjcozzi opened this issue Nov 14, 2021 · 2 comments

Comments

@pjcozzi
Copy link
Contributor

pjcozzi commented Nov 14, 2021

In 3D Tiles 1.0 3DTILES_batch_table_hierarchy enables storage-efficient representations of hierarchal relationships such as

  • a door is in a room is on a floor is in a building.

This is used for tiling CityGML in Cesium ion, and for Cesium OSM Buildings. CC @kring

AFAIK this is not currently possible in 3D Tiles Next; instead, duplicate metadata needs to be stored. I think it is OK to solidify the current specs, #556, before starting anything new, but can we:

  1. Communicate to the developer community why 3D Tiles Next doesn't support this
  2. Add a comment here on how 3D Tiles Next may approach hierarchal metadata representation
@ptrgags
Copy link
Contributor

ptrgags commented Nov 17, 2021

I found some detailed notes I had about allowing a hierarchy of classes. I had a few ideas, though at the time it was low priority so never came up for discussion.

NOTE: examples below were based on the older 3DTILES_feature_metadata schema.

NOTE: Towards the end I have some diagrams in the section on Space Complexity

Option 1: Inherit Columns

{
  "classes": {
    "building": {
      "properties": {
        "estimatedHeight": {
          "type": "STRING"
        },
        "longitude": {
          "type": "FLOAT64"
        },
        "latitude": {
          "type": "FLOAT64"
        }
      }
    },
    "buildingPart": {
      // point to parent class
      "parent": "building",
      "properties": {
        "elementType": {
          "type": "STRING"
        },
        "elementID": {
          "type": "FLOAT64"
        },
        // Let's make a name collision with
        // geometry.color with different types. I'm proposing that
        // the subclass overrides this. Other options would be to
        // require that the types match, or just disallow collisions.
        "color": {
          "type": "ARRAY",
          "componentType": "UINT8",
          "componentCount": 4
        }
      }
    },
    "geometry": {
      // point to parent class
      "parent": "buildingPart",
      "properties": {
        "color": {
          "type": "STRING"
        } 
      }
    }
  },
  "instanceTables": {
    "buildingTable": {
      "class": "building",
      "properties": {
        "estimatedHeight": {
          "bufferView": 1,
          "offsetBufferViews": [2]
        },
        "longitude": {"bufferView": 3},
        "latitude": {"bufferView": 4}
      }
    },
    "buildingPartTable": {
      "class": "buildingPart",
      "properties": {
        // Separate copy of columns from parents
        "estimatedHeight": {
          "bufferView": 1,
          "offsetBufferViews": [2]
        },
        "longitude": {"bufferView": 3},
        "latitude": {"bufferView": 4},
        // plus subclass-only columns
        "elementID": {
          "bufferView": 5,
          "offsetBufferViews": [6]
        },
        // name collision example.
        // This is the parent table, so this has type ARRAY[UINT8, 4]
        "color": {"bufferView": 7}
      }
    },
    "geometryTable": {
      // Separate copy of columns from parents
      "estimatedHeight": {
        "bufferView": 1,
        "offsetBufferViews": [2]
      },
      "longitude": {"bufferView": 3},
      "latitude": {"bufferView": 4},
      "elementID": {
        "bufferView": 5,
        "offsetBufferViews": [6]
      },
      // plus subclass-only columns
      // name collision example. This is the subclass, so it
      // shadows the parent variable. Thus this has type STRING
      "color": {
        "bufferView": 7,
        "offsetBufferViews": 8
      }
    }
  }
}

Pros:

  • no extra auxillary buffers needed

Cons:

  • more bufferviews to keep track of

Example

buildingTable:

instanceId estimatedHeight longitude latitude
0 100.0 -75.164031 39.953264
1 400.0 -74.013334 40.712270

buildingPartTable:

Notice that we inherit columns from the parent, adding elementType, elementId and color.
The values are independent of the parent though.

instanceId estimatedHeight longitude latitude elementType elementId color
0 200.0 -122.477886 37.813934 "elementType1" 0 [255, 0, 0]
1 200.0 -122.479090 37.825700 "elementType2" 1 [255, 0, 0]

geometryTable:

Notice that there was a name collision. Here I suggest that the subclass' property shadows the parent. See notes about name collisions below.

instanceId estimatedHeight longitude latitude elementType elementId color
0 180.0 38.624548 -90.184862 "elementType1" 0 "GREY"
1 50.0 38.624548 -90.184862 "elementType2" 1 "GREY"

Resolving Name Collisions:

  • Option 1.1: Subclass properties shadow their parents. This is like the example above, where buildingPart.color: ARRAY[UINT8, 4] is shadowed by geometry.color: STRING. This could be problematic though in cases where geometry needs to be used as a buildingPart
  • Option 1.2: like 1.1, but require that the types match (e.g. the above example would be invalid)
  • Option 1.3: Disallow name conflicts between parent and child.

Option 2: Include a parentIds column

In this option, each child table includes an extra property array called parentId which stores instanceIds that index into the parent table.

{
  "classes": {
    "building": {
      "properties": {
        "estimatedHeight": {
          "type": "STRING"
        },
        "longitude": {
          "type": "FLOAT64"
        },
        "latitude": {
          "type": "FLOAT64"
        }
      }
    },
    "buildingPart": {
      // point to parent class
      "parent": "building",
      "properties": {
        "elementType": {
          "type": "STRING"
        },
        "elementID": {
          "type": "FLOAT64"
        },
        // Let's make a name collision with
        // geometry.color with different types. Unlike the 
        // "inherit columns" approach, we can store values at both
        // the parent and child scope. However, then it becomes a question
        // of how to access each version of color from the API.
        // we could always make it explicit: buildingPart.color vs geometry.color.
        // Or we could disallow name conflicts.
        "color": {
          "type": "ARRAY",
          "componentType": "UINT8",
          "componentCount": 4
        }
      }
    },
    "geometry": {
      // point to parent class
      "parent": "buildingPart",
      "properties": {
        "color": {
          "type": "STRING"
        } 
      }
    }
  },
  "instanceTables": {
    "buildingTable": {
      "class": "building",
      "properties": {
        // These bufferViews must store values for both the parents
        // and _all children classes_ that inherit from building
        // so we have instances for building, buildingPart, and geometry
        "estimatedHeight": {
          "bufferView": 1,
          "offsetBufferViews": [2]
        },
        "longitude": {"bufferView": 3},
        "latitude": {"bufferView": 4}
      }
    },
    "buildingPartTable": {
      "class": "buildingPart",
      // parentIds is like a UINT (what size?) property
      // that is an instance ID in the _parent_ table.
      // in this case, it indexes into buildings
      "parentIds": {
        "bufferView": 4
      },
      "properties": {
        // In this case, we have instances from buildingPart and geometry
        "elementID": {
          "bufferView": 5,
          "offsetBufferViews": [6]
        },
        // name collision example. This property array can be defined
        // separately from geometryTable.properties.color. See note
        // in the class definition.
        "color": {"bufferView": 7}
      }
    },
    "geometryTable": {
      "class": "geometry",
      "parentIds": {
        "bufferView": 8
      },
      "properties": {
        "color": {
          "bufferView": 7,
          "offsetBufferViews": 8
        }
      }
    }
  }
}

Pros:

  • Data for each table is stored by the relevant class
  • Fewer bufferView declarations
  • Naming collisions allow parent and class version of variable, even with different types)

Cons:

  • Need additional parentId property arrays
  • since parentId is essentially a pointer, need to be careful about dangling references

Example

buildingTable:

Notice that the parent table now has many more entries for all the subclass instances

instanceId estimatedHeight longitude latitude
(building instances)
0 100.0 -75.164031 39.953264
1 400.0 -74.013334 40.712270
(buildingPart instances)
2 200.0 -122.477886 37.813934
3 200.0 -122.479090 37.825700
(geometry instances)
4 180.0 38.624548 -90.184862
5 50.0 38.624548 -90.184862

buildingPartTable:

Notice the new column parentId, and again, we have additional rows from the geometry subclass. Furthermore, due to the name collision, there are two different "color" properties, one in the parent table and one in the child table. No collisions in the data, but you'd need to allow a way to distinguish them, perhaps with a fully qualified name (e.g. geometry.color)

instanceId elementType elementId color parentId
(buildingPart instances)
0 "elementType1" 0 [255, 0, 0] 2
1 "elementType2" 1 [255, 0, 0] 3
(geometry instances)
2 "elementType1" 0 [128, 128, 128] 4
3 "elementType2" 1 [128, 128, 128] 5

geometryTable:

instanceId color parentId
0 "GREY" 2
1 "GREY" 3

Resolving Name Collisions

Option 2.1: Consider each version of the name as a separate variable, identified by a fully-qualified name like buildingPart.color vs. geometry.color to resolve conflicts.
Option 2.2: Disallow naming conflicts between parent and child class.

Option 3: Virtual parentId

Option 2 can be tweaked to remove the overhead of storing pointers to parents. However, there's a caveat about generalizing to trees of classes.

If we enforced a rule that the parent table is listed in order from base class down the inheritance chain, then we can (at runtime) compute the offsets:

classOffsets = {A: 0, B: hA, C: hA + hB]

Then parentInstanceId = classOffsets[parentClass] + child.instanceId.
Thus no pointers need to be stored!

EDIT: One catch: supporting inheritance trees may be slightly trickier. See the section on "Inheritance Trees" below.

{
  "classes": {
    "building": {
      "properties": {
        "estimatedHeight": {
          "type": "STRING"
        },
        "longitude": {
          "type": "FLOAT64"
        },
        "latitude": {
          "type": "FLOAT64"
        }
      }
    },
    "buildingPart": {
      // point to parent class
      "parent": "building",
      "properties": {
        "elementType": {
          "type": "STRING"
        },
        "elementID": {
          "type": "FLOAT64"
        },
        // Let's make a name collision with
        // geometry.color with different types. Unlike the 
        // "inherit columns" approach, we can store values at both
        // the parent and child scope. However, then it becomes a question
        // of how to access each version of color from the API.
        // we could always make it explicit: buildingPart.color vs geometry.color.
        // Or we could disallow name conflicts.
        "color": {
          "type": "ARRAY",
          "componentType": "UINT8",
          "componentCount": 4
        }
      }
    },
    "geometry": {
      // point to parent class
      "parent": "buildingPart",
      "properties": {
        "color": {
          "type": "STRING"
        } 
      }
    }
  },
  "instanceTables": {
    "buildingTable": {
      "class": "building",
      "properties": {
        // These bufferViews must store values for both the parents
        // and _all children classes_ that inherit from building
        // so we have instances for building, buildingPart, and geometry
        "estimatedHeight": {
          "bufferView": 1,
          "offsetBufferViews": [2]
        },
        "longitude": {"bufferView": 3},
        "latitude": {"bufferView": 4}
      }
    },
    "buildingPartTable": {
      "class": "buildingPart",
      // No parentIds needed! 
      "properties": {
        "elementID": {
          "bufferView": 5,
          "offsetBufferViews": [6]
        },
        "color": {"bufferView": 7}
      }
    },
    "geometryTable": {
      "class": "geometry",
      // No parentIds needed!
      "properties": {
        "color": {
          "bufferView": 7,
          "offsetBufferViews": 8
        }
      }
    }
  }
}

Pros:

  • No need to store pointers in the file
  • strict storage layout could be a good thing, only one correct way to store data.
  • Since the pointers are implicit, validation is easier (just check that the length of the parent array is correct). No worries about dangling pointers!

Cons:

  • Small amount of runtime overhead to compute offsets at each access
  • This method could be more tricky to explain clearly.

Inheritance Trees

This method has one caveat: It was derived using an example with a single inheritance chain. What changes when the user specifies an inheritance tree?

  1. How do we define the ordering of classes for storage in the parent class table? A preorder of the tree would be fine, except siblings are unordered... we could require an explicit listing of subclasses of the direct children. Here's one idea, though it feels a bit weird:
"classes": {
  "parent": {
     // ordering the parents' direct children
     childrenOrder: ["childA", "childB"]
  },
  "childA": {
    "parent": "parent"
  },
  "childB": {
    "parent": "parent"
  }
}
  1. how does calculating the classOffsets change? It would be similar to before; a cumulative sum of instance counts, but you'd need to do this on a preordered list of classes.

Comparison with Batch Table Hierarchy

The 3D Tiles Batch Table Hierarchy spec works a little bit differently in the following way:

Batch Table Hierarchy:

  • All instances of all classes are assumed to be listed in one big array, so each has a unique batchId.
  • Each class has a unique id, numbered in order that they appear in the instance table
  • up to 3 auxillary arrays of length numberOfInstances are needed to mention parents:
    1. classIds: each instance explicitly labels a class
    2. parentCounts: how many levels up the inheritance tree are valid for this instance. This enables partial inheritance chains (why??)
    3. parentIds: the batch Id of the parent instance. Since this indexes into the global array of indices, this allows some corner cases like an instance with a parent in the same class (behavior of this explicitly undefined in the spec)

3D Tiles Next:

  • Each class has a different set of instanceIds
  • Classes are stored as a dictionary, not an array, so they don't have a defined order
  • If we make the following assumptions, you can get rid of the classIds and parentCounts buffer, and make other simplifications:
    • inheritance is only defined between classes, not between arbitrary instances
    • A subclass must inherit from all its ancestors, no partial inheritance chains
  • if columns are inherited, then no parentIds buffer is needed.

Space Complexity Analysis

Let's consider a case like above where we have class C extends B extends A. I want to compute the number of cells in all three tables and any auxillary buffers needed.

Note: In the analysis below, I assume that there are no name collisions for simplicity.

To do this, let's define some helper variables:

Variable Description
wA "width" of instance table A, i.e. the number of properties (columns) unique to class A
hA "height" of instance table A, i.e. the number of instances (rows) unique to class A
wB "width" of instance table B
hB "height" of instance table B
wC "width" of instance table C
hC "height" of instance table C

Option 1: Inherit columns

IMG_20210112_101347081

Overhead needed: 0 (we only store the values we need, no pointers)
bufferViews needed:

  • class A: wA
  • class B: wA + wB (need a separate bufferView for each parent column. This is not shown in the diagram)
  • class C: wA + wB + wC
  • total: 3wA + 2wB + wC

Option 2: parentId

IMG_20210112_101357319

overhead: hB + 2hC (parentId pointers)
the generalized pattern is 1hB + 2hC + 3hD + ... (n-1)hX where n is the length of the inheritance chain. this grows quadratically with n

Option 3: Virtual parentId

IMG_20210112_101409869

Overhead: 0
bufferVIews needed:

  • class A: wA
  • class B: wB (parent columns are stored in the table for classA, no bufferViews needed)
  • class C: wC

Batch Table Hierarchy for Comparison:

IMG_20210112_101405223

Overhead: 3hA + 2 * 3hB + 3 * 3hC
General pattern: 1 * (3hA) + 2 * (3hB) + 3 * (3hC) + ... + n * (3hX). Again, n stands for the length of the inheritance chain. Again, this is quadratic in n, but has strictly greater storage requirements than Option 2 because even the base class stores parent pointers to itself, plus there are 3 arrays rather than just a single pointer.

@lilleyse
Copy link
Contributor

lilleyse commented Mar 1, 2022

I'm reading this again after having opened #641. It seems like the most important decision is this one.

inheritance is only defined between classes, not between arbitrary instances

I'm not sure if class inheritance alone provides enough composability, especially for city and BIM/CAD use cases where you might have a "door" entity that could belong to many types of parent entities.

One concept I like in option 2 is the separate parentIds column. Originally I thought PARENT_IDS could be a semantic but it doesn't really make sense to include a property like that in the class definition. A separate property next to the property table makes more sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants