Skip to content

Chunk Observations

Sam Crawford edited this page Jan 17, 2023 · 8 revisions

Migrated from #3196.


What is a Chunk?

There are many types of 'fundamental knowledge' in Drasil, including (but not limited to):

  • name
  • abbreviation
  • domains
  • term
  • symbol
  • space / type
  • constraints
  • reasonable value
  • unit
  • uncertainty
  • definition
  • notes
  • defining expression
  • ...

(Are these really "fundamental"? We don't have a good answer to that, but it has been sufficient so far.)

If these fundamental pieces of knowledge are atoms, then chunks are molecules (i.e. collections of atoms). Like molecules, some can arise and some cannot; this means that there is "order" in how things assemble. The molecules that interest us are the ones that end up getting defined. This process was quite ad hoc; when we encountered a bunch of facts about a thing we were interested in that occurred in practice a bunch of times, we named it.

The classes that arise from that allow you to see particular atoms and sub-molecules that make sense on their own. The underlying theory we should be using is that of Formal Concept Analysis (FCA) which is a "way of deriving a concept hierarchy or formal ontology from a collection of objects and their properties." (Other analytical techniques might also make sense to use.) The properties here would be "has information X in it", with X from the list above. Our chunks are then the nodes of the lattice that occur in practice with classes to help us navigate the lattice.

An understanding of FCA also makes it clear that using Maybe is a hack: a proper concept should have an exact list of attributes that it embodies.

The "Proper" Process for Forming Chunks

  1. Settle on an analysis technique for concepts
  2. List all the attributes we have
  3. Derive the concepts we need (by using co-occurrence in our actual knowledge database)
  4. Give names to these extracted concepts
  5. Create data structures for these concepts
  6. Create accessors for all information

This should really be done from scratch, but Dr. Carette is "quite confident that a lot of what we currently have will stay as is, or with minor modifications."

Some Potential Chunks to be Formed

(Note that this is just a brainstorming list; this knowledge could end up inside a chunk or be tracked in a different way.)

  • The local symbol used to represent a quantity. I think we currently "bake" the symbol into quantity making it difficult to change, but symbols aren't universal; they can change. In some cases, symbols are changed to avoid clashes between conventions when different domains are mixed (for instance sigma is used both for standard deviation, stress and the Stefan Boltzmann constant). In other cases, symbols are changed because of author/community preferences.
  • The unit system. We implicitly (I believe) assume SI for everything, but we will also want to be able to use imperial units.
  • Rationale information. For example, we may want to include rationale for constraints (see #3197). Our "detailed derivations" currently provide rationale information for how we combine theories and assumptions to come up with a new theory.
  • Refinement traceability information. Many theories will depend on other theories for their justification (rationale).
  • Theory pre-conditions. Conditions that will need to be true to invoke a theory. That is you can only use a theory if you can satisfy the pre-conditions. The pre-conditions will be assumptions.
  • Theory post-conditions. The conditions that have to be true once a theory has been invoked.

To bring it full-circle: there is all sorts of knowledge that exists that is well-defined, but doesn't possess an abbreviation. So we can't make abbreviation manditory, as that would undermine our whole system. But when abbreviations exist, they should be used. From a pure programming point-of-view, that screams for Maybe, doesn't it? We've learned that this is not a good solution. It seems that a better solution seems to lie at the "knowledge retrieval" stage, where we can have functions that retrieve abbreviations if they exist, and our code should deal with the fact that abbreviations are not always present.

There are two places where Maybe can be used:

  1. in the data representation,
  2. in what the data accessors return.

So we'd have HasX classy-lenses and MayHaveX classy-lenses. We could have instances of MayHaveX for all sorts of things where we already know there is no X but where asking the question isn't silly. We do need to be careful to not implement MayHaveX where the question should not be asked.

From the point of view of our usage, lenses are just polymorphic getters. We want to be able to "get X" from some representation without caring how X is embedded in the data we've been handed, as long as we're promised that X is in there somewhere.

Reconstructing our thinking from ~6-7 (!!!) years ago, we noticed that many important 'concepts' (where I use the term informally) had a tell-tale sign that they were more important than others: they came with an abbreviation. This was, of course, purely an observation on the sample that we had. Though it does still seem to hold. Where we seemed to have made an error was to enshrine this in our data representation.

Taking a step back, it does seem odd to enforce the existence of an abbreviation. An abbreviation really is something that may exist. Therefore, the distinction between NamedChunks and IdeaDicts might not be useful.

We really do need to go back to the blackboard (perhaps even literally!) and revisit all our chunks (their contents, their name, their intent, their constructors). An in-person design meeting is likely needed.

Beyond this, if we decide that attaching a domain at the Idea level is a code smell, then I propose we merge the two chunks. Otherwise, I think that keeping them separate makes sense: both would have a Maybe String for an abbreviation, and CI would also contain a list of domains (we could even make this more explicit by having a CI be an IdeaDict and a list of domains).

Clone this wiki locally