Skip to content
drasil-bot edited this page Jun 13, 2024 · 22 revisions

Warning: Wiki should not be edited directly. Edit the files in the ./wiki/ folder instead and make a PR.

As the basis for all information encoding in Drasil, chunks have become an integral part of allowing us to use and maintain the current database of knowledge. At its core, a chunk is a data type specialized in holding a specific type of information for a specific purpose. For example, NamedChunks are often used for objects that have a unique identifier and an associated term. ConceptChunks mirror real-world concepts by including the idea, definition, and domain for a particular concept. Something like a QuantityDict can have an idea, the space in which it exists, units and a symbol. Many other chunks exist within Drasil that allow the program to hold the required information and its meaning so that knowledge may be used in generated models, definitions, and theories.

Structure

Chunks are usually made up of lower-level types with different purposes. A chunk whose purpose is to hold all the information needed for a mathematical variable would need a symbol, description/definition, and units (as shown below). This particular example gives a name to the concept which is built from a quantity and its units. The structure of a chunk can be thought of as a wrapper of sorts. It encases only the necessary information to perform its job, but its contents may be unwrapped and used one at a time. The wrapper itself may be wrapped again with more things added to it (like an abbreviation or a domain). This is primarily how one idea can be built upon in Drasil.

ChunkDiagram

Alternatively, here is a diagram of the 'wrapping' analogy. We first start with an identifier, then build up to an idea with a name, then a concept, so on and so forth:

image

Implementation

So, how do we represent this in code? Conveniently, we can use Haskell's record-type syntax along with lenses to define, set, and get the information we need from within the chunk wrapper. This way, we can wrap wrappers without worrying about the "level" of wrapping around one particular identifier. Using this, one UID can be represented in a hierarchy of chunks, with no information loss when upgrading to a larger chunk. A straightforward example of this is the progression from a lower-levelled NamedChunk to something much larger like a TheoryModel. One of the smallest chunks (NamedChunk) is defined as follows:

data NamedChunk = NC {_uu :: UID, _np :: NP}

It contains a unique identifier (UID) and a term that can be used in creating sentences (as a noun phrase, NP). As of now, we don't know what this NamedChunk is or what it can do, but we do know that it exists and we can use it in a sentence with proper pluralization and capitalization. Most likely, these chunks will be common nouns that are significant enough to have a name. Two NamedChunks may also be combined to produce a new NamedChunk that carries both of their terms. We can start to define single words and simple ideas like table_ and symbol and then combine those to make a tableOfSymbol NamedChunk idea, which is more complex. Using the wrapper analogy, we unwrap the term from table_ and symbol, then rewrap them after placing an "of" between them to get a tableOfSymbol chunk.

A NamedChunk can either be used as a method for getting a defined term or build upon. The "next step" up from a NamedChunk is an IdeaDict, which contains a NamedChunk and maybe an abbreviation. We can see the direct progress in its type definition:

data IdeaDict = IdeaDict { _nc' :: NamedChunk, mabbr :: Maybe String }

As we continue to learn more about what exactly we want this chunk to represent, we can gain more specifics about the idea and directly create a richer type to work with such information. From this point, there are many options available to continue adding information. If the idea should be made into a concept, we can use a ConceptChunk to wrap the idea along with a definition and its domain:

data ConceptChunk = ConDict { _idea :: IdeaDict -- ^ Contains the idea of the concept.
                            , _defn' :: Sentence -- ^ The definition of the concept.
                            , cdom' :: [UID] -- ^ UID of the domain of the concept.
                            }

If we know the concept is a quantity or can be treated as one, it may become a QuantityDict or DefinedQuantityDict:

data DefinedQuantityDict = DQD { _con :: ConceptChunk
                               , _symb :: Stage -> Symbol
                               , _spa :: Space
                               , _unit' :: Maybe UnitDefn
                               }

By continuously wrapping the information needed, we can successfully encode relevant knowledge in a useful and practical manner.

Eventually, we build up relevant chunks through seeing common patterns in examples and actual documentation. We have various high-level chunks dedicated to units (UnitDefn, UnitaryConceptDict, UnitaryChunk, UnitalChunk), relations (RelationConcept), quantities (QuantityDict, DefinedQuantityDict), uncertainties (UncertainChunk, UncertQ), and much more. Our foundation of knowledge is built upon these chunks, and the strong typing of Haskell really emphasizes the semantic meaning that should be associated to each type. As Drasil grows, more and more chunks will be added with different chunk types, thereby allowing our database of knowledge to grow alongside it. For more information on the chunks currently available in Drasil, please see the Haddock documentation.

Documentation of Chunks

This section contains a list of the chunks currently defined in drasil-lang (as of August 3, 2021), along with a short description for each of them.

Chunk Name Description Example
NamedChunk One of the lowest-level chunks. Used for anything worth naming, only contains a term and its UID. A pendulum arm will start out by being named as such, before we can add any values or equations to it.
IdeaDict It is simply a NamedChunk that could have an abbreviation (similar to CI but may not necessarily need an abbreviation and does not have a domain). The project name "Double Pendulum" may have the abbreviation "DblPendulum".
CI A common idea is something that is worth naming, similar to a NamedChunk. However, it also includes an abbreviation and the domains of knowledge in which it appears. The term "Operating System" has the abbreviation "OS" and comes from the domain of computer science.
ConceptChunk Used to make a concept that has a term and definition. It may also be tagged with some domain of knowledge. The concept of "Accuracy" may be defined as the quality or state of being correct or precise.
CommonConcept Similar to a ConceptChunk, but it must have an abbreviation. Not used widely across Drasil. "HGHC" is defined as dcc' "hghc" (cn "HGHC") "HGHC program" "HGHC".
ConceptInstance Used for a concept that can be referred to. Often used in Goal Statements, Assumptions, Requirements, etc. A concept that we would want to reference back to. Something like the assumption that gravity is 9.81 m/s. When we write our equations, we can then link this assumption so that we do not have to derive that assumption to verify our work.
QuantityDict In a similar way to DefinedQuantityDict, this chunk adds a space, symbol and units. However, the information may not necessarily be a concept, but rather anything that is named through an IdeaDict. A pendulum arm does not necessarily have to be defined as a concept before we assign a space (Real numbers), a symbol (l), or units (cm, m, etc.).
NamedArgument This chunk type is a wrapper for a QuantityDict, but used more for generating code and ODEs. Can be used to define inline arguments in generated code.
DefinedQuantityDict For when we want to assign a quantity to a concept. Includes the space, symbol, and units for that quantity. A pendulum arm can be defined as a concept with a symbol (l), space (Real numbers), and units (cm, m, etc.).
ConstrainedChunk These are symbolic quantities with some constraints and maybe a reasonable value. Measuring the length of a pendulum would have some reasonable value (between 1 cm and 2 m) and the constraint that the length cannot be a negative value.
ConstrConcept Similar to a ConstrainedChunk, but is instead built off of a DefinedQuantityDict. This means that the value also has a definition an associated domain of knowledge. We could use a similar example to the one for ConstrainedChunk, except we would know the definition of a pendulum arm and its domain (physics).
UnitalChunk Similar to a DefinedQuantityDict, these are for concepts with quantities that must have a unit definition. A pendulum arm is a tangible object with a symbol (l) and units (cm, m, etc.).
UnitaryChunk Similar to a QuantityDict, these are for ideas with quantities that must have a unit definition. A pendulum arm is an idea associated with a symbol (l) and units (cm, m, etc.).
UnitaryConceptDict Same as UnitalChunk in terms of record fields, just with a different wrapping. A pendulum arm is an idea associated with a symbol (l) and units (cm, m, etc.).
UnitDefn Comprised of a concept, unit symbol, and a list contributing units. Meter is a unit of length defined by the symbol (m).
UncertainChunk Uncertain Chunks are constrained values that have some form of uncertainty. Uncertainties can only be added after we know that the concept is a symbolic quantity with some form of constraints. Measuring the length of a pendulum arm may be recorded with an uncertainty value.
UncertQ Similar to an UncertainChunk, this type is used to store constrained values with some form of uncertainty. However, it is built off of a ConstrConcept rather than a ConceptChunk. Measuring the length of a pendulum arm may be recorded with an uncertainty value.
RelationConcept These are for concepts that may have an associated expression. Used often in creating definitions and models. We can describe a pendulum arm and then apply an associated equation so that we know its behaviour.
QDefinition Building off of a QuantityDict, we now have a defining expression with inputs, a definition, and a domain. Used to make definitions and models. Finding the velocity of a pendulum arm through a QDefinition would entail an equation to find velocity and input values.
Citation A citation refers to other people's work. In Drasil, the reference address of a citation becomes the UID for that citation. It also contains other necessary information such as the kind of citation and citation fields. A reference to a thesis paper like Koothoor's "Document driven approach to certifying scientific computing software" would include the affiliated university, publishing year, and city.

Analyzing Chunks

It can be quite difficult to see the dependencies of each chunk, so making graphs and data tables (by running make analysis) can help us to fine-tune which chunks should exist and which chunks need to be modified.

Clone this wiki locally