Skip to content

Commit

Permalink
Wrote bytecode-format.txt
Browse files Browse the repository at this point in the history
It's annoying that I can only work for two hours at a time
  • Loading branch information
Ratstail91 committed Aug 31, 2024
1 parent 65087b1 commit 023cf9c
Show file tree
Hide file tree
Showing 2 changed files with 102 additions and 48 deletions.
79 changes: 31 additions & 48 deletions .notes/SECD-concept.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
This file is messy and confusing, and makes sense to nobody but me - so don't worry about understanding it too much - better docs will come later.

===

SECD = State, Environment, Control, Dump

The idea of "Landin's SECD Machine" is to store the working memory in S, the variable-value bindings in E, the code/instructions in C, and the program stack in D.
Expand All @@ -7,10 +11,16 @@ Notes:

The environment, denoted with an E, is created on routine start, and destroyed on routine end - however, it uses the parent routine's environment as the starting point for it's creation, so closures work as expected

unlike version 1, identifiers are not a valid datatype.
unlike version 1, identifiers are not a valid datatype - they're just an index representing a symbol, like "standard::clock"

placeholder opcodes - EOF, PASS, ERROR,

a "value" can be of any valid datatype, and may point to various parts of memory to define it's value

Symbols will be awkward... I suspect the symbol table might need to be rebuilt on startup, as the order of the modules will not necessarily be the same each time

The various instances of S could be the same array in memory, simply marked as "unused"? You could stick C on there as a value before "pushing" for a new routine

Things to consider later:
type cast?
rest parameter?
Expand All @@ -28,13 +38,13 @@ ASSERT
PRINT
pop S(0), and print the output
SET
read one word from C, saves the key E[word] to the value S(0), popping S(0)
read one word from C, saves the key E[SYMBOL(word)] to the value S(0), popping S(0)
GET
read one word from C, finds the value of E[word], leaves the value on S
read one word from C, finds the value of E[SYMBOL(word)], leaves the value on S
DECLARE
read two words from C, create a new entry in E with the key E[word1], the type defined by word2, the value 'null'
read two words from C, create a new entry in E with the key E[SYMBOL(word1)], the type defined by word2, the value 'null'
DEFINE
read two words from C, create a new entry in E with the key E[word1], the type defined by word2, the value popped from S(0)
read two words from C, create a new entry in E with the key E[SYMBOL(word1)], the type defined by word2, the value popped from S(0)


//arithmetic instructions
Expand All @@ -54,7 +64,7 @@ MODULO
COMPARE_EQUAL
pops S(-1) and S(0), replacing it with TRUE or FALSE, depending on equality
COMPARE_LESS
pops S(-1) and S(0), replacing it with TRUE or FALSE, depending on comparisoncomparison
pops S(-1) and S(0), replacing it with TRUE or FALSE, depending on comparison
COMPARE_LESS_EQUAL
pops S(-1) and S(0), replacing it with TRUE or FALSE, depending on comparison
COMPARE_GREATER
Expand All @@ -76,13 +86,14 @@ INVERT

//control instructions
JUMP
read one value from C, and move the program counter to that location
read one value from C, and move the program counter to that location (relative to the current position)
JUMP_IF_FALSE
read one value from C, pops S(0), and move the program counter to that location if the popped value is falsy
read one value from C, pops S(0), and move the program counter to that location (relative to the current position) if the popped value is falsy
FN_CALL
*read a list of arguments specified in C into 'A', store (S, E, C, D) as D, wipe S and E, move the stack pointer to the specified routine, set E based on the contents of 'A'
*read a list of arguments specified in C into 'A', store (S, E, C, D) as D, push S, move the stack pointer to the specified routine, push a new E based on the contents of 'A'
FN_RETURN
*read a list of return values specified in C into 'R', wipe S and E, restoroutine re (S, E, C, D) from D(0) popping it, store the contents of 'R' in E or S based on the next few parts of C
This
*read a list of return values specified in C into 'R', pop S, restore (S, E, C, D) from D(0) popping it, store the contents of 'R' in E or S based on the next few parts of C

//bespoke utility instructions
IMPORT
Expand All @@ -96,21 +107,21 @@ SCOPE_END

===

FN_CALLonly
FN_CALL
read word: read the following N arguments

for 0 to N do:
read word as match: # this allows literals and identifiers as arguments
stack: then pop S(0) into 'A'
**env: then read word, load E[word]*** into 'A'
**env: then read word, load E[SYMBOL(word)] into 'A'

read word:
store (S,E,C,D) as D
wipe S and E
jump C to routines[word]
determine where the routine is (is it new or is it a value?) and hold it for a moment
push E and C into a frame marker on S
jump C to the routine

read word:
read the following N parameter names, storing each member of 'A' as their value in E[name]***
read the following N parameter names, storing each member of 'A' as their value in E[SYMBOL(name)]

continue

Expand All @@ -120,21 +131,20 @@ FN_RETURN
for 0 to N do:
read word as match: # this allows literals and identifiers as arguments
stack: then pop S(0) into 'R'
**env: then read word, load E[word]*** into 'R'
**env: then read word, load E[SYMBOL(word)] into 'R'

restore (S,E,C,D) from D(0), popping it # this wipes S and C from the routine, and returns C to the pre-call position
pop E and S
extract and restore E and C from the frame marker on S

read word: read the following N storage locations for the values within `R`

for 0 to N do:
read word as match: # you're effectively reversing the prior reads
stack: then push from 'R' onto S
**env: then read word, save 'R' into E[word]***
**env: then read word, save 'R' into E[SYMBOL(word)]

**This could work by listing the sources as e.g. "SSSExS" - three stacks and one environment variable loaded onto the stack, then one more stack for a total of four values

***E[word] would more accurately be E[.data[word]], where '.data' is for the currently loaded routine

Notes:
the bytecode of a funtion call would look like:

Expand All @@ -144,30 +154,3 @@ Notes:

===

.header:
N total length
N .args count
N .data count
N .routine count
.args start
.code start
.datatable start
.data start
.routine start
//any additional metadata can go here

.args: # these keys stored in E before execution begins

.code:
READ 0
LOAD 0
ASSERT

.datatable: # could list the starts as a jump table, since members of data and routines have unknown sizes
0 -> 0x00

.data:
"Hello world"

.routines: # this stores inner routines, in sequence

71 changes: 71 additions & 0 deletions .notes/bytecode-format.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
The bytecode format

===

There are four components in the bytecode header:

TOY_VERSION_MAJOR
TOY_VERSION_MINOR
TOY_VERSION_PATCH
TOY_VERSION_BUILD

The first three are each one unsigned byte, and the fourth is a null terminated C-string.

* Under no circumstance, should you ever run bytecode whose major version is different
* Under no circumstance, should you ever run bytecode whose minor version is above the interpreter’s minor version
* You may, at your own risk, attempt to run bytecode whose patch version is different from the interpreter’s patch version
* You may, at your own risk, attempt to run bytecode whose build version is different from the interpreter’s build version

An additional note: The contents of the build string may be anything, such as:

* the compilation date and time of the interpreter
* a marker identifying the current fork and/or branch
* identification information, such as the developer's copyright
* a link to Risk Astley's "Never Gonna Give You Up" on YouTube

===

At this time, a 'module' consists of a single 'routine', which acts as its global scope.

Additional information may be added later, or multiple 'modules' listed sequentially may be a possibility.

===

# the routine structure, which is potentially recursive

# symbol shorthand : 'module::identifier'
# where 'module' can be omitted if it's local to this module ('identifier' within the symbols is calculated at the module level, it's always unique)

.header:
total size # size of this routine, including all data and subroutines
N .param count # the number of parameter fields expected
N .data count # the number of data fields expected
N .routine count # the number of routines present
.param start # absolute address of .param; omitted if not needed
.code start # absolute address of .code; mandatory
.datatable start # absolute address of .datatable; omitted if not needed
.data start # absolute address of .data; omitted if not needed
.routine start # absolute address of .routine; omitted if not needed
# additional metadata fields can be added later

.param:
# a list of symbols to be used as keys in the environment

.code:
# instructions read and 'executed' by the interpreter
READ 0
LOAD 0
ASSERT

.datatable:
# a 'symbol -> pointer' jumptable for quickly looking up values in .data and .routines
0 -> {string, 0x00}
1 -> {fn, 0xFF}

.data:
# data that can't really be embedded into .code
"Hello world"

.routines:
# inner routines, each of which conforms to this spec

0 comments on commit 023cf9c

Please sign in to comment.