Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Less strict design for typed arrays #73005

Open
vonagam opened this issue Feb 9, 2023 · 19 comments
Open

Less strict design for typed arrays #73005

vonagam opened this issue Feb 9, 2023 · 19 comments

Comments

@vonagam
Copy link
Contributor

vonagam commented Feb 9, 2023

Currently typed arrays do work: there should be no memory corruptions, unexpected overwrites on variable assignments, unexpected inferred element types and so on. But they do so in a very strict manner, a typed array has a type info during runtime and any potential element of the array is tested for a match with the type. It allows to speed up type safe read access - var a: int = ints[i] - since an element type is guaranteed a type check can be skipped on such assignment. But it comes with the cost of performance for writing and, more importantly, with loss of flexibility of types.

Inflexibility comes from a need to have all type info present at runtime, from how those types are represented and from underlying incompatibility between typed and untyped arrays. Some examples of such problems:

  1. Inability to specify deep nested element type - Array[Array[int]]. Right now this is only about arrays, but then other built-in types can get type parameters and the limitation will be more clear.
  2. Should array.map(to_int) return a typed array or not? That's a choice and because with current design they are incompatible it will always lead to clumsiness in some cases.
  3. Should typed arrays with different typed but same elements be considered equal and have the same hash or not? Again, that's a choice and strong incompatibility can be ignored but will lead to inconsistent behavior.
  4. Same array literal produces incompatible types depending on context. Which creates confusion for users.

Those problems would go away with more relaxed design. Instead of storing type information in the array itself keep type enforcement only to usage sites.

So, yes, the type will become less enforced:

var ints: Array[int] = []
var untyped: Variant = ints
var strings: Array[String] = untyped

Assignment of untyped to strings will not lead to an error by itself (like right now) but any consequent place where an element will be accessed and its type will be expected to be a string will produce an error. And with properly typed code many wrong assignments will still be caught during the analysis - so strings = ints will not need to get to runtime to see the problem.

Maybe it is too late to have a discussion about such change since the release candidate is here, but at least it should be noted that there was such possibility.

@tjw123hh
Copy link

I think it's reasonable because now I can't even export a deep nested element type like 3.x

@Bromeon
Copy link
Contributor

Bromeon commented Feb 25, 2023

I agree that the current typed array design has quite a few flaws and feels unfinished. As a prominent example, covariance ("upcast" Array[int] -> Array) breaks type safety when writing, moving errors to runtime. Modeling this relation in GDExtension bindings is also quite hard, as the array can be in 3 states (Array, Array[T], Array but typed at runtime) and there are a lot of edge cases to consider.

That said, where would the point of having a type declaration be if it's not reliable? If I can have an Array[String] storing ints, why not stay with Array like in Godot 3? Such type annotations then feel more like recommendations à la Python, which would mark a deviation from other types in GDScript.

It will be harder to build robust interfaces, because the user needs to manually check whether an Array[String] indeed only contains strings. And even then, such user-side checks are only snapshots and guarantee no invariants -- another reference to the array could always add ints after the check.

Fundamentally, GDScript is a very dynamic language -- there are tons of reflection APIs like call(), set(), emit_signal() etc. Without runtime type information, this would all break apart.

@vonagam
Copy link
Contributor Author

vonagam commented Feb 25, 2023

In this theoretical design an element type is an access guard. Meaning that theoretically Array[String] can store ints inside, but if during read or write through a typed array variable an element ends up being anything other than String then an error will be produced.

So it is not a "hint" as I worded it initially but a specified type (so easier typing of things) and a access type guarantee (so not just comment).

@Bromeon
Copy link
Contributor

Bromeon commented Feb 25, 2023

But that would mean you would propagate runtime errors from the point of declaration until much later, when the array is accessed. If that element is not accessed, but e.g. appended to another (untyped) array, you'd never catch it at all.

I personally don't think such a design would be better. Why not?

  • You can achieve "access guard" semantics relatively easily by writing a utility function that checks the type. The only advantage typed arrays give you here is syntactic sugar, so you can write array[i] instead of get_int(array, i).
  • The opposite is not true: you cannot emulate an invariant on element-types if the type information is not stored alongside the array.

As I see it, the current approach is strictly more powerful in terms of type safety.

@vonagam
Copy link
Contributor Author

vonagam commented Feb 25, 2023

you cannot emulate an invariant on element-types if the type information is not stored alongside the array.

It is backwards. Currently there is no support invariant/contraviant of types because they are stored in the array.

@Bromeon
Copy link
Contributor

Bromeon commented Feb 25, 2023

I meant "invariant" as "enforced runtime constraint", not in terms of invariance 😉

@vonagam
Copy link
Contributor Author

vonagam commented Feb 25, 2023

You can achieve "access guard" semantics relatively easily

No, you cannot, for that you would need to write such utility access functions for each type that you use, it is straight forward but not acceptable variant (how many functions would you need to write?). Also how would that work with typed arrays in native classes api?

The opposite is not true: you cannot emulate

Not everything is needed to be emulated. The question is what gives more usable and useful api.

Current design of arrays will stay, it is too late to change it now. This issue is just a note for future and present when dynamic nature of gdscript meets rigidity of typed arrays and creates problems.

@Bromeon
Copy link
Contributor

Bromeon commented Feb 25, 2023

No, you cannot, for that you would need to write such utility access functions for each type that you use, it is straight forward but not acceptable variant (how many functions would you need to write?).

You don't even need such a function, it's just convenience. You can simply declare a local variable with a type:

var s: String = array[i] 

This already creates an error when the element type is int. The array doesn't need to know about it.
But I'm aware that the : String would not be necessary in your proposal.


Not everything is needed to be emulated. The question is what gives more usable and useful api.

Exactly, and here is where I disagree 🙂 If I type a parameter explicitly as Array[String], I would like to rely on the fact that it's an array of strings. Just like when I have an int parameter, I don't want a String to be passed.

As mentioned, catching errors as early as possible (at the parameter level of a function) allows to enforce invariants that hold inside the function, and thus enables more robust APIs. I no longer need to manually check for contents if the type declaration gives certain guarantees. The opposite, propagating logic errors due to a weaker type system, can cause hard-to-find bugs and shifts more effort to the user (through testing, debugging, etc).


This issue is just a note for future and present when dynamic nature of gdscript meets rigidity of typed arrays and creates problems.

I agree that there are some remaining issues with typed arrays, but I don't think this is one of them -- what you describe can be solved quite well with the current untyped Array. Without runtime type information, typed arrays would lose most of their advantages in my opinion.

@tcoxon
Copy link
Contributor

tcoxon commented Jul 26, 2023

It's very difficult to use typed arrays as they are right now. In some cases, they create runtime errors on code that is logically correct, e.g.:

var object = load("res://some_script.gd").new()
object.func_with_typed_array_param(["string1", "string2"]) # Error on this line

Where some_script.gd is:

func func_with_typed_array_param(_arg: Array[String]) -> void:
    pass

IIUC, array literals use context to infer which TypedArray to create. Since the type of object is not statically known, it falls back on creating a plain untyped Array, which then fails the type check when the function is called.

Being able to assign an Array value to an Array[T] parameter/variable, which the proposal would allow, would make typed arrays much more usable. Analogous code where a Variant argument is converted to match a String parameter doesn't cause errors:

var object = load("res://some_script.gd").new()
var x: Variant = "hello"
object.func_with_string_param(x)
func func_with_string_param(_arg: String) -> void:
    pass

@anvilfolk
Copy link
Contributor

Got pinged, figured I'd link godotengine/godot-proposals#7364, where a discussion on desired behavior of typed arrays is ongoing :)

@Bromeon
Copy link
Contributor

Bromeon commented Jul 28, 2023

Being able to assign an Array value to an Array[T] parameter/variable, which the proposal would allow, would make typed arrays much more usable.

The challenge is to allow that without significantly weakening type safety. Such code should not be allowed:

var untyped = ["hello", 42]
var typed: Array[String] = untyped

The solution to this problem is a more lenient/flexible interpretation of literals:

  • ["string1", "string2"] should be assignable to both Array[String] and Array.
  • ["string", 42] should only be assignable to Array.

Note that ["string1", "string2"] cannot simply be inferred as Array[String], even if that is implicitly convertible to Array -- because that means the array loses the ability to store other types. But even that would likely be better than the status quo.

@ttencate
Copy link
Contributor

TypeScript might be a good source of inspiration, since it also adds gradual static typing to a previously existing dynamically typed language, and is designed to be pragmatic. In TypeScript, array types can be converted freely:

let arr1: number[] = [1, 2]

// Implicit upcast: allowed
let arr2: any[] = arr1
arr2.push("three")
console.log(arr1) // [1, 2, "three"] despite arr1 being number[]

// Implicit downcast: allowed
let arr3: number[] = arr2

function takeTyped(arr: number[]) {}
function takeUntyped(arr: any[]) {}

// All these are also allowed:
takeTyped(arr1)
takeUntyped(arr1)
takeTyped(arr2)
takeUntyped(arr2)

Playground link

I have worked with TypeScript extensively, and I don't recall any problems resulting from this unsoundness. It's definitely not a commonly encountered footgun.

Note that TypeScript compiles down to JavaScript, which (unlike GDScript) does not attach any runtime type information to arrays. We might consider its removal in GDScript as well.

@vnen
Copy link
Member

vnen commented Jul 29, 2023

TypeScript might be a good source of inspiration, since it also adds gradual static typing to a previously existing dynamically typed language, and is designed to be pragmatic.

Note that the relationship between TypeScript and JavaScript is not the same as typed GDScript and untyped GDScript. For TS, it all ends up being transpiled to dynamic JS in the end. GDScript however uses the type information to guarantee soundness even when mixed with a dynamic runtime.

This creates a tradeoff where some code is slower to check types at runtime but others become quite faster by being validated at compile time. This is an expectation we set from the get go and we should abide by it.

That's also why I don't like the idea of being loose with typed arrays. If we are removing the effects of typed arrays, their existence becomes almost pointless. They would do well in a complete typed code but break as soon as they interact with dynamic type land. So except for very local contexts there would be no type optimization possible, and even runtime checks would have to exist on completely typed code (so more of a performance penalty than he opposite).

@vnen
Copy link
Member

vnen commented Jul 29, 2023

The solution to this problem is a more lenient/flexible interpretation of literals:

This can be done but invariably will have to spill into core code. We would need to differentiate a Variant Array typed value and an untyped literal. This would also hold only for actual literals:

var untyped = [1, 2, 3]
var typed: Array[int]

typed = [1, 2, 3] # OK
typed = untyped # Not OK

For the core this is all the same, that's why it will need some "mark as literal" functionality in the core Array.

Unless we make the type be forced into untyped code:

var untyped = [1, 2, 3]
var typed: Array[int]

typed = untyped # Now `untyped` is `Array[int]`
untyped.push_back("a") # Fails at runtime

Which makes it workable in more cases but also creates more unexpected scenarios.

I do think that having more flexible literals is a good compromise.

@Bromeon
Copy link
Contributor

Bromeon commented Jul 29, 2023

Out of curiosity, aren't flexible literals already implemented to an extent?

var n = 7         # int
var n: float = 7  # float

var s = "str"              # String
var s: StringName = "str"  # StringName

var a = [1, 2, 3]                    # Array
var a: PackedInt32Array = [1, 2, 3]  # PackedInt32Array 

@setzer22
Copy link

setzer22 commented Jul 29, 2023

Unless we make the type be forced into untyped code

@vnen You're almost describing type inference here. Many languages have local limited type inference for literals.

When you write var untyped = [1, 2, 3], the type of the array is not yet determined. It can be considered to be in a superposition of types: Maybe it is an Array[int], maybe it is an Array. It is definitely not an Array[String]. The type checker can take note of that.

This "superposition" of types is forced to be resolved later on, looking at the usages of this variable. By assigning it to another variable declared to be Array[int], the type of the "untyped" array is contrained to in fact be an array of ints.

If there weren't any usages that resolve the type, a sane default of Array should be taken. OTOH, if there are incompatible usages, like assigning to a variable typed Array[int] and passing as parameter to a function taking an Array, then you emit a compile time error about the type of this variable being ambiguous.

This is of course, just a possibility. But I think it might help solve many of your concerns without breaking type safety!

@vnen
Copy link
Member

vnen commented Jul 30, 2023

Out of curiosity, aren't flexible literals already implemented to an extent?

Indeed, I was referring specifically for the case of typed arrays. This is actually also the case already for typed arrays, when the type is known:

var int_arr:  Array[int] = [1, 2, 3]

But this isn't the issue. The issue is when the actual type ins't known at compile time:

$SomeNode.int_arr = [1, 2, 3]

$SomeNode can be any descendant of Node so the type of the int_arr property isn't known at compile time. This can only be properly solved at runtime, but at this point the VM or the core cannot tell it was a literal array, hence my mention of adding a marker to the Array itself. For types passed by value this does not matter because we can just convert, it becomes a problem with Array because it is passed by reference.


Unless we make the type be forced into untyped code

@vnen You're almost describing type inference here. Many languages have local limited type inference for literals.

When you write var untyped = [1, 2, 3], the type of the array is not yet determined. It can be considered to be in a superposition of types: Maybe it is an Array[int], maybe it is an Array. It is definitely not an Array[String]. The type checker can take note of that.
[...]
If there weren't any usages that resolve the type, a sane default of Array should be taken. OTOH, if there are incompatible usages, like assigning to a variable typed Array[int] and passing as parameter to a function taking an Array, then you emit a compile time error about the type of this variable being ambiguous.

It is not exactly type inference, because the type isn't resolved at compile time (basically what I just said above in reply to Bromeon, it could be if the type is known, but I'm thinking in the cases where it isn't). For a user that does not add static typing, this might become unpredictable.

The example I posted is self-contained for the sake of illustration, we can create a more expanded example that's not unlikely to happen in real codebases:

# addons/some-third-party-addon/CustomNode.gd
extends Node
class_name CustomNode
var arr: Array[int]
# my-untyped-node.gd
extends Node
var my_arr = [1, 2, 3]
func called_at_some_point():
	$CustomNode.arr = my_arr # (a)
# other-untyped-node.gd
extends Node
func called_at_some_other_point():
	$MyUntypedNode.my_arr.push_back("a") # (b)

If we change the type at runtime, the order of calling can generate one of two errors:

  1. If (a) runs first, it will coerce my_arr to Array[int], since the contents are valid. When (b) runs, this will generate a runtime error (note that this kind of error does not break in the debugger because there's still no mechanism to bubble this error up to the GDScript interpreter). The user may notice the error message in the logs, but will be left to their own devices to discover where the array changed typed, since they might not even know the CustomNode uses static typing.

  2. If (b) runs first then (a) will generate a runtime error, because the array cannot be converted to Array[int] anymore. I believe this does break in the debugger, which is good, and it's a much more obvious error (the type is Array[int], which is told in the error, and the value contains an element that cannot be converted to int).

If this is chosen as a solution, then the case 1 must have very good error messages. Pointing out why the call fails and where exactly the array has changed the type. This might be tricky to actually implement as the array has to carry some debug info about the type change to be able to point out the source.

@ttencate
Copy link
Contributor

@vnen As your example illustrates, changing types at runtime is a recipe for confusion. I'm concerned that such an approach would create more problems than it solves.

@vnen
Copy link
Member

vnen commented Jul 31, 2023

As your example illustrates, changing types at runtime is a recipe for confusion. I'm concerned that such an approach would create more problems than it solves.

@ttencate I agree. While I don't like this approach, I still prefer to present multiple solutions so everyone can be aware of the implications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants