Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Support __len metamethod for tables and rawlen function #536

Merged
merged 4 commits into from
Jun 28, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions rfcs/len-metamethod-rawlen.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Support `__len` metamethod for tables and `rawlen` function

## Summary

`__len` metamethod will be called by `#` operator on tables, matching Lua 5.2

## Motivation

Lua 5.1 invokes `__len` only on userdata objects, whereas Lua 5.2 extends this to tables. In addition to making `__len` metamethod more uniform and making Luau
Copy link

@Blockzez Blockzez Jun 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure that Lua 5.1 invokes the __len metamethod on everything except for tables and strings, not just userdata only, akin to __add metamethod invoking on everything except numbers and strings that can be casted to numbers.
So did I miss something then with the statement "Lua 5.1 invokes __len only on userdata objects", but #number errors attempt to get length of a number value so __len ought to be able to invoke on numbers as this is a sign that no raw operation is performed so it falls backs to metamethod (with the exception of __tostring which invokes on strings!)?

Should __len also invoke on strings as well? To add to that, should __add invoke on numbers and should __concat invoke on strings? Should we check metamethod first then perform raw operation later? (though vanilla Lua never had this and this is likely a feature too esoteric to be added). e.g.

function metatable_of_the_strings.__len(self)
    return tonumber(self) or 0
end
print(#"1234") --> 1234
print(#"42") --> 42
print(#"hello world") --> 0
print(rawlen("1234")) --> 4
print(rawlen("42")) --> 2
print(rawlen("hello world")) --> 11

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're technically correct in that a host program, or debug.setmetatable, can change metatable on other object types, Luau doesn't support debug.setmetatable so this is written assuming a user-space program that doesn't use debug. API.

more compatible with later versions of Lua, this has the important advantage which is that it makes it possible to implement an index based container.

Before `__iter` and `__len` it was possible to implement a custom container using `__index`/`__newindex`, but to iterate through the container a custom function was
necessary, because Luau didn't support generalized iteration, `__pairs`/`__ipairs` from Lua 5.2, or `#` override.

With generalized iteration, a custom container can implement its own iteration behavior so as long as code uses `for k,v in obj` iteration style, the container can
be interfaced with the same way as a table. However, when the container uses integer indices, manual iteration via `#` would still not work - which is required for some
more complicated algorithms, or even to simply iterate through the container backwards.

Supporting `__len` would make it possible to implement a custom integer based container that exposes the same interface as a table does.

## Design

`#v` will call `__len` metamethod if the object is a table and the metamethod exists; the result of the metamethod will be returned if it's a number (an error will be raised otherwise).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit inconsistent with other metamethods.

__concat, __unm, and all the binary/math operator metamethods, don't seem to do any checks.
__eq, __le and __lt convert the return to a boolean silently, seemingly via truthiness rules.

On the other side of the argument, __tostring does enforce a string return.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think __unm/__concat are comparable:

__unm doesn't have to return a number - in fact, it should likely return the value of the same type as its argument for arithmetic consistency.
__concat is likely to be used in contexts where "concatenation" similarly should preserve the source type instead of manufacturing a string; for example, container types may override .. to mean "container concatenation", eg returning a new container that contains elements from both sides.

I agree with eq etc, they should ideally be restricted to booleans but it's not strictly a necessary change (since every type is convertable to a boolean implicitly anyway), so there's already a guarantee that a caller has that a == b always evaluates to a boolean.

However, not every type can be converted to a number, so # either has to fail on non-numbers, or return the value of arbitrary type to the caller. The latter is worse from the type safety perspective; since today # can only return a number and it's not clear why # should be able to return a value of another type, it seems beneficial to keep it as such.

Essentially the general rule would be that if it makes sense semantically to return an arbitrary type (like __unm/__concat), then we should not restrict the returned type; if there's realistically only one semantically meaningful returned type, then it's better to guarantee to the caller that you either get the result of that type or an error.

Copy link

@Blockzez Blockzez Jun 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the # operation if ran with __len cast the return to unsigned 8/16/32/64/128/etc bit integers (if possible) akin to some standard library functions (e.g. math.random for signed and bit32 for unsigned) or would # if ran with __len also allow all values from the VM 'number' data type (most commonly IEC 559 binary64) so instead of just nonnegative ℤ, it'll include ℝ (actually a subset of ℚ with the denominator of 2^k), ±Infinity, and ±NaNQ(x)/±NaNS(x)/Ind?
I believe that currently # only returns nonnegative integers so it might as will cast it to a nonnegative integer data type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path of least resistance / maximum performance would be to preserve the number as is (which would allow the full range of IEEE values). I'm not sure truncating the number silently is very useful, and we don't really have a precedent of erroring on inexact conversions I believe.


`table.` functions that implicitly compute table length, such as `table.getn`, `table.insert`, will continue using the actual table length. This is consistent with the
general policy that Luau doesn't support metamethods in `table.` functions.

A new function, `rawlen(v)`, will be added to the standard library; given a string or a table, it will return the length of the object without calling any metamethods.
The new function has the previous behavior of `#` operator with the exception of not supporting userdata inputs, as userdata doesn't have an inherent definition of length.

## Drawbacks

`#` is an operator that is used frequently and as such an extra metatable check here may impact performance. However, `#` is usually called on tables without metatables,
and even when it is, using the existing metamethod-absence-caching approach we use for many other metamethods a test version of the change to support `__len` shows no
statistically significant difference on existing benchmark suite. This does complicate the `#` computation a little more which may affect JIT as well, but even if the
table doesn't have a metatable the process of computing `#` involves a series of condition checks and as such will likely require slow paths anyway.

This is technically changing semantics of `#` when called on tables with an existing `__len` metamethod, and as such has a potential to change behavior of an existing valid program.
That said, it's unlikely that any table would have a metatable with `__len` metamethod as outside of userdata it would not anything, and this drawback is not feasible to resolve with any alternate version of the proposal.

## Alternatives

Do not implement `__len`.