Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project Arbex (Arbitrary Expressions) #4777

Merged
merged 4 commits into from
Aug 31, 2017
Merged

Conversation

anandthakker
Copy link
Contributor

@anandthakker anandthakker commented Jun 2, 2017

Intent

Design and implement a style specification syntax for defining "style functions" using arbitrary expressions. Such functions should be usable for:

Background:

Initial discussion: #4715
Development of working draft: #4754

Important use cases that the final design should cover:

  • unit conversion
  • number formatting
    • decimal precision
    • thousands separator
  • conditionals (for the token defaults use case)
  • concatenation (for the token defaults use case)
  • customizable interpolation
  • arithmetic on feature properties within functions
  • localization

Plan

  • Develop working draft of spec
  • GL JS Implementation JS arbex implementation: parse, typecheck, compile expressions #4841
  • Tail work for GL JS implementation:
    • Implement cubic-bezier interpolation type for curve expressions
    • Drop new bezier-easing dependency in favor of pre-existing unitbezier
    • Reinstate support for color space interpolation option
    • Check out how native handles validation for a property function whose stop outputs are of the wrong type (see JS arbex implementation: parse, typecheck, compile expressions #4841 (comment))
    • Refactor style function call sites to pass (globalProperties, feature) rather than (globalProperties, featureProperties) (needed for geometry_type and id)
    • Generalize match expression to support multiple keys per output value
    • Guard against JS keywords in let-bound variable names
    • Update {text,icon}-size DDS path to use expressions
    • Benchmark.
  • Add documentation for each expression into style-spec JSON and the generated docs pages.
  • Cross-platform test suite
  • GL Native Implementation Expressions mapbox-gl-native#9439
  • SDK Implementations

Benchmarks

Benchmark scores:

benchmark master 45fde16 feature/expressions 3fc994c
map-load 148 ms 121 ms
style-load 76 ms 132 ms
buffer 1,058 ms 1,091 ms
fps 60 fps 60 fps
frame-duration 6 ms, 0% > 16ms 8 ms, 1% > 16ms
query-point 0.85 ms 0.84 ms
query-box 51.25 ms 62.33 ms
geojson-setdata-small 7 ms 9 ms
geojson-setdata-large 135 ms 151 ms

Looking at the dds-specific benchmark being added in #5207, which exercises tile layout with many zoom-and-property functions, looks like about a 2x hit using expressions compared to existing functions:

update: After a couple quick fixes, it's more like 1.5x instead of 2x:

screen shot 2017-08-30 at 11 55 30 am

@mapbox/gl @mapbox/studio @mapbox/cartography-cats @mapbox/support

@anandthakker anandthakker added cross-platform 📺 Requires coordination with Mapbox GL Native (style specification, rendering tests, etc.) under development labels Jun 2, 2017

###Decision:
- `["case", cond1: Boolean, result1: T, cond2: Boolean, result2: T, ..., cond_m, result_m: T, result_otherwise: T] -> T`
- `["match", x: T, a_1: T, y_1: U, a_2: T, y_2: U, ..., a_m: T, y_m: U, y_else: U]` - `a_1`, `a_2`, ... must be _literal_ values of type `T`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we generalize this so that multiple literal values can be mapped to the same output expression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplest syntax for this would be:

["match", x: T, [a_1: T, a_2: T, ... a_N: T], y_a: U, [b_1: T, b_2: T, ..., b_M], y_b: U, ..., y_else: U]

Only bad thing about this is that it would take us away from the purity of every array in the JSON representing an expression (since we could consider the curve types like ["linear"] to be expressions of an private CurveType type). Is maintaining such syntax purity worthwhile? Some ways we could do that:

  1. ["match", x: T, [ "vector", a_1: T, a_2: T, ... a_N: T], y_a: U, [ "vector", b_1: T, b_2: T, ..., b_M], y_b: U, ..., y_else: U]. But allowing a "vector" constructor may not be a great idea, since we don't have a way for users to specify arbitrary types.

  2. We could use "json_array" instead -- but that's not great type-wise, since it specifically produces Vector<Value>, not Vector<T>.

  3. A specialized "match_case" expression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the simplest syntax. It's fine if "match" is a special form. (case is in scheme too.)

- `Boolean`
- Literal: `true` or `false`
- `Color`
- `JSONObject`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just call it Object.

- TODO: without type inference, 0-length arrays and vectors can't be typed
- `Value`: A "variant" type representing the set of possible values retrievable from a feature's `properties` object (`Null | String | Number | Boolean | JSONObject | Vector<Value>`)
- `Error`: a subtype of all other types. Used wherever an expression is unable to return the appropriate supertype. Carries diagnostic information such as an explanatory message and the expression location where the Error value was generated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about enumerated types? I think we'll find that leaving them stringly-typed is less than ideal.

Copy link
Contributor Author

@anandthakker anandthakker Jun 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfirebaugh Can you say more about why you think we’ll want these over strings? specifically: are there cases where such types would be useful for checking what a subexpression evaluates to?

So far, the use cases I can think of for enums are for checking the final output of an expression that's being used for an enum-typed style property like symbol-placement. If it's true that this is the main reason for them, then the question is which is simpler/better: adding enums to the type system here, or treating the validation of style property values as a separate step?

Since there are, e.g., numeric properties like *-opacity that can't really be validated via types (at least not normal ones), seems like we're gonna have the extra validation step anyway, which makes me lean towards the latter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be wrong, just a gut feeling that we'll run into issues in practice, or as we implement it (especially in native). I don't have any concrete examples.

###Lookup:
- `["get", obj: JSONObject, key: String ] -> Value`
- `["has", obj: JSONObject, key: String ] -> Boolean`
- `["at", arr: JSONArray, index: Number] -> Value`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant to the polymorphic version on the next line, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, yep thanks.

- min, max

###String:
- `["concat", expr1: T, expr2: U, …] -> String`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

["concat", expr: String, …] -> String?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it might be okay, as a convenience, for concat to implicitly convert all of its arguments, on the premise that this should always be possible without error as long as the value isn't an Error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason to support implicit conversion for concat but not other expressions (+, &&, up/downcase, ...)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a hugely compelling one... just the combination of:

  • Combining string literals, string property values, and numeric property values will be a very common use case (i.e. the equivalent of using {} token replacement for text-field
  • Automatic conversions to String will never fail for values that aren't already Error, so the explicit cast doesn't seem to me to add any value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there by any advantage to having a string formatting function instead of concat?

On one hand it would add complexity because it would add function-specific syntax. On the other hand you could handle the entire "this is what I want the displayed text to be" in a single function, which might be more UI-friendly.

}
```

## Property Expressions
Copy link
Contributor

@1ec5 1ec5 Jun 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document refers exclusively to “expressions”, but the title of this PR refers to “arbitrary expressions”. To avoid confusion, we should settle on one name for the feature and eliminate the other name. As catchy as “Arbex” sounds, I prefer simply “expressions” for these reasons:

  • Studio users, who may be unfamiliar with the style specification’s evolution, may perceive “arbitrary” as a negative trait.
  • “Arbitrary expressions” would seem to imply a non-arbitrary expression feature that doesn’t exist.
  • “Expressions” aligns with common usage on the iOS platform and in other kinds of software.

/cc @dasulit

brandonbloom added a commit to brandonbloom/mapbox-gl-js that referenced this pull request Jun 14, 2017
Ideally this will no longer be necessary after this happens:
mapbox#4777
@anandthakker anandthakker force-pushed the feature/expressions branch 2 times, most recently from b75efe5 to c1c29cb Compare June 30, 2017 12:27
@ivovandongen
Copy link
Contributor

@anandthakker Could we consider using a (formal) specification for the expression syntax? Something like ABNF would let us use existing parser generators like ANTLR and others to generate platform specific bindings and also parts of the core implementations and tests.

@anandthakker
Copy link
Contributor Author

@ivovandongen Yeah, I've definitely been pondering this. There are some things that would make it challenging to specify expressions fully and formally:

  • It's a JSON 'syntax', not text, so I don't think existing methods (like BNF) would work.
  • Thus far, we've had no need to include a way to represent every type in the system. Types exist, but only in the implementation--not in the language itself. In order to formally specify each expression in a spec, we'd need to either introduce a serialized representation of types, so that we could specify, e.g., that the 'length' expression is typed (Array<T>, String) => Number.
  • Each expression has not only unique evaluation behavior, but also potentially custom parsing needs. That's not a dealbreaker for a formal spec, but it does limit how much useful code we can generate from such a spec.

Beyond that, it's also just more machinery in general. As such, when @jfirebaugh and I discussed this yesterday we landed on the notion that, at least for now, we could just use the module in GL JS that defines each expression (found at src/style-spec/function/definitions/index.js) as the "source of truth". Even though that module's first purpose is to actually provide the GL JS implementation of each expression, we can also use it to build scripts to generate docs and core / SDK code. If/when that becomes problematic, we can always revisit the question of moving the 'source of truth' into a more complete specification within the style spec reference JSON.

brandonbloom added a commit to brandonbloom/mapbox-gl-js that referenced this pull request Jul 5, 2017
Ideally this will no longer be necessary after this happens:
mapbox#4777
throw new ParsingError(key, `Expected string, but found ${typeof name} instead`);

if (context.definitions[name])
throw new ParsingError(key, `"${name}" is reserved, so it cannot not be used as a "let" binding.`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is getting compiled to function (name, ...) { ... }, we need to either prohibit JS keywords as well, or sanitize names, say by prefixing them with _.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 yep, tracking that on my list of tail work but forgot to add it to the top of this ticket. Adding now.

- `[ "e" ] -> Number`

### Variable binding:
- `[ "let", name1: String, e1, name2: String, e2, ..., e_result ]` - Bind expression `e1` to the string `name1`, `e2` to `name2`, etc., before evaluating `e_result`. The bound expressions may be referenced within `e_result` with `[ name1 ]`, `[ name2 ]`, etc. (E.g.: `["let", "a", 1, "b", ["number", ["get", "blah"]], [ "+", ["a"], ["b"] ]`.)
Copy link
Contributor

@jfirebaugh jfirebaugh Jul 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to use [name1] as the syntax for this -- that would mean that any new expression form we want to add in a future version would potentially break existing expressions that used that name as an identifier. Better to have an explicit form like ["var", name].

Copy link
Contributor Author

@anandthakker anandthakker Jul 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂️ yeah. ["ref", name1]? ["var", name1]? Less concise, but actually might allow us to avoid some special logic for var reference expressions.

Copy link
Contributor

@jfirebaugh jfirebaugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than extending LambdaExpression, the Expression subclasses for match, case, curve, and coalesce should directly implement Expression, providing custom parse, typecheck, and compile methods, and using structured member data rather than a generic args. For example, the member data for MatchExpression should be something like:

    input: Expression;
    cases: Array<[Array<string | number>, Expression]>;
    otherwise: Expression;

This will reduce the complexity of LambdaExpression and other parts of the codebase, likely removing the need for typename, nargs, isGeneric, and so on. At the same time, using more strictly typed member data will reveal edge cases that are not currently handled/tested. As an experiment, I sketched this out for match. Edge cases:

  • No arguments (["match"])
  • One argument (["match", foo])
  • Two arguments (["match", foo, z])
  • Null or boolean key (["match", foo, null, x, z]) -- should we allow this? It's better written withcase IMO.
  • Object or array key (["match", foo, {...}, x, [[...]], y, z])
  • Mixed key types (["match", foo, 0, x, "0", y, z], ["match", foo, [0, "0"], x, z]). This is currently allowed, which surprised me, but I guess it is useful for munging datasets that are inconsistently typed.
  • Mismatched input/key types (["match", 0, "0", x, "1", y, z])
  • Mismatched result types (["match", foo, 0, 0, 1, "1", false])
  • Repeated keys (["match", foo, 0, x, 0, y, z])
  • Non-integer keys (["match", foo, 0.333, x, z])

@jfirebaugh
Copy link
Contributor

Continuing to look at how to remove TypeName from the Type variant list...

Besides match, case, curve, and coalesce, the expression forms with "special" typing rules are:

  • The forms that take a variable number of arguments of a single type: concat, +, *, &&, and ||. These are fairly easy to handle with a special case varargs signature type, a simplified version of nargs which only CompoundExpression#typecheck needs to handle (type Signature = Array<Type> | Varargs).
  • length: the array signature wants to match an array of any type. This seems like it should be handled by the array subtyping rules somehow.
  • at: the output type is the element type of the input array type. This is the closest to "inference" we seem to get. Maybe it should just have a custom typecheck override?

Another thing I noticed is that typecheck operates one level "deeper" than typing judgements are usually expressed. Typing judgements are usually of the form

let t1 be the type of subexpression a, t2 be the type of subexpression b, if <some predicate on t1 and t2>, then the type of [expr, a, b] is type t3

Note that there's no "expected" type input to this judgement: if you need to check that t3 is a subtype of some expected type, than you do that externally to checking expr itself. I think our typechecking code would be clearer if it was written like this.

@anandthakker
Copy link
Contributor Author

@jfirebaugh 👍 I came to similar tentative conclusions re: Varargs and length. For at, I was thinking of two possibilities:

  • at: (Number, Array) => Value (Array being Array<Value>). As with length, array subtyping should allow any array here (though I think currently, Value might, incorrectly, fail to match Array<T> for T != Value).
  • Promote at to be a special form with custom typecheck.

typecheck operates one level "deeper" than typing judgements are usually expressed

Oh yeah, good point -- this was originally the case to help make type inference of generics more straightforward, but now that we're eliminating that need, we can simplify it 🙌

context.key,
'The "zoom" expression may only be used as the input to a top-level "curve" expression.'
);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check should happen in function/index.js rather than here, because this restriction on [zoom] only applies to expressions used for style property values (as opposed to, say, filters)

@anandthakker anandthakker force-pushed the feature/expressions branch 6 times, most recently from cab3b58 to db7c15c Compare August 5, 2017 03:12
@anandthakker anandthakker force-pushed the feature/expressions branch 3 times, most recently from 75fcd51 to 73fff34 Compare August 29, 2017 22:33
@anandthakker
Copy link
Contributor Author

@mourner @jfirebaugh posted some benchmarks in the top of the ticket - looks like DDS is about 2x slower with expressions, but hopefully quick profiling will yield some low-hanging fixes that will reclaim some of that.

@mourner
Copy link
Member

mourner commented Aug 30, 2017

@anandthakker thanks! The frame-duration 6ms to 8ms change is also worth looking into.

@ChrisLoer
Copy link
Contributor

The frame-duration 6ms to 8ms change is also worth looking into.

#5150 adds to frame-duration as well (because it's doing collision detection during rendering). It's mainly an issue for zooms/maps with very dense line labeling, which is kind of the worst case for collision detection. We should try to keep an eye on how the two PRs interact.

@anandthakker
Copy link
Contributor Author

@mourner @ChrisLoer After 6f4a6c4 the impact on frame-duration is approximately 0.84 ms:

screen shot 2017-08-30 at 11 53 57 am


return {
result: 'success',
function: fn.bind(evaluationContext()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth trying a custom bind or an arrow fn — calls of natively binded functions are expensive in many browsers

} else if (expectedType.kind === 'Value') {
typeError = true;
} else if (expectedType.kind !== 'Array') {
typeError = typeof value !== expectedType.kind.toLowerCase();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: toLowerCase will be called on almost every get; might be preferable to make a lookup table for kinds instead

} catch (e) {
if (thunks.length === 0) throw e;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do away with a simple for loop here? Here, we create an array from arguments, and then pop values one by one from it when we can do away with just iterating over them, which is much cheaper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is actually obsolete code after df245c1 - will remove.

@anandthakker anandthakker merged commit 9ee7362 into master Aug 31, 2017
@ChrisLoer ChrisLoer mentioned this pull request Sep 5, 2017
18 tasks
@anandthakker anandthakker deleted the feature/expressions branch September 16, 2017 01:23
anandthakker added a commit to mapbox/mapbox-gl-native that referenced this pull request Nov 8, 2017
Ports mapbox/mapbox-gl-js#4777 (and its several follow-ups)
@gabimoncha
Copy link

I see that issue #4820 Support contains, begins with, and ends with filter option for strings is referenced in this package. But I cannot figure out the syntax for the expression to filter strings that begin with a given value. Can anyone help me with it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cross-platform 📺 Requires coordination with Mapbox GL Native (style specification, rendering tests, etc.)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants