Proposal for ATD modules #265

mjambon · 2022-03-24T01:35:35Z

The ATD language currently doesn't have a dedicated syntax for expressing dependencies on external type definitions and implementations. The current mechanisms use the special pseudo-type abstract or use the special wrap constructor. This is all clunky and hard to remember. As a result, users do their best to fit all the definitions in a single .atd file. It's usually fine but in some cases, it's limiting.

Issues being addressed

We want to split ATD files into multiple pieces that depend on one another and fit well within the structure of the application. For example, basic types such as date can be defined in A.atd while two other APIs B.atd and C.atd use types defined in A.atd but are otherwise independent of one another.
External implementations that are not derived from ATD files should be easier to use and reference. Such implementations are typically opaque data structures but they offer a way to view them as JSON. It could be that they support only JSON export but not import. For example, JSON can be used to dump metadata about an image loaded in memory (see example in the comments).

Proposal

Add a top-level import construct that declares which ATD-compatible units the current ATD file depends on, e.g. import basic_types.
External types are referenced using a dot notation e.g. basic_types.date.
Separate compilation: External types may remain opaque i.e. the code generators atdgen, atdj, atdpy, etc. should not be forced to access external ATD files when generating code for one given ATD file. The existence of an external type or its arity (number of type parameters of a polymorphic type) may not be checked because of this.

Syntax

We introduce a new top-level import construct, which should (must?) occur at the beginning of the ATD file as it will affect all type definitions (since type definitions are reordered based on their mutual dependencies anyway).

import REAL_NAME [as SHORT_NAME]

e.g.

import basic_types as base
import time_and_date

REAL_NAME should be the name of the original ATD file without the .atd extension if such file exists.

Referencing the types they provide must be unambiguous without having access to the external definitions. For that, the type names use the dot notation:

type msg = {
  id: string;
  date: time_and_date.date;
  signature: base.signature;
}

Note that this import semantics is similar to Python's. We don't really need to support the form from MODULE import MEMBER because it can be achieved using type aliases. For shorter type names, we can do this:

import basic_types as base
import time_and_date

type date = time_and_date.date

type msg = {
  id: string;
  date: date;
  signature: base.signature;
}

Mapping to target languages

Each target programming language has a slightly different way of managing program units. It's up to the translator (atdgen, atdpy, ...) to do the right thing and support language-specific ATD annotations. For the OCaml and Python targets (atdgen, atdpy), useful annotations could look like this:

import JSON
  <ocaml module="Yojson.Safe">
  <python module="atdpy_json">

type raw_json = JSON.t

type foo = {
  name: string;
  bar: raw_json;
}

or more directly:

import raw_json
  <ocaml module="Yojson.Safe">
  <python module="atdpy_json">

type foo = {
  name: string;
  bar: raw_json.t;
}

For commonly used external modules such as a JSON module, a code generator can place them in scope automatically. In OCaml for example, atdgen could emit the code open Atdgen_runtime which would make a JSON module directly available. For Python where the convention is for modules to use lowercase and where a json module already exists, we might want the name JSON to map to atdpy_json rather than mapping to JSON (nonstandard) or json (already taken). The goal is to make the use of common or standard external libraries not require ATD annotations. We want the following to just work:

import JSON

type foo = {
  name: string;
  bar: JSON.t;
}

Note that the t represents the type of the main data structure provided by the module as is conventional in OCaml. It's a little nicer than JSON.json or than having to create an alias type json = JSON.json. It's not a requirement imposed by ATD.

So far, all the implementations of an ATD module must provide the same collection of types and functions that work on those types. We could provide a way to express "hey, Python's JSON.t isn't named t but is named json" but it's not clear if it's necessary since many other constraints exist for a module interface to be ATD-compatible. Whoever decides that a cross-language module such as JSON should exist, should also specify the set of types it provides.

The text was updated successfully, but these errors were encountered:

aryx · 2022-03-24T08:56:46Z

A poor's man alternative is to simply use the C preprocessor (or another preprocessor) with #include to split an ATD file in different parts and have a Makefile generate the final .atd from those different parts.

jchavarri · 2022-03-24T15:41:05Z

Thanks for this proposal. I am not sure I understand what are the upsides in terms of functionality.

We want to split ATD files into multiple pieces that depend on one another and fit well within the structure of the application. For example, basic types such as date can be defined in A.atd while two other APIs B.atd and C.atd use types defined in A.atd but are otherwise independent of one another.

Isn't this problem solved already with abstract and module? If yes, is it worth the trouble of implementing new syntax for essentially the same functionality? And would adopting this proposal mean that abstract and module would go away at some point in the future? To avoid redundant functionality and confusion. Maybe the proposal should include some deprecation plans as well? e.g. first minor version with deprecation, second major version with removal.

External implementations that are not derived from ATD files should be easier to use and reference. Such implementations are typically opaque data structures but they offer a way to view them as JSON. It could be that they support only JSON export but not import.

I am not sure I understand what this means. Could you show an example maybe to illustrate the problem and how this proposal would fix it, please?

mjambon · 2022-03-24T19:02:15Z

@jchavarri in a previous company, we have an application-wide 2000-line file named api.atd that started with the following "imports":

type cardid = string wrap <ocaml module="Cardid">
type color = string wrap <ocaml module="Color">
type colorid = string wrap <ocaml module="Colorid">
type commentid = string wrap <ocaml module="Commentid">
type cusid = string wrap <ocaml module="Cusid">
type email = string wrap <ocaml module="Email">
type empty_record = string wrap <ocaml module="Empty_record">
type error_details <ocaml from="Error_details" t="t"> = abstract
type eventid = string wrap <ocaml module="Eventid.Public">
type full_calid = string wrap <ocaml module="Full_calid">
type gender = string wrap <ocaml module="Gender">
type group_member_role = string wrap <ocaml module="Group_member_role">
type groupid = string wrap <ocaml module="Groupid">
type hashtag = string wrap <ocaml module="Hashtag">
type json <ocaml module="Yojson.Safe"> = abstract
type messageid = string wrap <ocaml module="Messageid">
type planid = string wrap <ocaml module="Planid">
type platform = string wrap <ocaml module="Platform">
type label = string wrap <ocaml module="Label">
type recurrence <ocaml from="Icalendar" t="recurrence"> = abstract
type simple_eventid = string wrap <ocaml module="Eventid.Simple">
type slack_teamid = string wrap <ocaml module="Slack_api_teamid">
type slack_userid = string wrap <ocaml module="Slack_api_userid">
type slack_username = string wrap <ocaml module="Slack_api_username">
type storable_eventid = string wrap <ocaml module="Eventid.Storable">
type teamid = string wrap <ocaml module="Teamid">
type timeonly = string wrap <ocaml module="Util_timeonly">
type timestamp = string wrap <ocaml module="Util_time">
type timezone = string wrap <ocaml module="Util_timezone">
type token = string wrap <ocaml module="Token">
type uid = string wrap <ocaml module="Uid">
type unixtime = float wrap <ocaml module="Util_time.As_unixtime">

What bothers me about this is:

2000 lines is a lot of type definitions. It might be worth splitting this file into smaller units.
type error_details <ocaml from="Error_details" t="t"> = abstract is just weird to me. It's not obvious to the casual reader (or to me) that it means import error_details followed by type error_details = error_details.t. The imported type is referenced as error_details instead of the more direct error_details.t in the new proposal.

The many type ... = string wrap <...> definitions wouldn't change in the new proposal. One thing we could do is move all these definitions to their own .atd file, though:

File core_types.atd:

type cardid = string wrap <ocaml module="Cardid">
type color = string wrap <ocaml module="Color">
type colorid = string wrap <ocaml module="Colorid">
type commentid = string wrap <ocaml module="Commentid">
type cusid = string wrap <ocaml module="Cusid">
type email = string wrap <ocaml module="Email">
...

File api.atd:

import core_types

type foo = {
   email: core_types.email;
}

Isn't this problem solved already with abstract and module? If yes, is it worth the trouble of implementing new syntax for essentially the same functionality?

A difference is that import would declare a whole module rather than individual types. It would makes the grouping shown above possible (import core_types) and I think it would generally facilitate the splitting into multiple files.

And would adopting this proposal mean that abstract and module would go away at some point in the future? To avoid redundant functionality and confusion. Maybe the proposal should include some deprecation plans as well? e.g. first minor version with deprecation, second major version with removal.

Yes, sure. I see no urgent need to remove the existing features, though. Deprecation notices should be good enough for a while, I think.

mjambon · 2022-03-24T19:27:03Z

External implementations that are not derived from ATD files should be easier to use and reference. Such implementations are typically opaque data structures but they offer a way to view them as JSON. It could be that they support only JSON export but not import.

I am not sure I understand what this means. Could you show an example maybe to illustrate the problem and how this proposal would fix it, please?

This is an example of a partial export to JSON. We have a profile object that includes an image that was loaded in memory. Exporting this profile to JSON only exports the image metadata because perhaps the full image data is too big or simply should not be returned to the client of our service.

image_metadata.atd:

type t = {
  format: image_format;
  width: int;
  height: int;
}

Handwritten image_j.ml:

(*
   Dump only the metadata of an image.

   This module name and interface comply with the requirements of an ATD module
   that supports OCaml and JSON.
*)

(* The type of an image loaded in memory *)
type t = {
  metadata: Image_metadata_t.t;
  raw_contents: string;
}

(* We can't recover an image from just its metadata *)
let read_t _ _ = failwith "not implemented"

(* Export metadata only *)
let write_t ob x = Image_metadata_j.write_t ob x.metadata

export.atd:

import image (* expects an OCaml module named 'Image_j' with the correct interface *)

(* Can only be exported to JSON, not imported from JSON since the JSON representation
   doesn't include the full image data. *)
type profile = {
  username: string;
  image: image.t;
}

This saves the application from defining whole different types for internal use and for JSON export.

mjambon · 2022-03-24T21:03:54Z

@aryx

A poor's man alternative is to simply use the C preprocessor (or another preprocessor) with #include to split an ATD file in different parts and have a Makefile generate the final .atd from those different parts.

Yes, but with such an approach, the namespace remains global i.e. all the definitions are in scope whether you want it or not, which isn't great. For example, I want to be able to define a new date type locally even if there's already one defined in another ATD module, possibly because we implement two different external APIs that use two different date representations:

util.atd:

type date = ...  (* our preferred date type for internal use *)

calendar.atd:

import util

type event = {
  start: util.date;
  ...
}

google_calendar.atd:

type date = ... (* whatever representation Google Calendar uses *)

aryx · 2022-05-02T11:06:35Z

another issue with the "abstract" trick for modularity is that you can't inherit a property defined in another file:

dune build
      atdgen src/core/Semgrep_scan_output_j.{ml,mli} (exit 1)
(cd _build/default/src/core && /home/pad/.opam/4.12.0/bin/atdgen -j -j-std Semgrep_scan_output.atd)
Cannot inherit from type abstract
make: *** [Makefile:23: all] Error 1

But maybe this would not be supported either by the ATD module system because of this sentence "External types may remain opaque" ?

mjambon · 2022-05-02T19:26:44Z

@aryx

But maybe this would not be supported either by the ATD module system because of this sentence "External types may remain opaque" ?

Right. The goal of the current proposal is to compile an ATD file without inspecting its dependencies (other ATD files, OCaml implementation, Python implementation, etc.). The following wouldn't be possible:

import foo
type bar = {
  inherit foo.t;  (* can't do that because we don't know the representation of foo.t *)
  bar: int list;
}

We could consider another proposal where this is possible, but it seems to me that it would be strictly more complicated and could be implemented later as an extension if we change our minds.

aryx · 2022-05-02T19:41:57Z

yes, less important for now. I can always move the definitions to the file that does the inherit.

mjambon · 2022-05-02T20:07:00Z

Do you really need to use inherit? We don't have this in OCaml and we're doing fine.

aryx · 2022-05-03T08:07:14Z

well the JSON output of the CLI is inlining for example the location: field in its parent, so inherit is perfect for that.

aryx · 2022-05-03T08:07:59Z

the semgrep-core JSON output for a match result has a location: (with inside a start: end: path:), but the JSON output of the CLI for a match result has directly the start: end: path:

MisterDA · 2022-05-30T15:18:19Z

A poor's man alternative is to simply use the C preprocessor (or another preprocessor) with #include to split an ATD file in different parts and have a Makefile generate the final .atd from those different parts.

Yes, but with such an approach, the namespace remains global i.e. all the definitions are in scope whether you want it or not, which isn't great.

Still, I've also got a growing atd file, and all of its types are currently in the global namespace (which is mostly fine in my use case). I'm looking at splitting the atd file, and although I can do it at the build level by concatenating all the splitted files, a textual include statement would probably be nicer (or a documentation that it doesn't exist 😉).

mjambon · 2022-05-31T03:50:29Z

@MisterDA I started working on this but it's on hold due to other priorities. See #297

ELLIOTTCABLE · 2022-08-24T15:42:22Z

This looks like a fantastic idea, to me. In particular, I don't see the only benefit as 'remove extra lines at the top'; there's also the benefit of namespacing and readability. type event = { start: company_shared.date; } makes it immediately obvious what's being used; and although this can be aped with naming conventions like shared_date or sth, I much prefer first-class namespacing.

Content-aware inheritance is also important for our use-case, but I'd personally rather that use a different keyword. It's a fundamentally distinct set of constraints and trade-offs. Stick with import for content-unaware abstract references like this; then use something like examine or open for content-aware "inclusion" / inheritance-compatible analysis?

mjambon added the design and planning Various discussions about the future label Mar 24, 2022

mjambon changed the title ~~Proposal for multi-file type definitions~~ Proposal for ATD modules Mar 24, 2022

mjambon mentioned this issue Apr 24, 2022

Fix minimum atd version in opam files #282

Merged

3 tasks

mjambon added the work in progress Work started; no guarantee about current activity label Apr 27, 2022

mjambon mentioned this issue May 31, 2022

ATD modules (work in progress) #297

Draft

3 tasks

mjambon mentioned this issue Jun 28, 2023

Provide a way to define simple defaults regardless of target language #345

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for ATD modules #265

Proposal for ATD modules #265

mjambon commented Mar 24, 2022 •

edited

Loading

aryx commented Mar 24, 2022

jchavarri commented Mar 24, 2022

mjambon commented Mar 24, 2022

mjambon commented Mar 24, 2022 •

edited

Loading

mjambon commented Mar 24, 2022

aryx commented May 2, 2022 •

edited

Loading

mjambon commented May 2, 2022

aryx commented May 2, 2022

mjambon commented May 2, 2022

aryx commented May 3, 2022

aryx commented May 3, 2022

MisterDA commented May 30, 2022

mjambon commented May 31, 2022

ELLIOTTCABLE commented Aug 24, 2022

Proposal for ATD modules #265

Proposal for ATD modules #265

Comments

mjambon commented Mar 24, 2022 • edited Loading

Issues being addressed

Proposal

Syntax

Mapping to target languages

aryx commented Mar 24, 2022

jchavarri commented Mar 24, 2022

mjambon commented Mar 24, 2022

mjambon commented Mar 24, 2022 • edited Loading

mjambon commented Mar 24, 2022

aryx commented May 2, 2022 • edited Loading

mjambon commented May 2, 2022

aryx commented May 2, 2022

mjambon commented May 2, 2022

aryx commented May 3, 2022

aryx commented May 3, 2022

MisterDA commented May 30, 2022

mjambon commented May 31, 2022

ELLIOTTCABLE commented Aug 24, 2022

mjambon commented Mar 24, 2022 •

edited

Loading

mjambon commented Mar 24, 2022 •

edited

Loading

aryx commented May 2, 2022 •

edited

Loading