Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for ATD modules #265

Open
mjambon opened this issue Mar 24, 2022 · 14 comments
Open

Proposal for ATD modules #265

mjambon opened this issue Mar 24, 2022 · 14 comments
Labels
design and planning Various discussions about the future work in progress Work started; no guarantee about current activity

Comments

@mjambon
Copy link
Collaborator

mjambon commented Mar 24, 2022

The ATD language currently doesn't have a dedicated syntax for expressing dependencies on external type definitions and implementations. The current mechanisms use the special pseudo-type abstract or use the special wrap constructor. This is all clunky and hard to remember. As a result, users do their best to fit all the definitions in a single .atd file. It's usually fine but in some cases, it's limiting.

Issues being addressed

  • We want to split ATD files into multiple pieces that depend on one another and fit well within the structure of the application. For example, basic types such as date can be defined in A.atd while two other APIs B.atd and C.atd use types defined in A.atd but are otherwise independent of one another.
  • External implementations that are not derived from ATD files should be easier to use and reference. Such implementations are typically opaque data structures but they offer a way to view them as JSON. It could be that they support only JSON export but not import. For example, JSON can be used to dump metadata about an image loaded in memory (see example in the comments).

Proposal

  • Add a top-level import construct that declares which ATD-compatible units the current ATD file depends on, e.g. import basic_types.
  • External types are referenced using a dot notation e.g. basic_types.date.
  • Separate compilation: External types may remain opaque i.e. the code generators atdgen, atdj, atdpy, etc. should not be forced to access external ATD files when generating code for one given ATD file. The existence of an external type or its arity (number of type parameters of a polymorphic type) may not be checked because of this.

Syntax

We introduce a new top-level import construct, which should (must?) occur at the beginning of the ATD file as it will affect all type definitions (since type definitions are reordered based on their mutual dependencies anyway).

import REAL_NAME [as SHORT_NAME]

e.g.

import basic_types as base
import time_and_date

REAL_NAME should be the name of the original ATD file without the .atd extension if such file exists.

Referencing the types they provide must be unambiguous without having access to the external definitions. For that, the type names use the dot notation:

type msg = {
  id: string;
  date: time_and_date.date;
  signature: base.signature;
}

Note that this import semantics is similar to Python's. We don't really need to support the form from MODULE import MEMBER because it can be achieved using type aliases. For shorter type names, we can do this:

import basic_types as base
import time_and_date

type date = time_and_date.date

type msg = {
  id: string;
  date: date;
  signature: base.signature;
}

Mapping to target languages

Each target programming language has a slightly different way of managing program units. It's up to the translator (atdgen, atdpy, ...) to do the right thing and support language-specific ATD annotations. For the OCaml and Python targets (atdgen, atdpy), useful annotations could look like this:

import JSON
  <ocaml module="Yojson.Safe">
  <python module="atdpy_json">

type raw_json = JSON.t

type foo = {
  name: string;
  bar: raw_json;
}

or more directly:

import raw_json
  <ocaml module="Yojson.Safe">
  <python module="atdpy_json">

type foo = {
  name: string;
  bar: raw_json.t;
}

For commonly used external modules such as a JSON module, a code generator can place them in scope automatically. In OCaml for example, atdgen could emit the code open Atdgen_runtime which would make a JSON module directly available. For Python where the convention is for modules to use lowercase and where a json module already exists, we might want the name JSON to map to atdpy_json rather than mapping to JSON (nonstandard) or json (already taken). The goal is to make the use of common or standard external libraries not require ATD annotations. We want the following to just work:

import JSON

type foo = {
  name: string;
  bar: JSON.t;
}

Note that the t represents the type of the main data structure provided by the module as is conventional in OCaml. It's a little nicer than JSON.json or than having to create an alias type json = JSON.json. It's not a requirement imposed by ATD.

So far, all the implementations of an ATD module must provide the same collection of types and functions that work on those types. We could provide a way to express "hey, Python's JSON.t isn't named t but is named json" but it's not clear if it's necessary since many other constraints exist for a module interface to be ATD-compatible. Whoever decides that a cross-language module such as JSON should exist, should also specify the set of types it provides.

@mjambon mjambon added the design and planning Various discussions about the future label Mar 24, 2022
@mjambon mjambon changed the title Proposal for multi-file type definitions Proposal for ATD modules Mar 24, 2022
@aryx
Copy link

aryx commented Mar 24, 2022

A poor's man alternative is to simply use the C preprocessor (or another preprocessor) with #include to split an ATD file in different parts and have a Makefile generate the final .atd from those different parts.

@jchavarri
Copy link
Member

Thanks for this proposal. I am not sure I understand what are the upsides in terms of functionality.

We want to split ATD files into multiple pieces that depend on one another and fit well within the structure of the application. For example, basic types such as date can be defined in A.atd while two other APIs B.atd and C.atd use types defined in A.atd but are otherwise independent of one another.

Isn't this problem solved already with abstract and module? If yes, is it worth the trouble of implementing new syntax for essentially the same functionality? And would adopting this proposal mean that abstract and module would go away at some point in the future? To avoid redundant functionality and confusion. Maybe the proposal should include some deprecation plans as well? e.g. first minor version with deprecation, second major version with removal.

External implementations that are not derived from ATD files should be easier to use and reference. Such implementations are typically opaque data structures but they offer a way to view them as JSON. It could be that they support only JSON export but not import.

I am not sure I understand what this means. Could you show an example maybe to illustrate the problem and how this proposal would fix it, please?

@mjambon
Copy link
Collaborator Author

mjambon commented Mar 24, 2022

@jchavarri in a previous company, we have an application-wide 2000-line file named api.atd that started with the following "imports":

type cardid = string wrap <ocaml module="Cardid">
type color = string wrap <ocaml module="Color">
type colorid = string wrap <ocaml module="Colorid">
type commentid = string wrap <ocaml module="Commentid">
type cusid = string wrap <ocaml module="Cusid">
type email = string wrap <ocaml module="Email">
type empty_record = string wrap <ocaml module="Empty_record">
type error_details <ocaml from="Error_details" t="t"> = abstract
type eventid = string wrap <ocaml module="Eventid.Public">
type full_calid = string wrap <ocaml module="Full_calid">
type gender = string wrap <ocaml module="Gender">
type group_member_role = string wrap <ocaml module="Group_member_role">
type groupid = string wrap <ocaml module="Groupid">
type hashtag = string wrap <ocaml module="Hashtag">
type json <ocaml module="Yojson.Safe"> = abstract
type messageid = string wrap <ocaml module="Messageid">
type planid = string wrap <ocaml module="Planid">
type platform = string wrap <ocaml module="Platform">
type label = string wrap <ocaml module="Label">
type recurrence <ocaml from="Icalendar" t="recurrence"> = abstract
type simple_eventid = string wrap <ocaml module="Eventid.Simple">
type slack_teamid = string wrap <ocaml module="Slack_api_teamid">
type slack_userid = string wrap <ocaml module="Slack_api_userid">
type slack_username = string wrap <ocaml module="Slack_api_username">
type storable_eventid = string wrap <ocaml module="Eventid.Storable">
type teamid = string wrap <ocaml module="Teamid">
type timeonly = string wrap <ocaml module="Util_timeonly">
type timestamp = string wrap <ocaml module="Util_time">
type timezone = string wrap <ocaml module="Util_timezone">
type token = string wrap <ocaml module="Token">
type uid = string wrap <ocaml module="Uid">
type unixtime = float wrap <ocaml module="Util_time.As_unixtime">

What bothers me about this is:

  • 2000 lines is a lot of type definitions. It might be worth splitting this file into smaller units.
  • type error_details <ocaml from="Error_details" t="t"> = abstract is just weird to me. It's not obvious to the casual reader (or to me) that it means import error_details followed by type error_details = error_details.t. The imported type is referenced as error_details instead of the more direct error_details.t in the new proposal.

The many type ... = string wrap <...> definitions wouldn't change in the new proposal. One thing we could do is move all these definitions to their own .atd file, though:

File core_types.atd:

type cardid = string wrap <ocaml module="Cardid">
type color = string wrap <ocaml module="Color">
type colorid = string wrap <ocaml module="Colorid">
type commentid = string wrap <ocaml module="Commentid">
type cusid = string wrap <ocaml module="Cusid">
type email = string wrap <ocaml module="Email">
...

File api.atd:

import core_types

type foo = {
   email: core_types.email;
}

Isn't this problem solved already with abstract and module? If yes, is it worth the trouble of implementing new syntax for essentially the same functionality?

A difference is that import would declare a whole module rather than individual types. It would makes the grouping shown above possible (import core_types) and I think it would generally facilitate the splitting into multiple files.

And would adopting this proposal mean that abstract and module would go away at some point in the future? To avoid redundant functionality and confusion. Maybe the proposal should include some deprecation plans as well? e.g. first minor version with deprecation, second major version with removal.

Yes, sure. I see no urgent need to remove the existing features, though. Deprecation notices should be good enough for a while, I think.

@mjambon
Copy link
Collaborator Author

mjambon commented Mar 24, 2022

External implementations that are not derived from ATD files should be easier to use and reference. Such implementations are typically opaque data structures but they offer a way to view them as JSON. It could be that they support only JSON export but not import.

I am not sure I understand what this means. Could you show an example maybe to illustrate the problem and how this proposal would fix it, please?

This is an example of a partial export to JSON. We have a profile object that includes an image that was loaded in memory. Exporting this profile to JSON only exports the image metadata because perhaps the full image data is too big or simply should not be returned to the client of our service.

image_metadata.atd:

type t = {
  format: image_format;
  width: int;
  height: int;
}

Handwritten image_j.ml:

(*
   Dump only the metadata of an image.

   This module name and interface comply with the requirements of an ATD module
   that supports OCaml and JSON.
*)

(* The type of an image loaded in memory *)
type t = {
  metadata: Image_metadata_t.t;
  raw_contents: string;
}

(* We can't recover an image from just its metadata *)
let read_t _ _ = failwith "not implemented"

(* Export metadata only *)
let write_t ob x = Image_metadata_j.write_t ob x.metadata

export.atd:

import image (* expects an OCaml module named 'Image_j' with the correct interface *)

(* Can only be exported to JSON, not imported from JSON since the JSON representation
   doesn't include the full image data. *)
type profile = {
  username: string;
  image: image.t;
}

This saves the application from defining whole different types for internal use and for JSON export.

@mjambon
Copy link
Collaborator Author

mjambon commented Mar 24, 2022

@aryx

A poor's man alternative is to simply use the C preprocessor (or another preprocessor) with #include to split an ATD file in different parts and have a Makefile generate the final .atd from those different parts.

Yes, but with such an approach, the namespace remains global i.e. all the definitions are in scope whether you want it or not, which isn't great. For example, I want to be able to define a new date type locally even if there's already one defined in another ATD module, possibly because we implement two different external APIs that use two different date representations:

util.atd:

type date = ...  (* our preferred date type for internal use *)

calendar.atd:

import util

type event = {
  start: util.date;
  ...
}

google_calendar.atd:

type date = ... (* whatever representation Google Calendar uses *)

@mjambon mjambon added the work in progress Work started; no guarantee about current activity label Apr 27, 2022
@aryx
Copy link

aryx commented May 2, 2022

another issue with the "abstract" trick for modularity is that you can't inherit a property defined in another file:

dune build
      atdgen src/core/Semgrep_scan_output_j.{ml,mli} (exit 1)
(cd _build/default/src/core && /home/pad/.opam/4.12.0/bin/atdgen -j -j-std Semgrep_scan_output.atd)
Cannot inherit from type abstract
make: *** [Makefile:23: all] Error 1

But maybe this would not be supported either by the ATD module system because of this sentence "External types may remain opaque" ?

@mjambon
Copy link
Collaborator Author

mjambon commented May 2, 2022

@aryx

But maybe this would not be supported either by the ATD module system because of this sentence "External types may remain opaque" ?

Right. The goal of the current proposal is to compile an ATD file without inspecting its dependencies (other ATD files, OCaml implementation, Python implementation, etc.). The following wouldn't be possible:

import foo
type bar = {
  inherit foo.t;  (* can't do that because we don't know the representation of foo.t *)
  bar: int list;
}

We could consider another proposal where this is possible, but it seems to me that it would be strictly more complicated and could be implemented later as an extension if we change our minds.

@aryx
Copy link

aryx commented May 2, 2022

yes, less important for now. I can always move the definitions to the file that does the inherit.

@mjambon
Copy link
Collaborator Author

mjambon commented May 2, 2022

Do you really need to use inherit? We don't have this in OCaml and we're doing fine.

@aryx
Copy link

aryx commented May 3, 2022

well the JSON output of the CLI is inlining for example the location: field in its parent, so inherit is perfect for that.

@aryx
Copy link

aryx commented May 3, 2022

the semgrep-core JSON output for a match result has a location: (with inside a start: end: path:), but the JSON output of the CLI for a match result has directly the start: end: path:

@MisterDA
Copy link
Contributor

A poor's man alternative is to simply use the C preprocessor (or another preprocessor) with #include to split an ATD file in different parts and have a Makefile generate the final .atd from those different parts.

Yes, but with such an approach, the namespace remains global i.e. all the definitions are in scope whether you want it or not, which isn't great.

Still, I've also got a growing atd file, and all of its types are currently in the global namespace (which is mostly fine in my use case). I'm looking at splitting the atd file, and although I can do it at the build level by concatenating all the splitted files, a textual include statement would probably be nicer (or a documentation that it doesn't exist 😉).

@mjambon
Copy link
Collaborator Author

mjambon commented May 31, 2022

@MisterDA I started working on this but it's on hold due to other priorities. See #297

@ELLIOTTCABLE
Copy link

This looks like a fantastic idea, to me. In particular, I don't see the only benefit as 'remove extra lines at the top'; there's also the benefit of namespacing and readability. type event = { start: company_shared.date; } makes it immediately obvious what's being used; and although this can be aped with naming conventions like shared_date or sth, I much prefer first-class namespacing.

Content-aware inheritance is also important for our use-case, but I'd personally rather that use a different keyword. It's a fundamentally distinct set of constraints and trade-offs. Stick with import for content-unaware abstract references like this; then use something like examine or open for content-aware "inclusion" / inheritance-compatible analysis?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design and planning Various discussions about the future work in progress Work started; no guarantee about current activity
Projects
None yet
Development

No branches or pull requests

5 participants