-
-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor AST codegen #4442
Comments
I think going for simplicity is generally a good thing, and there are many flaws in the current implementation that we should fix as we move forward. I just want to elaborate on something, You've said:
I beg the differ, I think using syn types is the standard way for doing any meta programming in Rust, A custom schema developed in-house although being simpler since we don't need the information all at once is much harder to learn as there would be no prior knowledge nor documentation/guides out there. On the other side, While not all people have used syn before; The ones with experience developing procedural macros(which are most likely to want to contribute to this area) should be familiar with it. Also, people who are new to Rust's procedural macros have lots of resources to study and they can carry this knowledge with them throughout their journey as Rust developers. Edit:Maybe we should get rid of RTypes, We can use |
Oh dear! It seems we fundamentally disagree, and both feel strongly about it. I'll try and debate this a bit more:
I do see your argument here. BUT I don't see what codegen is doing as "meta programming" in the usual sense. "Meta programming" in Rust basically means proc macros. A proc macro:
Whereas in codegen we are doing something quite different:
These are relatively complex operations, which go far beyond simple AST manipulation. So I'd say it's less like "meta programming", and more like a compiler. And in a compiler, the pattern of creating an intermediate representation to work from is a common idiom. That's what I'm suggesting here (the "schema" is an IR). Here's an analogy from another part of Oxc: In I feel that staying in "AST land" in the codegen is operating at wrong level of abstraction, in the same way as trying to resolve bindings direct from the AST would be. Just like we're better off using the
Yes it's true that It's true that our "schema" types are non-standard, but I don't think they're hard for anyone to understand: pub struct StructDef {
pub name: TypeName,
pub fields: Vec<FieldDef>,
pub has_lifetime: bool,
}
pub struct FieldDef {
/// `None` if unnamed
name: Option<String>,
r#type: TypeName,
} (compare to
The codegen is becoming completely fundamental to Oxc. I really don't think we should require in-depth knowledge of |
After going through the comments and the code itself, I think some kind of guidance on creating a new generator is needed, everything else is fine by me and they can be iterated upon. It's often hard to get to the final state of what we wish for for this kind of partially ad-hoc code. For example, the test runners in So given the code is not going to change often in In software engineering we should spend more time on crafting code that will change a lot for a higher impact value. |
Closing as "issue too broad". Prefer incremental or more concrete changes. |
This PR introduces `EarlyCtx` and `LateCtx` in place of the old `CodegenCtx`, Early passes operate at the AST level while generators and other late passes operate on the schema. It will also replace the confusing `RType` name with something more idiomatic ~~(open for suggestions, I haven't found a good name yet)~~ I've named it `AstType` and dropped the `R` prefix for `REnum` and `RStruct`. There are some qualities of life improvements too, Things like `to_type_elide` can be used to simplify the code. Related to #4442 (and can potentially mark it "close as fixed").
rzvxa and I came to agreement in the end. Codegen has been refactored to be more schema-centric in #4682. |
I've been reading through the AST codegen (most recent version as in draft PR #4404). @rzvxa and I are meeting to discuss this evening, but just to put some thoughts down on "paper" in advance...
My opinion of current state
The codegen is doing a great job of automating creating a ton of Oxc's code now. It makes it possible to make changes to AST without having to make corresponding updates in a bunch of other places (and probably making mistakes/omissions in the process). This is hugely valuable.
However, personally I find the implementation of the codegen itself complicated and hard to understand. The primary reasons for this are:
syn
types throughout, which are unfamiliar to those of us who are not macro maestros.syn
types with our own "metadata" types, but with similar names (e.g.TypeDef
,TypeRef
).Proposed refactor
I propose that we refactor it along these lines:
Generate from schema
How it is now
Currently the way codegen works (roughly speaking, I may not have understood it correctly) is:
.rs
source files and generateREnum
+RStruct
types, which are mostly wrappers aroundsyn
types.R*
types.Schema
object which is a simpler representation of the types.Proposal: Simplify the pipeline
.rs
files.syn
types).Arguments against
There are valid arguments against what I'm suggesting:
syn
Ident
s where there was already anIdent
in source file AST that can be reused.syn
types is to make the code flexible so it could converted easily to macros if we want to do that.However, I feel like we are optimizing for the wrong things here.
Arguments for
The main argument is simplicity.
I have high hopes that the codegen is going to grow and grow. We can use it to generate more code, and also pull off some more advanced tricks which would be infeasible/unsafe to do by hand. e.g.:
inherit_variants!
(including calculating an optimal set of values for the enum discriminants).Serialize
impls.As we get into more complex areas, working from a simpler model will be very valuable. Especially when generating
unsafe
code, it's vitally important that the codegen creating that code is easy to understand, so can validate the soundness of its logic.Building + using the schema
I suggest the following steps (mostly as it is now, but a little bit more structured):
.rs
files codegen need to read.Vec
of(TypeId, TypeName, Item)
containing all types with#[ast]
attr. Throw away everything else.TypeId
is index into thisVec
. Simultaneously build aHashMap<TypeName, TypeId>
.Vec
again, this time fully parsing the structs + enums to createStructDef
/EnumDef
objects for each. "Links" between types can useTypeId
s (as we can resolve type names to IDs after step 3). This is our schema.NB: If some generators need access to original
syn
types for some reason, that's not impossible - they can get them via theVec
built in step 3.Code style
Again, aiming for simplicity... While we are refactoring, I think we could also move to a more imperative style in parts. In my opinion, the "depth" of abstractions is a little excessive at present. If the structure was less labyrinthine, I think we also won't need
RefCell
s any more, which will further simplify the code. I would hope this will make the code easier to follow.The text was updated successfully, but these errors were encountered: