Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redo internal cell memory layout #4461

Draft
wants to merge 39 commits into
base: main
Choose a base branch
from
Draft

Conversation

widlarizer
Copy link
Collaborator

@widlarizer widlarizer commented Jun 17, 2024

Brief

What are the reasons/motivation for this change?
small is fast!
Explain how this is achieved.
make cell small.
If applicable, please suggest to reviewers how they can test the change.
right now, nothing works

Full

Core idea

Yosys is conceptually flexible in what a cell can be. This makes it easy to do odd custom things that match use cases. The way this is achieved is inefficient for commonly represented types. Each cell has a dict from interned parameter name strings to parameter value constants. It also has a dict from port name strings to signals represented by SigSpecs. However, most cells that Yosys uses in common flows have a fixed set of ports and parameters. So instead of dynamically growing space for a dict and hashing to access the value stored in it, we can have preallocated space and a compile-time known mapping if we just use a struct specific to a given cell type:

// $not etc
struct RTLIL::Unary {
	SigSpec a;
	SigSpec y;
	Const a_width;
	Const y_width;
	Const is_signed;
};

We can then use a tagged union to construct a new, slimmer, faster to access cell type, with a fallback pointer to the old cell structure.

struct RTLIL::Cell
{
	RTLIL::IdString type;
	RTLIL::IdString name;
	RTLIL::Module* module;
	union {
		RTLIL::Unary not_;
		RTLIL::Unary pos;
		RTLIL::Unary neg;
		RTLIL::OldCell* flexible;
	};
};

Wait, wasn't this supposed to be a Yosys rewrite in Rust?

This isn't next generation Yosys (project nyan). Rust or Zig would be better to do what I'm doing, but Yosys is a C++ codebase, so C++ it is. This is a surgical, gradual transition to more efficient internals with minimal impact on stuff outside kernel/rtlil.h and kernel/rtlil.cc. The Yosys codebase (~200k LOC) and user plugins have a large number of code like cell->getParam(ID::A_WIDTH), we have to reproduce these interfaces rather than change them all to cell->not_.a_width. There's a good number of such functions. The annoying part is mocking for example the parameters dictionary. Instead, there's struct FakeParams which provides iterators to hide this. Same goes for connections. Implementations of getParam etc are simple switches over the cell type so they should inline well.

However, there's a lot of "internal" cells we may want to be fast, and their struct layouts and methods are basically duplicating the same information. Manually writing these methods is also very error prone, accessing tagged unions incorrectly is common UB. This is why, to add support for the various cell types, the next stage will get a bit weird:

Writing a compiler

Modern C++ offers multiple ways to have code that thinks about types. However, what we need here is to construct functions and construct their return types based on struct layouts. This level of reflection if possible seems too difficult to achieve with templates and constexpr and such. We don't have Rust's proc_macro, we don't have Zig's comptime, so one possible solution is a light Python utility which from information-dense Python or plain text structures emits generates C++ which is committed into the repository and checked with CI. Yosys already depends on Python in a similar way, see passes/pmgen/pmgen.py.

Beyond the cell

It's possible that the changes outlined above alone won't make a huge difference. RTLIL::Const is fairly inefficient as well. Size is virtuous in a positive feedback loop: the fewer bytes per cell, the more it helps to save a single byte per cell.

Size goals

The current flexible RTLIL::Cell is 192 bytes. The proposed one is a bit bigger while it supports only unary elements, but doesn't require external allocations. With the current flexible cell, this simple verilog eats 6kB memory per inverter-and-FF-bit:

module invert_n_times #(parameter N = 10000) (
    input wire in,
    output wire out
);
    wire [N:0] intermediate;

    assign intermediate[0] = in;

    genvar i;
    generate
        for (i = 0; i < N; i = i + 1) begin : invert_loop
            assign intermediate[i+1] = ~intermediate[i];
        end
    endgenerate

    assign out = intermediate[N];
endmodule

This suggests that focused effort can bring it down by a significant factor.

@widlarizer
Copy link
Collaborator Author

This was meant to be a draft PR and I hit the wrong button writing the description. Heck

@povik
Copy link
Member

povik commented Jun 17, 2024

fundamental issues with the Yosys IR

What would be the fundamental issues provided we switch to the buffer-normalized mode of #3967 (which this PR should be compatible with)?

don't come with benchmarks

I pushed for @widlarizer to open a draft PR no matter what state this is in. Benchmarks can come later.

@povik
Copy link
Member

povik commented Jun 17, 2024

so one possible solution is a light Python utility which from information-dense Python or plain text structures emits generates C++ which is committed into the repository and checked with CI. Yosys already depends on Python in a similar way, see passes/pmgen/pmgen.py.

FWIW pmgen-generated C++ code is not checked into the repository. It's treated as a build artifact.

@whitequark
Copy link
Member

whitequark commented Jun 17, 2024

What would be the fundamental issues provided we switch to the buffer-normalized mode of #3967 (which this PR should be compatible with)?

That's an improvement, although other issues remain (like the unnecessary coarse/fine cell split, or the combined treatment of logic and inout wires in the same netlist).

@widlarizer
Copy link
Collaborator Author

Benchmarks are coming. The work presented here right now is exclusively an incomplete proof of concept. Incomplete as in it immediately segfaults on anything in the current state. I want to find a case where some cells being fast tracked really shows improvements. If we find better fundamental approach, all this is free to get canned, in such a case it still will have been a helpful exercise in my eyes

@widlarizer
Copy link
Collaborator Author

the unnecessary coarse/fine cell split

Can you elaborate on the adverse effects of this? From a performance point of view it doesn't seem too concerning, but I may be missing something.

combined treatment of logic and inout wires in the same netlist

I wanted to redesign the netlist first, but that was before I learned about bufnorm, after which it was suggested to me by povik that redoing how connections are handled would require excessive code changes and that bufnorm + smaller internal cells takes priority as a result.

@whitequark
Copy link
Member

From a performance point of view it doesn't seem too concerning, but I may be missing something.

I don't think this affects performance but in general I don't care a whole lot about Yosys performance; as you may know I'm quite happy using the YoWASP version despite a ~2x perf hit, so clearly that much performance loss isn't an issue to me. But the IR is poorly defined and difficult to work with (I think the single most obnoxious issue from my perspective is the batshit behavior of \init, which is directly downstream from the coarse/fine split and some other bad decisions) and I'd much rather see that addressed than any performance improvements.

@povik
Copy link
Member

povik commented Jun 17, 2024

I think the single most obnoxious issue from my perspective is the batshit behavior of \init, which is directly downstream from the coarse/fine split and some other bad decisions

I don't follow. How is \init related to the coarse/fine split, if that's what you are saying?

@whitequark
Copy link
Member

whitequark commented Jun 17, 2024

I don't follow. How is \init related to the coarse/fine split, if that's what you are saying?

\init is only an attribute and not a parameter because of fine cells (which would otherwise have to triple in quantity), which in turn only lack parameters because of BLIF export. Nobody cares how many fine cells Yosys has, there's hundreds already, just do the 0, 1, X variant for them and stop imposing the cost of \init handling on every other pass.

@widlarizer
Copy link
Collaborator Author

YoWASP

I would be remiss if I didn't bring up that the road to performance I'm talking about is through major runtime memory size reduction which should matter for use cases of EDA in browsers.

IR is poorly defined and difficult to work with

As I understand, there are many open questions on design decisions in Yosys, as viewed from the perspective of correctness and developer experience and maintainability, but I'm not aware of any interaction this PR would have on those topics, other than this being an opportunity to have correct-by-construction, rather than correct-by-cell-checker cells.

@povik
Copy link
Member

povik commented Jun 17, 2024

Nobody cares how many fine cells Yosys has, there's hundreds already, just do the 0, 1, X variant for them and stop imposing the cost of \init handling on every other pass.

Huh, so e.g. the state writeback in sim would involve changing the cell types on fine flip-flops, and adjusting a parameter on coarse flip-flops. In exchange we make init disappear. I think I like the change.

I'd much rather see that addressed than any performance improvements.

We should mention a bit about the background for the PR. It's motivated by company interests which allow us to invest significant effort into performance improvements. If we can fix some outstanding ills of the IR along the way, that's all the better, but it's not the case that we can pick any IR work instead. I guess we shouldn't be doubling down on a wrong design decision by writing specialized memory layouts around it, but that doesn't seem to be happening, at least not with the issues brought up so far.

@whitequark
Copy link
Member

I would be remiss if I didn't bring up that the road to performance I'm talking about is through major runtime memory size reduction which should matter for use cases of EDA in browsers.

Interestingly, YoWASP already greatly benefits from such size reduction compared to native builds because it uses an ABI with 32-bit pointers. This is for example why nextpnr is ~on par with a native build, despite other inefficiencies.

I do welcome performance improvements (who doesn't?) but I feel that there are many other things to address in Yosys that I'm a lot more excited about, since the IR structure is a major pain point and performance rarely is, at least from where I stand.

other than this being an opportunity to have correct-by-construction, rather than correct-by-cell-checker cells.

I'm uncertain on whether this design would further entrench the existing IR flaws or not.

I think ultimately I'll choose to withhold any judgement I would have before I see benchmarks. That seems only right if this was intended to be a draft (which I didn't realize at first); I don't want to be negative on something that is clearly not the final work.

Huh, so e.g. the state writeback in sim would involve changing the cell types on fine flip-flops, and adjusting a parameter on coarse flip-flops.

Yes. The problem with \init is that it's antithetical to correctness-by-construction because it's just ignored on wires with no flops, and conflicts resolve kind of randomly? We've had a lot of difficulty with emitting it in Amaranth correctly as well, but that's not the main motivation.

@whitequark
Copy link
Member

It's motivated by company interests which allow us to invest significant effort into performance improvements.

So on second thought and having read this, I think I'll just leave this PR to be driven by the company interests. Looks like it pinged me because of the CXXRTL code owner bit, I'll pre-emptively sign off on any changes there since they're quite minimal.

@widlarizer
Copy link
Collaborator Author

This PR was initiated by my personal interest which I found to be well aligned with company interests of making specifically opt_expr faster

@widlarizer
Copy link
Collaborator Author

Major road block identified: Yosys until now has been built with the assumption that at will one can modify cell types and erase and add params and ports. At coverage tag "opt.opt_expr.eqneq.isnot" that's exactly what's done. With my changes, it is possible to attempt it but it's union access UB. As a solution, the type field could become private, which will create large code change where read-only type access is replaced with a const getter (from cell->type to cell->type()). This will bring large code changes. Or we can just fix the places where types are modified (there seem to be 79 locations where type is set outside of addCell) and temporarily keep the UB hazard for new code

@widlarizer widlarizer mentioned this pull request Jul 18, 2024
@widlarizer widlarizer added the status-superseded Status: Work continues in a different PR or was made redundant label Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status-superseded Status: Work continues in a different PR or was made redundant
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants