Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using Google provider + Python the import generated file get so big that the IDEs can't handle it #1606

Closed
fbrodrigorezino opened this issue Mar 1, 2022 · 14 comments
Labels
bug Something isn't working language/python performance Gotta go fast priority/important-longterm Medium priority, to be worked on within the following 1-2 business quarters. provider get / generation provider/google
Milestone

Comments

@fbrodrigorezino
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

cdktf & Language Versions

Language: Python
Version: All

Affected Resource(s)

Developer experience.

Expected Behavior

When you install the google provider, you can use it like any other python package.

Actual Behavior

The contact of the imported __init__.py is so big, that IDEs can't read the symbols, so we don't have any code hint.

Steps to Reproduce

  1. Install cdktf in your python project
  2. Add google provider
  3. Run cdktf get
  4. Open PyCharm, and try to use the generated classes.

Important Factoids

This is not a problem in the cdktf itself. But how jsii generated python packages.
I'm posting it here, as I think terraform would be interested in collaborating to jsii development and solve it.

You can find extra information here: aws/jsii#2436

@fbrodrigorezino fbrodrigorezino added bug Something isn't working new Un-triaged issue labels Mar 1, 2022
@jsteinich
Copy link
Collaborator

We've previously ran into this with the AWS provider. There we explicitly carve up the code into submodules which I believe are more IDE friendly. Could most likely do the same with the Google provider.

@thomasschockaert
Copy link

Just chiming in: I am experiencing a similar nuisance with the Azure provider - albeit due to prospector (python) maxing out a core when parsing the file because it's so big - it doesn't crash, just can't finish in a timely fashion. Pylance in VS Code seems to handle it fairly ok, so it can be used as fallback for certain linting purposes, but then you're dependent on VS Code settings for proper linting, instead of a prospector config file.

I'm unsure if carving up into submodules would solve that prospector issue, as it would still need to parse just as much code, maybe more - but maybe it can do it more efficiently if it's split up.

@ansgarm
Copy link
Member

ansgarm commented Mar 2, 2022

@thomasschockaert You could try that with the AWS provider (which is already split up) and report back. Would be helpful to know 👍

@ansgarm ansgarm added language/python needs-priority Issue has not yet been prioritized; this will prompt team review provider get / generation provider/google and removed new Un-triaged issue labels Mar 2, 2022
@thomasschockaert
Copy link

Hi @ansgarm,

I'm not experiencing the issue that @fbrodrigorezino mentioned with either azure or aws code when using VS Code, so there is that. (making my comment slightly off topic, unless this issue becomes/is relevant for splitting up all provider code).

Wall of text below on the how/why, imo, splitting up is relevant for all large provider code bases for dev tools and consequently IDE's.

I tested the aws provider as you requested - the slowness is less present, but still is for the larger files - it's definitely a prospector being "slow" issue (more specifically one or more of the tools it runs).

I've tried the following files (in linecount order) from the aws package:
497302 ./wafv2/init.py
48982 ./ec2/init.py
34222 ./vpc/init.py
24846 ./appmesh/init.py

The first one doesn't complete in a timely manner (+15mins).
The second one does complete, but it also takes a little over 2 minutes.

If I compare that to the azure package we get:
778662 ./init.py

=> with similar results: too slow.

I'm unsure how effective the splitting-up procedure would be, but if it yields files above 40k lines, it's gonna be slow but at least functional, anything above that 40k mark is unbearably slow (on a i7-1185G7 - ymmv of course, but ~500k lines just takes ages).

In the name of being thorough, I ran and timed each of the tools manually as well (with the same configs as what prospector would pass) and it seems like it's only pylint and mypy that are causing the biggest slowdowns: for the aws ec2 init file alone, pylint takes 00:01:35 and mypy takes 00:01:17, so even if these are run as efficiently as possible, that's still over a minute to get results. Mypy also drags in a lot of the other files (as that's what it does for type checking), but does so very efficiently. If it needs to draw in that wafv2 init file as well, that number changes drastically though.

All-in-all I'm leaning towards the idea that splitting up code won't do much for mypy, but will do it for pylint. So it's ultimately a matter of the tools having to check a library that's 700k+ (azure) or 1.26m+ (aws) lines of code to provide insightful information - and that will inevitably be slow. Moving the mypy and pylint checks to pre-commit instead of handling them "each time when you save a file in IDE" is a necessary workflow modification in my opinion.

I do enjoy the split up version of the generated code more than a single big file, because I don't need to tell the IDE to turn off pretty much every feature it has when opening such a large file. VS Code even automatically disables certain things in such case.

So my conclusion is: split up == good, 700k+ lines of code == slow for dev tools and in some cases also slow for IDE; split up helps IDE, but does not always help all dev tools (it does for pylint, but not for mypy).

Kind regards,

Thomas

@ansgarm
Copy link
Member

ansgarm commented Mar 3, 2022

Thank you for your extensive profiling @thomasschockaert!
Good that code splitting helps for at least some of the tools 👍

The main problem with the wafv2 resources is, that the underlying schema at AWS is recursive which Terraform does not support. So the workaround in the AWS Terraform provider is to explicitly define the schema up until a certain depth. As the CDKTF bases its generated code on that schema, it produces a ridiculous amount of code for that schema instead of using recursion (which would be possible with JSII, I think). So tackling that issue could improve this problem.

@DanielMSchmidt DanielMSchmidt added needs-research priority/important-longterm Medium priority, to be worked on within the following 1-2 business quarters. and removed needs-priority Issue has not yet been prioritized; this will prompt team review labels Mar 3, 2022
@jsteinich
Copy link
Collaborator

The current splitting approach is really only viable for a handful of providers (~10 I'd say) since it requires manual effort (maybe could be automated some). I checked a few other providers:

  • Google: ~471k lines
  • OCI: ~738k lines
  • IBM: ~301k lines (this is the 10th largest provider by number of resources (~219) from a snapshot of data that I have)

It would be nice to have a metric (number of resources / loc / something else) to use as guidance as to when splitting a provider is necessary for IDE functionality.

@thomasschockaert would it be possible to exclude the generated files from the tools?

@thomasschockaert
Copy link

Hi @jsteinich

Both mypy and pylint support ignoring files / patterns, but in the case of mypy this only affects "which files to discover for checking", not "when following imports" (cfr --exclude docs mypy).

Pylance for VS Code doesn't seem to have issues with the large files when it comes to providing code hints and completion, so that part of the dev experience is intact when using VS Code.
The propsector / pylint / mypy / .... part of the dev experience is affected and directly related to either the size of the code base it's checking or the per-file-size, depending on the tool.

Everything is optional, so maybe it all comes down to guidance on how to use a specific dev environment when dealing with large files. Most tools do just fine, some are just slow, but those things only pose problems in certain workflows like when you want to use the "problems" tab in VS code which implies semi-continuous (on save/edit) running of the tools, versus a workflow that runs the tools only as a pre-commit hook.

Maybe this means it's 'just' a "nice to have code split up" as it's not a "full fix" and very dependent on the workflow you choose (i.e. you could change workflow for a project based on the fact that there are a lot of large files). None of this is unique to cdktf; anything with large files will present this very same issue for an IDE, and it's not always fixable per definition.

@fbrodrigorezino
Copy link
Author

JetBrains IDEs by default don't support files with more than 25MB. So the experience gets very poor for CDKTF for a variety of providers. Given the number of users of this IDE, it might be smart as product management to improve the support for it.

Another thing, though in VS Code it "works" for me it's been very slow and very very memory-consuming, so I'd say that's far from a nice experience.

@jsteinich
Copy link
Collaborator

@thomasschockaert and @fbrodrigorezino thanks for the information. It makes sense and is very helpful.

After doing some more investigation, this seems to only be an issue with how jsii generates python code. Other languages have a a file per resource (need to double check go). Except for a couple edge cases (aws waf), this results in fairly reasonably sized files.

I haven't looked into the feasibility of making python generate a file per resource in jsii, but that seems like a much better route to pursue than just splitting into arbitrary (generally service level) submodules.

@fbrodrigorezino
Copy link
Author

@jsteinich yes, you are correct, you can find this info in the Important Factoids

@nlucansk
Copy link

For JetBrains you can use idea.max.intellisense.filesize=999999 (Edit custom properties) + increase the allocated memory (Change memory settings). Then you will get "hints / indexing will work" (but experience is slow)

Both are in the Help "menu":

image

I am using azurerm + python

It really needs to be separated.
Also generated docs on constructs.dev are slow (I guess thanks to that)

@mcouthon
Copy link

mcouthon commented Sep 4, 2022

It's not just the IDE. Running mypy on a project that uses these imports (Azure in our case) runs forever (maybe 6 minutes vs ~20 seconds without it). And yes, we're excluding the import folders, but as mentioned above, mypy follows the imports, and we won't have proper typing support if we wouldn't allow that.

For reference, the azurerm __init__.py file is 50M in size, with 1030915 lines (that's over 1 million lines).

@xiehan xiehan added this to the 0.13 (tentative) milestone Sep 19, 2022
@xiehan xiehan added performance Gotta go fast and removed needs-research labels Oct 3, 2022
@ansgarm
Copy link
Member

ansgarm commented Oct 6, 2022

Hi 👋
We recently released CDKTF 0.13 which splits each resource into its own namespace which has the neat side effect of drastically reduced file size for Python files. Read more about the release and its backgrounds: https://cdk.tf/0.13

@ansgarm ansgarm closed this as completed Oct 6, 2022
@github-actions
Copy link
Contributor

I'm going to lock this issue because it has been closed for 30 days. This helps our maintainers find and focus on the active issues. If you've found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working language/python performance Gotta go fast priority/important-longterm Medium priority, to be worked on within the following 1-2 business quarters. provider get / generation provider/google
Projects
None yet
Development

No branches or pull requests

8 participants