Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code marked as unreachable after pandas concat #5630

Closed
ldouteau opened this issue Mar 13, 2024 · 27 comments
Closed

Code marked as unreachable after pandas concat #5630

ldouteau opened this issue Mar 13, 2024 · 27 comments
Assignees
Labels
external bug typestub Issue relating to our bundled type stubs

Comments

@ldouteau
Copy link

Environment data

  • Language Server version: 2024.3.1
  • OS and version: win32 x64
  • Python version (and distribution if applicable, e.g. Anaconda): 3.11.5
  • python.analysis.indexing: true
  • python.analysis.typeCheckingMode: off

Code Snippet

import pandas as pd

a = 1
pd.concat()
a = 1
pd.concat([])
a = 1

Repro Steps

  1. Open this piece of code
  2. Code at line 7 is flagged as unreachable by Pylance

Expected behavior

Code flagged as unreachable, and thus greyed out

Actual behavior

Code should be reachable

Logs

I'm not sure what i should provide here, or if it's relevant. Let me know if more is required

@github-actions github-actions bot added the needs repro Issue has not been reproduced yet label Mar 13, 2024
@ArthurPERE
Copy link

Hello,

I have the same problem, but I am on debian 12.

How do i know it's pylance, when i disable pylance and reload, the code is suddenly reachable.

My version of python is 3.12.1 in an environment conda.
And my version of pandas is :
pandas 2.2.1 py312h526ad5a_0

@clementlefevre
Copy link

Hello same here with pylance v2024.3.1

each pandas.concat call make the following lines unreachable.

Version: 1.87.2 (user setup)
Commit: 863d2581ecda6849923a2118d93a088b0745d9d6
Date: 2024-03-08T15:20:17.278Z
Electron: 27.3.2
ElectronBuildId: 26836302
Chromium: 118.0.5993.159
Node.js: 18.17.1
V8: 11.8.172.18-electron.0
OS: Windows_NT x64 10.0.22621

@torext
Copy link

torext commented Mar 13, 2024

Have a duplicate report here with additional details possibly: #5631

@debonte
Copy link
Contributor

debonte commented Mar 13, 2024

The matching overload of concat in pandas-stubs currently returns Never, so Pylance is behaving correctly here. https://github.com/pandas-dev/pandas-stubs/blob/main/pandas-stubs/core/reshape/concat.pyi#L39

This overload was added in pandas-dev/pandas-stubs#858.

Please file an issue against pandas-stubs at https://github.com/pandas-dev/pandas-stubs/issues.

Btw, in this case there are multiple matching overloads because the input parameter is list[Any]. Following our overload matching rules we choose the first matching overload, which in this case is the new Never overload.

They seemt to be abusing Never in this case. I believe they want it to mean "not a legal call" whereas it really means "this call never returns".

@debonte debonte added typestub Issue relating to our bundled type stubs external bug and removed needs repro Issue has not been reproduced yet labels Mar 13, 2024
@debonte debonte closed this as completed Mar 13, 2024
@debonte
Copy link
Contributor

debonte commented Mar 13, 2024

Btw, while waiting for a fix, you could work around the pandas-stubs issue by commenting out the Never overload in ...\.vscode\extensions\ms-python.vscode-pylance-2024.3.1\dist\bundled\stubs\pandas\core\reshape\concat.pyi.

@overload
def concat(
    objs: Iterable[None] | Mapping[HashableT1, None],
    *,
    axis: Axis = ...,
    join: Literal["inner", "outer"] = ...,
    ignore_index: bool = ...,
    keys: Iterable[HashableT2] = ...,
    levels: Sequence[list[HashableT3] | tuple[HashableT3, ...]] = ...,
    names: list[HashableT4] = ...,
    verify_integrity: bool = ...,
    sort: bool = ...,
    copy: bool = ...,
) -> Never: ...

@torext
Copy link

torext commented Mar 13, 2024

Thanks for the explanation @debonte, I've submitted an issue to to Pandas devs here: pandas-dev/pandas-stubs#888

@Dr-Irv
Copy link

Dr-Irv commented Mar 13, 2024

Note - both of the calls below will raise an exception with pandas, so in the example by the OP, pylance is absolutely correct in terms of what is reported!

pd.concat()
pd.concat([])

@ldouteau
Copy link
Author

Note - both of the calls below will raise an exception with pandas, so in the example by the OP, pylance is absolutely correct in terms of what is reported!

pd.concat()
pd.concat([])

FYI Pylance (through pandas-stubs) doesn't mark code as unreachable after the first code, only after the second one.
About the relevance of the test case, sure it's a minimal example that doesnt make much sense.

@Dr-Irv
Copy link

Dr-Irv commented Mar 14, 2024

Please see #5640 . pylance is doing the unreachable analysis by looking at the stubs even if type checking is off.

@Dr-Irv
Copy link

Dr-Irv commented Mar 14, 2024

FYI Pylance (through pandas-stubs) doesn't mark code as unreachable after the first code, only after the second one.
About the relevance of the test case, sure it's a minimal example that doesnt make much sense.

I'm guessing that you had type checking turned off. Because if you turned it on, the first line would be flagged as a typing error.

The second line will raise an exception, so then code below it is unreachable.

What's debatable here is whether there should be some control over whether you can turn on/off the unreachable analysis, or whether pylance should be doing the unreachable analysis when you are not using type checking.

@zorion
Copy link

zorion commented Mar 14, 2024

Why is this (and #5631) closed? Is it fixed? If it is not, which open issue will fix the question?

I will refer to #5631, but since that is duplicated to this one, the arguments should be relevant here.

Pylance says it is ok to decide that it is unreachable code because pandas-stubs starts checking if all list is None then it will fail. So pylance is correct.
Pandas-stubs says it is ok to check if all list elements are None, then this will obviously fail. They claim that pylance shouldn't state that "Any" means all-None.

I've been in both sides while reading the issues but now I feel like pandas-stubs are right and the issue is that if we have Any there is a possibility that we end with a unreachable code (all-None), but it isn't guaranteed. You can receive a list including some DataFrame. Don't you?
For instance, the #5631 question.
Ok, #5631's OP could state that generate_series will return a pandas.Series, which it does, but it's not always that easy.

from typing import Any

import pandas as pd


def generate_series() -> Any:
    return pd.Series([1, 2, 3])


df = pd.concat({k: generate_series() for k in range(10)}, axis=1)

print("Unreachable?!")

Thanks!

@Dr-Irv
Copy link

Dr-Irv commented Mar 14, 2024

Ok, #5631's OP could state that generate_series will return a pandas.Series, which it does, but it's not always that easy.

I agree with the aspect of it not always being that easy. You could have a 3rd party library that is fully untyped (or improperly typed!) where the type checker cannot infer the result of a method or function from that library to be a Series or DataFrame (or an Iterable of either). So you have code that calls that 3rd party library and now the type checker thinks the result is Iterable[Any], even though you "know" it is Iterable[Series], so your only solution is to cast in order for the call to pd.concat() to work.

@erictraut
Copy link
Contributor

Why is this closed?

Pyright and pylance are working as intended here (and in compliance with the Python type system). There's nothing actionable for the pylance or pyright teams, so it's appropriate to close this issue. The behavior you're seeing is due to the way the pandas stubs are using NoReturn in overloads. I recommend that the maintainers of these stubs reconsider the way they're using this part of the Python typing system. If necessary, we can discuss ways to extend the type system to better support their needs.

@Dr-Irv
Copy link

Dr-Irv commented Mar 14, 2024

Pyright and pylance are working as intended here (and in compliance with the Python type system).

Is "reachability" part of the python type system? I don't see that in the link you provided.

Remember, with the stubs as they are in pandas-stubs, the code at #5630 (comment) passes fine with pyright on command line. So the stubs are correct with respect to both mypy and pyright as command line type checkers. So I would argue that the stubs are correct with respect to formal type checking.

You're saying they are incorrect in the context of doing "reachability analysis".

The issue here is the assumptions about reachability inferred by pyright in the context of pylance and VSCode, and the lack of control for users to turn off the reachability analysis.

@debonte
Copy link
Contributor

debonte commented Mar 14, 2024

the code at #5630 (comment) passes fine with pyright on command line

@Dr-Irv, please see my recent reply on the pandas-stubs issue. While pyright doesn't emit any diagnostics about unreachability, the Never overload effectively causes false negatives in the code after the concat call. pandas-dev/pandas-stubs#888 (comment)

@Dr-Irv
Copy link

Dr-Irv commented Mar 14, 2024

@Dr-Irv, please see my recent reply on the pandas-stubs issue. While pyright doesn't emit any diagnostics about unreachability, the Never overload effectively causes false negatives in the code after the concat call. pandas-dev/pandas-stubs#888 (comment)

@debonte and see my response at pandas-dev/pandas-stubs#888 (comment)

Command line pyright does not exhibit what you show (and neither does mypy).

The fundamental issue is about the reachability analysis done within pylance/VSCode, which is why I opened #5640

@erictraut
Copy link
Contributor

As I've already said, correct type evaluation requires reachability analysis. One cannot be done without the other. You can't just "turn off reachability analysis" and expect to get correct types.

@Dr-Irv
Copy link

Dr-Irv commented Mar 14, 2024

As I've already said, correct type evaluation requires reachability analysis. One cannot be done without the other. You can't just "turn off reachability analysis" and expect to get correct types.

But if a user turns off type analysis in VS Code, then shouldn't the reachability analysis also be turned off if one can't be done without the other?

@erictraut
Copy link
Contributor

There is no way to turn off type analysis in Pylance. All functionality within Pylance (completion suggestions, hover text, signature help, semantic tokens, etc.) relies on type analysis. If you set typeCheckingMode to "off", that simply suppresses certain classes of diagnostics. Reachability analysis and type analysis go hand in hand.

@Dr-Irv
Copy link

Dr-Irv commented Mar 14, 2024

There is no way to turn off type analysis in Pylance. All functionality within Pylance (completion suggestions, hover text, signature help, semantic tokens, etc.) relies on type analysis. If you set typeCheckingMode to "off", that simply suppresses certain classes of diagnostics. Reachability analysis and type analysis go hand in hand.

OK, that makes sense. Maybe the option that is needed is "ignore stubs included with pylance" or "ignore specific packaged stubs included with pylance" ?? At least that would give people the option to temporarily suppress the issue if any stub package like pandas-stubs had an error like this one.

@debonte
Copy link
Contributor

debonte commented Mar 14, 2024

Maybe the option that is needed is "ignore stubs included with pylance" or "ignore specific packaged stubs included with pylance" ?? At least that would give people the option to temporarily suppress the issue if any stub package like pandas-stubs had an error like this one.

That seems excessive to me. I suspect that a small percentage of our users have sufficient understanding of Python typing, stubs, Pylance/Pyright, etc. to know whether such a command would help in any particular scenario. Those that do would likely know enough to work around the issue in their code or in the stubs, thereby retaining the benefit of the unbroken/non-buggy portion of the stubs rather than throwing the baby out with the bath water. In the case of pandas-dev/pandas-stubs#888, they could cast the parameter passed to concat or remove the Never overload from the stubs. Worst-case, this type of advanced user could easily find the bundled stubs in our install directory and delete them manually.

Btw, you pointed out that Pylance shipped with the stubs from your main branch rather than your most recent release. It sounded like you might have been frustrated that we were shipping prerelease stubs. To be honest, that hadn't occurred to me before. Currently we just grab whatever's in main once a week. Would you recommend that we always ship the latest official release instead?

@Dr-Irv
Copy link

Dr-Irv commented Mar 14, 2024

Btw, you pointed out that Pylance shipped with the stubs from your main branch rather than your most recent release. It sounded like you might have been frustrated that we were shipping prerelease stubs. To be honest, that hadn't occurred to me before. Currently we just grab whatever's in main once a week. Would you recommend that we always ship the latest official release instead?

Whatever is in main has been tested with our CI and would go in the "next" release whenever I decide to do that release. So pulling once a week is fine. In fact, in this case, it helped us identify an issue that we can fix quickly (which I think we will do, based the latest remarks in the pandas-stubs issue). It would be worthwhile if you could do something (you could add a Literal to pandas-stubs/_version.pyi) that indicated which commit you pulled from in case we have to figure out the difference again.

@zorion
Copy link

zorion commented Mar 15, 2024

This worked like a charm:
#5630 (comment)

I think this is better than adding a flag saying "I don't want pylance using pandas-stub", so I wouldn't add that option.

Thanks for your great support to both teams, pylance(-release) and pandas(-stubs). I love your projects!

@rcyost
Copy link

rcyost commented Mar 18, 2024

Hi guys is this fixed? Pandas concat messes up my editor viewing.

@debonte
Copy link
Contributor

debonte commented Mar 18, 2024

Hi guys is this fixed? Pandas concat messes up my editor viewing.

It has been fixed in pandas-stubs. We will pick up that fix in our next prerelease build, which will likely ship on Wednesday.

In the meantime you can use the workaround I mentioned above.

@ahmedmohammed107
Copy link

ahmedmohammed107 commented Mar 21, 2024

A naive solution is to define my_concat function that returns pd.concat. Something like:

def my_concat(dfs_list, axis=0): return pd.concat(dfs_list, axis=axis)

df = my_concat([df_1, df_2], axis=0)

@debonte
Copy link
Contributor

debonte commented Mar 25, 2024

This issue is also fixed in our most recent stable release -- 2024.3.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external bug typestub Issue relating to our bundled type stubs
Projects
None yet
Development

No branches or pull requests

10 participants