Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/descendant::span[1] should not be equivalent to //span[1] #93

Closed
yayuyokitano opened this issue Mar 10, 2024 · 1 comment
Closed

/descendant::span[1] should not be equivalent to //span[1] #93

yayuyokitano opened this issue Mar 10, 2024 · 1 comment

Comments

@yayuyokitano
Copy link

Note: I have experienced this issue using htmlquery, but it seems to not be fixed here either, so I am raising issue here.

By xpath spec, /descendant::span[1] and //span[1] are not the same.

/descendant::span[1] should get the first span descendant matched within the current ancestor, while //span[1] should get the first span descendant matched within the parent of each respective span element. This package seems to currently be treating both like //span[1]

Example:

<div id="wrapper">
  <span>span one</span>
  <div>
    <span>span two</span>
  </div>
</div>

In this case, //div[@id='wrapper']//span[1] should be matching both span one and span two, because both are the first child of their respective parents. However, //div[@id='wrapper']/descendant::span[1] should only match span one, as it goes by the position within div[@id='wrapper'], not position within direct parent.

See xpath spec 2.5

NOTE: The location path //para[1] does not mean the same as the location path /descendant::para[1]. The latter selects the first descendant para element; the former selects all descendant para elements that are the first para children of their parents.

@yayuyokitano
Copy link
Author

Very likely tightly related observation that may or may not be an issue: following-sibling does not seem to take into account context.

Example:

<div id="wrapper">
  <a>anchor 1</a>
  <div>div 1</div>
  <a>anchor 2</a>
  <div>div 2</div>
  <div>div 3</div>
</div>

If I here run a queryall on //div[@id='wrapper']/a/following-sibling::div[1], I expected that both div 1 and div 2 are returned, as there are two matching anchors, both with their own first following sibling. However, only one match (div 1) seems to be returned.

However, I am not entirely sure if this is even an issue, as in initial checking I did not see this as clearly spelled out in the xpath spec as the main issue (though I did not read through it very thoroughly either). All I know is that the way I expected it to work is how the chromium devtools xpath implementation works.

zhengchun added a commit that referenced this issue Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants