Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tags out of order in returned list when using css to specify multiple tags #104

Closed
pushshift opened this issue Nov 11, 2023 · 5 comments
Closed

Comments

@pushshift
Copy link

pushshift commented Nov 11, 2023

When using css selection, I want to grab two different tags (p and h3). When I use the selector like this:

html.css("p,h3")

It selects the appropriate tags but the list gives all p tags first and the h3 tag last.

Example:

<p>   1 </p>
<h3>  2 </h3>
<p>   3 </p>

I would expect the returned list to give: [<node p>, <node h3>, <node p>]

Instead it returns: [<node p>, <node p>, <node h3>]

However, if I use html.css("*") it does return them in correct order but I have to loop through and throw out all unneeded nodes.

If this is indeed a bug, I'd give it a low priority since using css("*") is an alternative where I can simply loop through and only grab what I'm interested in. I just wasn't sure if this was a bug or expected behavior.

If this is expected behavior when selecting multiple css elements, is there a way to get them in the order they appear in the parent (similar to "*" as the CSS selector)

Also, please provide Patreon or Bitcoin wallet if possible so I can contribute for your time. Thank you for creating such an amazing tool. I use this often since it is lightweight, efficient and easy to use.

@rushter
Copy link
Owner

rushter commented Nov 15, 2023

Yeah, that's indeed unexpected behavior. I will have a look a bit closer this week.

@lexborisov is there a way to fix this? It looks like both modest and lexbor are affected.

Also, please provide Patreon or Bitcoin wallet if possible so I can contribute for your time. Thank you for creating such an amazing tool. I use this often since it is lightweight, efficient and easy to use.

That would be unfair to take all the credit for this library since most of the hard work is done by @lexborisov. @lexborisov do you accept donations?

@lexborisov
Copy link

Hi @rushter @pushshift

Yeah, that's indeed unexpected behavior. I will have a look a bit closer this week.

@lexborisov is there a way to fix this? It looks like both modest and lexbor are affected.

Yeah, it's my fault.
I'll try to fix it by Monday.

Also, please provide Patreon or Bitcoin wallet if possible so I can contribute for your time. Thank you for creating such an amazing tool. I use this often since it is lightweight, efficient and easy to use.

That would be unfair to take all the credit for this library since most of the hard work is done by @lexborisov. @lexborisov do you accept donations?

I seriously hadn't considered accepting donations. It doesn't seem to make sense. Not that many people will be donating.
@rushter you can safely accept donations. It's your binging and people like it. I don't see anything wrong with it.

@lexborisov
Copy link

@pushshift @rushter

Sorry, I remember this challenge. A lot of things to do at my day job. I hope to solve it soon.

lexborisov added a commit to lexbor/lexbor that referenced this issue Feb 8, 2024
Two factors served to completely change the algorithm for searching nodes by
selectors:
1. The order in which the nodes were found.

Previously, the algorithm found nodes by selectors in its own defined order.
This did not match the specification and behavior of modern browsers.

For example, the correct order:
    HTML: <div><p class="x"></p><p id="y">"abc"</p></div>
    Selectors: .x, div
    Result: div, p

Previous result: p, div.

This related rushter/selectolax#104 issue on GitHub.

2. limitation on nesting of pseudo class function.

The specification does not limit the nesting of pseudo functions in any way.

For example:
    Selectors: :not(:not(:not(:not(:not( <and 4k times :not()> )))))

Previously, all pseudo function nesting calls were made on the stack.
This caused stack overflow in case of large nesting.

Now no stack is used for nested functions. Recursions are also absent.
This makes the code safer.

Also, options have been added to change the search behavior.
Added new tests and fuzzer.
@lexborisov
Copy link

@rushter @pushshift

Sorry, it took time and a complete rewrite of the algorithm.
Fixed in lexbor/lexbor@7ed557d

@rushter
Copy link
Owner

rushter commented Feb 11, 2024

I've deployed a new release with updated lexbor backend.

@rushter rushter closed this as completed Feb 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants