Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalizing the public API #1122

Closed
jugglinmike opened this issue Dec 10, 2017 · 8 comments
Closed

Normalizing the public API #1122

jugglinmike opened this issue Dec 10, 2017 · 8 comments

Comments

@jugglinmike
Copy link
Member

The project has two notions of "static methods":

  1. Methods defined on the Cheerio constructor function. There are six in total:
    • 3 mimic methods defined on the global function defined by the jQuery project: .contains(), .merge(), and .parseHTML()
    • 3 are specific to Cheerio, exposed to address concerns that are not relevant in the browser context .html() and .xml(), .root()
  2. Methods defined on the module's exported value (e.g. the value returned from require('cheerio')). There are two of these: .load() and .text().

As of today, however, the source does not reflect this distinction. All eight methods are defined in both places.

The exported value is itself the constructor for the Cheerio object. We have never documented the direct use of this constructor. As far as I remember, we have always instructed folks to use it indirectly, after being "bound" by the load method (as trying to use it directly would involve re-implementing the load method).

These details make the API difficult to document, and they may cause confusion for developers who are new to the project. (The methods in the first set don't make sense in the context of the Cheerio module itself, and the methods in the second set don't apply to a Cheerio function that has been "bound" to a document.) Because we are preparing for a major release, I would like to discuss taking the opportunity to make a breaking change:

  1. Export an ordinary object value (not a function)
  2. Re-implement exports.html() and exports.xml() to operate in this context (rendering the "outer" markup of a "selection" defined by a provided Cheerio object)
  3. Persist the following exported values: exports.load(), exports.text(), and exports.version
  4. Remove the Cheerio.load() and Cheerio.text() methods from the Cheerio constructor function

#2 directly contradicts the documentation on rendering, but the upgrade path is fairly straightforward:

  • $.html('.pear') becomes cheerio.html($('.pear'))
  • $.html($('.pear')) becomes cheerio.html($('.pear'))
  • $.html() becomes cheerio.html($.root())

I think this is warranted because it makes the API easier to understand/document and because it further reduces disparities between "Cheerio objects" and "jQuery objects".

@fb55 @matthewmueller does this sound good to you? And are there any consumers out there that can share a use case which would be prevented by the change I'm proposing?

@fb55
Copy link
Member

fb55 commented Dec 25, 2017

Sorry for the delay, was traveling. This sounds exactly like the kind of change we should consider for the final 1.0. This should be a nice simplification, and remove some pitfalls — I'm all for it.

@matthewmueller
Copy link
Member

matthewmueller commented Dec 28, 2017

Would it be possible to upgrade this without breaking anything? I agree with these changes and given that we're still in 0.x, I'm open to this change.

I do think it'd be better if we can avoid breaking anything, alias these functions and update the docs with the improved API.

Lots of users these days – over 5 million downloads per month (!!). For better or for worse, there's going to be some package.json's with { "cheerio": "*" } 🙃

@jugglinmike
Copy link
Member Author

Thanks for the review, you two!

Would it be possible to upgrade this without breaking anything? I agree with
these changes and given that we're still in 0.x, I'm open to making this
change.

I do think it'd be better if we can avoid breaking anything and alias these
functions. lots of users these days – over 5 million downloads per month (!!)

One way to think about the current API is that it already defines a bunch of
aliases. For instance, $.html() is just a convenience for
cheerio.html($.root()). Removing the aliases (i.e. making breaking changes)
is the goal.

The change we're discussing is unrelated to correctness, so there's a case to
be made for not touching anything. In other words, "If it ain't broke, don't
fix it."

From my perspective, though, it is broken. The convenience APIs are not available
in jQuery. It's also generally harder to document and learn about an API that
offers many ways to do the same thing.

I should mention that I made an error in the description of this issue. When I
wrote,

The exported value is itself the constructor for the Cheerio object. We have
never documented the direct use of this constructor. As far as I remember, we
have always instructed folks to use it indirectly, after being "bound" by the
load method (as trying to use it directly would involve re-implementing the
load method).

I was wrong. The README.md file contains the following examples:

Optionally, you can also load in the HTML by passing the string as the context:

const $ = require('cheerio');
$('ul', '<ul id="fruits">...</ul>');

Or as the root:

const $ = require('cheerio');
$('li', 'ul', '<ul id="fruits">...</ul>');

This doesn't make the change any more "breaking", but it probably means that
more consumers rely on the API than my original report suggested.

If we'd like to be more conservative, we could release 1.0 without this change,
and instead document the methods as "deprecated". In that case, I would like to
also announce an estimated release date for version 2.0 where we eventually do
remove those things. But I'm not sure this is actually helpful--I think most
consumers will be unaware of API deprecation status, so deferring the change
just delays the interruption. We're already planning to break some use cases
for the 1.0 release. If we're all in agreement that this change is desirable,
then it seems better to get this change over with.

@fb55
Copy link
Member

fb55 commented Jan 19, 2018

I always found the constructor pretty confusing and would agree with deprecating its direct usage. Removing it directly might be a bit too aggressive, so for now just adding a deprecation notice if using any of the methods we want to remove is IMHO the best way forward.

jugglinmike added a commit to jugglinmike/cheerio that referenced this issue Apr 22, 2018
Methods named `load`, `html`, `xml`, and `text` are defined in many
locations:

Today, Cheerio defines multiple versions of methods named `load`,
`html`, `xml`, `text`, and `parseHTML`. These alternate versions may be
defined in up to three distinct parts of the API:

- exported by the Cheerio module
- as static methods of the "loaded" Cheerio factory function
- as instance methods of the "loaded" Cheerio factory function

Some of these are surperfluous, and because some unecessarily conflict
with idiomatic jQuery coding patterns, they have been designated for
future removal [1].

Add tests for the deprecated methods in order to avoid regressions prior
to their removal. Insert comments to delineate the methods which are
endorsed and which have been deprecated. For the latter group of
methods, include recommendation for the preferred alternatives.

[1] cheeriojs#1122
@jugglinmike
Copy link
Member Author

I agree that it's in everyone's interest if we don't move quite so fast on this--I just needed some time to slow down :)

Still, the 1.0 announcement will be a great time to alert users to our intention here, and I think there are some things we can do to make the transition easier.

To move toward this goal for version 1.0, I'd like to document the irregular API as "deprecated" but also to improve test coverage. Currently, our coverage is incomplete, and that's particularly hard to see given the irregular use of the $ internally.

  1. Improving variable names used within the test suite to consistently refer to the module value as cheerio and the "loaded" factory function as $
  2. Extending test coverage to include the deprecated APIs
  3. Refactoring the internals to more clearly designate the deprecated APIs

We just landed a patch for step 1, so patches for the next steps should be easier to read. A patch for step 2 is currently under review. Once that's through, we'll have reasonable certainty that the refactoring in step 3 isn't introducing any regressions in those unfortunate methods.

@matthewmueller
Copy link
Member

matthewmueller commented Apr 22, 2018

From my perspective, though, it is broken. The convenience APIs are not available
in jQuery. It's also generally harder to document and learn about an API that
offers many ways to do the same thing.

Ah I think I got it, so you're coming at this as "let's align ourselves better with jQuery". That makes a lot of sense to me.

I'm happy to accept whatever you decide here, but my preference would be that we undocument those irregularities, while leaving them in and perhaps adding a deprecation note. However, in any case where we're at odds with jQuery, that'd be a really good signal that we should make a breaking change.

Everything else also sounds really wonderful. The $(...) stuff is definitely some hairy code.

Thanks for taking the charge on this!

@jugglinmike
Copy link
Member Author

I'm happy to accept whatever you decide here, but my preference would be that we undocument those irregularities, while > leaving them in and perhaps adding a deprecation note.

I know where you're coming from--it's why I initially suggested we remove the code outright. By deciding to deprecate for 1.0 (rathen than remove, as would be our right in a major release) then we're recognizing that some users are currently dependent on that code and that we don't want to shut them out. Removing documentation will make things harder for them, but adding documentation with clear deprecation notices more closely aligns with our committments. It may also cause some folks to discover and use the methods, but I'll have less concern about breaking their code in Cheerio version 2

Everything else also sounds really wonderful. The $(...) stuff is > definitely some hairy code.

Thanks for taking the charge on this!

Awesome, thanks for saying so!

fb55 pushed a commit that referenced this issue May 9, 2018
Methods named `load`, `html`, `xml`, and `text` are defined in many
locations:

Today, Cheerio defines multiple versions of methods named `load`,
`html`, `xml`, `text`, and `parseHTML`. These alternate versions may be
defined in up to three distinct parts of the API:

- exported by the Cheerio module
- as static methods of the "loaded" Cheerio factory function
- as instance methods of the "loaded" Cheerio factory function

Some of these are surperfluous, and because some unecessarily conflict
with idiomatic jQuery coding patterns, they have been designated for
future removal [1].

Add tests for the deprecated methods in order to avoid regressions prior
to their removal. Insert comments to delineate the methods which are
endorsed and which have been deprecated. For the latter group of
methods, include recommendation for the preferred alternatives.

[1] #1122
fb55 added a commit that referenced this issue Oct 3, 2018
* Use parse5 as a default parser (closes #863)

* Use documents via $.load

* Add test for #997

* Change options format

* Update unit test

Update test phrasing according to recent changes in parsing logic.

* 1.0.0-rc.1

* Improve release process

Limit responsibility of "pre-publish" script to simply validate the
project's `History.md` file (by verifying an entry for the current
release). Define a separate script for history generation. Separating
the workflow in this way allows for manual modification of the release
notes.

* Correct errors in Readme.md

* Document advanced usage with htmlparser2

* Update History.md (and include migration guide)

* Remove documentation for `xmlMode` option

Simply expose an option named `xml` that enables XML parsing via
htmlparser2 with the ability to specify additional options for that
parser.

* Rename `useHtmlParser2` option

This flag is used to control parsing behavior internally, but it is not
intended for use by consumers. Prefix the name with an underscore in
order to discourage users from relying on it.

* Re-write migration guide for version 1.0 (#1078)

Incorporate recent feedback from consumers who have experimented with
the version 1.0 release candidate.

* Pass locationInfo option to parse5 (#1155)

* Update css-select to the latest version 🚀 (#1158)

Breaking change: Invalid selectors now throw Errors, not SyntaxErrors.

* Use eslint & prettier, add precommit hook (#1152)

* chore(package): update mocha to version 5.0.4 (#1088)

* Ensure text nodes expose the DOM level 1 API

Since enabling the `withDomLvl1` parsing option, nodes cannot be created
with an object literal. Create new text nodes using the `evaluate`
function to ensure they expose the correct attributes.

* fixing missing prop(‘outerHTML’) implementation.
Added an ‘outerHTML’ case to the switch in the prop function, which wraps a clone of  in a container element, and sets  to that container's HTML (#945)

* Do not lint files excluded from version control (#1162)

This includes code coverage reports as generated by the command `make
test-cov`.

* Correct typo in git hook configuration (#1163)

* Correct typo in git hook configuration

* Reformat package manifest to satisfy linter

* Fix .text with a function as the argument

* Fix `.text` being called on a collection with size > 1 with a function

* chore(package): update coveralls to version 3.0.0 (#1086)

* Update jsdom to the latest version 🚀 (#1008)

* Throw a useful error on invalid input to cheerio.load() (#1087)

* Procedurally generate API documentation from source (#1165)

* Use parse5 to serialize the DOM, use lodash to clone dom

* Fix DoS/RCE vulnerability in lodash@4.15.0 (#1179)

fixes #1175

*  Add eslint-plugin-jsdoc, improve documentation (#1168)

* Improve variable names (#1183)

Promote consistency in variable names within the project's source and
unit tests. This helps to highlight the distinction between the object
exported by the module and the function produced by the `load` method.
The latter value is expected to mimic the jQuery API, while the former
value generally should only be used for a small set of methods specific
to Cheerio:

- `load`
- `html`
- `xml`
- `text`

Other usages of the exported object are discouraged, and a future patch
will update the unit tests to reflect the usages that are endorsed for
long-term stability.

* Formally test deprecated APIs (#1184)

Methods named `load`, `html`, `xml`, and `text` are defined in many
locations:

Today, Cheerio defines multiple versions of methods named `load`,
`html`, `xml`, `text`, and `parseHTML`. These alternate versions may be
defined in up to three distinct parts of the API:

- exported by the Cheerio module
- as static methods of the "loaded" Cheerio factory function
- as instance methods of the "loaded" Cheerio factory function

Some of these are surperfluous, and because some unecessarily conflict
with idiomatic jQuery coding patterns, they have been designated for
future removal [1].

Add tests for the deprecated methods in order to avoid regressions prior
to their removal. Insert comments to delineate the methods which are
endorsed and which have been deprecated. For the latter group of
methods, include recommendation for the preferred alternatives.

[1] #1122

* Implement for...of iterator via Symbol.iterator (#1197)

* Implement for...of iterator via Symbol.iterator

Similar to jQuery: https://github.com/jquery/jquery/blob/1ea092a54b00aa4d902f4e22ada3854d195d4a18/src/core.js#L371-L373

Fixes #1191

* Assert that the iterator ends

#1197 (comment)
@jugglinmike
Copy link
Member Author

We just completed step 3 via gh-1232, so it's safe to call this issue resolved :)

fb55 pushed a commit that referenced this issue Sep 10, 2020
Methods named `load`, `html`, `xml`, and `text` are defined in many
locations:

Today, Cheerio defines multiple versions of methods named `load`,
`html`, `xml`, `text`, and `parseHTML`. These alternate versions may be
defined in up to three distinct parts of the API:

- exported by the Cheerio module
- as static methods of the "loaded" Cheerio factory function
- as instance methods of the "loaded" Cheerio factory function

Some of these are surperfluous, and because some unecessarily conflict
with idiomatic jQuery coding patterns, they have been designated for
future removal [1].

Add tests for the deprecated methods in order to avoid regressions prior
to their removal. Insert comments to delineate the methods which are
endorsed and which have been deprecated. For the latter group of
methods, include recommendation for the preferred alternatives.

[1] #1122
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants