Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code duplication issue between rdflib and pymicrodata #582

Closed
1 of 4 tasks
joernhees opened this issue Jan 25, 2016 · 10 comments
Closed
1 of 4 tasks

code duplication issue between rdflib and pymicrodata #582

joernhees opened this issue Jan 25, 2016 · 10 comments
Labels
cleanup discussion enhancement New feature or request meta Relates primarily to the project and not users of the project.
Milestone

Comments

@joernhees
Copy link
Member

There seems to be a code duplication problem between https://github.com/RDFLib/rdflib and https://github.com/RDFLib/pymicrodata which lead to problems around #443 before. There are several ways to solve this:

  • remove the pymicrodata part from rdflib, put it in an own package and make rdflib depend on that...
  • make a git submodule
  • remove this repository and just have the one in rdflib
  • keep both and maybe have this problem again

Once #443 is solved, i'd love to move that tick mark to some other box...

@gromgull
Copy link
Member

This is a "known issue"

At the time @iherman wanted to keep the code BOTH in a separate repo and in RDFLib, due to the way pymicrodata (and it's exactly the same for the rdfa code) was deployed at w3c.

I was skeptical, but he assured me he would keep them in sync :)

Since I am a "unionist" and like having all the stuff in one repo (not the npm disease :) ), I vote removing the separate project and keeping only the rdlilb one.

@joernhees joernhees added this to the rdflib 4.3.0 milestone Jan 25, 2016
@joernhees joernhees added the meta Relates primarily to the project and not users of the project. label Jan 25, 2016
@joernhees
Copy link
Member Author

hmm, while https://github.com/RDFLib/rdflib/tree/master/rdflib/plugins/parsers/pyMicrodata and https://github.com/RDFLib/pymicrodata/tree/master/pyMicrodata are clearly meant to be the same, i'm not entirely sure what to do about the docs and scripts in https://github.com/RDFLib/pymicrodata .

sounds like a clear argument for an own package to me.

Main problem with that would be docs and intra-package tests, as we recently experienced with SPARQLWrapper (a change there broke rdflib's tests quite a while later...)

@iherman
Copy link
Contributor

iherman commented Jan 25, 2016

The reasons I do not want formally merge is internal to the way things are deployed at W3C and the time I devote to this these days, to be frank. For example, I use the RDFLib installed by the system team on the service, which is never 100% up to date. Etc.

Let us close this particular issue...

(The same holds for the RDFa service vs parser, b.t.w.)

@gromgull
Copy link
Member

I appreciate your concern about deployment at w3c @iherman , but RDFLib has many many other users than w3c - there has already been many problems because of this slightly odd setup. All RDFLib maintainer work on voluntarily in their spare time, I am very much interested in having my hobby be as smooth and painless to work on as possible, and I cannot really set off extra time to do things awkwardly to support the deployment politics of the W3C.

We need to solve this issue somehow NOW - if the easiest way for you to do any pyMicrodata/RDFa work is to keep them in separate projects, lets make the RDFLib plugins.

We also discussed #391

I was always against re-separating the projects, since 1. it was a bit of work to move them together, it'll be work again to take them apart, work that could be better spent actually fixing bugs and implementing features. (Re: my hobby again, it is: making the library that makes working with RDF data as easy as possible, not refactoring) 2. I remember the massive support-effort it was to explain to everyone about rdflib-sparql etc.

Number 2 may well be better now - more people (hopefully) use pip or similar, and everyone is already used to npm etc. to install 10,000 small packages to get anything done.

@gromgull
Copy link
Member

One RDFLib maintainer was stepping down and another taking over, no the first day the old maintainer said : prepare three envelopes ...
http://www.notboring.com/jokes/work/3.htm

;)

@iherman
Copy link
Contributor

iherman commented Jan 26, 2016

@gromgull:

I appreciate your concern about deployment at w3c @iherman , but RDFLib has many many other users than w3c - there has already been many problems because of this slightly odd setup. All RDFLib maintainer work on voluntarily in their spare time, I am very much interested in having my hobby be as smooth and painless to work on as possible, and I cannot really set off extra time to do things awkwardly to support the deployment politics of the W3C.
We need to solve this issue somehow NOW - if the easiest way for you to do any pyMicrodata/RDFa work is to keep them in separate projects, lets make the RDFLib plugins.

I believe that boat has sailed, insofar as I think it would be a major mistake removing the parsers from the core RDFLib distribution (users probably did not bother installing separate libraries if it was part of the core). Whether it was a mistake to incorporate them into the core or not back then is an academic discussion. (And, frankly, I do not even remember how we got there, it was eons ago for me, having moved out of the active RDF world...)

We can declare the separate microdata and RDFa libraries are closed and not maintained any more, directing people towards the core distribution, and solve possible problems only for the embedded parsers. I am fine with that; if there are problems popping up in those versions it will become my problem (or problem of those who take over the maintenance at W3C if I move on). I would probably clone those repositories under my name simply to maintain history and use version control for my own purposes.

(B.t.w., the separate versions are not plugins. The contain the same parsing library, and an application layer on top, but I am not sure it contains the plugin 'binding'. In this sense, they are utilities on top of RDFLib rather than plugins.)

@joernhees joernhees modified the milestones: rdflib 5.0.0, rdflib 4.3.0 Jan 27, 2016
@joernhees
Copy link
Member Author

@RDFLib/owners are there any use cases for installing rdflib without pip, conda or any other package manager?

@joernhees joernhees modified the milestones: rdflib 6.0.0, rdflib 5.0.0 Jan 28, 2016
@joernhees
Copy link
Member Author

ok, so as you might've noticed we're nearing an end wrt. #443. As the new parser changes the interface and returns different results when parsing the same file, i have to make it a new major release of rdflib in terms of http://semver.org ... i'm not afraid of bumping that number, but as you might have noticed, this causes me to move a lot of issues to milestone 6.0.0 (such as this one), as they on their own have the potential to change rdflib in a backwards incompatible way.

As a new major release is a signal to system maintainers that things changed in a backwards incompatible fashion, doing this too often is not really good style. It's also a very strong reason to externalize individual changing parsers into sub-packages again, allowing to use old parsers with an updated rdflib core to not break your parsing results...

The obvious downside of that is that splitting causes a lot of redundant work (making sub-packages, setting up travis, where to run the tests, specifying which version combinations are supported, etc.) and divides the few developers even further.

I'm still undecided what's best here, also "best" relies on how often upstream format specs change in a backwards incompatible way... let's hope they rarely do and this was an exception...

@iherman
Copy link
Contributor

iherman commented Jan 29, 2016

ok, so as you might've noticed we're nearing an end wrt. #443 #443. As the new parser changes the interface and returns different results when parsing the same file, i have to make it a new major release of rdflib in terms of http://semver.org http://semver.org/ ... i'm not afraid of bumping that number, but as you might have noticed, this causes me to move a lot of issues to milestone 6.0.0 (such as this one), as they on their own have the potential to change rdflib in a backwards incompatible way.

I understand...
As a new major release is a signal to system maintainers that things changed in a backwards incompatible fashion, doing this too often is not really good style. It's also a very strong reason to externalize individual changing parsers into sub-packages again, allowing to use old parsers with an updated rdflib core to not break your parsing results...

The obvious downside of that is that splitting causes a lot of redundant work (making sub-packages, setting up travis, where to run the tests, specifying which version combinations are supported, etc.) and divides the few developers even further.

I'm still undecided what's best here, also "best" relies on how often upstream format specs change in a backwards incompatible way... let's hope they rarely do and this was an exception...

I believe this is an exception, at least for these features.

On the RDFa side, RDFa1.1 is a Recommendation. This means changing it is not permitted only if a fully new Working Group is formed, and the charter of that one may require backward compatibility (just as RDFa1.1 is backward compatible with RDFa1.0, even if some RDFa1.0 features are now obsolete).

Microdata is a little bit different, insofar as that is only a Note. For various reasons it is not a Rec, and it will not be, primarily because the HTML WG has not published it as a Recommendation. The update of the Note was done to align it with the way schema.org http://schema.org/ uses microdata; taking into account that schema.org http://schema.org/ is the only major user of microdata, but also that it has a major deployment by now, I do not expect anything will be changed in a backward incompatible way either.

Of course, I cannot guarantee anything:-)

Thanks!

@ashleysommer
Copy link
Contributor

This was closed via #828 and is in RDFLib 5.0.0 release, but this issue is tagged as 6.0.0.
Changing milestone to 5.0.0 for inclusion in the RDFLib 5.0.0 changelog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup discussion enhancement New feature or request meta Relates primarily to the project and not users of the project.
Projects
None yet
Development

No branches or pull requests

4 participants