Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BNode value not random enough #185

Closed
ghost opened this issue Feb 20, 2012 · 7 comments
Closed

BNode value not random enough #185

ghost opened this issue Feb 20, 2012 · 7 comments
Labels
bug Something isn't working fix-in-progress

Comments

@ghost
Copy link

ghost commented Feb 20, 2012

bcq.nng@gmail.com, 2011-12-23T16:13:30.000Z

When rdflib is used in a application that fork the current Python process, for exemple when using flup.server.*_fork, BNode's value generation in these processes:

  • share the same _prefix
  • use independant serial number generators that start with the same value

This means that the parent and child processes will all get the same BNode values.

I fixed this issue using

value = _unique_id() + str(random.randint(1, 99999999))

but there might be a cleaner solution.

@ghost ghost self-assigned this Feb 20, 2012
@joernhees
Copy link
Member

Uhm, why not just use UUIDs?
For example uuid.uuid4()
http://docs.python.org/library/uuid.html#uuid.uuid4

@ghost
Copy link
Author

ghost commented Mar 28, 2012

Sound suggestion, IMO. I've covered this ground in a comment to issue #195 because using uuid4 BNode ids has the potential of supporting some of the set-theoretic graph operations.

uuid/4 was introduced in Python 2.5, so it has to be mimicked for 2.4

I set up a uuid solution (leaving the existing BNode private API undisturbed) and exercised it in a couple of tests, setting them to fail if the issue was solved:

https://github.com/RDFLib/rdflib/blob/master/test/test_issue200.py#L90 (issue 200 was the old gcode issue #)

So far, I haven't merged Bertrand's pull request because:

  1. the change in BNode private API results in the above-mentioned tests raising Exceptions (easily fixed)
  2. using the code in the pull request, the first of the two non-uuid fork tests continues to pass, thereby
    demonstrating a continued requirement for each forked process to initialise the random seed.

I do think the argument for a switch to uuid4 is fairly compelling.

@gromgull
Copy link
Member

Looking at this again I'd vote for UUIDs - do we have a source for a 2.4 compatible uuid faker?

@ghost
Copy link
Author

ghost commented Apr 18, 2012

  • do we have a source for a 2.4 compatible uuid faker?

There's a putative implementation here:

https://github.com/RDFLib/rdflib/blob/master/test/test_issue200.py#L17

where I created a solution for this issue using the tests as proof-of-concept.

All seems to be working according to the CI build, I think it's now just a matter of propagating the implementation back into the RDFLib source.

@bcroq
Copy link
Contributor

bcroq commented Apr 19, 2012

I have just closed my pull request... my proposed fix was bcroq/rdflib@6e8aac9

ghost pushed a commit that referenced this issue Apr 20, 2012
@ghost
Copy link
Author

ghost commented Apr 20, 2012

Thank you for the proposed fix, Bertrand. It's great to see people pitching in.

BNode ids are declared to conform to NCName specs. Unadorned uuid4() won't the cut the mustard as a valid NCName due to its regrettable habit of randomly starting with an integer:

ParserError: None:6:2: rdf:nodeID value is not a valid NCName: 557cb5ba-9713-4b54-a2fb-65524e8154b1

My naive trial of a "urn:uuid:" prefix was also not valid:

ParserError: None:6:2: rdf:nodeID value is not a valid NCName: urn:uuid:a389b738-c914-42c0-06de-94e77112dabb

Comments in BNode.new assert:

# BNode identifiers must be valid NCNames" _:[A-Za-z][A-Za-z0-9]*
# http://www.w3.org/TR/2004/REC-rdf-testcases-20040210/#nodeID

However, the NCName reference given in https://github.com/RDFLib/rdflib/blob/master/rdflib/namespace.py#L364 (as the implementation spec for RDFLib's is_ncname() test) no longer seems to describe the same specification as used in RDFLib. Is this worth looking at in more detail?

In the end, I used a deeply unimaginative "_" (underbar) prefix.

@ghost
Copy link
Author

ghost commented Apr 20, 2012

Fixed in commit #03c77ea0af

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fix-in-progress
Projects
None yet
Development

No branches or pull requests

3 participants