Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New API: Add api end point to move dataverses #4406

Closed
kcondon opened this issue Jan 10, 2018 · 12 comments
Closed

New API: Add api end point to move dataverses #4406

kcondon opened this issue Jan 10, 2018 · 12 comments
Assignees

Comments

@kcondon
Copy link
Contributor

kcondon commented Jan 10, 2018

We frequently receive support requests to move dataverses. Providing an api endpoint would empower end users to do this.

@mheppler
Copy link
Contributor

Related/duplicate of #2278 from 2015-06-17.

@djbrooke
Copy link
Contributor

From backlog grooming:

  • There are indexing considerations here
  • If we open this to a group beyond superadmins, we have to be very careful about letting a user leave a dataverse in a broken state

@kcondon
Copy link
Contributor Author

kcondon commented Jan 10, 2018

Note: we currently move a dataverse by updating the owner_id field in the dvobject table for the dv being moved with the id of the new parent dv, then run index on the moved dataverse and all it's child objects:

move dataverse, id 23 to dataverse 5:
update dvobject set owner_id=5 where id=23;
get list of child objects of dataverse 23:
select id, dtype from dvobject where owner_id=23;
if any child objects are sub dataverses, need to iterate on getting list of child objects
index all moved objects, dataverse 23 and all child objects:
curl http://localhost:8080/api/admin/index/dataverses/23
curl http://localhost:8080/api/admin/index/datasets/28
curl http://localhost:8080/api/admin/index/datasets/31
curl http://localhost:8080/api/admin/index/datasets/33

@pdurbin
Copy link
Member

pdurbin commented Jan 11, 2018

My ears perked up when we started talking about recursive indexing, which we don't do currently. It's not clear to me whether we should tackle recursive indexing as part of this issue or not, but it's a potentially time consuming operation that might be a nice tool in a Dataverse keeper's toolbox.

@ferrys
Copy link
Contributor

ferrys commented Feb 28, 2018

Some notes about changes so far:

  • If a Dataverse contains a guestbook and is moving to a Dataverse without that guestbook, the guestbook is removed (if the move is forced)
  • The same goes for Dataverse templates
  • If a Dataverse is featured in its parent, that feature association is removed
  • If a metadata field is selected on a Dataverse and the metadata block does not exist in the destination Dataverse, it is disassociated with the moved Dataverse

@pameyer
Copy link
Contributor

pameyer commented Mar 2, 2018

Are warnings generated when these changes (guestbook removal, metadata block dissociation, etc) happen? If not, is it worth the effort?

@scolapasta
Copy link
Contributor

scolapasta commented Mar 2, 2018

We planning on discussing that when @ferrys and @kcondon are both back from vacations.

@kcondon
Copy link
Contributor Author

kcondon commented Apr 6, 2018

Tested the basic functionality and it all works great with reasonable performance for test sets.

  • Had a question on how linked objects should be handled or if it was considered. Need to test.
  • Did a performance test of moving a large scale institutional dv (62 child dvs, 853 datasets) to a new root dv and it is still running after 17hours so something is happening there.
  • In testing a long running job, noticed there was no logging about the operation so hard to tell where it is at and more challenging to determine if it all worked.
  • Also noticed the API is synchronous, which is fine for small moves but maybe less convenient for larger moves. It was suggested it act like index all, synchronous at first to determine size of move/ anything to prevent move, then async with logging.
  • When moving larger dataverses the admin needs to be confident all items were moved and also some sense of progress if it is going to take a while so the way we do this with indexing is logging at the top of index all we post a message with account of what needs logging, then log each id, object type as it is indexed, then at the end post a completed message, elapsed time, and (I think) count of indexed and failures.

@kcondon
Copy link
Contributor Author

kcondon commented Apr 20, 2018

OK, tested the linking conflict behavior and link removal endpoints. Plus, retested performance.

  • Link conflicts are detected and require forceMove or removal as designed. If use forceMove, link is not actually removed and if move back, link is once again displayed. Not sure whether this is intentional.
  • Performance moving 12 dv's and 444 datasets with prod data is 18 mins. It seemed to index and pause and pauses and indexing got slower toward the end. Not sure what is good or what benchmark would be except against current raw index. Need to simulate somehow, maybe with selective index continue. I imagine it is also not fast but much faster.
    -Logging is much improved, now has start counts, progress, and finish count and time. There is a little chattiness around detecting links in the very beginning, a large amount of entries are just saying checking dataset links.

I will try to check straight index time and post for comparison here.

OK, performance for index without move is 11 mins:
12 dataverses and 444 datasets indexed. index all took 655610 milliseconds.

Performance for index when moving is 10.5 mins:
12 dataverses and 444 datasets indexed. Total time to index 630978

So, it looks like we're not adding any extra overhead to indexing, which is the most time consuming part. This looks good to go.

@djbrooke
Copy link
Contributor

@kcondon - thanks for the details. I moved this to QA since it sounds like you're still taking a look at some stuff. If you're done with it feel free to move it to the appropriate spot.

@ferrys
Copy link
Contributor

ferrys commented Apr 23, 2018

@kcondon thanks for the feedback! I fixed the issue where links weren't removed during a move & cleaned up the logging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants