-
Notifications
You must be signed in to change notification settings - Fork 9
Convert meet data to incremental additions #112
Conversation
Ran this pr + #99 with --meet. On the first run, got this error:
On second run, got:
|
478250c
to
1f41047
Compare
@denglender Sorry about that. The first error seems to be something transient that I've heard about, but have never been able to repro before. If it is coming up frequently, let me know. The second error was a start condition I hadn't checked. It should be fixed now. Just to clarify, you do not need #99 as well — just this one on top of the main branch should do. |
bba9cfb
to
86d011a
Compare
Ahh, it looks like on subsequent pages, the changed startTime causes the request to return nothing. I just tested it storing the startTime parameter from the first request, and this seems to be working better (I pulled all the data, deleted the last few days, and then pulled again). Thanks for your patience @denglender. Hoping we have a working solution now. |
Sorry for the delay on feedback on this --- it still appears to be missing data after a few runs, but I haven't been able to identify yet exactly the parameters of what's missing. If I run the incremental sync a few times over a the course of a few hours, pull the data, and then run the full sync (i.e. without the patch), the data won't exactly match. |
Hi @denglender, let me know if you can determine what data might be missing. I've tested by running a fresh pull, deleting some data, and then calling it again to make sure it has the same data, which should have the same effect, but perhaps there's a timezone issue cropping up somewhere? I'll keep playing around with this to see if I can identify the issue you're having as well. |
I don't think it's a time zone issue. It looks like there's just some data near the edge of the refresh which isn't getting pulled in. As an example, I ran a full refresh last night, then incremental refreshes today at 11:08 AM, 12:30 PM, and 1:26 PM. I pulled a copy of the whole DB table for today, then re-ran the sync using the full-sync code (i.e. what's currently live in the repo), pulled a copy of the whole DB again and compared. After eliminating the data from after 1:26 PM, the DB from the incremental syncs has 11262 rows. The DB from the full sync has 11409 rows. Those missing rows are scattered mostly around the 11:08, 12:30, and 1:26 time periods. I've shown them in attached screenshot (yellow rows are rows that are in the full sync but not incremental, data is sorted chronologically with latest at top). In each "grouping" of the missing/yellow rows, here are the ranges that all the yellow/missing rows fall within: 1:06 PM - 1:13 PM, 12:04 PM - 12:18 PM, and 10:42 - 10:56 PM. Not all the rows in those time frames are missing from the incremental data version, but no rows are missing outside of those ranges. So it kind of looks like there's an approx 15 minute period of time where the data reported by the API is incomplete, plus an additional an approx 15 minute delay in reporting it. i.e. at the 1:26 pull, there was no data reported after 1:13 (~ 15 min), and the data after 1:06 was incomplete (another ~10 minutes in this case, but 15 min in some of the others). I'm not sure how reliable those time gaps are. If they're moderately reliable, then perhaps looking back a little further (e.g. an hour) prior to the most recent DB entry whenever the incremental data is run could do the trick. |
@denglender Makes sense, and understandable that Google wouldn't have up-to-the-second data available in the API. The safest approach is probably to drop the last day and start from there, as we don't want to rely on how quickly Google processes data and makes it available. I'll take a look at making those changes in the next few days and update this PR with it. |
86d011a
to
373ef2e
Compare
@denglender Thanks for your patience. The latest version here should hopefully work. It gets the last date and drops the 24 prior hours, so that it can replace any outdated data with up-to-date data that might've been missing in the last pull. Give that a try and see if this is complete now! |
This looks like it's working to me. And solves the original issue. Data pull with new code took 3 minutes, compared to 129 minutes for the full load. |
Great to hear @denglender! @dchess, thoughts on merging this in? |
373ef2e
to
bcf26da
Compare
Rebased and ready to merge whenever you want. |
@zkagin Black formatter is showing no errors on my system but the automated tests are failing. I think my local environment is on a different version of black (19.10b0) than the Github Actions runner is. I am going to force merge this for now and then open a separate PR to bump this codebase to Python 3.8. |
Fixes #100
Converts Meet pulls so that it only pulls the latest data.