Patches: nan summaries breaking some deployments, filter land modes on underlying (not display) label #141

Abby-Wheelis · 2024-07-22T19:12:27Z

To address #140 by updating the way we filter the data from display mode to underlying mode

What I am now using is mode_confirm, which should work for now (all of the label sets we have now using the label "air", but I think using the baseMode from the label file, which would be AIR, but that change will be somewhat more complicated.

If I figure out why there are so many Unknown ble mode entries in dfc-fermata, I'll fix that in this PR as well.

switching the condition from the display mode to the underlying mode

Abby-Wheelis · 2024-07-22T19:51:33Z

Looking into what's happening with dfc-fermata by comparing May in the public and admin dashboards, and the results are the same. There are about 63% UNKNOWN trips on both dashboards for May, so the display is accurate.

We should keep looking into why there are so many trips that are not connected to a mode from a BLE beacon

Abby-Wheelis · 2024-09-06T19:51:37Z

Testing with open-access data for the patch to #142

Abby-Wheelis · 2024-09-06T20:28:02Z

So... air -> Air is not in our current csv way of translating the labels, so they get grouped as "Other"

Great advocacy for getting those labels unified as soon as we can.

I think the quickest fix would be to add "air -> Air" to the csv, flagging that we should get the labels unified (I would try to do that here, but I think #148 / #143 needs to be completed first to prevent mode chaos. The we can retire the csvs as soon as possible as a follow-on/clean-up to the unification of colors (which is proving to require more work with labels than we first thought)

Abby-Wheelis · 2024-09-06T20:53:45Z

@iantei feel free to chime in on what you think about my approach to add air->Air to the csv for now, with the plan of unifying the labels and filtering on the base mode as soon as feasible.

Abby-Wheelis · 2024-09-06T20:56:50Z

This fixes #140 and #142

iantei · 2024-09-06T22:04:34Z

I think the approach to add air->Air to the csv would cover up for the default mapping case, which uses the csv for the mapping.
For the dynamic labels, currently we have usaid-laos-ev, godcgo-label-options label_options which have MODE.value = air, but caebike-label-options doesn't have MODE.value = air.
Would the above change also cover for this program?

An alternative temporary solution, which I can also think about is to modify the merge_small_entries(labels, values) function to restrict the conversion of labels = air from air to Other.
I think introducing new label on csv is also a good solution.

iantei · 2024-09-06T22:06:40Z

The code changes look good! Could you please test for the caebike-label-options program too?

Abby-Wheelis · 2024-09-07T16:00:35Z

@iantei since air is not an option in that program, I would expect any airplane trips to have a user-entered label and so they would get grouped as other, I'll test with that dataset just to be sure!

Abby-Wheelis · 2024-09-07T16:26:53Z

The two labeled charts look the same, despite the fact that there are quite a few trips that do seem to be "air" - but are user-custom labels. I wonder if ca-ebike would like to add the air mode? It would make their distance charts more representative (and not overwhelmed by "Other")

Abby-Wheelis · 2024-09-07T16:28:13Z

Counts are also largely "Other" - IIRC from the data, much of this might be testing related.

shankari · 2024-09-07T16:30:47Z

Can we get the breakdown of the "Other" to verify this?

At a high level, in a future fix, I think we should consider user labels if they represent a large enough proportion of the aggregate data. So if 40% of the data is "Other" and 90% of the "Other" is "Air", maybe we should just show "Air". But that is going to require a lot of judgement calls - what is "large enough proportion"? what if the text entered by users is subtly different? so let's punt on that for now.

Abby-Wheelis · 2024-09-07T16:54:30Z

The labeled trips at least are not air. Labels not in the dynamic labels include - drove_alone, shared_ride, bus, and e_car_shared_ride - all of the data that I have in this dump is from test users, and I'm guessing a lot of it is from before they added their custom labels.

The air/HSR trips (39/79,000 miles) must be unlabeled.

shankari · 2024-09-07T16:58:28Z

all of the data that I have in this dump is from test users, and I'm guessing a lot of it is from before they added their custom labels.

Yup, makes sense.

Abby-Wheelis · 2024-09-07T17:06:06Z

So aside from the tabled point of

At a high level, in a future fix, I think we should consider user labels if they represent a large enough proportion of the aggregate data. So if 40% of the data is "Other" and 90% of the "Other" is "Air", maybe we should just show "Air". But that is going to require a lot of judgement calls - what is "large enough proportion"? what if the text entered by users is subtly different? so let's punt on that for now.

This seems to be working as expected, and can serve as a patch pending the more principled unification of labels, and usage of base modes (I added to the csv, and we still filter on mode_confirm, should update to baseMode when we get rid of the csvs)

check more carefully for missing summaries

shankari · 2024-09-09T17:07:40Z

I'm not going to insist on this right now, but in a cleanup change, I think it would be helpful to maybe have a unified layer that handles both dashboards. Let's wait until the carbon calculations are integrated into the public dashboard, though. If the unified layer is small and generic enough, it could even go into the server codebase, so that anybody else working with mongodumps (e.g. TSDC) could use it as well.

Abby-Wheelis · 2024-09-10T15:16:48Z

I have tested with marking the entries that are malformated as INVALID as we do on the admin dash, but if I do that, they get grouped as OTHER since INVALID is not a valid sensed mode, but later when we merge small entries, they get marked as Other, leaving both OTHER and Other on the chart. I see 3 paths forward with this:

Mark the data as UNKNOWN rather than INVALID (arguably, the mode is unknown, but because the data is invalid)
add INVALID to the list of valid modes - it might become a segment in the chart, and would flag to us and program admins that there is an issue with the data
Continue to allow INVALID to be marked for display as OTHER, but prevent OTHER and Other from appearing on the same chart (merge small entries into OTHER for sensed modes, not Other

I'll collect opinions at our meeting later today!

Abby-Wheelis · 2024-09-10T21:03:37Z

They should be marked as INVALID and I should fix OTHER vs Other

Abby-Wheelis · 2024-09-10T22:38:39Z

What the chart with invalid data will look like now:

What it would look like if INVALID were OTHER (to test if existing OTHER gets pooled with the small entries)

shankari · 2024-09-15T18:07:00Z

@iantei @Abby-Wheelis this is not in "Ready for review", but I am deploying the related admin dashboard changes to staging, so am planning to merge and deploy this as well. Please make sure to pull this into future changes as well!

shankari · 2024-09-15T18:09:20Z

viz_scripts/plots.py

+    if v2l_df.index[0][-1].isupper():
+        print("Found uppercase last letter")
+        # This part if a bit tricky
+        # We could have already had a non-zero other, and it could be small or large
+        if "OTHER" not in v2l_df.index:
+            # zero other will end up with misc_count
+            if misc_count.vals > 0:
+                v2l_df.loc["OTHER"] = misc_count
+        elif "OTHER" in small_chunk.index:
+            # non-zero small other will already be in misc_count
+            v2l_df.loc["OTHER"] = misc_count
+        else:
+            # non-zero large other, will not already be in misc_count
+            v2l_df.loc["OTHER"] = v2l_df.loc["OTHER"] + misc_count


Ouch! I hope this is simplified once we start using the basemode values are remove the CSV

updates to land modes

1f872e8

switching the condition from the display mode to the underlying mode

Abby-Wheelis added 2 commits September 4, 2024 16:57

Merge remote-tracking branch 'upstream/main' into mode-patches

022f340

set nan sensed summaries as "UNKNOWN"

08f5b2f

add air-> Airplane to the csvs

59debf1

Abby-Wheelis marked this pull request as ready for review September 6, 2024 20:52

Abby-Wheelis changed the title ~~Represent only land modes~~ Patches: nan summaries breaking some deployments, filter land modes on underlying (not display) label Sep 6, 2024

Abby-Wheelis mentioned this pull request Sep 9, 2024

Fix Callback Error Updating store-trips.data - TypeError: 'float' object is not subscriptable fixed e-mission/op-admin-dashboard#121

Merged

use the same lambda function as the admin dash

aac5d72

check more carefully for missing summaries

invalid data is INVALID, sensed other as OTHER not Other

0809f2e

Abby-Wheelis mentioned this pull request Sep 12, 2024

Integrate common base mode colors into public dashboard #148

Merged

shankari approved these changes Sep 15, 2024

View reviewed changes

shankari merged commit 6137db4 into e-mission:main Sep 15, 2024

Abby-Wheelis mentioned this pull request Sep 15, 2024

🧹 ✈️ 👤 Pending cleanups #149

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patches: nan summaries breaking some deployments, filter land modes on underlying (not display) label #141

Patches: nan summaries breaking some deployments, filter land modes on underlying (not display) label #141

Abby-Wheelis commented Jul 22, 2024

Abby-Wheelis commented Jul 22, 2024

Abby-Wheelis commented Sep 6, 2024

Abby-Wheelis commented Sep 6, 2024

Abby-Wheelis commented Sep 6, 2024

Abby-Wheelis commented Sep 6, 2024

iantei commented Sep 6, 2024

iantei commented Sep 6, 2024

Abby-Wheelis commented Sep 7, 2024

Abby-Wheelis commented Sep 7, 2024

Abby-Wheelis commented Sep 7, 2024

shankari commented Sep 7, 2024 •

edited

Loading

Abby-Wheelis commented Sep 7, 2024

shankari commented Sep 7, 2024

Abby-Wheelis commented Sep 7, 2024

shankari commented Sep 9, 2024

Abby-Wheelis commented Sep 10, 2024

Abby-Wheelis commented Sep 10, 2024

Abby-Wheelis commented Sep 10, 2024

shankari commented Sep 15, 2024

shankari Sep 15, 2024

Patches: nan summaries breaking some deployments, filter land modes on underlying (not display) label #141

Patches: nan summaries breaking some deployments, filter land modes on underlying (not display) label #141

Conversation

Abby-Wheelis commented Jul 22, 2024

Abby-Wheelis commented Jul 22, 2024

Abby-Wheelis commented Sep 6, 2024

Abby-Wheelis commented Sep 6, 2024

Abby-Wheelis commented Sep 6, 2024

Abby-Wheelis commented Sep 6, 2024

iantei commented Sep 6, 2024

iantei commented Sep 6, 2024

Abby-Wheelis commented Sep 7, 2024

Abby-Wheelis commented Sep 7, 2024

Abby-Wheelis commented Sep 7, 2024

shankari commented Sep 7, 2024 • edited Loading

Abby-Wheelis commented Sep 7, 2024

shankari commented Sep 7, 2024

Abby-Wheelis commented Sep 7, 2024

shankari commented Sep 9, 2024

Abby-Wheelis commented Sep 10, 2024

Abby-Wheelis commented Sep 10, 2024

Abby-Wheelis commented Sep 10, 2024

shankari commented Sep 15, 2024

shankari Sep 15, 2024

Choose a reason for hiding this comment

shankari commented Sep 7, 2024 •

edited

Loading