Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Inferred Labels for Stacked Bar Charts #145

Merged
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
3ec13b7
1. Introduce load_viz_notebook_inferred_data(), filter_inferred_trips…
iantei Aug 16, 2024
9897c76
Utilize map_trip_data() for common trip mapping functionality in load…
iantei Aug 16, 2024
b4d704b
Add load_viz_notebook_inferred_data() function for inferred metrics, …
iantei Aug 18, 2024
add0e50
1. Introduce markdown for collecting data from database for Inferred …
iantei Aug 19, 2024
5d0546b
1. Add load_viz_notebook_inferred_data() to collect data from db 2. A…
iantei Aug 19, 2024
9998bee
Uncomment plot_title() and set_title_save() for total trip length in …
iantei Aug 19, 2024
0437710
1. Add commute_labeled/inferred_match regex, stacked_bar_quality_text…
iantei Aug 19, 2024
d098c18
1. Introduce regex to extract labeled_match and inferred_match 2. Use…
iantei Aug 19, 2024
704450a
Update expanded_ct_inferred_u80 to use expanded_ct_inferred instead o…
iantei Aug 19, 2024
49b2da4
Update in map_trip_data() param name from df to expanded_trip_df
iantei Aug 21, 2024
62c3af0
Re-order Inferred Trip Stacked Charts above Sensed.
iantei Sep 13, 2024
007f9bf
Update the index of axis and text_results for sensed and labeled trip…
iantei Sep 13, 2024
260e8ad
Update expand_inferredlabels(). Iterate over the inferred_ct to see i…
iantei Sep 17, 2024
3c33e75
Filter for inferred trip bar - it should have either user_input or in…
iantei Sep 17, 2024
72fcb20
Use confidence_threshold to filter labels from inferred_labels.
iantei Sep 18, 2024
3d718b5
Update bar_label for inferred bars from Inferred by OpenPATH ... to L…
iantei Sep 18, 2024
f096d5b
Merge branch 'main' into Introduce_Inferred_Labels_for_Stacked_Bar_Ch…
iantei Sep 18, 2024
83e259b
In case there is no user_input, and confidence_threshold is not met, …
iantei Sep 21, 2024
9905ca7
Replace use of iterrow over panda dataframe with df.apply() method. R…
iantei Sep 21, 2024
a983a5b
Update expand_inferredlabels() to expand_labeled_inferredlabels(), an…
iantei Sep 21, 2024
11d2823
Merge branch 'main' into Introduce_Inferred_Labels_for_Stacked_Bar_Ch…
iantei Sep 21, 2024
acf5da6
Fix merge with main - Introduce read_json_resource function. Introduc…
iantei Sep 21, 2024
4d8c403
Update load_viz_notebook_inferred_data() to be async, and call it as …
iantei Sep 21, 2024
1a5abfb
Fix type : debug_df_inferred from debug_df_inferre
iantei Sep 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 73 additions & 18 deletions viz_scripts/generic_metrics.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,31 @@
" sensed_algo_prefix)"
]
},
{
"cell_type": "markdown",
"id": "325e5eda",
"metadata": {},
"source": [
"## Collect Data from Database for Inferred Metrics"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c26ff5f5",
"metadata": {},
"outputs": [],
"source": [
"expanded_ct_inferred, file_suffix_inferred, quality_text_inferred, debug_df_inferred = scaffolding.load_viz_notebook_inferred_data(year,\n",
" month,\n",
" program,\n",
" study_type,\n",
" dynamic_labels,\n",
" dic_re,\n",
" dic_pur=dic_pur,\n",
" include_test_users=include_test_users)"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -171,9 +196,14 @@
"labeled_match = re.match(r'Based on ([0-9]+) confirmed trips from ([0-9]+) (users|testers and participants)\\nof ([0-9]+) total trips from ([0-9]+) (users|testers and participants) (\\(([0-9.]+|nan)%\\))', quality_text)\n",
"# labeled_match\n",
"stacked_bar_quality_text_labeled = f\"{labeled_match.group(1)} trips {labeled_match.group(7)}\\n from {labeled_match.group(2)} {labeled_match.group(3)}\"\n",
"\n",
"sensed_match = re.match(r'Based on ([0-9]+) trips from ([0-9]+) (users|testers and participants)', quality_text_sensed)\n",
"stacked_bar_quality_text_sensed = f\"{sensed_match.group(1)} trips (100%)\\n from {sensed_match.group(2)} {sensed_match.group(3)}\"\n",
"stacked_bar_quality_text_labeled, stacked_bar_quality_text_sensed"
"\n",
"inferred_match = re.match(r'Based on ([0-9]+) confirmed trips from ([0-9]+) (users|testers and participants)\\nof ([0-9]+) total trips from ([0-9]+) (users|testers and participants) (\\(([0-9.]+|nan)%\\))', quality_text_inferred)\n",
"stacked_bar_quality_text_inferred = f\"{inferred_match.group(1)} trips {inferred_match.group(7)}\\n from {inferred_match.group(2)} {inferred_match.group(3)}\"\n",
"\n",
"stacked_bar_quality_text_labeled, stacked_bar_quality_text_sensed, stacked_bar_quality_text_inferred"
Comment on lines +203 to +207
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we fix/clean this up soon. Even if we wanted to generate parametrized text, it is a lot clearer to have a structure with the parameters instead of parsing out from existing text using regular expressions. And now that we have converted over to bar charts, we don't need the backwards compat of the old Based on... text.

]
},
{
Expand Down Expand Up @@ -203,14 +233,16 @@
"plot_title_no_quality= \"Number of trips for each mode\"\n",
"\n",
"try:\n",
" fig, ax = plt.subplots(nrows=2, ncols=1, figsize=(15,2*2), sharex=True)\n",
" fig, ax = plt.subplots(nrows=3, ncols=1, figsize=(15,3*2), sharex=True)\n",
" # We will have text results corresponding to the axes for simplicity and consistency\n",
" text_results = [[\"Unmodified Alt Text\", \"Unmodified HTML\"], [\"Unmodified Alt Text\", \"Unmodified HTML\"]]\n",
" text_results = [[\"Unmodified Alt Text\", \"Unmodified HTML\"], [\"Unmodified Alt Text\", \"Unmodified HTML\"], [\"Unmodified Alt Text\", \"Unmodified HTML\"]]\n",
" \n",
" plot_and_text_stacked_bar_chart(expanded_ct, lambda df: (df.groupby(\"Mode_confirm\").agg({distance_col: 'count'}).sort_values(by=distance_col, ascending=False)), \n",
" \"Labeled by user\\n\"+stacked_bar_quality_text_labeled, ax[0], text_results[0], colors_mode, debug_df)\n",
" plot_and_text_stacked_bar_chart(expanded_ct_sensed, lambda df: (df.groupby(\"primary_mode\").agg({distance_col: 'count'}).sort_values(by=distance_col, ascending=False)), \n",
" \"Sensed by OpenPATH\\n\"+stacked_bar_quality_text_sensed, ax[1], text_results[1], colors_sensed, debug_df_sensed)\n",
" plot_and_text_stacked_bar_chart(expanded_ct_inferred, lambda df: (df.groupby(\"Mode_confirm\").agg({distance_col: 'count'}).sort_values(by=distance_col, ascending=False)), \n",
" \"Inferred by OpenPATH\\n\"+stacked_bar_quality_text_inferred, ax[2], text_results[2], colors_mode, debug_df_inferred)\n",
" \n",
" set_title_and_save(fig, text_results, plot_title_no_quality, file_name)\n",
"except (AttributeError, KeyError, pd.errors.UndefinedVariableError) as e:\n",
Expand Down Expand Up @@ -253,13 +285,23 @@
"\n",
" expanded_ct_commute = expanded_ct.query(trip_purpose_query)\n",
" commute_quality_text = scaffolding.get_quality_text(expanded_ct, expanded_ct_commute, \"commute\", include_test_users) if not expanded_ct.empty else \"\"\n",
" plot_title = plot_title_no_quality + \"\\n\" + commute_quality_text\n",
" \n",
" expanded_ct_inferred_commute = expanded_ct_inferred.query(trip_purpose_query)\n",
" commute_quality_text_inferred = scaffolding.get_quality_text(expanded_ct_inferred, expanded_ct_inferred_commute, \"commute\", include_test_users) if not expanded_ct_inferred.empty else \"\"\n",
" plot_title = plot_title_no_quality\n",
"\n",
" commute_labeled_match = re.match(r'Based on ([0-9]+) confirmed commute trips from ([0-9]+) (users|testers and participants)\\nof ([0-9]+) total confirmed trips from ([0-9]+) (users|testers and participants) (\\(([0-9.]+|nan)%\\))', commute_quality_text)\n",
" stacked_bar_quality_text_commute_labeled = f\"{commute_labeled_match.group(1)} trips {commute_labeled_match.group(7)}\\n from {commute_labeled_match.group(2)} {commute_labeled_match.group(3)}\"\n",
"\n",
" commute_inferred_match = re.match(r'Based on ([0-9]+) confirmed commute trips from ([0-9]+) (users|testers and participants)\\nof ([0-9]+) total confirmed trips from ([0-9]+) (users|testers and participants) (\\(([0-9.]+|nan)%\\))', commute_quality_text_inferred)\n",
" stacked_bar_quality_text_commute_inferred = f\"{commute_inferred_match.group(1)} trips {commute_inferred_match.group(7)}\\n from {commute_inferred_match.group(2)} {commute_inferred_match.group(3)}\"\n",
"\n",
" # Plot entries\n",
" fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(15,2*1), sharex=True) \n",
" text_results = [\"Unmodified Alt Text\", \"Unmodified HTML\"]\n",
" fig, ax = plt.subplots(nrows=2, ncols=1, figsize=(15,2*2), sharex=True) \n",
" text_results = [[\"Unmodified Alt Text\", \"Unmodified HTML\"], [\"Unmodified Alt Text\", \"Unmodified HTML\"]]\n",
" plot_and_text_stacked_bar_chart(expanded_ct_commute, lambda df: df.groupby(\"Mode_confirm\").agg({distance_col: 'count'}).sort_values(by=distance_col, ascending=False), \n",
" \"Labeled by user\\n (Confirmed trips)\", ax, text_results, colors_mode, debug_df)\n",
" \"Labeled by user\\n\"+stacked_bar_quality_text_commute_labeled, ax[0], text_results[0], colors_mode, debug_df)\n",
" plot_and_text_stacked_bar_chart(expanded_ct_inferred_commute, lambda df: df.groupby(\"Mode_confirm\").agg({distance_col: 'count'}).sort_values(by=distance_col, ascending=False), \n",
" \"Inferred by OpenPATH\\n\"+stacked_bar_quality_text_commute_inferred, ax[1], text_results[1], colors_mode, debug_df_inferred)\n",
" set_title_and_save(fig, text_results, plot_title, file_name)\n",
"except (AttributeError, KeyError, pd.errors.UndefinedVariableError) as e:\n",
" plt.clf()\n",
Expand Down Expand Up @@ -289,10 +331,12 @@
"plot_title_no_quality=\"Number of trips for each purpose\"\n",
"file_name= f\"ntrips_purpose{file_suffix}\"\n",
"try:\n",
" fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(15,2*1), sharex=True)\n",
" text_results = [\"Unmodified Alt Text\", \"Unmodified HTML\"]\n",
" fig, ax = plt.subplots(nrows=2, ncols=1, figsize=(15,2*2), sharex=True)\n",
" text_results = [[\"Unmodified Alt Text\", \"Unmodified HTML\"], [\"Unmodified Alt Text\", \"Unmodified HTML\"]]\n",
" plot_and_text_stacked_bar_chart(expanded_ct, lambda df: df.groupby(\"Trip_purpose\").agg({distance_col: 'count'}).sort_values(by=distance_col, ascending=False), \n",
" \"Labeled by user\\n\"+stacked_bar_quality_text_labeled, ax, text_results, colors_purpose, debug_df)\n",
" \"Labeled by user\\n\"+stacked_bar_quality_text_labeled, ax[0], text_results[0], colors_purpose, debug_df)\n",
" plot_and_text_stacked_bar_chart(expanded_ct_inferred, lambda df: df.groupby(\"Trip_purpose\").agg({distance_col: 'count'}).sort_values(by=distance_col, ascending=False), \n",
" \"Inferred by OpenPATH\\n\"+stacked_bar_quality_text_inferred, ax[1], text_results[1], colors_purpose, debug_df_inferred)\n",
" set_title_and_save(fig, text_results, plot_title_no_quality, file_name)\n",
"except (AttributeError, KeyError, pd.errors.UndefinedVariableError) as e:\n",
" plt.clf()\n",
Expand Down Expand Up @@ -336,17 +380,22 @@
" ## We do an existence check for the labeled df because we want to display the sensed value even if we don't have the labeled value\n",
" ## but we don't need to have an existence check for sensed because in that case we will have no data to display\n",
" expanded_ct_u80 = expanded_ct.loc[(expanded_ct['distance'] <= cutoff)] if \"Mode_confirm\" in expanded_ct.columns else None\n",
" expanded_ct_inferred_u80 = expanded_ct_inferred.loc[(expanded_ct_inferred['distance'] <= cutoff)] if \"Mode_confirm\" in expanded_ct_inferred.columns else None\n",
" expanded_ct_sensed_u80 = expanded_ct_sensed.loc[(expanded_ct_sensed['distance'] <= cutoff)]\n",
"\n",
" sensed_u80_quality_text = f\"{len(expanded_ct_sensed_u80)} trips ({round(len(expanded_ct_sensed_u80)/len(expanded_ct_sensed)*100)}% of all trips)\\nfrom {scaffolding.unique_users(expanded_ct_sensed_u80)} {sensed_match.group(3)}\"\n",
" labeled_u80_quality_text = f\"{len(expanded_ct_u80)} trips ({round(len(expanded_ct_u80)/len(expanded_ct)*100)}% of all labeled,\\n{round(len(expanded_ct_u80)/len(expanded_ct_sensed)*100)}% of all trips)\\nfrom {scaffolding.unique_users(expanded_ct_u80)} {sensed_match.group(3)}\" if \"Mode_confirm\" in expanded_ct.columns else \"0 labeled trips\"\n",
" \n",
" inferred_u80_quality_text = f\"{len(expanded_ct_inferred_u80)} trips ({round(len(expanded_ct_inferred_u80)/len(expanded_ct_inferred)*100)}% of all inferred,\\n{round(len(expanded_ct_inferred_u80)/len(expanded_ct_sensed)*100)}% of all trips)\\nfrom {scaffolding.unique_users(expanded_ct_inferred_u80)} {sensed_match.group(3)}\" if \"Mode_confirm\" in expanded_ct_inferred.columns else \"0 inferred trips\"\n",
"\n",
" # Plot entries\n",
" fig, ax = plt.subplots(nrows=2, ncols=1, figsize=(15,2*2), sharex=True)\n",
" text_results = [[\"Unmodified Alt Text\", \"Unmodified HTML\"], [\"Unmodified Alt Text\", \"Unmodified HTML\"]]\n",
" fig, ax = plt.subplots(nrows=3, ncols=1, figsize=(15,3*2), sharex=True)\n",
" text_results = [[\"Unmodified Alt Text\", \"Unmodified HTML\"], [\"Unmodified Alt Text\", \"Unmodified HTML\"], [\"Unmodified Alt Text\", \"Unmodified HTML\"]]\n",
" plot_and_text_stacked_bar_chart(expanded_ct_u80, lambda df: df.groupby(\"Mode_confirm\").agg({distance_col: 'count'}).sort_values(by=distance_col, ascending=False), \n",
" \"Labeled by user\\n\"+labeled_u80_quality_text, ax[0], text_results[0], colors_mode, debug_df)\n",
" plot_and_text_stacked_bar_chart(expanded_ct_sensed_u80, lambda df: df.groupby(\"primary_mode\").agg({distance_col: 'count'}).sort_values(by=distance_col, ascending=False), \n",
" \"Sensed by OpenPATH\\n\"+sensed_u80_quality_text, ax[1], text_results[1], colors_sensed, debug_df_sensed)\n",
" plot_and_text_stacked_bar_chart(expanded_ct_inferred, lambda df: df.groupby(\"Mode_confirm\").agg({distance_col: 'count'}).sort_values(by=distance_col, ascending=False), \n",
" \"Inferred by OpenPATH\\n\"+inferred_u80_quality_text, ax[2], text_results[2], colors_mode, debug_df_inferred)\n",
" set_title_and_save(fig, text_results, plot_title_no_quality, file_name)\n",
"except (AttributeError, KeyError, pd.errors.UndefinedVariableError) as e:\n",
" # we can have an missing attribute error during the pre-procssing, in which case we should show the missing plot\n",
Expand Down Expand Up @@ -380,13 +429,15 @@
"file_name =f\"total_trip_length{file_suffix}\"\n",
"\n",
"try:\n",
" fig, ax = plt.subplots(nrows=2, ncols=1, figsize=(15,2*2), sharex=True)\n",
" fig, ax = plt.subplots(nrows=3, ncols=1, figsize=(15,3*2), sharex=True)\n",
" \n",
" text_results = [[\"Unmodified Alt Text\", \"Unmodified HTML\"], [\"Unmodified Alt Text\", \"Unmodified HTML\"]]\n",
" text_results = [[\"Unmodified Alt Text\", \"Unmodified HTML\"], [\"Unmodified Alt Text\", \"Unmodified HTML\"], [\"Unmodified Alt Text\", \"Unmodified HTML\"]]\n",
" plot_and_text_stacked_bar_chart(expanded_ct, lambda df: df.groupby(\"Mode_confirm\").agg({distance_col: 'sum'}).sort_values(by=distance_col, ascending=False), \n",
" \"Labeled by user\\n\"+stacked_bar_quality_text_labeled, ax[0], text_results[0], colors_mode, debug_df)\n",
" plot_and_text_stacked_bar_chart(expanded_ct_sensed, lambda df: df.groupby(\"primary_mode\").agg({distance_col: 'sum'}).sort_values(by=distance_col, ascending=False), \n",
" \"Sensed by OpenPATH\\n\"+stacked_bar_quality_text_sensed, ax[1], text_results[1], colors_sensed, debug_df_sensed)\n",
" plot_and_text_stacked_bar_chart(expanded_ct_inferred, lambda df: df.groupby(\"Mode_confirm\").agg({distance_col: 'sum'}).sort_values(by=distance_col, ascending=False), \n",
" \"Inferred by OpenPATH\\n\"+stacked_bar_quality_text_inferred, ax[2], text_results[2], colors_mode, debug_df_inferred)\n",
" set_title_and_save(fig, text_results, plot_title_no_quality, file_name) \n",
"except (AttributeError, KeyError, pd.errors.UndefinedVariableError) as e:\n",
" plt.clf()\n",
Expand Down Expand Up @@ -420,16 +471,20 @@
" ## We do an existence check for the labeled df because we want to display the sensed value even if we don't have the labeled value\n",
" ## but we don't need to have an existence check for sensed because in that case we will have no data to display\n",
" labeled_land_trips_df = expanded_ct[expanded_ct['Mode_confirm'] != \"Airplane\"] if \"Mode_confirm\" in expanded_ct.columns else None\n",
" inferred_land_trips_df = expanded_ct_inferred[expanded_ct_inferred['Mode_confirm'] != \"Airplane\"] if \"Mode_confirm\" in expanded_ct_inferred.columns else None\n",
" sensed_land_trips_df = expanded_ct_sensed[expanded_ct_sensed['primary_mode'] != \"AIR_OR_HSR\"]\n",
" \n",
" sensed_land_quality_text = f\"{len(sensed_land_trips_df)} trips ({round(len(sensed_land_trips_df)/len(expanded_ct_sensed)*100)}% of all trips)\\nfrom {scaffolding.unique_users(sensed_land_trips_df)} {sensed_match.group(3)}\"\n",
" labeled_land_quality_text = f\"{len(labeled_land_trips_df)} trips ({round(len(labeled_land_trips_df)/len(expanded_ct)*100)}% of all labeled,\\n{round(len(labeled_land_trips_df)/len(expanded_ct_sensed)*100)}%) of all trips)\\nfrom {scaffolding.unique_users(labeled_land_trips_df)} {sensed_match.group(3)}\" if \"Mode_confirm\" in expanded_ct.columns else \"0 labeled trips\"\n",
"\n",
" fig, ax = plt.subplots(nrows=2, ncols=1, figsize=(15,2*2), sharex=True)\n",
" inferred_land_quality_text = f\"{len(inferred_land_trips_df)} trips ({round(len(inferred_land_trips_df)/len(expanded_ct_inferred)*100)}% of all inferred,\\n{round(len(inferred_land_trips_df)/len(expanded_ct_sensed)*100)}%) of all trips)\\nfrom {scaffolding.unique_users(inferred_land_trips_df)} {sensed_match.group(3)}\" if \"Mode_confirm\" in expanded_ct_inferred.columns else \"0 inferred trips\"\n",
" \n",
" fig, ax = plt.subplots(nrows=3, ncols=1, figsize=(15,3*2), sharex=True)\n",
" plot_and_text_stacked_bar_chart(labeled_land_trips_df, lambda df: df.groupby(\"Mode_confirm\").agg({distance_col: 'sum'}).sort_values(by=distance_col, ascending=False), \n",
" \"Labeled by user\\n\"+labeled_land_quality_text, ax[0], text_results[0], colors_mode, debug_df)\n",
" plot_and_text_stacked_bar_chart(sensed_land_trips_df, lambda df: df.groupby(\"primary_mode\").agg({distance_col: 'sum'}).sort_values(by=distance_col, ascending=False), \n",
" \"Sensed by OpenPATH\\n\"+sensed_land_quality_text, ax[1], text_results[1], colors_sensed, debug_df_sensed)\n",
" plot_and_text_stacked_bar_chart(inferred_land_trips_df, lambda df: df.groupby(\"Mode_confirm\").agg({distance_col: 'sum'}).sort_values(by=distance_col, ascending=False), \n",
" \"Inferred by OpenPATH\\n\"+inferred_land_quality_text, ax[2], text_results[2], colors_mode, debug_df_inferred)\n",
" set_title_and_save(fig, text_results, plot_title_no_quality, file_name) \n",
"except (AttributeError, KeyError, pd.errors.UndefinedVariableError) as e:\n",
" plt.clf()\n",
Expand Down
Loading