Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some deployments are broken? #142

Open
Abby-Wheelis opened this issue Jul 31, 2024 · 8 comments
Open

Some deployments are broken? #142

Abby-Wheelis opened this issue Jul 31, 2024 · 8 comments

Comments

@Abby-Wheelis
Copy link
Member

I tried to check the open-access dashboard recently, and noticed it is broken in a strange way, a few others I've found are broken in the same way:

  • durham
  • caeb-co
  • bikethere-garfield-county
  • mm-masscec

This is what it looks like:
Screenshot 2024-07-31 at 2 57 38 PM

or

Screenshot 2024-07-31 at 2 58 46 PM

Which is very confusing because the pie charts have been retired for a while, and some have the bar charts

I wonder what could be causing this behavior? Is it because they are all old studies/programs?

@Abby-Wheelis
Copy link
Member Author

For one of the problematic deployments, I found an error in the logs when trying to run generic-metrics :

expanded_ct_sensed, file_suffix_sensed, quality_text_sensed, debug_df_sensed = scaffolding.load_viz_notebook_sensor_inference_data(...

The same call in generic_metrics_sensed also throws an error, and in generic_timeseries

Clearly there is a but with the load_viz_notebook_sensor_inference_data call, I think my next step is to try and reproduce locally.

@Abby-Wheelis
Copy link
Member Author

Looking at the details a bit more while I wait for the data to load, it looks like the pie chart was generated on 5/10, so is likely from just before the changes were merged. I'm guessing the two charts have the same name.
Screenshot 2024-09-05 at 2 17 48 PM

@Abby-Wheelis
Copy link
Member Author

Reproduced and have the full error now! Ran out of time today but will resume tomorrow

Loaded expanded_ct with length 41449 for None
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[6], line 1
----> 1 expanded_ct_sensed, file_suffix_sensed, quality_text_sensed, debug_df_sensed = scaffolding.load_viz_notebook_sensor_inference_data(year,
      2                                                                             month,
      3                                                                             program,
      4                                                                             include_test_users,
      5                                                                             sensed_algo_prefix)

File /usr/src/app/saved-notebooks/scaffolding.py:243, in load_viz_notebook_sensor_inference_data(year, month, program, include_test_users, sensed_algo_prefix)
    241 print(f"Loaded expanded_ct with length {len(expanded_ct)} for {tq}")
    242 if len(expanded_ct) > 0:
--> 243     expanded_ct["primary_mode_non_other"] = participant_ct_df.cleaned_section_summary.apply(lambda md: max(md["distance"], key=md["distance"].get))
    244     expanded_ct.primary_mode_non_other.replace({"ON_FOOT": "WALKING"}, inplace=True)
    245     valid_sensed_modes = ["WALKING", "BICYCLING", "IN_VEHICLE", "AIR_OR_HSR", "UNKNOWN"]

File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/pandas/core/series.py:4771, in Series.apply(self, func, convert_dtype, args, **kwargs)
   4661 def apply(
   4662     self,
   4663     func: AggFuncType,
   (...)
   4666     **kwargs,
   4667 ) -> DataFrame | Series:
   4668     """
   4669     Invoke function on values of Series.
   4670 
   (...)
   4769     dtype: float64
   4770     """
-> 4771     return SeriesApply(self, func, convert_dtype, args, kwargs).apply()

File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/pandas/core/apply.py:1123, in SeriesApply.apply(self)
   1120     return self.apply_str()
   1122 # self.f is Callable
-> 1123 return self.apply_standard()

File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/pandas/core/apply.py:1174, in SeriesApply.apply_standard(self)
   1172     else:
   1173         values = obj.astype(object)._values
-> 1174         mapped = lib.map_infer(
   1175             values,
   1176             f,
   1177             convert=self.convert_dtype,
   1178         )
   1180 if len(mapped) and isinstance(mapped[0], ABCSeries):
   1181     # GH#43986 Need to do list(mapped) in order to get treated as nested
   1182     #  See also GH#25959 regarding EA support
   1183     return obj._constructor_expanddim(list(mapped), index=obj.index)

File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/pandas/_libs/lib.pyx:2924, in pandas._libs.lib.map_infer()

File /usr/src/app/saved-notebooks/scaffolding.py:243, in load_viz_notebook_sensor_inference_data.<locals>.<lambda>(md)
    241 print(f"Loaded expanded_ct with length {len(expanded_ct)} for {tq}")
    242 if len(expanded_ct) > 0:
--> 243     expanded_ct["primary_mode_non_other"] = participant_ct_df.cleaned_section_summary.apply(lambda md: max(md["distance"], key=md["distance"].get))
    244     expanded_ct.primary_mode_non_other.replace({"ON_FOOT": "WALKING"}, inplace=True)
    245     valid_sensed_modes = ["WALKING", "BICYCLING", "IN_VEHICLE", "AIR_OR_HSR", "UNKNOWN"]

TypeError: 'float' object is not subscriptable

@Abby-Wheelis
Copy link
Member Author

The issue is that one of the rows has nan as the entry for participant_ct_df.cleaned_section_summary and nan is not subscriptable

@shankari
Copy link
Contributor

shankari commented Sep 6, 2024

I wonder if this and e-mission/op-admin-dashboard#120 are related.
Really curious about why we are getting nan; maybe this is a backwards compat issue that we didn't address?!

@Abby-Wheelis
Copy link
Member Author

It does seem to be present mainly in older deployments, so a missing backwards compat issue would make sense to me

I have been able to recover from the error by defaulting to UNKNOWN when the summary is nan

 expanded_ct["primary_mode_non_other"] = participant_ct_df.cleaned_section_summary.apply(lambda md: max(md["distance"], key=md["distance"].get) if not isinstance(md, float) else "UNKNOWN")

@Abby-Wheelis
Copy link
Member Author

High rate of "UNKNOWN" now

image

With a bit of checking months and rates of unknown (and NAN counts):

Month %UNKNOWN sensed num NAN
8/2022 33% 250
12/2022 49% 291
2/2023 39% 222
6/2023 41% 497
7/2023 19% 184
8/2023 13% 15
2/2024 23% 0

It looks like there might be higher rates for older data?

@shankari
Copy link
Contributor

shankari commented Sep 6, 2024

I think we should go ahead with this fix, but should file a follow up issue to investigate the older data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Issues being worked on
Development

No branches or pull requests

2 participants