Skip to content

Commit

Permalink
Merge pull request #2997 from plotly/trendlines2
Browse files Browse the repository at this point in the history
Trendline options
  • Loading branch information
nicolaskruchten authored Aug 13, 2021
2 parents 9f8633a + f42c5af commit ed48215
Show file tree
Hide file tree
Showing 9 changed files with 568 additions and 85 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ This project adheres to [Semantic Versioning](http://semver.org/).
### Added
- Extra flags were added to the `gapminder` and `stocks` dataset to facilitate testing, documentation and demos [#3305](https://github.com/plotly/plotly.py/issues/3305)
- All line-like Plotly Express functions now accept `markers` argument to display markers, and all but `line_mapbox` accept `symbol` to map a field to the symbol attribute, similar to scatter-like functions [#3326](https://github.com/plotly/plotly.py/issues/3326)
- `px.scatter` and `px.density_contours` now support new `trendline` types `'rolling'`, `'expanding'` and `'ewm'` [#2997](https://github.com/plotly/plotly.py/pull/2997)
- `px.scatter` and `px.density_contours` now support new `trendline_options` argument to parameterize trendlines, with support for constant control and log-scaling in `'ols'` and specification of the fraction used for `'lowess'`, as well as pass-through to Pandas for `'rolling'`, `'expanding'` and `'ewm'` [#2997](https://github.com/plotly/plotly.py/pull/2997)
- `px.scatter` and `px.density_contours` now support new `trendline_scope` argument that accepts the value `'overall'` to request a single trendline for all traces, including across facets and animation frames [#2997](https://github.com/plotly/plotly.py/pull/2997)

### Fixed
- Fixed regression introduced in version 5.0.0 where pandas/numpy arrays with `dtype` of Object were being converted to `list` values when added to a Figure ([#3292](https://github.com/plotly/plotly.py/issues/3292), [#3293](https://github.com/plotly/plotly.py/pull/3293))
Expand Down
3 changes: 3 additions & 0 deletions doc/apidoc/plotly.express.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ plotly's high-level API for rapid figure generation. ::
density_heatmap
density_mapbox
imshow
set_mapbox_access_token
get_trendline_results


`plotly.express` subpackages
Expand All @@ -60,3 +62,4 @@ plotly's high-level API for rapid figure generation. ::

generated/plotly.express.data.rst
generated/plotly.express.colors.rst
generated/plotly.express.trendline_functions.rst
165 changes: 156 additions & 9 deletions doc/python/linear-fits.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ jupyter:
text_representation:
extension: .md
format_name: markdown
format_version: '1.1'
jupytext_version: 1.1.1
format_version: '1.2'
jupytext_version: 1.4.2
kernelspec:
display_name: Python 3
language: python
Expand All @@ -20,11 +20,12 @@ jupyter:
name: python
nbconvert_exporter: python
pygments_lexer: ipython3
version: 3.6.8
version: 3.7.7
plotly:
description: Add linear Ordinary Least Squares (OLS) regression trendlines or
non-linear Locally Weighted Scatterplot Smoothing (LOWESS) trendlines to scatterplots
in Python.
in Python. Options for moving averages (rolling means) as well as exponentially-weighted
and expanding functions.
display_as: statistical
language: python
layout: base
Expand All @@ -39,7 +40,7 @@ jupyter:

[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/).

Plotly Express allows you to add [Ordinary Least](https://en.wikipedia.org/wiki/Ordinary_least_squares) Squares regression trendline to scatterplots with the `trendline` argument. In order to do so, you will need to install `statsmodels` and its dependencies. Hovering over the trendline will show the equation of the line and its R-squared value.
Plotly Express allows you to add [Ordinary Least Squares](https://en.wikipedia.org/wiki/Ordinary_least_squares) regression trendline to scatterplots with the `trendline` argument. In order to do so, you will need to [install `statsmodels` and its dependencies](https://www.statsmodels.org/stable/install.html). Hovering over the trendline will show the equation of the line and its R-squared value.

```python
import plotly.express as px
Expand All @@ -66,14 +67,160 @@ print(results)
results.query("sex == 'Male' and smoker == 'Yes'").px_fit_results.iloc[0].summary()
```

### Non-Linear Trendlines
### Displaying a single trendline with multiple traces

Plotly Express also supports non-linear [LOWESS](https://en.wikipedia.org/wiki/Local_regression) trendlines.
_new in v5.2_

To display a single trendline using the entire dataset, set the `trendline_scope` argument to `"overall"`. The same trendline will be overlaid on all facets and animation frames. The trendline color can be overridden with `trendline_color_override`.

```python
import plotly.express as px

df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", symbol="smoker", color="sex", trendline="ols", trendline_scope="overall")
fig.show()
```

```python
import plotly.express as px

df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", facet_col="smoker", color="sex",
trendline="ols", trendline_scope="overall", trendline_color_override="black")
fig.show()
```

### OLS Parameters

_new in v5.2_

OLS trendlines can be fit with log transformations to both X or Y data using the `trendline_options` argument, independently of whether or not the plot has [logarithmic axes](https://plotly.com/python/log-plot/).

```python
import plotly.express as px

df = px.data.gapminder(year=2007)
fig = px.scatter(df, x="gdpPercap", y="lifeExp",
trendline="ols", trendline_options=dict(log_x=True),
title="Log-transformed fit on linear axes")
fig.show()
```

```python
import plotly.express as px

df = px.data.gapminder(year=2007)
fig = px.scatter(df, x="gdpPercap", y="lifeExp", log_x=True,
trendline="ols", trendline_options=dict(log_x=True),
title="Log-scaled X axis and log-transformed fit")
fig.show()
```

### Locally WEighted Scatterplot Smoothing (LOWESS)

Plotly Express also supports non-linear [LOWESS](https://en.wikipedia.org/wiki/Local_regression) trendlines. In order use this feature, you will need to [install `statsmodels` and its dependencies](https://www.statsmodels.org/stable/install.html).

```python
import plotly.express as px

df = px.data.stocks(datetimes=True)
fig = px.scatter(df, x="date", y="GOOG", trendline="lowess")
fig.show()
```

_new in v5.2_

The level of smoothing can be controlled via the `frac` trendline option, which indicates the fraction of the data that the LOWESS smoother should include. The default is a fairly smooth line with `frac=0.6666` and lowering this fraction will give a line that more closely follows the data.

```python
import plotly.express as px

df = px.data.stocks(datetimes=True)
fig = px.scatter(df, x="date", y="GOOG", trendline="lowess", trendline_options=dict(frac=0.1))
fig.show()
```

### Moving Averages

_new in v5.2_

Plotly Express can leverage Pandas' [`rolling`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html), [`ewm`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.ewm.html) and [`expanding`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.expanding.html) functions in trendlines as well, for example to display moving averages. Values passed to `trendline_options` are passed directly to the underlying Pandas function (with the exception of the `function` and `function_options` keys, see below).

```python
import plotly.express as px

df = px.data.stocks(datetimes=True)
fig = px.scatter(df, x="date", y="GOOG", trendline="rolling", trendline_options=dict(window=5),
title="5-point moving average")
fig.show()
```

```python
import plotly.express as px

df = px.data.stocks(datetimes=True)
fig = px.scatter(df, x="date", y="GOOG", trendline="ewm", trendline_options=dict(halflife=2),
title="Exponentially-weighted moving average (halflife of 2 points)")
fig.show()
```

```python
import plotly.express as px

df = px.data.stocks(datetimes=True)
fig = px.scatter(df, x="date", y="GOOG", trendline="expanding", title="Expanding mean")
fig.show()
```

### Other Functions

The `rolling`, `expanding` and `ewm` trendlines support other functions than the default `mean`, enabling, for example, a moving-median trendline, or an expanding-max trendline.

```python
import plotly.express as px

df = px.data.gapminder().query("year == 2007")
fig = px.scatter(df, x="gdpPercap", y="lifeExp", color="continent", trendline="lowess")
df = px.data.stocks(datetimes=True)
fig = px.scatter(df, x="date", y="GOOG", trendline="rolling", trendline_options=dict(function="median", window=5),
title="Rolling Median")
fig.show()
```

```python
import plotly.express as px

df = px.data.stocks(datetimes=True)
fig = px.scatter(df, x="date", y="GOOG", trendline="expanding", trendline_options=dict(function="max"),
title="Expanding Maximum")
fig.show()
```

In some cases, it is necessary to pass options into the underying Pandas function, for example the `std` parameter must be provided if the `win_type` argument to `rolling` is `"gaussian"`. This is possible with the `function_args` trendline option.

```python
import plotly.express as px

df = px.data.stocks(datetimes=True)
fig = px.scatter(df, x="date", y="GOOG", trendline="rolling",
trendline_options=dict(window=5, win_type="gaussian", function_args=dict(std=2)),
title="Rolling Mean with Gaussian Window")
fig.show()
```

### Displaying only the trendlines

In some cases, it may be desirable to show only the trendlines, by removing the scatter points.

```python
import plotly.express as px

df = px.data.stocks(indexed=True, datetimes=True)
fig = px.scatter(df, trendline="rolling", trendline_options=dict(window=5),
title="5-point moving average")
fig.data = [t for t in fig.data if t.mode == "lines"]
fig.update_traces(showlegend=True) #trendlines have showlegend=False by default
fig.show()
```

```python

```
3 changes: 2 additions & 1 deletion packages/python/plotly/plotly/express/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@

from ._special_inputs import IdentityMap, Constant, Range # noqa: F401

from . import data, colors # noqa: F401
from . import data, colors, trendline_functions # noqa: F401

__all__ = [
"scatter",
Expand Down Expand Up @@ -100,6 +100,7 @@
"imshow",
"data",
"colors",
"trendline_functions",
"set_mapbox_access_token",
"get_trendline_results",
"IdentityMap",
Expand Down
4 changes: 4 additions & 0 deletions packages/python/plotly/plotly/express/_chart_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,9 @@ def scatter(
marginal_x=None,
marginal_y=None,
trendline=None,
trendline_options=None,
trendline_color_override=None,
trendline_scope="trace",
log_x=False,
log_y=False,
range_x=None,
Expand Down Expand Up @@ -90,7 +92,9 @@ def density_contour(
marginal_x=None,
marginal_y=None,
trendline=None,
trendline_options=None,
trendline_color_override=None,
trendline_scope="trace",
log_x=False,
log_y=False,
range_x=None,
Expand Down
Loading

0 comments on commit ed48215

Please sign in to comment.