Skip to content

Commit

Permalink
Update readme and overview
Browse files Browse the repository at this point in the history
  • Loading branch information
e10v committed Jul 14, 2024
1 parent e37a462 commit 4cdc1b7
Show file tree
Hide file tree
Showing 2 changed files with 42 additions and 328 deletions.
337 changes: 11 additions & 326 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,23 +18,7 @@

**tea-tasting** calculates statistics within data backends such as BigQuery, ClickHouse, PostgreSQL, Snowflake, Spark, and other of 20+ backends supported by [Ibis](https://ibis-project.org/). This approach eliminates the need to import granular data into a Python environment, though Pandas DataFrames are also supported.

**tea-tasting** is still in alpha, but already includes all the features listed above. The following features are coming soon:

- Power analysis.
- A/A tests and simulations.
- More statistical tests:
- Asymptotic and exact tests for frequency data.
- Mann–Whitney U test.

## Installation

```bash
pip install tea-tasting
```

## Basic usage

Begin with this simple example to understand the basic functionality:
## Basic example

```python
import tea_tasting as tt
Expand All @@ -58,317 +42,18 @@ print(result)
#> revenue_per_user 5.24 5.73 9.3% [-2.4%, 22%] 0.123
```

In the following sections, each step of this process will be explained in detail.

### Input data

The `make_users_data` function creates synthetic data for demonstration purposes. This data mimics what you might encounter in an A/B test for an online store. Each row represents an individual user, with the following columns:

- `user`: The unique identifier for each user.
- `variant`: The specific variant (e.g., 0 or 1) assigned to each user in the A/B test.
- `sessions`: The total number of user's sessions.
- `orders`: The total number of user's orders.
- `revenue`: The total revenue generated by the user.

**tea-tasting** accepts data as either a Pandas DataFrame or an Ibis Table. [Ibis](https://ibis-project.org/) is a Python package which serves as a DataFrame API to various data backends. It supports 20+ backends including BigQuery, ClickHouse, DuckDB, Polars, PostgreSQL, Snowflake, Spark etc. You can write an SQL-query, [wrap](https://ibis-project.org/how-to/extending/sql#backend.sql) it as an Ibis Table and pass it to **tea-tasting**.

Many statistical tests, like Student's t-test or Z-test, don't need granular data for analysis. For such tests, **tea-tasting** will query aggregated statistics, like mean and variance, instead of downloading all the detailed data.

**tea-tasting** assumes that:

- The data is grouped by randomization units, such as individual users.
- There is a column indicating the variant of the A/B test (typically labeled as A, B, etc.).
- All necessary columns for metric calculations (like the number of orders, revenue, etc.) are included in the table.

### A/B test definition

The `Experiment` class defines the parameters of an A/B test: metrics and a variant column name. There are two ways to define metrics:

- Using keyword parameters, with metric names as parameter names and metric definitions as parameter values, as in example above.
- Using the first argument `metrics` which accepts metrics in a form of dictionary with metric names as keys and metric definitions as values.

By default, **tea-testing** assumes that A/B test variant is stored in a column named `"variant"`. You can change it using the `variant` parameter of the `Experiment` class.

Example usage:

```python
experiment = tt.Experiment(
{
"sessions per user": tt.Mean("sessions"),
"orders per session": tt.RatioOfMeans("orders", "sessions"),
"orders per user": tt.Mean("orders"),
"revenue per user": tt.Mean("revenue"),
},
variant="variant",
)
```

### Metrics

Metrics are instances of metric classes which define how metrics are calculated. Those calculations include calculation of effect size, confidence interval, p-value and other statistics.

Use the `Mean` class to compare averages between variants of an A/B test. For example, average number of orders per user, where user is a randomization unit of an experiment. Specify the column containing the metric values using the first parameter `value`.

Use the `RatioOfMeans` class to compare ratios of averages between variants of an A/B test. For example, average number of orders per average number of sessions. Specify the columns containing the numerator and denominator values using the parameters `numer` and `denom`.

Use the following parameters of `Mean` and `RatioOfMeans` to customize the analysis:

- `alternative`: Alternative hypothesis. The following options are available:
- `"two-sided"` (default): the means are unequal.
- `"greater"`: the mean in the treatment variant is greater than the mean in the control variant.
- `"less"`: the mean in the treatment variant is less than the mean in the control variant.
- `confidence_level`: Confidence level of the confidence interval. Default is `0.95`.
- `equal_var`: Defines whether equal variance is assumed. If `True`, pooled variance is used for the calculation of the standard error of the difference between two means. Default is `False`.
- `use_t`: Defines whether to use the Student's t-distribution (`True`) or the Normal distribution (`False`). Default is `True`.

Example usage:

```python
experiment = tt.Experiment(
sessions_per_user=tt.Mean("sessions", alternative="greater"),
orders_per_session=tt.RatioOfMeans("orders", "sessions", confidence_level=0.9),
orders_per_user=tt.Mean("orders", equal_var=True),
revenue_per_user=tt.Mean("revenue", use_t=False),
)
```

Look for other supported metrics in the [Metrics](https://tea-tasting.e10v.me/api/metrics/) reference.

You can change the default values of these four parameters using global settings (see details below).

### Analyzing and retrieving experiment results

After defining an experiment and metrics, you can analyze the experiment data using the `analyze` method of the `Experiment` class. This method takes data as an input and returns an `ExperimentResult` object with experiment result.

```python
result = experiment.analyze(data)
```

By default, **tea-tasting** assumes that the variant with the lowest ID is a control. Change the default behavior using the `control` parameter:

```python
result = experiment.analyze(data, control=0)
```

`ExperimentResult` is a mapping. Get a metric's analysis result using metric name as a key.

```python
print(result["orders_per_user"])
#> MeanResult(control=0.5304003954522986, treatment=0.5730905412240769,
#> effect_size=0.04269014577177832, effect_size_ci_lower=-0.010800201598205564,
#> effect_size_ci_upper=0.0961804931417622, rel_effect_size=0.08048664016431273,
#> rel_effect_size_ci_lower=-0.019515294044062048,
#> rel_effect_size_ci_upper=0.19068800612788883, pvalue=0.11773177998716244,
#> statistic=1.5647028839586694)
```

The fields in the result depend on metrics. For `Mean` and `RatioOfMeans`, the fields include:

- `metric`: Metric name.
- `control`: Mean or ratio of means in the control variant.
- `treatment`: Mean or ratio of means in the treatment variant.
- `effect_size`: Absolute effect size. Difference between two means.
- `effect_size_ci_lower`: Lower bound of the absolute effect size confidence interval.
- `effect_size_ci_upper`: Upper bound of the absolute effect size confidence interval.
- `rel_effect_size`: Relative effect size. Difference between two means, divided by the control mean.
- `rel_effect_size_ci_lower`: Lower bound of the relative effect size confidence interval.
- `rel_effect_size_ci_upper`: Upper bound of the relative effect size confidence interval.
- `pvalue`: P-value
- `statistic`: Statistic (standardized effect size).

`ExperimentResult` provides the following methods to serialize and view the experiment result:

- `to_dicts`: Convert the result to a sequence of dictionaries.
- `to_pandas`: Convert the result to a Pandas DataFrame.
- `to_pretty`: Convert the result to a Pandas Dataframe with formatted values (as strings).
- `to_string`: Convert the result to a string.
- `to_html`: Convert the result to HTML.

`print(result)` is the same as `print(result.to_string())`.

```python
print(result)
#> metric control treatment rel_effect_size rel_effect_size_ci pvalue
#> sessions_per_user 2.00 1.98 -0.66% [-3.7%, 2.5%] 0.674
#> orders_per_session 0.266 0.289 8.8% [-0.89%, 19%] 0.0762
#> orders_per_user 0.530 0.573 8.0% [-2.0%, 19%] 0.118
#> revenue_per_user 5.24 5.73 9.3% [-2.4%, 22%] 0.123
```

By default, methods `to_pretty`, `to_string`, and `to_html` return a predefined list of attributes. This list can be customized:

```python
print(result.to_string(names=(
"control",
"treatment",
"effect_size",
"effect_size_ci",
)))
#> metric control treatment effect_size effect_size_ci
#> sessions_per_user 2.00 1.98 -0.0132 [-0.0750, 0.0485]
#> orders_per_session 0.266 0.289 0.0233 [-0.00246, 0.0491]
#> orders_per_user 0.530 0.573 0.0427 [-0.0108, 0.0962]
#> revenue_per_user 5.24 5.73 0.489 [-0.133, 1.11]
```

In Jupyter and IPython, the output of the line `result` will be a rendered HTML table.

## More features

### Variance reduction with CUPED/CUPAC

**tea-tasting** supports variance reduction with CUPED/CUPAC, within both `Mean` and `RatioOfMeans` classes.
Learn more in the detailed [user guide](https://tea-tasting.e10v.me/user-guide).

Example usage:
## Roadmap

```python
import tea_tasting as tt


data = tt.make_users_data(seed=42, covariates=True)

experiment = tt.Experiment(
sessions_per_user=tt.Mean("sessions", "sessions_covariate"),
orders_per_session=tt.RatioOfMeans(
numer="orders",
denom="sessions",
numer_covariate="orders_covariate",
denom_covariate="sessions_covariate",
),
orders_per_user=tt.Mean("orders", "orders_covariate"),
revenue_per_user=tt.Mean("revenue", "revenue_covariate"),
)

result = experiment.analyze(data)
print(result)
#> metric control treatment rel_effect_size rel_effect_size_ci pvalue
#> sessions_per_user 2.00 1.98 -0.68% [-3.2%, 1.9%] 0.603
#> orders_per_session 0.262 0.293 12% [4.2%, 21%] 0.00229
#> orders_per_user 0.523 0.581 11% [2.9%, 20%] 0.00733
#> revenue_per_user 5.12 5.85 14% [3.8%, 26%] 0.00675
```

Set the `covariates` parameter of the `make_users_data` functions to `True` to add the following columns with pre-experimental data:

- `sessions_covariate`: Number of sessions before the experiment.
- `orders_covariate`: Number of orders before the experiment.
- `revenue_covariate`: Revenue before the experiment.

Define the metrics' covariates:

- In `Mean`, specify the covariate using the `covariate` parameter.
- In `RatioOfMeans`, specify the covariates for the numerator and denominator using the `numer_covariate` and `denom_covariate` parameters, respectively.

### Sample ratio mismatch check

The `SampleRatio` class in **tea-tasting** detects mismatches in the sample ratios of different variants of an A/B test.

Example usage:

```python
import tea_tasting as tt


experiment = tt.Experiment(
sample_ratio=tt.SampleRatio(),
)

data = tt.make_users_data(seed=42)
result = experiment.analyze(data)
print(result.to_string(("control", "treatment", "pvalue")))
#> metric control treatment pvalue
#> sample_ratio 2023 1977 0.477
```

By default, `SampleRatio` expects equal number of observations across all variants. To specify a different ratio, use the `ratio` parameter. It accepts two types of values:

- Ratio of the number of observation in treatment relative to control, as a positive number. Example: `SampleRatio(0.5)`.
- A dictionary with variants as keys and expected ratios as values. Example: `SampleRatio({"A": 2, "B": 1})`.

The `method` parameter determines the statistical test to apply:

- `"auto"`: Apply exact binomial test if the total number of observations is less than 1000, or normal approximation otherwise.
- `"binom"`: Apply exact binomial test.
- `"norm"`: Apply normal approximation of the binomial distribution.

The result of the sample ratio mismatch includes the following attributes:

- `metric`: Metric name.
- `control`: Number of observations in control.
- `treatment`: Number of observations in treatment.
- `pvalue`: P-value

### Global settings

In **tea-tasting**, you can change defaults for the following parameters:

- `alternative`: Alternative hypothesis.
- `confidence_level`: Confidence level of the confidence interval.
- `equal_var`: If `False`, assume unequal population variances in calculation of the standard deviation and the number of degrees of freedom. Otherwise, assume equal population variance and calculate pooled standard deviation.
- `n_resamples`: The number of resamples performed to form the bootstrap distribution of a statistic.
- `use_t`: If `True`, use Student's t-distribution in p-value and confidence interval calculations. Otherwise use Normal distribution.

Use `get_config` with the option name as a parameter to get a global option value:

```python
import tea_tasting as tt


tt.get_config("equal_var")
#> False
```

Use `get_config` without parameters to get a dictionary of global options:

```python
global_config = tt.get_config()
```

Use `set_config` to set a global option value:

```python
tt.set_config(equal_var=True, use_t=False)

experiment = tt.Experiment(
sessions_per_user=tt.Mean("sessions"),
orders_per_session=tt.RatioOfMeans("orders", "sessions"),
orders_per_user=tt.Mean("orders"),
revenue_per_user=tt.Mean("revenue"),
)

experiment.metrics["orders_per_user"]
#> Mean(value='orders', covariate=None, alternative='two-sided',
#> confidence_level=0.95, equal_var=True, use_t=False)
```

Use `config_context` to temporarily set a global option value within a context:

```python
with tt.config_context(equal_var=True, use_t=False):
experiment = tt.Experiment(
sessions_per_user=tt.Mean("sessions"),
orders_per_session=tt.RatioOfMeans("orders", "sessions"),
orders_per_user=tt.Mean("orders"),
revenue_per_user=tt.Mean("revenue"),
)

experiment.metrics["orders_per_user"]
#> Mean(value='orders', covariate=None, alternative='two-sided',
#> confidence_level=0.95, equal_var=True, use_t=False)
```

### More than two variants

In **tea-tasting**, it's possible to analyze experiments with more than two variants. However, the variants will be compared in pairs through two-sample statistical tests.

How variant pairs are determined:

- Default control variant: When the `control` parameter of the `analyze` method is set to `None`, **tea-tasting** automatically compares each variant pair. The variant with the lowest ID in each pair is a control.
- Specified control variant: If a specific variant is set as `control`, it is then compared against each of the other variants.

The result of the analysis is a dictionary of `ExperimentResult` objects with tuples (control, treatment) as keys.

Keep in mind that **tea-tasting** does not adjust for multiple comparisons. When dealing with multiple variant pairs, additional steps may be necessary to account for this, depending on your analysis needs.
- Power analysis.
- A/A tests and simulations.
- More statistical tests:
- Asymptotic and exact tests for frequency data.
- Mann–Whitney U test.
- More examples or guides on how to:
- Create a custom metric.
- Use **tea-tasting** with an arbitrary Ibis backend.

## Package name

Expand Down
Loading

0 comments on commit 4cdc1b7

Please sign in to comment.