diff --git a/README.md b/README.md index e3f137c..3942674 100644 --- a/README.md +++ b/README.md @@ -18,23 +18,7 @@ **tea-tasting** calculates statistics within data backends such as BigQuery, ClickHouse, PostgreSQL, Snowflake, Spark, and other of 20+ backends supported by [Ibis](https://ibis-project.org/). This approach eliminates the need to import granular data into a Python environment, though Pandas DataFrames are also supported. -**tea-tasting** is still in alpha, but already includes all the features listed above. The following features are coming soon: - -- Power analysis. -- A/A tests and simulations. -- More statistical tests: - - Asymptotic and exact tests for frequency data. - - Mann–Whitney U test. - -## Installation - -```bash -pip install tea-tasting -``` - -## Basic usage - -Begin with this simple example to understand the basic functionality: +## Basic example ```python import tea_tasting as tt @@ -58,317 +42,18 @@ print(result) #> revenue_per_user 5.24 5.73 9.3% [-2.4%, 22%] 0.123 ``` -In the following sections, each step of this process will be explained in detail. - -### Input data - -The `make_users_data` function creates synthetic data for demonstration purposes. This data mimics what you might encounter in an A/B test for an online store. Each row represents an individual user, with the following columns: - -- `user`: The unique identifier for each user. -- `variant`: The specific variant (e.g., 0 or 1) assigned to each user in the A/B test. -- `sessions`: The total number of user's sessions. -- `orders`: The total number of user's orders. -- `revenue`: The total revenue generated by the user. - -**tea-tasting** accepts data as either a Pandas DataFrame or an Ibis Table. [Ibis](https://ibis-project.org/) is a Python package which serves as a DataFrame API to various data backends. It supports 20+ backends including BigQuery, ClickHouse, DuckDB, Polars, PostgreSQL, Snowflake, Spark etc. You can write an SQL-query, [wrap](https://ibis-project.org/how-to/extending/sql#backend.sql) it as an Ibis Table and pass it to **tea-tasting**. - -Many statistical tests, like Student's t-test or Z-test, don't need granular data for analysis. For such tests, **tea-tasting** will query aggregated statistics, like mean and variance, instead of downloading all the detailed data. - -**tea-tasting** assumes that: - -- The data is grouped by randomization units, such as individual users. -- There is a column indicating the variant of the A/B test (typically labeled as A, B, etc.). -- All necessary columns for metric calculations (like the number of orders, revenue, etc.) are included in the table. - -### A/B test definition - -The `Experiment` class defines the parameters of an A/B test: metrics and a variant column name. There are two ways to define metrics: - -- Using keyword parameters, with metric names as parameter names and metric definitions as parameter values, as in example above. -- Using the first argument `metrics` which accepts metrics in a form of dictionary with metric names as keys and metric definitions as values. - -By default, **tea-testing** assumes that A/B test variant is stored in a column named `"variant"`. You can change it using the `variant` parameter of the `Experiment` class. - -Example usage: - -```python -experiment = tt.Experiment( - { - "sessions per user": tt.Mean("sessions"), - "orders per session": tt.RatioOfMeans("orders", "sessions"), - "orders per user": tt.Mean("orders"), - "revenue per user": tt.Mean("revenue"), - }, - variant="variant", -) -``` - -### Metrics - -Metrics are instances of metric classes which define how metrics are calculated. Those calculations include calculation of effect size, confidence interval, p-value and other statistics. - -Use the `Mean` class to compare averages between variants of an A/B test. For example, average number of orders per user, where user is a randomization unit of an experiment. Specify the column containing the metric values using the first parameter `value`. - -Use the `RatioOfMeans` class to compare ratios of averages between variants of an A/B test. For example, average number of orders per average number of sessions. Specify the columns containing the numerator and denominator values using the parameters `numer` and `denom`. - -Use the following parameters of `Mean` and `RatioOfMeans` to customize the analysis: - -- `alternative`: Alternative hypothesis. The following options are available: - - `"two-sided"` (default): the means are unequal. - - `"greater"`: the mean in the treatment variant is greater than the mean in the control variant. - - `"less"`: the mean in the treatment variant is less than the mean in the control variant. -- `confidence_level`: Confidence level of the confidence interval. Default is `0.95`. -- `equal_var`: Defines whether equal variance is assumed. If `True`, pooled variance is used for the calculation of the standard error of the difference between two means. Default is `False`. -- `use_t`: Defines whether to use the Student's t-distribution (`True`) or the Normal distribution (`False`). Default is `True`. - -Example usage: - -```python -experiment = tt.Experiment( - sessions_per_user=tt.Mean("sessions", alternative="greater"), - orders_per_session=tt.RatioOfMeans("orders", "sessions", confidence_level=0.9), - orders_per_user=tt.Mean("orders", equal_var=True), - revenue_per_user=tt.Mean("revenue", use_t=False), -) -``` - -Look for other supported metrics in the [Metrics](https://tea-tasting.e10v.me/api/metrics/) reference. - -You can change the default values of these four parameters using global settings (see details below). - -### Analyzing and retrieving experiment results - -After defining an experiment and metrics, you can analyze the experiment data using the `analyze` method of the `Experiment` class. This method takes data as an input and returns an `ExperimentResult` object with experiment result. - -```python -result = experiment.analyze(data) -``` - -By default, **tea-tasting** assumes that the variant with the lowest ID is a control. Change the default behavior using the `control` parameter: - -```python -result = experiment.analyze(data, control=0) -``` - -`ExperimentResult` is a mapping. Get a metric's analysis result using metric name as a key. - -```python -print(result["orders_per_user"]) -#> MeanResult(control=0.5304003954522986, treatment=0.5730905412240769, -#> effect_size=0.04269014577177832, effect_size_ci_lower=-0.010800201598205564, -#> effect_size_ci_upper=0.0961804931417622, rel_effect_size=0.08048664016431273, -#> rel_effect_size_ci_lower=-0.019515294044062048, -#> rel_effect_size_ci_upper=0.19068800612788883, pvalue=0.11773177998716244, -#> statistic=1.5647028839586694) -``` - -The fields in the result depend on metrics. For `Mean` and `RatioOfMeans`, the fields include: - -- `metric`: Metric name. -- `control`: Mean or ratio of means in the control variant. -- `treatment`: Mean or ratio of means in the treatment variant. -- `effect_size`: Absolute effect size. Difference between two means. -- `effect_size_ci_lower`: Lower bound of the absolute effect size confidence interval. -- `effect_size_ci_upper`: Upper bound of the absolute effect size confidence interval. -- `rel_effect_size`: Relative effect size. Difference between two means, divided by the control mean. -- `rel_effect_size_ci_lower`: Lower bound of the relative effect size confidence interval. -- `rel_effect_size_ci_upper`: Upper bound of the relative effect size confidence interval. -- `pvalue`: P-value -- `statistic`: Statistic (standardized effect size). - -`ExperimentResult` provides the following methods to serialize and view the experiment result: - -- `to_dicts`: Convert the result to a sequence of dictionaries. -- `to_pandas`: Convert the result to a Pandas DataFrame. -- `to_pretty`: Convert the result to a Pandas Dataframe with formatted values (as strings). -- `to_string`: Convert the result to a string. -- `to_html`: Convert the result to HTML. - -`print(result)` is the same as `print(result.to_string())`. - -```python -print(result) -#> metric control treatment rel_effect_size rel_effect_size_ci pvalue -#> sessions_per_user 2.00 1.98 -0.66% [-3.7%, 2.5%] 0.674 -#> orders_per_session 0.266 0.289 8.8% [-0.89%, 19%] 0.0762 -#> orders_per_user 0.530 0.573 8.0% [-2.0%, 19%] 0.118 -#> revenue_per_user 5.24 5.73 9.3% [-2.4%, 22%] 0.123 -``` - -By default, methods `to_pretty`, `to_string`, and `to_html` return a predefined list of attributes. This list can be customized: - -```python -print(result.to_string(names=( - "control", - "treatment", - "effect_size", - "effect_size_ci", -))) -#> metric control treatment effect_size effect_size_ci -#> sessions_per_user 2.00 1.98 -0.0132 [-0.0750, 0.0485] -#> orders_per_session 0.266 0.289 0.0233 [-0.00246, 0.0491] -#> orders_per_user 0.530 0.573 0.0427 [-0.0108, 0.0962] -#> revenue_per_user 5.24 5.73 0.489 [-0.133, 1.11] -``` - -In Jupyter and IPython, the output of the line `result` will be a rendered HTML table. - -## More features - -### Variance reduction with CUPED/CUPAC - -**tea-tasting** supports variance reduction with CUPED/CUPAC, within both `Mean` and `RatioOfMeans` classes. +Learn more in the detailed [user guide](https://tea-tasting.e10v.me/user-guide). -Example usage: +## Roadmap -```python -import tea_tasting as tt - - -data = tt.make_users_data(seed=42, covariates=True) - -experiment = tt.Experiment( - sessions_per_user=tt.Mean("sessions", "sessions_covariate"), - orders_per_session=tt.RatioOfMeans( - numer="orders", - denom="sessions", - numer_covariate="orders_covariate", - denom_covariate="sessions_covariate", - ), - orders_per_user=tt.Mean("orders", "orders_covariate"), - revenue_per_user=tt.Mean("revenue", "revenue_covariate"), -) - -result = experiment.analyze(data) -print(result) -#> metric control treatment rel_effect_size rel_effect_size_ci pvalue -#> sessions_per_user 2.00 1.98 -0.68% [-3.2%, 1.9%] 0.603 -#> orders_per_session 0.262 0.293 12% [4.2%, 21%] 0.00229 -#> orders_per_user 0.523 0.581 11% [2.9%, 20%] 0.00733 -#> revenue_per_user 5.12 5.85 14% [3.8%, 26%] 0.00675 -``` - -Set the `covariates` parameter of the `make_users_data` functions to `True` to add the following columns with pre-experimental data: - -- `sessions_covariate`: Number of sessions before the experiment. -- `orders_covariate`: Number of orders before the experiment. -- `revenue_covariate`: Revenue before the experiment. - -Define the metrics' covariates: - -- In `Mean`, specify the covariate using the `covariate` parameter. -- In `RatioOfMeans`, specify the covariates for the numerator and denominator using the `numer_covariate` and `denom_covariate` parameters, respectively. - -### Sample ratio mismatch check - -The `SampleRatio` class in **tea-tasting** detects mismatches in the sample ratios of different variants of an A/B test. - -Example usage: - -```python -import tea_tasting as tt - - -experiment = tt.Experiment( - sample_ratio=tt.SampleRatio(), -) - -data = tt.make_users_data(seed=42) -result = experiment.analyze(data) -print(result.to_string(("control", "treatment", "pvalue"))) -#> metric control treatment pvalue -#> sample_ratio 2023 1977 0.477 -``` - -By default, `SampleRatio` expects equal number of observations across all variants. To specify a different ratio, use the `ratio` parameter. It accepts two types of values: - -- Ratio of the number of observation in treatment relative to control, as a positive number. Example: `SampleRatio(0.5)`. -- A dictionary with variants as keys and expected ratios as values. Example: `SampleRatio({"A": 2, "B": 1})`. - -The `method` parameter determines the statistical test to apply: - -- `"auto"`: Apply exact binomial test if the total number of observations is less than 1000, or normal approximation otherwise. -- `"binom"`: Apply exact binomial test. -- `"norm"`: Apply normal approximation of the binomial distribution. - -The result of the sample ratio mismatch includes the following attributes: - -- `metric`: Metric name. -- `control`: Number of observations in control. -- `treatment`: Number of observations in treatment. -- `pvalue`: P-value - -### Global settings - -In **tea-tasting**, you can change defaults for the following parameters: - -- `alternative`: Alternative hypothesis. -- `confidence_level`: Confidence level of the confidence interval. -- `equal_var`: If `False`, assume unequal population variances in calculation of the standard deviation and the number of degrees of freedom. Otherwise, assume equal population variance and calculate pooled standard deviation. -- `n_resamples`: The number of resamples performed to form the bootstrap distribution of a statistic. -- `use_t`: If `True`, use Student's t-distribution in p-value and confidence interval calculations. Otherwise use Normal distribution. - -Use `get_config` with the option name as a parameter to get a global option value: - -```python -import tea_tasting as tt - - -tt.get_config("equal_var") -#> False -``` - -Use `get_config` without parameters to get a dictionary of global options: - -```python -global_config = tt.get_config() -``` - -Use `set_config` to set a global option value: - -```python -tt.set_config(equal_var=True, use_t=False) - -experiment = tt.Experiment( - sessions_per_user=tt.Mean("sessions"), - orders_per_session=tt.RatioOfMeans("orders", "sessions"), - orders_per_user=tt.Mean("orders"), - revenue_per_user=tt.Mean("revenue"), -) - -experiment.metrics["orders_per_user"] -#> Mean(value='orders', covariate=None, alternative='two-sided', -#> confidence_level=0.95, equal_var=True, use_t=False) -``` - -Use `config_context` to temporarily set a global option value within a context: - -```python -with tt.config_context(equal_var=True, use_t=False): - experiment = tt.Experiment( - sessions_per_user=tt.Mean("sessions"), - orders_per_session=tt.RatioOfMeans("orders", "sessions"), - orders_per_user=tt.Mean("orders"), - revenue_per_user=tt.Mean("revenue"), - ) - -experiment.metrics["orders_per_user"] -#> Mean(value='orders', covariate=None, alternative='two-sided', -#> confidence_level=0.95, equal_var=True, use_t=False) -``` - -### More than two variants - -In **tea-tasting**, it's possible to analyze experiments with more than two variants. However, the variants will be compared in pairs through two-sample statistical tests. - -How variant pairs are determined: - -- Default control variant: When the `control` parameter of the `analyze` method is set to `None`, **tea-tasting** automatically compares each variant pair. The variant with the lowest ID in each pair is a control. -- Specified control variant: If a specific variant is set as `control`, it is then compared against each of the other variants. - -The result of the analysis is a dictionary of `ExperimentResult` objects with tuples (control, treatment) as keys. - -Keep in mind that **tea-tasting** does not adjust for multiple comparisons. When dealing with multiple variant pairs, additional steps may be necessary to account for this, depending on your analysis needs. +- Power analysis. +- A/A tests and simulations. +- More statistical tests: + - Asymptotic and exact tests for frequency data. + - Mann–Whitney U test. +- More examples or guides on how to: + - Create a custom metric. + - Use **tea-tasting** with an arbitrary Ibis backend. ## Package name diff --git a/docs/index.md b/docs/index.md index 3532564..3942674 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,4 +1,4 @@ -# Overview +# tea-tasting: statistical analysis of A/B tests [![CI](https://github.com/e10v/tea-tasting/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/e10v/tea-tasting/actions/workflows/ci.yml) [![Coverage](https://codecov.io/github/e10v/tea-tasting/coverage.svg?branch=main)](https://codecov.io/gh/e10v/tea-tasting) @@ -18,13 +18,42 @@ **tea-tasting** calculates statistics within data backends such as BigQuery, ClickHouse, PostgreSQL, Snowflake, Spark, and other of 20+ backends supported by [Ibis](https://ibis-project.org/). This approach eliminates the need to import granular data into a Python environment, though Pandas DataFrames are also supported. -**tea-tasting** is still in alpha, but already includes all the features listed above. The following features are coming soon: +## Basic example + +```python +import tea_tasting as tt + + +data = tt.make_users_data(seed=42) + +experiment = tt.Experiment( + sessions_per_user=tt.Mean("sessions"), + orders_per_session=tt.RatioOfMeans("orders", "sessions"), + orders_per_user=tt.Mean("orders"), + revenue_per_user=tt.Mean("revenue"), +) + +result = experiment.analyze(data) +print(result) +#> metric control treatment rel_effect_size rel_effect_size_ci pvalue +#> sessions_per_user 2.00 1.98 -0.66% [-3.7%, 2.5%] 0.674 +#> orders_per_session 0.266 0.289 8.8% [-0.89%, 19%] 0.0762 +#> orders_per_user 0.530 0.573 8.0% [-2.0%, 19%] 0.118 +#> revenue_per_user 5.24 5.73 9.3% [-2.4%, 22%] 0.123 +``` + +Learn more in the detailed [user guide](https://tea-tasting.e10v.me/user-guide). + +## Roadmap - Power analysis. - A/A tests and simulations. - More statistical tests: - Asymptotic and exact tests for frequency data. - Mann–Whitney U test. +- More examples or guides on how to: + - Create a custom metric. + - Use **tea-tasting** with an arbitrary Ibis backend. ## Package name