Update readme and overview

e10v · Jul 14, 2024 · 4cdc1b7 · 4cdc1b7
1 parent e37a462
commit 4cdc1b7
Show file tree

Hide file tree

Showing 2 changed files with 42 additions and 328 deletions.
diff --git a/README.md b/README.md
@@ -18,23 +18,7 @@
 
 **tea-tasting** calculates statistics within data backends such as BigQuery, ClickHouse, PostgreSQL, Snowflake, Spark, and other of 20+ backends supported by [Ibis](https://ibis-project.org/). This approach eliminates the need to import granular data into a Python environment, though Pandas DataFrames are also supported.
 
-**tea-tasting** is still in alpha, but already includes all the features listed above. The following features are coming soon:
-
-- Power analysis.
-- A/A tests and simulations.
-- More statistical tests:
-    - Asymptotic and exact tests for frequency data.
-    - Mann–Whitney U test.
-
-## Installation
-
-```bash
-pip install tea-tasting
-```
-
-## Basic usage
-
-Begin with this simple example to understand the basic functionality:
+## Basic example
 
 ```python
 import tea_tasting as tt
@@ -58,317 +42,18 @@ print(result)
 #>   revenue_per_user    5.24      5.73            9.3%       [-2.4%, 22%]  0.123
 ```
 
-In the following sections, each step of this process will be explained in detail.
-
-### Input data
-
-The `make_users_data` function creates synthetic data for demonstration purposes. This data mimics what you might encounter in an A/B test for an online store. Each row represents an individual user, with the following columns:
-
-- `user`: The unique identifier for each user.
-- `variant`: The specific variant (e.g., 0 or 1) assigned to each user in the A/B test.
-- `sessions`: The total number of user's sessions.
-- `orders`: The total number of user's orders.
-- `revenue`: The total revenue generated by the user.
-
-**tea-tasting** accepts data as either a Pandas DataFrame or an Ibis Table. [Ibis](https://ibis-project.org/) is a Python package which serves as a DataFrame API to various data backends. It supports 20+ backends including BigQuery, ClickHouse, DuckDB, Polars, PostgreSQL, Snowflake, Spark etc. You can write an SQL-query, [wrap](https://ibis-project.org/how-to/extending/sql#backend.sql) it as an Ibis Table and pass it to **tea-tasting**.
-
-Many statistical tests, like Student's t-test or Z-test, don't need granular data for analysis. For such tests, **tea-tasting** will query aggregated statistics, like mean and variance, instead of downloading all the detailed data.
-
-**tea-tasting** assumes that:
-
-- The data is grouped by randomization units, such as individual users.
-- There is a column indicating the variant of the A/B test (typically labeled as A, B, etc.).
-- All necessary columns for metric calculations (like the number of orders, revenue, etc.) are included in the table.
-
-### A/B test definition
-
-The `Experiment` class defines the parameters of an A/B test: metrics and a variant column name. There are two ways to define metrics:
-
-- Using keyword parameters, with metric names as parameter names and metric definitions as parameter values, as in example above.
-- Using the first argument `metrics` which accepts metrics in a form of dictionary with metric names as keys and metric definitions as values.
-
-By default, **tea-testing** assumes that A/B test variant is stored in a column named `"variant"`. You can change it using the `variant` parameter of the `Experiment` class.
-
-Example usage:
-
-```python
-experiment = tt.Experiment(
-    {
-        "sessions per user": tt.Mean("sessions"),
-        "orders per session": tt.RatioOfMeans("orders", "sessions"),
-        "orders per user": tt.Mean("orders"),
-        "revenue per user": tt.Mean("revenue"),
-    },
-    variant="variant",
-)
-```
-
-### Metrics
-
-Metrics are instances of metric classes which define how metrics are calculated. Those calculations include calculation of effect size, confidence interval, p-value and other statistics.
-
-Use the `Mean` class to compare averages between variants of an A/B test. For example, average number of orders per user, where user is a randomization unit of an experiment. Specify the column containing the metric values using the first parameter `value`.
-
-Use the `RatioOfMeans` class to compare ratios of averages between variants of an A/B test. For example, average number of orders per average number of sessions. Specify the columns containing the numerator and denominator values using the parameters `numer` and `denom`.
-
-Use the following parameters of `Mean` and `RatioOfMeans` to customize the analysis:
-
-- `alternative`: Alternative hypothesis. The following options are available:
-    - `"two-sided"` (default): the means are unequal.
-    - `"greater"`: the mean in the treatment variant is greater than the mean in the control variant.
-    - `"less"`: the mean in the treatment variant is less than the mean in the control variant.
-- `confidence_level`: Confidence level of the confidence interval. Default is `0.95`.
-- `equal_var`: Defines whether equal variance is assumed. If `True`, pooled variance is used for the calculation of the standard error of the difference between two means. Default is `False`.
-- `use_t`: Defines whether to use the Student's t-distribution (`True`) or the Normal distribution (`False`). Default is `True`.
-
-Example usage:
-
-```python
-experiment = tt.Experiment(
-    sessions_per_user=tt.Mean("sessions", alternative="greater"),
-    orders_per_session=tt.RatioOfMeans("orders", "sessions", confidence_level=0.9),
-    orders_per_user=tt.Mean("orders", equal_var=True),
-    revenue_per_user=tt.Mean("revenue", use_t=False),
-)
-```
-
-Look for other supported metrics in the [Metrics](https://tea-tasting.e10v.me/api/metrics/) reference.
-
-You can change the default values of these four parameters using global settings (see details below).
-
-### Analyzing and retrieving experiment results
-
-After defining an experiment and metrics, you can analyze the experiment data using the `analyze` method of the `Experiment` class. This method takes data as an input and returns an `ExperimentResult` object with experiment result.
-
-```python
-result = experiment.analyze(data)
-```
-
-By default, **tea-tasting** assumes that the variant with the lowest ID is a control. Change the default behavior using the `control` parameter:
-
-```python
-result = experiment.analyze(data, control=0)
-```
-
-`ExperimentResult` is a mapping. Get a metric's analysis result using metric name as a key.
-
-```python
-print(result["orders_per_user"])
-#> MeanResult(control=0.5304003954522986, treatment=0.5730905412240769,
-#> effect_size=0.04269014577177832, effect_size_ci_lower=-0.010800201598205564,
-#> effect_size_ci_upper=0.0961804931417622, rel_effect_size=0.08048664016431273,
-#> rel_effect_size_ci_lower=-0.019515294044062048,
-#> rel_effect_size_ci_upper=0.19068800612788883, pvalue=0.11773177998716244,
-#> statistic=1.5647028839586694)
-```
-
-The fields in the result depend on metrics. For `Mean` and `RatioOfMeans`, the fields include:
-
-- `metric`: Metric name.
-- `control`: Mean or ratio of means in the control variant.
-- `treatment`: Mean or ratio of means in the treatment variant.
-- `effect_size`: Absolute effect size. Difference between two means.
-- `effect_size_ci_lower`: Lower bound of the absolute effect size confidence interval.
-- `effect_size_ci_upper`: Upper bound of the absolute effect size confidence interval.
-- `rel_effect_size`: Relative effect size. Difference between two means, divided by the control mean.
-- `rel_effect_size_ci_lower`: Lower bound of the relative effect size confidence interval.
-- `rel_effect_size_ci_upper`: Upper bound of the relative effect size confidence interval.
-- `pvalue`: P-value
-- `statistic`: Statistic (standardized effect size).
-
-`ExperimentResult` provides the following methods to serialize and view the experiment result:
-
-- `to_dicts`: Convert the result to a sequence of dictionaries.
-- `to_pandas`: Convert the result to a Pandas DataFrame.
-- `to_pretty`: Convert the result to a Pandas Dataframe with formatted values (as strings).
-- `to_string`: Convert the result to a string.
-- `to_html`: Convert the result to HTML.
-
-`print(result)` is the same as `print(result.to_string())`.
-
-```python
-print(result)
-#>             metric control treatment rel_effect_size rel_effect_size_ci pvalue
-#>  sessions_per_user    2.00      1.98          -0.66%      [-3.7%, 2.5%]  0.674
-#> orders_per_session   0.266     0.289            8.8%      [-0.89%, 19%] 0.0762
-#>    orders_per_user   0.530     0.573            8.0%       [-2.0%, 19%]  0.118
-#>   revenue_per_user    5.24      5.73            9.3%       [-2.4%, 22%]  0.123
-```
-
-By default, methods `to_pretty`, `to_string`, and `to_html` return a predefined list of attributes. This list can be customized:
-
-```python
-print(result.to_string(names=(
-    "control",
-    "treatment",
-    "effect_size",
-    "effect_size_ci",
-)))
-#>             metric control treatment effect_size     effect_size_ci
-#>  sessions_per_user    2.00      1.98     -0.0132  [-0.0750, 0.0485]
-#> orders_per_session   0.266     0.289      0.0233 [-0.00246, 0.0491]
-#>    orders_per_user   0.530     0.573      0.0427  [-0.0108, 0.0962]
-#>   revenue_per_user    5.24      5.73       0.489     [-0.133, 1.11]
-```
-
-In Jupyter and IPython, the output of the line `result` will be a rendered HTML table.
-
-## More features
-
-### Variance reduction with CUPED/CUPAC
-
-**tea-tasting** supports variance reduction with CUPED/CUPAC, within both `Mean` and `RatioOfMeans` classes.
+Learn more in the detailed [user guide](https://tea-tasting.e10v.me/user-guide).
 
-Example usage:
+## Roadmap
 
-```python
-import tea_tasting as tt
-
-
-data = tt.make_users_data(seed=42, covariates=True)
-
-experiment = tt.Experiment(
-    sessions_per_user=tt.Mean("sessions", "sessions_covariate"),
-    orders_per_session=tt.RatioOfMeans(
-        numer="orders",
-        denom="sessions",
-        numer_covariate="orders_covariate",
-        denom_covariate="sessions_covariate",
-    ),
-    orders_per_user=tt.Mean("orders", "orders_covariate"),
-    revenue_per_user=tt.Mean("revenue", "revenue_covariate"),
-)
-
-result = experiment.analyze(data)
-print(result)
-#>             metric control treatment rel_effect_size rel_effect_size_ci  pvalue
-#>  sessions_per_user    2.00      1.98          -0.68%      [-3.2%, 1.9%]   0.603
-#> orders_per_session   0.262     0.293             12%        [4.2%, 21%] 0.00229
-#>    orders_per_user   0.523     0.581             11%        [2.9%, 20%] 0.00733
-#>   revenue_per_user    5.12      5.85             14%        [3.8%, 26%] 0.00675
-```
-
-Set the `covariates` parameter of the `make_users_data` functions to `True` to add the following columns with pre-experimental data:
-
-- `sessions_covariate`: Number of sessions before the experiment.
-- `orders_covariate`: Number of orders before the experiment.
-- `revenue_covariate`: Revenue before the experiment.
-
-Define the metrics' covariates:
-
-- In `Mean`, specify the covariate using the `covariate` parameter.
-- In `RatioOfMeans`, specify the covariates for the numerator and denominator using the `numer_covariate` and `denom_covariate` parameters, respectively.
-
-### Sample ratio mismatch check
-
-The `SampleRatio` class in **tea-tasting** detects mismatches in the sample ratios of different variants of an A/B test.
-
-Example usage:
-
-```python
-import tea_tasting as tt
-
-
-experiment = tt.Experiment(
-    sample_ratio=tt.SampleRatio(),
-)
-
-data = tt.make_users_data(seed=42)
-result = experiment.analyze(data)
-print(result.to_string(("control", "treatment", "pvalue")))
-#>       metric control treatment pvalue
-#> sample_ratio    2023      1977  0.477
-```
-
-By default, `SampleRatio` expects equal number of observations across all variants. To specify a different ratio, use the `ratio` parameter. It accepts two types of values:
-
-- Ratio of the number of observation in treatment relative to control, as a positive number. Example: `SampleRatio(0.5)`.
-- A dictionary with variants as keys and expected ratios as values. Example: `SampleRatio({"A": 2, "B": 1})`.
-
-The `method` parameter determines the statistical test to apply:
-
-- `"auto"`: Apply exact binomial test if the total number of observations is less than 1000, or normal approximation otherwise.
-- `"binom"`: Apply exact binomial test.
-- `"norm"`: Apply normal approximation of the binomial distribution.
-
-The result of the sample ratio mismatch includes the following attributes:
-
-- `metric`: Metric name.
-- `control`: Number of observations in control.
-- `treatment`: Number of observations in treatment.
-- `pvalue`: P-value
-
-### Global settings
-
-In **tea-tasting**, you can change defaults for the following parameters:
-
-- `alternative`: Alternative hypothesis.
-- `confidence_level`: Confidence level of the confidence interval.
-- `equal_var`: If `False`, assume unequal population variances in calculation of the standard deviation and the number of degrees of freedom. Otherwise, assume equal population variance and calculate pooled standard deviation.
-- `n_resamples`: The number of resamples performed to form the bootstrap distribution of a statistic.
-- `use_t`: If `True`, use Student's t-distribution in p-value and confidence interval calculations. Otherwise use Normal distribution.
-
-Use `get_config` with the option name as a parameter to get a global option value:
-
-```python
-import tea_tasting as tt
-
-
-tt.get_config("equal_var")
-#> False
-```
-
-Use `get_config` without parameters to get a dictionary of global options:
-
-```python
-global_config = tt.get_config()
-```
-
-Use `set_config` to set a global option value:
-
-```python
-tt.set_config(equal_var=True, use_t=False)
-
-experiment = tt.Experiment(
-    sessions_per_user=tt.Mean("sessions"),
-    orders_per_session=tt.RatioOfMeans("orders", "sessions"),
-    orders_per_user=tt.Mean("orders"),
-    revenue_per_user=tt.Mean("revenue"),
-)
-
-experiment.metrics["orders_per_user"]
-#> Mean(value='orders', covariate=None, alternative='two-sided',
-#> confidence_level=0.95, equal_var=True, use_t=False)
-```
-
-Use `config_context` to temporarily set a global option value within a context:
-
-```python
-with tt.config_context(equal_var=True, use_t=False):
-    experiment = tt.Experiment(
-        sessions_per_user=tt.Mean("sessions"),
-        orders_per_session=tt.RatioOfMeans("orders", "sessions"),
-        orders_per_user=tt.Mean("orders"),
-        revenue_per_user=tt.Mean("revenue"),
-    )
-
-experiment.metrics["orders_per_user"]
-#> Mean(value='orders', covariate=None, alternative='two-sided',
-#> confidence_level=0.95, equal_var=True, use_t=False)
-```
-
-### More than two variants
-
-In **tea-tasting**, it's possible to analyze experiments with more than two variants. However, the variants will be compared in pairs through two-sample statistical tests.
-
-How variant pairs are determined:
-
-- Default control variant: When the `control` parameter of the `analyze` method is set to `None`, **tea-tasting** automatically compares each variant pair. The variant with the lowest ID in each pair is a control.
-- Specified control variant: If a specific variant is set as `control`, it is then compared against each of the other variants.
-
-The result of the analysis is a dictionary of `ExperimentResult` objects with tuples (control, treatment) as keys.
-
-Keep in mind that **tea-tasting** does not adjust for multiple comparisons. When dealing with multiple variant pairs, additional steps may be necessary to account for this, depending on your analysis needs.
+- Power analysis.
+- A/A tests and simulations.
+- More statistical tests:
+    - Asymptotic and exact tests for frequency data.
+    - Mann–Whitney U test.
+- More examples or guides on how to:
+    - Create a custom metric.
+    - Use **tea-tasting** with an arbitrary Ibis backend.
 
 ## Package name