Merge pull request #55 from e10v/dev

Multiple minor improvements
e10v · Apr 21, 2024 · 2e9551f · 2e9551f
2 parents e9687db + 796799c
commit 2e9551f
Show file tree

Hide file tree

Showing 12 changed files with 333 additions and 266 deletions.
diff --git a/README.md b/README.md
@@ -14,23 +14,21 @@
 - [Delta method](https://alexdeng.github.io/public/files/kdd2018-dm.pdf) for ratio metrics.
 - Variance reduction with [CUPED](https://exp-platform.com/Documents/2013-02-CUPED-ImprovingSensitivityOfControlledExperiments.pdf)/[CUPAC](https://doordash.engineering/2020/06/08/improving-experimental-power-through-control-using-predictions-as-covariate-cupac/) (also in combination with delta method for ratio metrics).
 - Confidence interval for both absolute and percent change.
+- Sample ratio mismatch check.
 
 **tea-tasting** calculates statistics within data backends such as BigQuery, ClickHouse, PostgreSQL, Snowflake, Spark, and other of 20+ backends supported by [Ibis](https://ibis-project.org/). This approach eliminates the need to import granular data into a Python environment, though Pandas DataFrames are also supported.
 
 **tea-tasting** is still in alpha, but already includes all the features listed above. The following features are coming soon:
 
-- Sample ratio mismatch check.
 - More statistical tests:
-  - Asymptotic and exact tests for frequency data.
   - Bootstrap.
   - Quantile test (using Bootstrap).
+  - Asymptotic and exact tests for frequency data.
   - Mann–Whitney U test.
 - Power analysis.
 - A/A tests and simulations.
 - Pretty output for experiment results (round etc.).
-- Documentation on how to define metrics with custom statistical tests.
-- Documentation with MkDocs and Material for MkDocs.
-- More examples.
+- More documentation and examples.
 
 ## Installation
 
@@ -66,10 +64,10 @@ In the following sections, each step of this process will be explained in detail
 The `make_users_data` function creates synthetic data for demonstration purposes. This data mimics what you might encounter in an A/B test for an online store. Each row represents an individual user, with the following columns:
 
 - `user`: The unique identifier for each user.
-- `variant`: The specific variant (e.g., 0 or 1) assigned to the user in the A/B test.
-- `sessions`: The total number of sessions by the user.
-- `orders`: The total number of purchases made by the user.
-- `revenue`: The total revenue generated from the user's purchases.
+- `variant`: The specific variant (e.g., 0 or 1) assigned to each user in the A/B test.
+- `sessions`: The total number of user's sessions.
+- `orders`: The total number of user's orders.
+- `revenue`: The total revenue generated by the user.
 
 **tea-tasting** accepts data as either a Pandas DataFrame or an Ibis Table. [Ibis](https://ibis-project.org/) is a Python package which serves as a DataFrame API to various data backends. It supports 20+ backends including BigQuery, ClickHouse, DuckDB, Polars, PostgreSQL, Snowflake, Spark etc. You can write an SQL-query, [wrap](https://ibis-project.org/how-to/extending/sql#backend.sql) it as an Ibis Table and pass it to **tea-tasting**.
 
@@ -108,19 +106,19 @@ experiment = tt.Experiment(
 
 Metrics are instances of metric classes which define how metrics are calculated. Those calculations include calculation of effect size, confidence interval, p-value and other statistics.
 
-Use the `Mean` class to compare metric averages between variants of an A/B test. For example, average number of orders per user, where user is a randomization unit of an experiment. Specify the column containing the metric values using the first parameter `value`.
+Use the `Mean` class to compare averages between variants of an A/B test. For example, average number of orders per user, where user is a randomization unit of an experiment. Specify the column containing the metric values using the first parameter `value`.
 
-Use the `RatioOfMeans` class to compare ratios of metrics averages between variants of an A/B test. For example, average number of orders per average number of sessions. Specify the columns containing the numerator and denominator values using the parameters `numer` and `denom`.
+Use the `RatioOfMeans` class to compare ratios of averages between variants of an A/B test. For example, average number of orders per average number of sessions. Specify the columns containing the numerator and denominator values using the parameters `numer` and `denom`.
 
 Use the following parameters of `Mean` and `RatioOfMeans` to customize the analysis:
 
 - `alternative`: Alternative hypothesis. The following options are available:
-  - `two-sided` (default): the means are unequal.
-  - `greater`: the mean in the treatment variant is greater than the mean in the control variant.
-  - `less`: the mean in the treatment variant is less than the mean in the control variant.
+  - `"two-sided"` (default): the means are unequal.
+  - `"greater"`: the mean in the treatment variant is greater than the mean in the control variant.
+  - `"less"`: the mean in the treatment variant is less than the mean in the control variant.
 - `confidence_level`: Confidence level of the confidence interval. Default is `0.95`.
-- `equal_var`: If `False` (default), assume unequal population variances in calculation of the standard deviation and the number of degrees of freedom. Otherwise, assume equal population variance and calculate pooled standard deviation.
-- `use_t`: If `True` (default), use Student's t-distribution in p-value and confidence interval calculations. Otherwise use Normal distribution.
+- `equal_var`: Defines whether equal variance is assumed. If `True`, pooled variance is used for the calculation of the standard error of the difference between two means. Default is `False`.
+- `use_t`: Defines whether to use the Student's t-distribution (`True`) or the Normal distribution (`False`). Default is `True`.
 
 Example usage:
 
@@ -176,7 +174,7 @@ The fields in the result depend on metrics. For `Mean` and `RatioOfMeans`, the f
 - `rel_effect_size_ci_lower`: Lower bound of the relative effect size confidence interval.
 - `rel_effect_size_ci_upper`: Upper bound of the relative effect size confidence interval.
 - `pvalue`: P-value
-- `statistic`: Statistic.
+- `statistic`: Statistic (standardized effect size).
 
 ## More features
 
@@ -216,14 +214,48 @@ Define the metrics' covariates:
 - In `Mean`, specify the covariate using the `covariate` parameter.
 - In `RatioOfMeans`, specify the covariates for the numerator and denominator using the `numer_covariate` and `denom_covariate` parameters, respectively.
 
+### Sample ratio mismatch check
+
+The `SampleRatio` class in **tea-tasting** detects mismatches in the sample ratios of different variants of an A/B test.
+
+Example usage:
+
+```python
+experiment = tt.Experiment(
+    sessions_per_user=tt.Mean("sessions"),
+    orders_per_session=tt.RatioOfMeans("orders", "sessions"),
+    orders_per_user=tt.Mean("orders"),
+    revenue_per_user=tt.Mean("revenue"),
+    sample_ratio=tt.SampleRatio(),
+)
+```
+
+By default, `SampleRatio` expects equal number of observations across all variants. To specify a different ratio, use the `ratio` parameter. It accepts two types of values:
+
+- Ratio of the number of observation in treatment relative to control, as a positive number. Example: `SampleRatio(0.5)`.
+- A dictionary with variants as keys and expected ratios as values. Example: `SampleRatio({"A": 2, "B": 1})`.
+
+The `method` parameter determines the statistical test to apply:
+
+- `"auto"`: Apply exact binomial test if the total number of observations is less than 1000, or normal approximation otherwise.
+- `"binom"`: Apply exact binomial test.
+- `"norm"`: Apply normal approximation of the binomial distribution.
+
+The result of the sample ratio mismatch includes the following attributes:
+
+- `metric`: Metric name.
+- `control`: Number of observations in control.
+- `treatment`: Number of observations in treatment.
+- `pvalue`: P-value
+
 ### Global settings
 
 In **tea-tasting**, you can change defaults for the following parameters:
 
 - `alternative`: Alternative hypothesis.
 - `confidence_level`: Confidence level of the confidence interval.
-- `equal_var`: If False, assume unequal population variances in calculation of the standard deviation and the number of degrees of freedom. Otherwise, assume equal population variance and calculate pooled standard deviation.
-- `use_t`: If True, use Student's t-distribution in p-value and confidence interval calculations. Otherwise use Normal distribution.
+- `equal_var`: If `False`, assume unequal population variances in calculation of the standard deviation and the number of degrees of freedom. Otherwise, assume equal population variance and calculate pooled standard deviation.
+- `use_t`: If `True`, use Student's t-distribution in p-value and confidence interval calculations. Otherwise use Normal distribution.
 
 Use `set_config` to set a global option value:
 

diff --git a/src/tea_tasting/__init__.py b/src/tea_tasting/__init__.py
@@ -4,5 +4,5 @@
 from tea_tasting.config import config_context, get_config, set_config
 from tea_tasting.datasets import make_sessions_data, make_users_data
 from tea_tasting.experiment import Experiment
-from tea_tasting.metrics import Mean, RatioOfMeans
+from tea_tasting.metrics import Mean, RatioOfMeans, SampleRatio
 from tea_tasting.version import __version__
diff --git a/src/tea_tasting/aggr.py b/src/tea_tasting/aggr.py
@@ -37,66 +37,72 @@ def __init__(
         var_: dict[str, float | int] = {},  # noqa: B006
         cov_: dict[tuple[str, str], float | int] = {},  # noqa: B006
     ) -> None:
-        """Create an object with aggregated statistics.
+        """Aggregated statistics.
 
         Args:
-            count_: Sample size.
-            mean_: Variables sample means.
-            var_: Variables sample variances.
-            cov_: Pairs of variables sample covariances.
+            count_: Sample size (number of observations).
+            mean_: Dictionary of sample means with variable names as keys.
+            var_: Dictionary of sample variances with variable names as keys.
+            cov_: Dictionary of sample covariances with pairs of variable names as keys.
         """
         self.count_ = count_
         self.mean_ = mean_
         self.var_ = var_
         self.cov_ = {_sorted_tuple(*k): v for k, v in cov_.items()}
 
     def with_zero_div(self) -> Aggregates:
-        """Return aggregates with values which can be divided by zero without error."""
+        """Return aggregates, which don't raise an error on division by zero.
+
+        Division by zero returns:
+            nan if numerator == 0,
+            inf if numerator > 0,
+            -inf if numerator < 0.
+        """
         return Aggregates(
             count_=None if self.count_ is None else tea_tasting.utils.Int(self.count_),
-            mean_={k: tea_tasting.utils.Float(v) for k, v in self.mean_.items()},
-            var_={k: tea_tasting.utils.Float(v) for k, v in self.var_.items()},
-            cov_={k: tea_tasting.utils.Float(v) for k, v in self.cov_.items()},
+            mean_={k: tea_tasting.utils.numeric(v) for k, v in self.mean_.items()},
+            var_={k: tea_tasting.utils.numeric(v) for k, v in self.var_.items()},
+            cov_={k: tea_tasting.utils.numeric(v) for k, v in self.cov_.items()},
         )
 
     def count(self) -> int:
-        """Sample size.
+        """Sample size (number of observations).
 
         Raises:
             RuntimeError: Count is None (it wasn't defined at init).
 
         Returns:
-            Number of observations.
+            Sample size (number of observations).
         """
         if self.count_ is None:
             raise RuntimeError("Count is None.")
         return self.count_
 
-    def mean(self, key: str | None) -> float | int:
+    def mean(self, name: str | None) -> float | int:
         """Sample mean.
 
         Args:
-            key: Variable name.
+            name: Variable name.
 
         Returns:
             Sample mean.
         """
-        if key is None:
+        if name is None:
             return 1
-        return self.mean_[key]
+        return self.mean_[name]
 
-    def var(self, key: str | None) -> float | int:
+    def var(self, name: str | None) -> float | int:
         """Sample variance.
 
         Args:
-            key: Variable name.
+            name: Variable name.
 
         Returns:
             Sample variance.
         """
-        if key is None:
+        if name is None:
             return 0
-        return self.var_[key]
+        return self.var_[name]
 
     def cov(self, left: str | None, right: str | None) -> float | int:
         """Sample covariance.
@@ -248,18 +254,19 @@ def read_aggregates(
     var_cols: Sequence[str],
     cov_cols: Sequence[tuple[str, str]],
 ) -> dict[Any, Aggregates] | Aggregates:
-    """Read aggregated statistics from an Ibis Table.
+    """Read aggregated statistics from an Ibis Table or a Pandas DataFrame.
 
     Args:
-        data: Ibis Table.
+        data: Granular data.
         group_col: Column name to group by before aggregation.
+            If None, total aggregates are calculated.
         has_count: If True, calculate the sample size.
         mean_cols: Column names for calculation of sample means.
         var_cols: Column names for calculation of sample variances.
         cov_cols: Pairs of column names for calculation of sample covariances.
 
     Returns:
-        Aggregated statistics from the Ibis Table.
+        Aggregated statistics.
     """
     if isinstance(data, pd.DataFrame):
         con = ibis.pandas.connect()

diff --git a/src/tea_tasting/config.py b/src/tea_tasting/config.py
@@ -1,4 +1,4 @@
-"""Global config."""
+"""Global configuration."""
 
 from __future__ import annotations
 
@@ -28,7 +28,7 @@ def get_config(option: str | None = None) -> Any:
         option: The option name.
 
     Returns:
-        The value of the option if it's not None,
+        The option value if its name is not None,
         or a dictionary with all options otherwise.
     """
     if option is not None:

diff --git a/src/tea_tasting/datasets.py b/src/tea_tasting/datasets.py
@@ -1,4 +1,4 @@
-"""Generates a sample of data for examples."""
+"""Example datasets."""
 # ruff: noqa: PLR0913
 
 from __future__ import annotations
@@ -75,9 +75,9 @@ def make_users_data(
 
     - user identifier,
     - variant of the test,
-    - number of sessions by the user,
-    - number of orders made by the user,
-    - revenue generated from user's orders.
+    - number of user's sessions,
+    - number of user's orders,
+    - revenue generated by the user.
 
     Optionally, pre-experimental data can be generated as well.
 
@@ -86,26 +86,28 @@ def make_users_data(
             in addition to default columns.
         seed: Random seed.
         n_users: Number of users.
-        ratio: Ratio of treatment observations to control observations.
-        sessions_uplift: Relative sessions uplift in the treatment variant.
-        orders_uplift: Relative orders uplift in the treatment variant.
-        revenue_uplift: Relative revenue uplift in the treatment variant.
+        ratio: Ratio of the number of observation in treatment relative to control.
+        sessions_uplift: Sessions uplift in the treatment variant, relative to control.
+        orders_uplift: Orders uplift in the treatment variant, relative to control.
+        revenue_uplift: Revenue uplift in the treatment variant, relative to control.
         avg_sessions: Average number of sessions per user.
         avg_orders_per_session: Average number of orders per session.
             Should be less than 1.
         avg_revenue_per_order: Average revenue per order.
-        to_ibis: If True, return Ibis Table instead if Pandas DataFrame.
+        to_ibis: If True, return an Ibis Table instead of a Pandas DataFrame.
 
     Returns:
         An Ibis Table or a Pandas DataFrame with the following columns:
             user: User identifier.
             variant: Variant of the test. 0 is control, 1 is treatment.
-            sessions: Number of sessions.
-            orders: Number of orders.
-            revenue: Revenue.
-            sessions_covariate (optional): Number of sessions before the experiment.
-            orders_covariate (optional): Number of orders before the experiment.
-            revenue_covariate (optional): Revenue before the experiment.
+            sessions: Number of user's sessions.
+            orders: Number of user's orders.
+            revenue: Revenue generated by the user.
+            sessions_covariate (optional): Number of user's sessions
+                before the experiment.
+            orders_covariate (optional): Number of user's orders before the experiment.
+            revenue_covariate (optional): Revenue generated by the user
+                before the experiment.
     """
     return _make_data(
         covariates=covariates,
@@ -179,9 +181,9 @@ def make_sessions_data(
 
     - user identifier,
     - variant of the test,
-    - number of sessions by the user,
-    - number of orders made by the user,
-    - revenue generated from user's orders.
+    - number of user's sessions,
+    - number of user's orders,
+    - revenue generated by the user.
 
     Optionally, pre-experimental data can be generated as well.
 
@@ -190,26 +192,28 @@ def make_sessions_data(
             in addition to default columns.
         seed: Random seed.
         n_users: Number of users.
-        ratio: Ratio of treatment observations to control observations.
-        sessions_uplift: Relative sessions uplift in the treatment variant.
-        orders_uplift: Relative orders uplift in the treatment variant.
-        revenue_uplift: Relative revenue uplift in the treatment variant.
+        ratio: Ratio of the number of observation in treatment relative to control.
+        sessions_uplift: Sessions uplift in the treatment variant, relative to control.
+        orders_uplift: Orders uplift in the treatment variant, relative to control.
+        revenue_uplift: Revenue uplift in the treatment variant, relative to control.
         avg_sessions: Average number of sessions per user.
         avg_orders_per_session: Average number of orders per session.
             Should be less than 1.
         avg_revenue_per_order: Average revenue per order.
-        to_ibis: If True, return Ibis Table instead if Pandas DataFrame.
+        to_ibis: If True, return an Ibis Table instead of a Pandas DataFrame.
 
     Returns:
         An Ibis Table or a Pandas DataFrame with the following columns:
             user: User identifier.
             variant: Variant of the test. 0 is control, 1 is treatment.
-            sessions: Number of sessions.
-            orders: Number of orders.
-            revenue: Revenue.
-            sessions_covariate (optional): Number of sessions before the experiment.
-            orders_covariate (optional): Number of orders before the experiment.
-            revenue_covariate (optional): Revenue before the experiment.
+            sessions: Number of user's sessions.
+            orders: Number of user's orders.
+            revenue: Revenue generated by the user.
+            sessions_covariate (optional): Number of user's sessions
+                before the experiment.
+            orders_covariate (optional): Number of user's orders before the experiment.
+            revenue_covariate (optional): Revenue generated by the user
+                before the experiment.
     """
     return _make_data(
         covariates=covariates,