diff --git a/docs/source/learn/core_notebooks/dimensionality.ipynb b/docs/source/learn/core_notebooks/dimensionality.ipynb index d38df91e1a7..1355cd3b4e9 100644 --- a/docs/source/learn/core_notebooks/dimensionality.ipynb +++ b/docs/source/learn/core_notebooks/dimensionality.ipynb @@ -10,35 +10,19 @@ "source": [ "(dimensionality)=\n", "# Distribution Dimensionality\n", - "PyMC provides a number of ways to specify the dimensionality of its distributions. In this document we provide an overview and current best practices.\n", + "PyMC provides a number of ways to specify the dimensionality of its distributions. This document provides an overview, and offers some user tips.\n", "\n", "## Glossary\n", "In this document we'll be using the term dimensionality to refer to the idea of dimensions. Each of the terms below has a specific\n", "semantic and computational definition in PyMC. While we share them here they will make much more sense when viewed in the examples below.\n", "\n", "+ *Support dimensions* → The core dimensionality of a distribution\n", - "+ *Batched dimensions* → Extra dimensions beyond the support dimensionality of a distribution\n", + "+ *Batch dimensions* → Extra dimensions beyond the support dimensionality of a distribution\n", "+ *Implicit dimensions* → Dimensions that follow from the values or shapes of the distribution parameters\n", - "+ *Explicit dimensions* → Dimensions that are explicitly defined by one or more of the following arguments:\n", + "+ *Explicit dimensions* → Dimensions that are explicitly defined by one of the following arguments:\n", " + *Shape* → Number of draws from a distribution\n", - " + *Size* → Number of batched dimensions\n", " + *Dims* → An array of dimension names\n", - "+ *Coords* → A dictionary mapping dimension names to coordinate values\n", - "\n", - "\n", - "## General Recommendations\n", - "### When prototyping implicit dimensions are convenient\n", - "Implicit dimensions are easy to specify and great for quickly expanding an existing distribution.\n", - "\n", - "### For reusable code we suggest dims\n", - "For any more important work, or reusable work we suggest dims and coords as the labels will be passed to {class}'arviz.InferenceData'. This is both best practice transparency and readability for others. It also is useful in single developer workflows, for example, in cases where there is a 3 dimensional or higher distribution it'll help indiciate which dimension corresponds to which model concept.\n", - "\n", - "### Use shape if you'd like to be explicit\n", - "Use shape if you'd like to bypass any dimensionality calculations implicit in PyMC. This will strictly specify the dimensionality to Aesara\n", - "\n", - "### When debugging use unique prime numbers\n", - "By using prime numbers it will be easier to determine where how input dimensionalities are being converted to output dimensionalities.\n", - "Once confident with result then change the dimensionalities to match your data or modeling needs." + "+ *Coords* → A dictionary mapping dimension names to coordinate values" ] }, { @@ -52,7 +36,8 @@ "outputs": [], "source": [ "import pymc as pm\n", - "import numpy as np" + "import numpy as np\n", + "import aesara.tensor as at" ] }, { @@ -64,7 +49,7 @@ }, "source": [ "## Univariate distribution example\n", - "We can start with the simplest case, a single Normal distribution. We specify one outside of a PyMC Model as shown below" + "We can start with the simplest case, a single Normal distribution. We use `.dist` to specify one outside of a PyMC Model." ] }, { @@ -88,7 +73,7 @@ } }, "source": [ - "We can then use the `pm.draw` to take a draw from that same distribution" + "We can then use the {func}`~pymc.draw` function to take a random draw from that distribution." ] }, { @@ -103,7 +88,7 @@ { "data": { "text/plain": [ - "(array(-0.20878582), ())" + "(array(0.2608393), 0)" ] }, "execution_count": 3, @@ -113,7 +98,7 @@ ], "source": [ "normal_draw = pm.draw(normal_dist)\n", - "normal_draw, normal_draw.shape" + "normal_draw, normal_draw.ndim" ] }, { @@ -124,7 +109,7 @@ } }, "source": [ - "In this case we end up with a single scalar value. This is consistent with the distributions support dimensionality, as the smallest random draw dimension is a scalar which has a dimension of zero. The support dimensionality of every distribution is hard_coded as a property." + "In this case we end up with a single scalar value. This means that a Normal distribution has a scalar support dimensionality, as the smallest random draw you can take is a scalar which has a dimension of zero. The support dimensionality of every distribution is hard-coded as a property." ] }, { @@ -185,7 +170,7 @@ { "data": { "text/plain": [ - "array([ 0.18874909, -1.72487846, -0.74330671])" + "array([-1.2658418 , -2.66812355, -1.54211256])" ] }, "execution_count": 5, @@ -206,7 +191,7 @@ } }, "source": [ - "More simply, one can create a *batch* of independent draws from the same distribution family by using the shape or size arguments." + "More simply, one can create a *batch* of independent draws from the same distribution family by using the shape argument." ] }, { @@ -221,7 +206,7 @@ { "data": { "text/plain": [ - "array([ 0.45616511, -1.09265567, 1.22444712])" + "array([-1.41254788, -1.84113445, -1.19187865])" ] }, "execution_count": 6, @@ -230,34 +215,22 @@ } ], "source": [ - "normal_dists = pm.Normal.dist(size=3)\n", + "normal_dists = pm.Normal.dist(shape=(3,))\n", "pm.draw(normal_dists)" ] }, - { - "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, - "source": [ - "For scalar distributions, shape and size are equivalent" - ] - }, { "cell_type": "code", "execution_count": 7, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, + "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "array([-0.61335381, 0.76041448, 0.34246865])" + "array([[-0.23505347, 0.79175381, 0.44669701],\n", + " [-1.16764321, -0.13278511, -1.47507399],\n", + " [ 0.45846192, -0.38989107, 0.56937159],\n", + " [ 2.77902301, 0.45458159, 0.88682354]])" ] }, "execution_count": 7, @@ -266,7 +239,7 @@ } ], "source": [ - "normal_dists = pm.Normal.dist(shape=3)\n", + "normal_dists = pm.Normal.dist(shape=(4, 3))\n", "pm.draw(normal_dists)" ] }, @@ -278,7 +251,7 @@ } }, "source": [ - "Not only is this more succint, but it produces much more efficient vectorized code. We rarely use the first approach in PyMC." + "Not only is this more succint, but it produces much more efficient vectorized code. We rarely use the stack approach in PyMC, unless we need to combine draws from distinct distribution families." ] }, { @@ -300,7 +273,7 @@ } }, "source": [ - "It is also possible to create a batch of draws by passing parameters with higher dimensions, without having to specify shape or size." + "It is also possible to create a batch of draws by passing parameters with higher dimensions, without having to specify shape." ] }, { @@ -315,7 +288,7 @@ { "data": { "text/plain": [ - "array([-1.56573057, 0.17834018, -0.0534052 ])" + "array([ 0.23197231, -1.34410183, -1.367532 ])" ] }, "execution_count": 8, @@ -336,9 +309,9 @@ } }, "source": [ - "This is equivalent to the previous example with explicit shape and size, and we could have passed one of those arguments explicitly here. Because we did not, we refer to these batched dimensions as being *implicit*.\n", + "This is equivalent to the previous example with explicit shape, and we could have passed it explicitly here. Because we did not, we refer to these batch dimensions as being *implicit*.\n", "\n", - "Where this becomes very useful is when we want the parameters to vary across batched dimensions." + "Where this becomes very useful is when we want the parameters to vary across batch dimensions." ] }, { @@ -353,7 +326,7 @@ { "data": { "text/plain": [ - "array([ 1.00005088, 10.00001614, 99.99992541])" + "array([ 0.99982989, 9.99993687, 99.99977518])" ] }, "execution_count": 9, @@ -400,9 +373,221 @@ "np.broadcast_arrays([1, 10, 100], 0.0001)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It's important to understand how NumPy {ref}`broadcasting ` works. When you do something that is not valid, you will easily encounter this sort of errors:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "shape mismatch: objects cannot be broadcast to a single shape. Mismatch is between arg 0 with shape (3,) and arg 1 with shape (2,).\n", + "Apply node that caused the error: normal_rv{0, (0, 0), floatX, True}(RandomGeneratorSharedVariable(), TensorConstant{[]}, TensorConstant{11}, TensorConstant{[ 1 10 100]}, TensorConstant{(2,) of 0.1})\n", + "Toposort index: 0\n", + "Inputs types: [RandomGeneratorType, TensorType(int64, (0,)), TensorType(int64, ()), TensorType(int64, (3,)), TensorType(float64, (2,))]\n", + "Inputs shapes: ['No shapes', (0,), (), (3,), (2,)]\n", + "Inputs strides: ['No strides', (8,), (), (8,), (8,)]\n", + "Inputs values: [Generator(PCG64) at 0x7F1B827A2DC0, array([], dtype=int64), array(11), array([ 1, 10, 100]), array([0.1, 0.1])]\n", + "Outputs clients: [['output'], ['output']]\n", + "\n", + "HINT: Re-running with most Aesara optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the Aesara flag 'optimizer=fast_compile'. If that does not work, Aesara optimizations can be disabled with 'optimizer=None'.\n", + "HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.\n" + ] + } + ], + "source": [ + "try:\n", + " # shapes of (3,) and (2,) can't be broadcasted together\n", + " pm.draw(pm.Normal.dist(mu=[1,10,100], sigma=[0.1, 0.1]))\n", + "except ValueError as error:\n", + " print(error)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Combining implicit and explicit batch dimensions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can combine explicit shape dimensions with implicit batch dimensions. As mentioned above, they can provide the same information." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([1.23987782, 0.42750972, 1.63048778])" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "normal_dists = pm.Normal.dist(mu=np.array([0, 1, 2]), sigma=1, shape=(3,))\n", + "pm.draw(normal_dists)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "But shape can also be used to extend beyond any implicit batch dimensions." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[-0.52659658, 0.19331684, 2.55517416],\n", + " [ 2.15805379, 2.46910163, 1.41661417],\n", + " [-0.33302049, -0.30511353, 3.17642351],\n", + " [-0.05964742, 0.49239052, 2.35364406]])" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "normal_dists = pm.Normal.dist(mu=np.array([0, 1, 2]), sigma=1, shape=(4, 3))\n", + "pm.draw(normal_dists)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that, due to broadcasting rules, explicit batch dimensions must always \"go on the left\" of any implicit dimensions. So in the previous example `shape=(4, 3)` is valid, but `shape=(3, 4)` is not, because the `mu` parameter can be broadcasted to the first shape but not to the second." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "shape mismatch: objects cannot be broadcast to a single shape. Mismatch is between arg 0 with shape (3, 4) and arg 1 with shape (3,).\n", + "Apply node that caused the error: normal_rv{0, (0, 0), floatX, True}(RandomGeneratorSharedVariable(), TensorConstant{[3 4]}, TensorConstant{11}, TensorConstant{[0 1 2]}, TensorConstant{1.0})\n", + "Toposort index: 0\n", + "Inputs types: [RandomGeneratorType, TensorType(int64, (2,)), TensorType(int64, ()), TensorType(int64, (3,)), TensorType(float64, ())]\n", + "Inputs shapes: ['No shapes', (2,), (), (3,), ()]\n", + "Inputs strides: ['No strides', (8,), (), (8,), ()]\n", + "Inputs values: [Generator(PCG64) at 0x7F1B8544A960, array([3, 4]), array(11), array([0, 1, 2]), array(1.)]\n", + "Outputs clients: [['output'], ['output']]\n", + "\n", + "HINT: Re-running with most Aesara optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the Aesara flag 'optimizer=fast_compile'. If that does not work, Aesara optimizations can be disabled with 'optimizer=None'.\n", + "HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.\n" + ] + } + ], + "source": [ + "try:\n", + " pm.draw(pm.Normal.dist(mu=np.array([0, 1, 2]), sigma=1, shape=(3, 4)))\n", + "except ValueError as error:\n", + " print(error)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you needed the Normal variables to have `shape=(4, 3)`, you can transpose it after defining it." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[-1.45172328, -1.74405501, -1.16531603, -0.04438762],\n", + " [ 2.45287838, -0.31967906, 2.75234018, 1.99526188],\n", + " [-0.33636071, 2.53181819, 2.35703438, 2.00117122]])" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "transposed_normals = pm.Normal.dist(mu=np.array([0, 1, 2]), sigma=1, shape=(4, 3)).T\n", + "pm.draw(transposed_normals)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + ":::{tip} It's important not to confuse dimensions set in the definition of a distribution versus those set in downstream manipulations like transposition, indexing or broadcasting. When sampling with PyMC (be it via forward sampling or MCMC), the random draws will always emanate from the distribution shape. Notice how in the following example, a different number of \"random\" draws were actually taken, despite the two variables having the same final shape." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "vector_normals = pm.Normal.dist(size=(3,))\n", + "broadcasted_normal = at.broadcast_to(pm.Normal.dist(), (3,))" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(array([-1.47755734, 1.03483024, 0.78655336]),\n", + " array([-0.25915453, -0.25915453, -0.25915453]))" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pm.draw(vector_normals), pm.draw(broadcasted_normal)" + ] + }, { "cell_type": "markdown", "metadata": { + "heading_collapsed": true, "pycharm": { "name": "#%% md\n" } @@ -414,6 +599,7 @@ { "cell_type": "markdown", "metadata": { + "hidden": true, "pycharm": { "name": "#%% md\n" } @@ -424,8 +610,9 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 18, "metadata": { + "hidden": true, "pycharm": { "name": "#%%\n" } @@ -434,10 +621,10 @@ { "data": { "text/plain": [ - "(array([0.59439084, 0.20156665, 1.10092247]), (3,))" + "(array([ 1.14823585, -0.38009172, 0.89913669]), 1)" ] }, - "execution_count": 11, + "execution_count": 18, "metadata": {}, "output_type": "execute_result" } @@ -445,12 +632,13 @@ "source": [ "mvnormal_dist = pm.MvNormal.dist(mu=np.ones(3), cov=np.eye(3))\n", "mvnormal_draw = pm.draw(mvnormal_dist)\n", - "mvnormal_draw, mvnormal_draw.shape" + "mvnormal_draw, mvnormal_draw.ndim" ] }, { "cell_type": "markdown", "metadata": { + "hidden": true, "pycharm": { "name": "#%% md\n" } @@ -461,8 +649,9 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 19, "metadata": { + "hidden": true, "pycharm": { "name": "#%%\n" } @@ -474,7 +663,7 @@ "1" ] }, - "execution_count": 12, + "execution_count": 19, "metadata": {}, "output_type": "execute_result" } @@ -486,6 +675,7 @@ { "cell_type": "markdown", "metadata": { + "hidden": true, "pycharm": { "name": "#%% md\n" } @@ -496,8 +686,9 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 20, "metadata": { + "hidden": true, "pycharm": { "name": "#%%\n" } @@ -506,10 +697,10 @@ { "data": { "text/plain": [ - "(array([1.46335572]), (1,))" + "(array([2.22738707]), 1)" ] }, - "execution_count": 13, + "execution_count": 20, "metadata": {}, "output_type": "execute_result" } @@ -517,12 +708,14 @@ "source": [ "smallest_mvnormal_dist = pm.MvNormal.dist(mu=[1], cov=[[1]])\n", "smallest_mvnormal_draw = pm.draw(smallest_mvnormal_dist)\n", - "smallest_mvnormal_draw, smallest_mvnormal_draw.shape" + "smallest_mvnormal_draw, smallest_mvnormal_draw.ndim" ] }, { "cell_type": "markdown", "metadata": { + "heading_collapsed": true, + "hidden": true, "pycharm": { "name": "#%% md\n" } @@ -534,18 +727,20 @@ { "cell_type": "markdown", "metadata": { + "hidden": true, "pycharm": { "name": "#%% md\n" } }, "source": [ - "In the MvNormal examples we just saw, the support dimension was actually implicit. Nowhere did we specify we wanted a vector of 3 or 1 draws. This was inferred from the shape of `mu` and `cov`. As such, we refer to it as being an *implicit support dimension*. We could be explicit by using shape, **but not size**." + "In the MvNormal examples we just saw, the support dimension was actually implicit. Nowhere did we specify we wanted a vector of 3 or 1 draws. This was inferred from the shape of `mu` and `cov`. As such, we refer to it as being an *implicit support dimension*. We could be a bit more explicit by using shape." ] }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 21, "metadata": { + "hidden": true, "pycharm": { "name": "#%%\n" } @@ -557,7 +752,7 @@ "array([3])" ] }, - "execution_count": 14, + "execution_count": 21, "metadata": {}, "output_type": "execute_result" } @@ -567,134 +762,202 @@ "explicit_mvnormal.shape.eval()" ] }, + { + "cell_type": "markdown", + "metadata": { + "hidden": true + }, + "source": [ + ":::{warning} However, note that at the time of writing shape is simply ignored for support dimensions. It serves merely as a \"type-hint\" for labeling the expected dimensions. :::" + ] + }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 22, "metadata": { - "pycharm": { - "name": "#%%\n" - } + "hidden": true }, "outputs": [ { "data": { "text/plain": [ - "array([3, 3])" + "array([3])" ] }, - "execution_count": 15, + "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "not_the_same = pm.MvNormal.dist(mu=np.ones(3), cov=np.eye(3), size=(3,))\n", - "not_the_same.shape.eval()" + "ignored_shape_mvnormal = pm.MvNormal.dist(mu=np.ones(3), cov=np.eye(3), shape=(4,))\n", + "explicit_mvnormal.shape.eval()" ] }, { "cell_type": "markdown", "metadata": { + "heading_collapsed": true, + "hidden": true, "pycharm": { "name": "#%% md\n" } }, "source": [ - "Size refers to the number of independent batched distributions whereas shape refers to the total number of draws. This is a subtle but important distinction. It is perhaps more apparent when we talk about batched dimensions for multivariate distributions next." + "### Explicit batch dimensions" ] }, { "cell_type": "markdown", "metadata": { + "hidden": true, "pycharm": { "name": "#%% md\n" } }, "source": [ - "### Explicit batch dimensions" + "As with univariate distributions, we can add explicit batched dimensions. We will use another vector distribution to illustrate this: the Multinomial. The following snippet defines a matrix of five independent Multinomial distributions, each of which is a vector of size 3." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "hidden": true, + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[0, 1, 4],\n", + " [0, 1, 4],\n", + " [1, 1, 3],\n", + " [0, 3, 2],\n", + " [0, 2, 3]])" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pm.draw(pm.Multinomial.dist(n=5, p=[0.1, 0.3, 0.6], shape=(5, 3)))" ] }, { "cell_type": "markdown", "metadata": { + "hidden": true, "pycharm": { "name": "#%% md\n" } }, "source": [ - "As with univariate distributions, we can add explicit batched dimensions. However, as we just mentioned size and shape now have different meanings. We will use another vector distribution to illustrate this: the Multinomial." + ":::{warning} Again, note that shape has no effect on the support dimensionality :::" ] }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 24, "metadata": { - "pycharm": { - "name": "#%%\n" - } + "hidden": true }, "outputs": [ { "data": { "text/plain": [ - "array([[0, 2, 3],\n", - " [0, 3, 2],\n", - " [0, 2, 3],\n", - " [0, 3, 2],\n", - " [2, 3, 0]])" + "array([[1, 2, 2],\n", + " [1, 2, 2],\n", + " [1, 1, 3],\n", + " [0, 1, 4],\n", + " [1, 0, 4]])" ] }, - "execution_count": 16, + "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "pm.draw(pm.Multinomial.dist(n=5, p=[0.1, 0.3, 0.6], shape=(5, 3)))" + "pm.draw(pm.Multinomial.dist(n=5, p=[0.1, 0.3, 0.6], shape=(5, 4)))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "hidden": true + }, + "source": [ + "For the same reason, you must always define explicit batched dimensions \"to the left\" of the support dimension. The following will not behave as expected." ] }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 25, "metadata": { - "pycharm": { - "name": "#%%\n" - } + "hidden": true }, "outputs": [ { "data": { "text/plain": [ - "array([[0, 1, 4],\n", - " [1, 4, 0],\n", - " [0, 3, 2],\n", - " [2, 1, 2],\n", - " [1, 2, 2]])" + "array([[0, 0, 5],\n", + " [0, 1, 4],\n", + " [0, 2, 3]])" ] }, - "execution_count": 17, + "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "pm.draw(pm.Multinomial.dist(n=5, p=[0.1, 0.3, 0.6], size=(5,)))" + "pm.draw(pm.Multinomial.dist(n=5, p=[0.1, 0.3, 0.6], shape=(3, 5)))" ] }, { "cell_type": "markdown", "metadata": { - "pycharm": { - "name": "#%% md\n" - } + "hidden": true }, "source": [ - "A size of five reflects the fact that we are sampling a matrix of from 5 independent Multinomial distributions. Each of these Multinomial distributions defines a vector of 3 values that are **not** independent of each other. In this case once we know the first two values of each vector we can infer the third one without ambiguity. Shape, in contrast, simply refers to the dimensions of all the draws combined." + "If you needed the Multinomial variables to have `shape=(3, 5)` you can transpose it after defining it." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "hidden": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[0, 0, 0, 0, 1],\n", + " [2, 3, 1, 1, 1],\n", + " [3, 2, 4, 4, 3]])" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "transposed_multinomials = pm.Multinomial.dist(n=5, p=[0.1, 0.3, 0.6], shape=(5, 3)).T\n", + "pm.draw(transposed_multinomials)" ] }, { "cell_type": "markdown", "metadata": { + "heading_collapsed": true, + "hidden": true, "pycharm": { "name": "#%% md\n" } @@ -706,6 +969,7 @@ { "cell_type": "markdown", "metadata": { + "hidden": true, "pycharm": { "name": "#%% md\n" } @@ -716,8 +980,9 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 27, "metadata": { + "hidden": true, "pycharm": { "name": "#%%\n" } @@ -726,11 +991,11 @@ { "data": { "text/plain": [ - "array([[0, 2, 3],\n", - " [1, 5, 4]])" + "array([[0, 1, 4],\n", + " [0, 4, 6]])" ] }, - "execution_count": 18, + "execution_count": 27, "metadata": {}, "output_type": "execute_result" } @@ -743,6 +1008,7 @@ { "cell_type": "markdown", "metadata": { + "hidden": true, "pycharm": { "name": "#%% md\n" } @@ -753,8 +1019,9 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 28, "metadata": { + "hidden": true, "pycharm": { "name": "#%%\n" } @@ -763,11 +1030,11 @@ { "data": { "text/plain": [ - "array([[0, 2, 3],\n", - " [0, 3, 7]])" + "array([[1, 1, 3],\n", + " [1, 4, 5]])" ] }, - "execution_count": 19, + "execution_count": 28, "metadata": {}, "output_type": "execute_result" } @@ -779,6 +1046,7 @@ { "cell_type": "markdown", "metadata": { + "hidden": true, "pycharm": { "name": "#%% md\n" } @@ -789,8 +1057,9 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 29, "metadata": { + "hidden": true, "pycharm": { "name": "#%%\n" } @@ -800,7 +1069,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "shape mismatch: objects cannot be broadcast to a single shape\n" + "shape mismatch: objects cannot be broadcast to a single shape. Mismatch is between arg 0 with shape (2,) and arg 1 with shape (3,).\n" ] } ], @@ -814,18 +1083,19 @@ { "cell_type": "markdown", "metadata": { - "pycharm": { - "name": "#%% md\n" - } + "hidden": true }, "source": [ - "Instead, PyMC takes into consideration the number of dimensions that each parameter has in the core case. In the Multinomial distribution, `n` is a scalar and `p` is a vector. So if we have a vector of two `n`, we should actually broadcast the vector of `p` into a `matrix` with two such vectors, and pair each `n` with each broadcasted row of `p`. This works exactly like `np.vectorize` would." + "To understand what is going on, we need to introduce the concept of parameter core dimensions. The core dimensions of a distribution's parameter are the minimum number of dimensions the parameters need to have in order to define a distribution. In the Multinomial distribution, `n` must at least be an scalar integer, but `p` must be at least a vector that represents the probability of having an outcome on each category. So, for the Multinomial distribution, `n` has 0 core dimensions, and `p` has 1 core dimension. \n", + "\n", + "So if we have a vector of two `n`, we should actually broadcast the vector of `p` into a matrix with two such vectors, and pair each `n` with each broadcasted row of `p`. This works exactly like `np.vectorize`." ] }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 30, "metadata": { + "hidden": true, "pycharm": { "name": "#%%\n" } @@ -842,11 +1112,11 @@ { "data": { "text/plain": [ - "array([[2, 1, 2],\n", - " [0, 5, 5]])" + "array([[1, 1, 3],\n", + " [2, 2, 6]])" ] }, - "execution_count": 21, + "execution_count": 30, "metadata": {}, "output_type": "execute_result" } @@ -863,6 +1133,7 @@ { "cell_type": "markdown", "metadata": { + "hidden": true, "pycharm": { "name": "#%% md\n" } @@ -873,8 +1144,9 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 31, "metadata": { + "hidden": true, "pycharm": { "name": "#%%\n" } @@ -886,7 +1158,7 @@ "(0, 1)" ] }, - "execution_count": 22, + "execution_count": 31, "metadata": {}, "output_type": "execute_result" } @@ -898,29 +1170,74 @@ { "cell_type": "markdown", "metadata": { + "hidden": true + }, + "source": [ + "Implicit batch dimensions must still respect broadcasting rules. The following example is not valid because `n` has batched dimensions of `shape=(2,)` and `p` has batched dimensions of `shape=(3,)` which cannot be broadcasted together." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "hidden": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "operands could not be broadcast together with remapped shapes [original->remapped]: (2,) and requested shape (3,)\n", + "Apply node that caused the error: multinomial_rv{1, (0, 1), int64, True}(RandomGeneratorSharedVariable(), TensorConstant{[]}, TensorConstant{4}, TensorConstant{[ 5 10]}, TensorConstant{[[0.1 0.3 .. 0.3 0.6]]})\n", + "Toposort index: 0\n", + "Inputs types: [RandomGeneratorType, TensorType(int64, (0,)), TensorType(int64, ()), TensorType(int64, (2,)), TensorType(float64, (3, 3))]\n", + "Inputs shapes: ['No shapes', (0,), (), (2,), (3, 3)]\n", + "Inputs strides: ['No strides', (8,), (), (8,), (24, 8)]\n", + "Inputs values: [Generator(PCG64) at 0x7F1B7A088740, array([], dtype=int64), array(4), array([ 5, 10]), 'not shown']\n", + "Outputs clients: [['output'], ['output']]\n", + "\n", + "HINT: Re-running with most Aesara optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the Aesara flag 'optimizer=fast_compile'. If that does not work, Aesara optimizations can be disabled with 'optimizer=None'.\n", + "HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.\n" + ] + } + ], + "source": [ + "try:\n", + " pm.draw(pm.Multinomial.dist(n=[5, 10], p=[[0.1, 0.3, 0.6], [0.1, 0.3, 0.6], [0.1, 0.3, 0.6]]))\n", + "except ValueError as error:\n", + " print(error)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "heading_collapsed": true, + "hidden": true, "pycharm": { "name": "#%% md\n" } }, "source": [ - "### Combining implicit and explicit dimensions" + "### Combining implicit and explicit batch dimensions" ] }, { "cell_type": "markdown", "metadata": { + "hidden": true, "pycharm": { "name": "#%% md\n" } }, "source": [ - "You can and should combine implicit dimensions from multidimensional parameters with explicit size or shape information, which is easier to reason about." + "You can and should combine implicit dimensions from multidimensional parameters with explicit shape information, which is easier to reason about." ] }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 33, "metadata": { + "hidden": true, "pycharm": { "name": "#%%\n" } @@ -929,11 +1246,11 @@ { "data": { "text/plain": [ - "array([[1, 1, 3],\n", - " [1, 1, 8]])" + "array([[0, 1, 4],\n", + " [1, 0, 9]])" ] }, - "execution_count": 23, + "execution_count": 33, "metadata": {}, "output_type": "execute_result" } @@ -942,29 +1259,45 @@ "pm.draw(pm.Multinomial.dist(n=[5, 10], p=[0.1, 0.3, 0.6], shape=(2, 3)))" ] }, + { + "cell_type": "markdown", + "metadata": { + "hidden": true + }, + "source": [ + "Explicit batch dimensions can still extend beyond any implicit batch dimensions. Again, due to how broadcasting works, explicit batch dimensions must always \"go on the left\". The following case is invalid, because `n` has batched dimensions of `shape=(2,)`, which cannot be broadcasted to the explicit batch dimensions of `shape=(2, 4)`." + ] + }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 34, "metadata": { - "pycharm": { - "name": "#%%\n" - } + "hidden": true }, "outputs": [ { - "data": { - "text/plain": [ - "array([[0, 2, 3],\n", - " [3, 4, 3]])" - ] - }, - "execution_count": 24, - "metadata": {}, - "output_type": "execute_result" + "name": "stdout", + "output_type": "stream", + "text": [ + "shape mismatch: objects cannot be broadcast to a single shape\n", + "Apply node that caused the error: multinomial_rv{1, (0, 1), int64, True}(RandomGeneratorSharedVariable(), TensorConstant{[2 4]}, TensorConstant{4}, TensorConstant{[ 5 10]}, TensorConstant{[0.1 0.3 0.6]})\n", + "Toposort index: 0\n", + "Inputs types: [RandomGeneratorType, TensorType(int64, (2,)), TensorType(int64, ()), TensorType(int64, (2,)), TensorType(float64, (3,))]\n", + "Inputs shapes: ['No shapes', (2,), (), (2,), (3,)]\n", + "Inputs strides: ['No strides', (8,), (), (8,), (8,)]\n", + "Inputs values: [Generator(PCG64) at 0x7F1B79FF3060, array([2, 4]), array(4), array([ 5, 10]), array([0.1, 0.3, 0.6])]\n", + "Outputs clients: [['output'], ['output']]\n", + "\n", + "HINT: Re-running with most Aesara optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the Aesara flag 'optimizer=fast_compile'. If that does not work, Aesara optimizations can be disabled with 'optimizer=None'.\n", + "HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.\n" + ] } ], "source": [ - "pm.draw(pm.Multinomial.dist(n=[5, 10], p=[0.1, 0.3, 0.6], size=(2,)))" + "try:\n", + " pm.draw(pm.Multinomial.dist(n=[5, 10], p=[0.1, 0.3, 0.6], shape=(2, 4, 3)))\n", + "except ValueError as error:\n", + " print(error)" ] }, { @@ -975,14 +1308,19 @@ } }, "source": [ - "## Inspecting dimensionality with a model graph\n", - "More often that not distributions are used inside a PyMC model, and as such there are tools that facilitate reasoning about distributions shapes in that context.\n", - "\n" + "## Inspecting dimensionality with a model graph" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "More often than not distributions are used inside a PyMC model, and as such there are tools that facilitate reasoning about distributions shapes in that context." ] }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 35, "metadata": { "pycharm": { "name": "#%%\n" @@ -990,14 +1328,14 @@ }, "outputs": [ { - "data": { - "text/plain": [ - "{'x': (3,), 'sigma_log__': (), 'sigma': (), 'y': (3,)}" - ] - }, - "execution_count": 25, - "metadata": {}, - "output_type": "execute_result" + "name": "stdout", + "output_type": "stream", + "text": [ + " x: shape=(3,)\n", + "sigma_log__: shape=()\n", + " sigma: shape=()\n", + " y: shape=(3,)\n" + ] } ], "source": [ @@ -1006,7 +1344,8 @@ " sigma = pm.HalfNormal(\"sigma\")\n", " y = pm.Normal(\"y\", mu=mu, sigma=sigma)\n", "\n", - "pmodel.eval_rv_shapes()" + "for rv, shape in pmodel.eval_rv_shapes().items():\n", + " print(f\"{rv:>11}: shape={shape}\")" ] }, { @@ -1017,12 +1356,12 @@ } }, "source": [ - "An even more powerful tool to understand and debug dimensionality in PyMC is the `pm.model_to_graphviz` functionality. Rather than inspecting array outputs we instead can read the Graphviz output to understand the dimensionality." + "An even more powerful tool to understand and debug dimensionality in PyMC is the {func}`~pymc.model_to_graphviz` function. Rather than inspecting array outputs we can instead read the Graphviz output to understand the dimensionality of the variables." ] }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 36, "metadata": { "pycharm": { "name": "#%%\n" @@ -1048,24 +1387,24 @@ "\n", "3\n", "\n", - "\n", - "\n", - "x\n", - "\n", - "x\n", - "~\n", - "Normal\n", - "\n", "\n", - "\n", + "\n", "y\n", "\n", "y\n", "~\n", "Normal\n", "\n", + "\n", + "\n", + "x\n", + "\n", + "x\n", + "~\n", + "Normal\n", + "\n", "\n", - "\n", + "\n", "x->y\n", "\n", "\n", @@ -1079,7 +1418,7 @@ "HalfNormal\n", "\n", "\n", - "\n", + "\n", "sigma->y\n", "\n", "\n", @@ -1088,10 +1427,10 @@ "\n" ], "text/plain": [ - "" + "" ] }, - "execution_count": 26, + "execution_count": 36, "metadata": {}, "output_type": "execute_result" } @@ -1115,7 +1454,7 @@ }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 37, "metadata": { "pycharm": { "name": "#%%\n" @@ -1131,76 +1470,62 @@ "\n", "\n", - "\n", + "\n", "\n", "%3\n", - "\n", + "\n", "\n", "cluster3\n", - "\n", - "3\n", + "\n", + "3\n", "\n", "\n", "cluster4\n", - "\n", - "4\n", + "\n", + "4\n", "\n", - "\n", - "cluster5\n", - "\n", - "5\n", - "\n", - "\n", + "\n", "\n", - "scalar\n", - "\n", - "scalar\n", - "~\n", - "Normal\n", + "scalar (support)\n", + "\n", + "scalar (support)\n", + "~\n", + "Normal\n", "\n", "\n", "\n", "vector (implicit)\n", - "\n", - "vector (implicit)\n", - "~\n", - "Normal\n", + "\n", + "vector (implicit)\n", + "~\n", + "Normal\n", "\n", - "\n", + "\n", "\n", - "vector (shape)\n", - "\n", - "vector (shape)\n", - "~\n", - "Normal\n", - "\n", - "\n", - "\n", - "vector (size)\n", - "\n", - "vector (size)\n", - "~\n", - "Normal\n", + "vector (explicit)\n", + "\n", + "vector (explicit)\n", + "~\n", + "Normal\n", "\n", "\n", "\n" ], "text/plain": [ - "" + "" ] }, - "execution_count": 27, + "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with pm.Model() as pmodel:\n", - " pm.Normal(\"scalar\") # shape=()\n", + " pm.Normal(\"scalar (support)\")\n", " pm.Normal(\"vector (implicit)\", mu=[1,2,3])\n", - " pm.Normal(\"vector (shape)\", shape=(4,))\n", - " pm.Normal(\"vector (size)\", size=(5,))\n", + " pm.Normal(\"vector (explicit)\", shape=(4,))\n", " \n", "pm.model_to_graphviz(pmodel)" ] @@ -1213,13 +1538,19 @@ } }, "source": [ - "## Dims\n", - "A new feature of PyMC is `dims` support. With many random variables it can become confusing which dimensionality corresponds to which \"real world\" idea, e.g. number of observations, number of treated units etc. The dims argument is an additional label to help." + "## Dims" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "PyMC supports the concept of `dims`. With many random variables it can become confusing which dimensionality corresponds to which \"real world\" idea, e.g. number of observations, number of treated units etc. The `dims` argument is an additional human-readable label that can convey this meaning." ] }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 38, "metadata": { "pycharm": { "name": "#%%\n" @@ -1284,10 +1615,10 @@ "\n" ], "text/plain": [ - "" + "" ] }, - "execution_count": 28, + "execution_count": 38, "metadata": {}, "output_type": "execute_result" } @@ -1296,8 +1627,8 @@ "with pm.Model() as pmodel:\n", " pm.Normal(\"crayon\", size=2, dims=\"colors\")\n", "\n", - " pm.Normal(\"hyperprior\", [1,2,3,4], dims=\"group\")\n", - " pm.Normal(\"prior\", dims=\"group\")\n", + " hyperprior = pm.Normal(\"hyperprior\", [1,2,3,4], dims=\"group\")\n", + " pm.Normal(\"prior\", mu=hyperprior, dims=\"group\")\n", "\n", "\n", "pm.model_to_graphviz(pmodel)" @@ -1311,12 +1642,12 @@ } }, "source": [ - "Where dims can become increasingly powerful is with the use of `coords` specified in the model itself. With this it becomes easy to track. As an added bonus the coords and dims will also be present in the returned {class}'arviz.InferenceData' simplifying the entire workflow." + "Where `dims` can become increasingly powerful is with the use of `coords` specified in the model itself. This gives a unique label to each `dim` entry, rendering it much more meaningful." ] }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 39, "metadata": { "pycharm": { "name": "#%%\n" @@ -1354,10 +1685,10 @@ "\n" ], "text/plain": [ - "" + "" ] }, - "execution_count": 29, + "execution_count": 39, "metadata": {}, "output_type": "execute_result" } @@ -1380,7 +1711,7 @@ } }, "source": [ - "Note that the dimensionality of the distribution was actually defined by the `dims` used. We did not pass shape or size!" + "Note that the dimensionality of the distribution was actually defined by the `dims` used. We did not pass shape or define implicit batched dimensions." ] }, { @@ -1391,12 +1722,12 @@ } }, "source": [ - "We can use all our dimensionality tools to help us reason about Multivariate distributions as well." + "Let us to review the different dimensionality flavours with a Multivariate Normal example." ] }, { "cell_type": "code", - "execution_count": 30, + "execution_count": 40, "metadata": { "pycharm": { "name": "#%%\n" @@ -1412,109 +1743,73 @@ "\n", "\n", - "\n", + "\n", "\n", "%3\n", - "\n", + "\n", "\n", "clustersupport (3)\n", - "\n", - "support (3)\n", + "\n", + "support (3)\n", "\n", "\n", "clusterbatch (4) x support (3)\n", - "\n", - "batch (4) x support (3)\n", + "\n", + "batch (4) x support (3)\n", "\n", - "\n", + "\n", "\n", - "implicit\n", - "\n", - "implicit\n", - "~\n", - "MvNormal\n", - "\n", - "\n", - "\n", - "explicit\n", + "vector\n", "\n", - "explicit\n", + "vector\n", "~\n", "MvNormal\n", "\n", - "\n", - "\n", - "batched size\n", - "\n", - "batched size\n", - "~\n", - "MvNormal\n", - "\n", - "\n", - "\n", - "batched coords\n", - "\n", - "batched coords\n", - "~\n", - "MvNormal\n", + "\n", + "\n", + "matrix (explicit)\n", + "\n", + "matrix (explicit)\n", + "~\n", + "MvNormal\n", "\n", - "\n", - "\n", - "batched shape\n", - "\n", - "batched shape\n", - "~\n", - "MvNormal\n", + "\n", + "\n", + "matrix (implicit)\n", + "\n", + "matrix (implicit)\n", + "~\n", + "MvNormal\n", "\n", "\n", "\n" ], "text/plain": [ - "" + "" ] }, - "execution_count": 30, + "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with pm.Model(coords={\n", - " \"batch\": [2019, 2020, 2021, 2022],\n", + " \"batch\": [0, 1, 2, 3],\n", "}) as pmodel:\n", - " pm.MvNormal(\"implicit\", mu=[0, 0, 0], cov=np.eye(3), dims=(\"support\",))\n", - " pm.MvNormal(\"explicit\", mu=[0, 0, 0], cov=np.eye(3), shape=(3,), dims=(\"support\",))\n", - "\n", - " pm.MvNormal(\"batched size\", mu=[0, 0, 0], cov=np.eye(3), size=4, dims=(\"batch\", \"support\"))\n", - " pm.MvNormal(\"batched shape\", mu=[0, 0, 0], cov=np.eye(3), shape=(4, 3), dims=(\"batch\", \"support\"))\n", - " pm.MvNormal(\"batched coords\", mu=[0, 0, 0], cov=np.eye(3), dims=(\"batch\", \"support\"))\n", + " pm.MvNormal(\"vector\", mu=[0, 0, 0], cov=np.eye(3), dims=(\"support\",))\n", + " pm.MvNormal(\"matrix (implicit)\", mu=np.zeros((4, 3)), cov=np.eye(3), dims=(\"batch\", \"support\"))\n", + " pm.MvNormal(\"matrix (explicit)\", mu=[0, 0, 0], cov=np.eye(3), shape=(4, 3), dims=(\"batch\", \"support\"))\n", "\n", "pm.model_to_graphviz(pmodel)" ] }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, - "source": [ - "## Ellipsis" - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], + "metadata": {}, "source": [ - "# TODO" + ":::{tip} For final model publication we suggest dims and coords as the labels will be passed to {class}'arviz.InferenceData'. This is both best practice transparency and readability for others. It also is useful in single developer workflows, for example, in cases where there is a 3 dimensional or higher distribution it'll help indiciate which dimension corresponds to which model concept." ] }, { @@ -1536,11 +1831,12 @@ } }, "source": [ - "While we provide all these tools for convenience, and while PyMC does it best to understand user intent, the result of mixed dimensionality tools may not always result in the final dimensionality intended. Sometimes the model may not indicate an error until sampling, or not indicate an issue at all. When working with dimensionality, particular more complex ones we suggest\n", + "While we provide all these tools for convenience, and while PyMC does it best to understand user intent, the result of mixed dimensionality tools may not always result in the final dimensionality intended. Sometimes the model may not indicate an error until sampling, or not indicate an issue at all. When working with dimensionality, particular more complex ones we suggest:\n", "\n", - "* Using GraphViz to visualize your model before sampling\n", - "* Using `pm.draw` or `pm.sample_prior predictive` to catch errors early\n", - "* Inspecting the returned `az.InferenceData` object to ensure all array sizes are as intended" + "* Using `model_to_graphviz` to visualize your model before sampling\n", + "* Using `draw` or `sample_prior predictive` to catch errors early\n", + "* Inspecting the returned `az.InferenceData` object to ensure all array sizes are as intended\n", + "* Defining shapes with prime numbers when tracking down errors." ] }, { @@ -1561,9 +1857,9 @@ "hash": "f574fac5b7e4a41f7640949d1e1759089329dd116ff7b389caa9cf21f93d872d" }, "kernelspec": { - "display_name": "Python 3", + "display_name": "pymc", "language": "python", - "name": "python3" + "name": "pymc" }, "language_info": { "codemirror_mode": { @@ -1575,9 +1871,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.10" + "version": "3.10.4" } }, "nbformat": 4, "nbformat_minor": 4 -} \ No newline at end of file +}