-
Notifications
You must be signed in to change notification settings - Fork 0
/
random-array-generation.dj
82 lines (59 loc) · 5.26 KB
/
random-array-generation.dj
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
{date="2024/02/04"}
# Generate random bit string with k ones, succinct!
While solving [12th day challenge][aoc2023day12] of recent Advent of Code I ran into following subtask required for the solution:
[aoc2023day12]: https://adventofcode.com/2023/day/12
> You need to *uniformly* generate random array of bits B = [b~0~, b~1~, ..., b~n-1~], such that there is exactly k ones
>
> For example, for *n = 4*, *k = 2* there are 6 valid arrays configurations:
>
> 1. *B = 1100*
> 2. *B = 1010*
> 3. *B = 1001*
> 4. *B = 0110*
> 5. *B = 0101*
> 6. *B = 0011*
It's not a direct subtask and couple reductions required before getting into this problem statement --- but this is not so important (and anyway I chose very weird approach to use randomized algorithm with `O(1)` additional space just for fun).
So, how can we uniformly generate random bit string of length *`n`* with exactly *`k`* ones fast using only constant amount of memory?
## Simple approach
The simplest option is to just take valid array configuration and apply random fair shuffle algorithm to it.
```rust
fn generate_non_succinct(rng: &mut SmallRng, n: usize, k: usize) -> Vec<i32> {
let mut array: Vec<i32> = repeat(1).take(k).chain(repeat(0).take(n - k)).collect();
array.shuffle(rng);
return array;
}
```
This is perfect approach which should be used in any real-life problem as it simple, concise, robust and performant enough. But unfortunately, this solution requires `O(n)` additional memory for generating routine -- which is not what we wanted to accomplish.
## Fast solution
Actually, fast succinct solution is pretty easy and straightforward -- we can just maintain amount of generated ones **s** on the prefix of length **i** and put next one with probability **`(k-s)/(n-i)`**. The code for this procedure is very simple (and also cool, thanks to the [scan][rust-scan] stateful method in `std::iter`):
[rust-scan]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.scan
```rust
fn generate_succinct<'a>(rng: &'a mut SmallRng, n: usize, k: usize) -> impl Iterator<Item=i32> + 'a {
return (0..n).scan(0, move |s, i| {
let outcome = if rng.gen_range(0..n-i) < k - *s { 1 } else { 0 };
*s += outcome;
Some(outcome as i32)
});
}
```
It's not so straightforward to prove that every sequence has same probability equals to **`1/C(n, k)`** where **[`C(n,k)`][cnk]** is **`n!/k!/(n-k)!`**.
[cnk]: https://en.wikipedia.org/wiki/Binomial_coefficient
First, we need to show that `generate_succinct` function can generate every possible array with **`k`** ones and no other output can be generated with this function. Indeed, we can't generate sequences with **`> k`** ones as we will have `0%` probability of generating **1** when we reach exactly **`k`** ones in a prefix (**`k - *s == 0`**). Also, we can't generate sequences with **`< k`** ones as at some point we will inevitably have `100%` probability of generating **1** (**`n - i == k - *s`**).
Last move -- we need to prove that every possible outcome will have same probability. We are making exactly **`n`** choices with probability of **`(k-s)/(n-i)`** each. If we multiply all denominators independently we will immediately get **`n!`**. Considering nominator of all positive choices (generating **1**) independently we will get **`k!`**. And finally -- nominators for all negative choices (generating **0**) will get us **`(n-k)!`**.
## Weird solution
In the AoC solution I implemented another approach for generating sequence succinctly. Due to the task specific I was allowed to generated bad sequences given that they can be easily filtered out without any additional memory. Considering this, I chose to generate random binary sequence with skewed one probability of **`k/n`**. This way we will get correct sequence with probability **`C(n,k)*(k/n)`{^`k`^}`*((n-k)/n)`^`n-k`^**. If we are interested in asymptotic approximation we can use [Stirling formula][stirling] and get following probability: **`√n / √(2π k(n-k))`**. We should be careful with applying this formula to edge cases with very small / very large k values as approximation for binomial coefficient will work only if **`k = Ω(1)`** and **`n - k = Ω(1)`**. Although from empiric results it seems like this approximate gives pretty good results:
[stirling]: https://en.wikipedia.org/wiki/Stirling%27s_approximation
```python
>>> import math
>>> probs = [
(c(n, k) * k**k * (n - k)**(n - k) / (n**n), math.sqrt(n / (2 * math.pi * k * (n - k))), n, k)
for n in range(1, 1024)
for k in range(1, n)
]
>>> max([(approx / actual, n, k) for (actual, approx, n, k) in probs])
(1.1283791670955126, 2, 1)
>>> min([(approx / actual, n, k) for (actual, approx, n, k) in probs])
(1.0002444094121852, 1023, 511)
```
We can see that for all possible parameters with **`n<1024`** probability approximation leads to not more than ~13% greater values. So, we can use this to estimate asymptotic of attempts required for good sequence generation. Given that good sequence generated with probability **`p`** it is well known fact (see [geometric distribution][geom]) that average amount of attempts will be equal to **`1/p`** which is **`√(2π k(n-k)) / √n = O(√k √(n-k) / √n)`** which is **`O(√n)`** in worst case when **`k = n/2`**.
[geom]: https://en.wikipedia.org/wiki/Geometric_distribution