Skip to content

wesdoyle/hyperloglog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HyperLogLog exploration

This repo explores implementations of the HyperLogLog algorithm for estimating the cardinality of a dataset.

These are basic implementations for learning purposes.

Example using text file on disk

> python3 ./py-version/hyperloglog.py ./data/complete_works_of_shakespeare.txt 16
Comparison of HyperLogLog vs Exact Counting
----------------------------------------------------------
Method         Count     Error (%)
----------------------------------------------------------
HyperLogLog    25838     0.37
Exact          25934     N/A
----------------------------------------------------------

Example using sqlite database on disk

This example uses the Wikibooks dataset (license is CC BY-SA 4.0)

Python Version

> python3 ./py-version/hll_sqlite.py ./data/wikibooks.sqlite en body_text 20

Rust Version

cargo run --release ../../data/wikibooks.sqlite en body_text 22

Comparison of HyperLogLog vs Exact Counting (SQLite)
----------------------------------------------------------
Method          Count      Time (s)   Error (%)
----------------------------------------------------------
HyperLogLog     1361262    19.68      0.01
Exact           1361142    12.87      N/A
----------------------------------------------------------

Releases

No releases published

Packages

No packages published