Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document and give example for alsw implementation #38

Open
tugtugtug opened this issue Jun 8, 2023 · 4 comments
Open

Document and give example for alsw implementation #38

tugtugtug opened this issue Jun 8, 2023 · 4 comments

Comments

@tugtugtug
Copy link

Hi Nugine,
I'm planning to add a new charset support to simd-base32, and is stuck with the alsw decoder implementation.
Given that the charset is:

const BASE32CROCKFORD_CHARSET: &[u8; 32] = b"0123456789ABCDEFGHJKMNPQRSTVWXYZ";

I have the decode function as:

    #[inline]
    const fn decode(c: u8) -> u8 {
        match c {
            b'0'..=b'9' => c - b'0',
            b'A'..=b'H' => c - b'A' + 10,
            b'J'..=b'K' => c - b'J' + 18,
            b'M'..=b'N' => c - b'M' + 20,
            b'P'..=b'T' => c - b'P' + 22,
            b'V'..=b'Z' => c - b'V' + 27,
            _ => 0xff,
        }
    }

But it is unclear to me how the check_hash function is implemented. Would you please add some documentation to the implementation and the verification of such functions?

Thanks!

@Nugine
Copy link
Owner

Nugine commented Jun 8, 2023

ALSW 利用一个 hash 函数和一个 offset 函数实现 u8 -> u8 的映射,主要用于解码器。特别不规则的映射无法用 ALSW 实现。

ALSW 有两个功能:check 和 decode,各需要两个长度为 16 的查找表,例如 check_hash 函数就是用来生成查找表 CHECK_HASH 的,输入 index,输出 value,另一个查找表 CHECK_OFFSET 根据 CHECK_HASH 自动计算,decode_hash 同理。

ALSW 的 check 功能用于判断输入字符是否有效,当且仅当输出小于 0x80 时,输入有效。
ALSW 的 decode 功能解码输入字符,当输入有效时,输出一定准确。

这是用于 debug 的测试函数,可以在终端打印两张带有颜色的映射表,填写 check_hash 和 decode_hash 函数时可以根据映射表调整数值。

#[cfg(feature = "std")]
#[test]
#[ignore]
fn debug_standard_alsw_check() {
let hash = &StandardAlsw::CHECK_HASH;
let offset = &StandardAlsw::CHECK_OFFSET;
let is_primary = |c: u8| StandardAlsw::decode(c) != 0xff;
vsimd::tools::print_fn_table(is_primary, |c: u8| vsimd::alsw::hash(hash, c));
vsimd::tools::print_fn_table(is_primary, |c: u8| vsimd::alsw::check(hash, offset, c));
}
#[cfg(feature = "std")]
#[test]
#[ignore]
fn debug_standard_alsw_decode() {
let hash = &StandardAlsw::DECODE_HASH;
let offset = &StandardAlsw::DECODE_OFFSET;
let is_primary = |c: u8| StandardAlsw::decode(c) != 0xff;
vsimd::tools::print_fn_table(is_primary, |c: u8| vsimd::alsw::hash(hash, c));
vsimd::tools::print_fn_table(is_primary, |c: u8| vsimd::alsw::decode(hash, offset, c));
}

用 Base32Crockford 举例,其中 check_hash 未经调整

struct Base32CrockfordAlsw;

impl Base32CrockfordAlsw {
    #[inline]
    const fn decode(c: u8) -> u8 {
        match c {
            b'0'..=b'9' => c - b'0',
            b'A'..=b'H' => c - b'A' + 10,
            b'J'..=b'K' => c - b'J' + 18,
            b'M'..=b'N' => c - b'M' + 20,
            b'P'..=b'T' => c - b'P' + 22,
            b'V'..=b'Z' => c - b'V' + 27,
            _ => 0xff,
        }
    }

    #[inline]
    const fn check_hash(i: u8) -> u8 {
        match i {
            0x0 => 1,
            0x1 => 1,
            0x2..=0x7 => 6,
            0x8..=0xA => 1,
            0xB..=0xF => 7,
            _ => unreachable!(),
        }
    }

    #[allow(dead_code)]
    #[inline]
    const fn decode_hash(i: u8) -> u8 {
        Self::check_hash(i)
    }
}

vsimd::impl_alsw!(Base32CrockfordAlsw);
    #[cfg(feature = "std")]
    #[test]
    #[ignore]
    fn debug_crockford_alsw_check() {
        let hash = &Base32CrockfordAlsw::CHECK_HASH;
        let offset = &Base32CrockfordAlsw::CHECK_OFFSET;
        let is_primary = |c: u8| Base32CrockfordAlsw::decode(c) != 0xff;

        vsimd::tools::print_fn_table(is_primary, |c: u8| vsimd::alsw::hash(hash, c));
        vsimd::tools::print_fn_table(is_primary, |c: u8| vsimd::alsw::check(hash, offset, c));
    }

执行命令 (部分由 vscode rust-analyzer 插件生成)

cargo watch -w crates -s 'cargo test --package base32-simd --lib --all-features -- alsw::algorithm::debug_crockford_alsw_check --exact --nocapture --ignored'

红色代表字符集位置,绿色代表小于 0x80,蓝色代表不小于 0x80。
第一张表是 hash 结果,对输入初次变换,第二张表是 check 结果,对 hash 结果加上 offset 处理。
当第二张表中只存在红色和蓝色时,说明成功找到了合适的 check_hash 函数,否则需要手动调整数值分配。

decode 同理,但对第二张表的要求更低,红色位置的数值全部正确即可。

更改函数后保存,上面命令中的 cargo watch 会重新执行测试,自动给出反馈。

目前只能人工调整 check_hash 和 decode_hash,我还没有找到根据字符集生成这两个函数的方法。

@Nugine
Copy link
Owner

Nugine commented Jun 8, 2023

其实初版 base32-simd 计划支持 Base32 Crockford,后来移除了。
Base32 Crockford 与 Base32 RFC4648 的算法不一致,不易复用已有函数,维护负担较大。
所以目前我不会合并相关 PR 进入主线,但你可以在你自己的 fork 中尝试一下。

@tugtugtug
Copy link
Author

tugtugtug commented Jun 9, 2023

thank you @Nugine , that was very informative. I'll give this a try. But afaik, this is basically a brute force process to find the matching hash method. The prints are helpful to identify which of the 16 values can affect the result, but a hand tweaking process may take forever. Is there any technique on how to pick the value of the hash, and is there a typical range of each value?

@Nugine
Copy link
Owner

Nugine commented Jun 10, 2023

先要熟悉一下 ALSW 的计算机制。

可以根据 base64 的数值分配来找找规律

#[cfg(feature = "std")]
#[test]
#[ignore]
fn debug_standard_alsw_check() {
let hash = &StandardAlsw::CHECK_HASH;
let offset = &StandardAlsw::CHECK_OFFSET;
let is_primary = |c: u8| StandardAlsw::decode(c) != 0xff;
vsimd::tools::print_fn_table(is_primary, |c: u8| vsimd::alsw::hash(hash, c));
vsimd::tools::print_fn_table(is_primary, |c: u8| vsimd::alsw::check(hash, offset, c));
}
#[cfg(feature = "std")]
#[test]
#[ignore]
fn debug_standard_alsw_decode() {
let hash = &StandardAlsw::DECODE_HASH;
let offset = &StandardAlsw::DECODE_OFFSET;
let is_primary = |c: u8| StandardAlsw::decode(c) != 0xff;
vsimd::tools::print_fn_table(is_primary, |c: u8| vsimd::alsw::hash(hash, c));
vsimd::tools::print_fn_table(is_primary, |c: u8| vsimd::alsw::decode(hash, offset, c));
}

hash 后的结果通常会将红色区域分割成横向连续的几条,列边界处会垂直分割区域。check 结果是按这样的横条分配的。
check_hash 和 decode_hash 的数值一般在 0 ~ 15 之间,过大会溢出,从而无效。

此外就靠找规律、猜测和数字敏感性了。时间长了我也记不住有什么方法了。

由于 ALSW 的查找表仅由字符集决定,这样的调整过程是一次性的,我觉得一定存在一个确定的算法来计算查找表或者判定无法计算,但还没想到构造方法。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants