Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewrite bytes2hex to be more efficient #14341

Closed
StefanKarpinski opened this issue Dec 9, 2015 · 16 comments
Closed

rewrite bytes2hex to be more efficient #14341

StefanKarpinski opened this issue Dec 9, 2015 · 16 comments
Labels
good first issue Indicates a good issue for first-time contributors to Julia performance Must go faster

Comments

@StefanKarpinski
Copy link
Sponsor Member

I was being really lazy when I wrote this one-liner:

bytes2hex(arr::Vector{UInt8}) = join([hex(i,2) for i in arr])

It's a good intro issue to rewrite this to pre-allocate a byte buffer, go through and fill it with hex characters appropriately and then return that buffer wrapped in a string object.

@StefanKarpinski StefanKarpinski added the good first issue Indicates a good issue for first-time contributors to Julia label Dec 9, 2015
@kshyatt kshyatt added the performance Must go faster label Dec 9, 2015
@ScottPJones
Copy link
Contributor

Would this be better as a method of hex? Currently, there is no method hex(arr::Vector{UInt8})

@StefanKarpinski
Copy link
Sponsor Member Author

Perhaps, but there's a nice symmetry between bytes2hex and hex2bytes.

@StefanKarpinski
Copy link
Sponsor Member Author

While whoever does this is at it, adding tests for hex2bytes and bytes2hex would be good.

@ScottPJones
Copy link
Contributor

Come to think of it, num2hex could also maybe be better as hex(val::AbstractFloat,pad).
About the symmetry, it might be OK to leave those as aliases to hex, but it doesn't seem very julian to have hex, bytes2hex, and num2hex (with some overlap), and not have hex be able to handle all cases.

@mason-bially
Copy link
Contributor

Sort of ties into #14340 doesn't it? About creating a well factored set of names.

@tkelman
Copy link
Contributor

tkelman commented Dec 10, 2015

function names with 2 in them do often seem like artifacts of languages that avoid dealing with types

@StefanKarpinski
Copy link
Sponsor Member Author

I agree about num2hex, but I'm unconvinced about making this a method of hex – what does hex mean then? One natural option would be to vectorize it, in which case it would create an array of strings that are the hex representation of each number. But I agree that it would be nice to rejigger all of this functionality in some way so that we can express all of these things more generally.

@nalimilan
Copy link
Member

map(hex, arr) exists for when you want to create an array of strings. hex(::AbstractArray{UInt8}) should rather return a single string, replacing bytes2hex.

@StefanKarpinski
Copy link
Sponsor Member Author

Ok, fine, but what does hex(::AbstractArray{UInt16}) or hex(::AbstractArray{Int}) do? This is the burden of generic functions – it's not enough to say what one method should do, we have to figure out what the function means.

@StefanKarpinski
Copy link
Sponsor Member Author

One possibility is to say that hex{T<:Unsigned}(::AbstractArray{T}) interprets the array of unsigned values as representing a number as a sequence of digits where each unsigned integer is a digit. The number is then printed as a hexadecimal string. With that meaning, hex(::Vector{UInt8}) would almost do what the current bytes2hex does – the main difference being padding.

@nalimilan
Copy link
Member

I can't really say what definition makes the most sense, I don't see what are the uses of this function.

ScottPJones added a commit to ScottPJones/julia that referenced this issue Dec 14, 2015
Add hex(x::Vector{UInt8}) & hex(x::ByteString)
@ScottPJones
Copy link
Contributor

It is a common thing to want to hex encode data that have control characters etc., or to encode it for display or debugging.
Since Julia does not have a binary string type (even though the b"..." macro makes it look like it does), and Vector{UInt8} has to be used for that, it seems to make sense to have hex handle those binary strings. Actually, I think it should be extended to handle string types as well, for ByteStrings, just outputting the bytes in hex, and for UTF16String and UTF32String, reinterpret the arrays (minus the last word of 0) as UInt8 and output that as hex.
I'm not sure just what the use case for hex(::AbstractArray{UInt16}) or hex(::AbstractArray{Int}) would be - but I think the best generic approach would be take the underlying bytes and output those as hex.

@StefanKarpinski
Copy link
Sponsor Member Author

Saying "I think this method of this function should do this because I find that specific behavior useful" is not sufficient. We need a definition of what hex means for arrays, such that the method for byte arrays is a special case. The hex function for integers prints the integers as hexadecimal strings. What does the hex function for arrays of integers mean? Just wedging this behavior into that function because it's kind of sort of similar is not good enough. I've already proposed a couple of interpretations and some issues.

@mason-bially
Copy link
Contributor

I think it's pretty clear hex is a term used to mean "convert this to display in hex". As some examples:

In a multiple dispatch language most of these names are superfluous, like @tkelman seemed to be saying. I can't conceive of another use for Base.hex(::AbstractArray{UInt16}) or Base.hex(::AbstractArray{Int}) (I include base because someone might use hex to mean other things in their module) besides converting the argument to a form that can be displayed as hex.

Unless you are saying we should pack that functionality up into a narrower protocol (say, get rid of hex altogether and pack the functionality into show/repr etc), to which I would agree (although it seems unlikely). But having bytes2hex, num2hex, and whatever other methods we may need (I note num2hex doesn't support BigInts / BigFloats, is that a different helper method (bignum2hex)? or merely not implemented). And while we're at it we should probably figure out how to pack hex2num into parse (say, parse(Float64, s, 16)).

What I find funny is num2hex(num) and hex(num, 16) are the exact same number of characters and appear to do exactly the same thing for integers. Talk about duplicate functionality.

@StefanKarpinski
Copy link
Sponsor Member Author

Do we have analogous methods for dec? How does dec(v::Vector{UInt8}) display values in decimal? Do we concatenate the base ten representations of all of the individual numbers? Should they be padded to three digits? Or do we consider the entire vector v to encode a single number, base 256, which we then print in decimal? Or do we say this only makes sense for bases that are powers of 2? I'm not saying we shouldn't do this – on the contrary, I think it would be great to generalize functions like hex – I just think we need a clearer picture before we start changing names and adding deprecations.

@ScottPJones
Copy link
Contributor

hex already has an extra method not in dec, I don't think it makes sense for dec to have all of the same methods, however it could be useful for bin to act the same as hex. (do people even use oct any more? I'd move that out to a package, doesn't seem like it's really needed in Base at all. I know that some of the later standards have even removed \ooo support from strings, only \0 is still allowed).

Note: no names were changed in my PR, nor any deprecated, although I do think that some deprecations of some x2y named functions would be good in another PR in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Indicates a good issue for first-time contributors to Julia performance Must go faster
Projects
None yet
Development

No branches or pull requests

6 participants