Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve documentation of numeric literals #11081

Closed
ScottPJones opened this issue May 1, 2015 · 38 comments
Closed

improve documentation of numeric literals #11081

ScottPJones opened this issue May 1, 2015 · 38 comments
Labels
domain:docs This change adds or pertains to documentation

Comments

@ScottPJones
Copy link
Contributor

In #8964, I was pointing out some serious inconsistencies that occur with the default or built-in numeric types in Julia, and how numeric constants are not handled consistently.
A decimal integer literal will start off as Int64 (or I imagine Int32 on 32-bit machines?), then go to Int128, and finally to Base.GMP.BigInt, depending on the value, not it's length (0, 00, 00000000...)
are all Int64, no matter how many leading 0s there are.
A hexadecimal literal, on the other hand, starts off as an unsigned integer, but it is the length, not the actual value, that determines it's type... i.e. 0x0 and 0x00 are both UInt8s, but 0x000 and 0x0000 are UInt16, and 0x00000 - 0x00000000 are UInt32... and so forth, up through UInt64, UInt128, and then, magically, it changes from an unsigned type, to a Base.GMP.BigInt...
This is very apparent if you use the ~ operator on the literal... a frequent thing to do with a mask value.
Floats are even worse - they are always Float64, even if Float64 is not large enough to represent the value (forgetting for the moment that binary floating point is not suitable for exactly representing base-10 literals...].

@StefanKarpinski
Copy link
Sponsor Member

This is all by design.

@StefanKarpinski StefanKarpinski added the status:won't change Indicates that work won't continue on an issue or pull request label May 1, 2015
@mbauman
Copy link
Sponsor Member

mbauman commented May 1, 2015

And it's documented: http://docs.julialang.org/en/latest/manual/integers-and-floating-point-numbers/#integers. Also, see #197.

Int128 and BigInt literals could use a mention, though. Have you read through the manual yet, @ScottPJones?

The floating point issues seem like they should be a separate issue. Please try to keep issues focused on exactly one topic!

@jiahao
Copy link
Member

jiahao commented May 1, 2015

Floats are even worse - they are always Float64, even if Float64 is not large enough to represent the value (forgetting for the moment that binary floating point is not suitable for exactly representing base-10 literals...].

  1. The statement "Float64 is not large enough to represent the value" is always false. The largest numbers can always be rounded in floating point to Inf or -Inf. The issue of exact representation is a separate issue of precision, not dynamic range.
  2. The statement "binary floating point is not suitable for exactly representing base-10 literals" is true in general, but remember that there are also many decimal literals that can be represented exactly in both binary and decimal floats, such as the whole numbers 0.0, 1.0, ..., 1000000.0, ...

I find that the flaws of binary floating point are severely exaggerated. While it is true that some applications do require accurate decimal representations, binary floats are here to stay and they aren't going away. Without support for computing with decimal floating point numbers that runs at speeds comparable to binary floating point numbers, the former are not very useful for practical computations.

@JeffBezanson
Copy link
Sponsor Member

I don't think selecting an integer type based on length in base-10 digits is reasonable...

@StefanKarpinski
Copy link
Sponsor Member

I don't think selecting an integer type based on length in base-10 digits is reasonable...

Agreed, which is why we don't do that. Some people seem not to care for the selection of unsigned integer types based on the number of hex digits, but I generally find it to be really practical and intuitive.

The only thing that seems fishy to me here is that over-long hex numbers probably shouldn't construct BigInts – i.e. 0x000000000000000000000000000000000 should be a syntax error.

@ScottPJones
Copy link
Contributor Author

@mbauman What precisely is off topic? My topic was "Inconsistencies in numeric literal handling". Is 0.0 not a numeric literal?

@ScottPJones
Copy link
Contributor Author

@JeffBezanson My issue is that it is inconsistent. Why do hex numeric literals behave one way (i.e. scaled based on length), decimal integer literals behave another (based on value), and floating point literals a third way (always the same, no matter the number)?
Can you please give a good justification for all of these inconsistencies?
I do understand that it might seem easier to have hex numeric literals be based on the length, but that is not what people from other languages would expect... and it can lead to bugs (I ran into this trying to handle both UInt16 and UInt32 masks for checking for surrogates).
At the very least, this needs a big warning in the documentation, both in the part that @mbauman mentioned, and especially in the part http://docs.julialang.org/en/release-0.3/manual/noteworthy-differences/ .

Literal numbers without a decimal point (such as 42) create integers instead of floating point numbers. Arbitrarily large integer literals are supported. But this means that some operations such as 2^-1 will throw a domain error as the result is not an integer (see the FAQ entry on domain errors for details).

It says nothing about the way hex literals are treated differently, nor that integers that happen to have a .0 or e+xx at the end don't get automatically promoted to BigFloat, in a way consistent with the way
decimal or hex literals get promoted to BigInts.
YES, I have been going through the manual! (that comment seemed almost insulting)
I didn't know when reading the manual that I also had to pore over GitHub issues (like #197) either...

@JeffBezanson and @StefanKarpinski I didn't say that selecting an integer type based on length in base-10 digits is reasonable... just that it is NOT consistent with what is done with hex or floating point literals, one on length, one on value, the other fixed.

@jiahao I don't think I ever exaggerated the flaws of binary floating point numbers... but unless you use a real binary (or octal or hex) for representing floating point literals, you are really dealing with decimal floating point literals, which you are converting to an inexact binary representation, and when you print it out, you are also not representing exactly the binary value... (unless you use around 53 or 54 digits for a Float64).
I also never stated that binary floating point was not better in many cases... it is definitely faster...
but each has its place... ignoring one because your programs don't need it is not very good...
(I had to deal with the opposite case for years... the language I worked on only had decimal floating point... we finally had to add support for IEEE binary floating point, to be able to handle scientific data well...)
With floating point literals, if somebody has a 30 digit literal, why shouldn't that be promoted to a BigFloat, in a consistent fashion with decimal and hex literals?
That's simply the point I'm trying to make here.

@ScottPJones
Copy link
Contributor Author

@StefanKarpinski Your This is all by design. is an argument by authority.

@JeffBezanson
Copy link
Sponsor Member

I don't think it's entirely an argument by authority. It means that these were deliberate decisions, with reasons behind them, not just some mess that happened because nobody was paying attention. It's not always obvious which features have good justifications, and people's initial impressions tend to differ.

@StefanKarpinski
Copy link
Sponsor Member

Yes, signed integer literals, unsigned integer literals, and floating-point literals behave differently, but there are good reasons for these differences, so the inconsistency isn't a bug. Duly noted that you don't care for this particular design decision.

@JeffBezanson
Copy link
Sponsor Member

As for inconsistency, isn't it sometimes the case that different things are different? Integers overflow, floats saturate at Inf. Is that inconsistent?

It seems to me that when somebody writes 0x0f they have a clear intent to work with bytes. If 15 and 0x0f both gave Int64s, that would enable switching between bases for your numeric constants just for the hell of it. Would code that did that be consistent?

@jiahao
Copy link
Member

jiahao commented May 1, 2015

@ScottPJones kindly do not put words in my mouth. I did not say that decimal floats are useless, nor did I ever advocate ignoring the needs of any applications that require decimal floats. My point is that all floats can be read into Float64s, albeit with different levels of precision in the roundoff that produces the final representation. Maybe you'll want more precision, and in that case you can use BigFloats, but you have to pay the performance penalty of working with the latter.

The outputting of floats is a different matter entirely. Algorithms like grisu specialize in producing a decimal literal that can be unambigiously read into the underlying binary representation. I think it is a clever use of the lack of exact conversion between base 2 and base 10.

@mbauman
Copy link
Sponsor Member

mbauman commented May 1, 2015

Of course it's not off-topic; you defined the topic! I just didn't think the topic was very well focused - which is reflected in the responses here. I always find the best response to the issues I create when they are very specific and sharply pointed. I also try to search the manual and existing issues — and if I don't find anything I often say so.

Do you bring up Floating Point numbers because you'd like to see automatic parsing as BigFloats if the precision is over-specified for Float64? Then create an issue "Automatic BigFloat parsing for large and precise numbers," citing integers as precedence. Or do you want Base 10 float support from the parser? That's a little different and has been discussed some before, too (#4278 (comment)).

Each of these topics takes time from the developers. I don't mean to be insulting at all - I just want to make sure that we use all our resources effectively.

@ScottPJones
Copy link
Contributor Author

@StefanKarpinski kindly do not put words in my mouth Did I say whether I cared or not about the design decision? No, I did say that it was confusing to people coming from elsewhere, and at least should be noted prominently in the appropriate section of the documentation, which I had read, and referenced here. An action item of this issue would be to add some appropriate warnings to the documentation... I don't consider myself a particularly good writer, but unless someone else steps up, I'll do so and issue a PR. Once I realized the rather different way that Julia treated hex constants from everybody else, and even from Julia's own integer constants and float constants, I was fine, I fixed the masking operations in my Julia code (just had to be a bit different from my C code). I didn't ever say that this feature should be removed. Everybody should stop being so defensive and taking things personally, IMO.

@ScottPJones
Copy link
Contributor Author

@jiahao I don't think I was putting words in your mouth... here are your very own words:

I find that the flaws of binary floating point are severely exaggerated. While it is true that some applications do require accurate decimal representations, binary floats are here to stay and they aren't going away. Without support for computing with decimal floating point numbers that runs at speeds comparable to binary floating point numbers, the former are not very useful for practical computations.

Saying that they are "not very useful for practical computations" does seem rather strong!
Do you consider financial or healthcare applications to not be doing "practical computations"?

Please take a look at the following:
http://www.sinenomine.net/sites/default/files/Cowlishaw-DecimalArithmetic-Hillgang2008_0.pdf

@StefanKarpinski
Copy link
Sponsor Member

Documentation improvements would be great.

@StefanKarpinski StefanKarpinski added domain:docs This change adds or pertains to documentation and removed status:won't change Indicates that work won't continue on an issue or pull request labels May 1, 2015
@StefanKarpinski StefanKarpinski changed the title Inconsistencies with numeric literal handling improve documentation of numeric literals May 1, 2015
@ScottPJones
Copy link
Contributor Author

@JeffBezanson If @StefanKarpinski had pointed to the discussion, and not just stuck a "won't fix" immediately on it, then it wouldn't have been "argument by authority". What he did, certainly was.

@ScottPJones
Copy link
Contributor Author

@StefanKarpinski There are still the issues I raised about the inconsistencies between the way floats are not promoted, but integers (both decimal and hex) are, so I think it is premature to change the title on me to "improve documentation"

@ScottPJones
Copy link
Contributor Author

To everybody... I don't mean to be a PITA, I am just trying to have a reasoned discussion of some issues that I have found... whether they need to be fixed by better documentation (which is fine for me in the hex "length vs value" case, I wouldn't have run into the problem if there had been something in the documentation section for people coming from other languages), fixed by a syntax error (if that is the best approach for large hex literals... I'm not sure, and that may require separate discussion...), or by changing floating point literals to promote from Float64 to BigFloat in a fashion consistent with integer and hex literal promotion.

@ScottPJones
Copy link
Contributor Author

Another thing that I think causes confusion, is that Julia conflates two separate things... whether a literal is represented as hex or decimal, and whether a number is signed or unsigned...
C/C++/C# have the U suffix for that, totally independent of the 0x for hex instead of decimal...
Is there some issue # for a discussion of that design decision?
(Again, I am not saying that it is necessarily bad, just that it is confusing for people coming from certain other (very common) languages... may just need a bit more documentation, especially in the part that newbies look at first... the section on significant differences)

@jiahao
Copy link
Member

jiahao commented May 1, 2015

@ScottPJones I stand by what I said. The presentation you showed does not change my opinion. I happen to be working right now with teams who are specialists in medical data and in financial data, and all the code they have shown me runs on hardware binary floats. Show me a single application that runs on modern hardware that Julia can run on where decimal floats are supported and are used because binary floats are genuinely harmful, and I will reconsider my position.

@mbauman
Copy link
Sponsor Member

mbauman commented May 1, 2015

Part of the reason you're not getting good answers here that these answers are already out there.

This is a subjective call, but I think it's worked out pretty well. In my experience when you use hex or binary, you're interested in a specific pattern of bits – and you generally want it to be unsigned. When you're just interested a numeric value you use decimal because that's what we're most familiar with. In addition, when you're using hex or binary, the number of digits you use for input is typically significant, whereas in decimal, it isn't. So that's how literals work in Julia: decimal gives you a signed integer of a type that the value fits in, while hex and binary give you an unsigned value whose storage size is determined by the number of digits.

Usually when you use hex notation for numbers, you are trying to create an unsigned value with a particular bit pattern. To facilitate that, it has been proposed that 0x hex integer literals represent unsigned integers of size determined by the number of digits…

  • Why is it tricky to parse a floating point number by its digits? Here's a comment from the thread I linked above that talks about a parse-time floating point type that keeps exactly the digits you wrote:

I don't think that introducing second-class, compile-time-only types is a satisfactory solution – it just complicated matters further. In Go, for example, you get completely different behavior if you operate on a literal than if you assign that literal to a variable and do the same operation on it, which, of course, confuses people. Instead of just explaining to people that computers binary instead of decimal – which is something they're going to need to know about to do numerical computing effectively – you also have to explain that there's this subtle difference between literal numbers runtime numbers. So now people have two confusing things to learn about instead of just one.

I don't think the documentation needs to defend every design decision. It simply must explain the behaviors adequately. You can fall back on secondary sources (GitHub, Google, StackOverflow) for rationales.

@JeffBezanson
Copy link
Sponsor Member

On the topic of floats: I believe we give a syntax error for float literals that don't fit in Float64 (i.e. would overflow). Writing extra decimal digits is another matter entirely: it's a common practice of numerical library developers to put a few extra digits in constants to make sure they get the right value. Switching to BigFloat in those cases would be surprising, especially since it's hard to predict where the cutoff will be.

FWIW I'm not a huge fan of giving BigInts from literals. It makes it that much harder to remove the dependency on a BigInt library.

@ScottPJones
Copy link
Contributor Author

@JeffBezanson If it did give a syntax error, instead of silently truncating, I wouldn't have been bothered... My example was of an integer value, where the literal ended in .0.
I think giving an error in that case would be fine, and consistent with @StefanKarpinski wanting for hex literals to give a syntax error instead of silently moving from unsigned to signed. Consistency is all I've been asking for!

@StefanKarpinski
Copy link
Sponsor Member

julia> 1e100000
ERROR: syntax: overflow in numeric constant "1e100000"

@ScottPJones
Copy link
Contributor Author

@mbauman I didn't say the documentation needed to explain every design decision... just that the documentation section that I'd read particularly thoroughly (about significant differences from other languages) said nothing on these issues. I just think some better warnings about the differences in literals is needed there, to help people from wasting any time trying to figure out why something didn't work as expected... I was just getting errors... I didn't even know at first that the errors were caused by Julia treating hex literals differently from C/C++, so I could not have known to do the search you suggested.

@ScottPJones
Copy link
Contributor Author

@StefanKarpinski Please check the more reasonable case that I was talking about:
julia> 12341234123412341234123412341234123412341234213412341234234.0
1.2341234123412342e58

@StefanKarpinski
Copy link
Sponsor Member

What do you want that to produce? This is how floating-point literals work in basically every language:

Python:
>>> 12341234123412341234123412341234123412341234213412341234234.0
1.2341234123412342e+58

R:
> 12341234123412341234123412341234123412341234213412341234234.0
[1] 1.234123e+58

Ruby:
irb(main):001:0> 12341234123412341234123412341234123412341234213412341234234.0
=> 1.2341234123412342e+58

Perl:
  DB<1> x 12341234123412341234123412341234123412341234213412341234234.0
0  '1.23412341234123e+58'

Same in C, C++, Java, etc.

@ScottPJones
Copy link
Contributor Author

@jiahao That surprises me a bit, about the medical and financial teams you are working with... and I thought Mike Cowlishaw's telephone billing company example was rather compelling.
Also, are you saying that it has to run on a platform with hardware support for decimal arithmetic?
Not to many of those... The majority of the medical data in the US is stored in databases that didn't even support binary floating point until less than 10 years ago (and those customers don't ever use the binary floating point... that was only added for customers wanting to store scientific data). Financial data also... you ever trade with TD/Ameritrade, for example?

@JeffBezanson
Copy link
Sponsor Member

@ScottPJones whenever you have a specific example, it's always good to show it up front --- avoids a lot of confusion!

See my comment about extra precision in float literals. This is what numerical programmers expect. I would call it not "silent truncation" but "expected rounding". Given any float literal in decimal, it must be rounded to a particular binary floating point precision. Writing floating point numbers without excess digits requires a fancy algorithm that human beings can't run in their heads. You would have to fuss with your digit strings until you stopped getting syntax errors.

@ScottPJones
Copy link
Contributor Author

@StefanKarpinski It seemed rather inconsistent to me, the promotion of Int to BigInt, but fixed Float64.
I didn't know those languages would silently truncate also, I guess I would have expected an error...

@ScottPJones
Copy link
Contributor Author

@JeffBezanson Now, that's a better answer. I'd still say that the case with hex literals -> BigInt deserves at least a warning message from the compiler... since it is inconsistent with decimal literals, and is changing from an unsigned to a signed type... isn't that a violation of some type safety rules?

@jiahao
Copy link
Member

jiahao commented May 1, 2015

@ScottPJones the example given in the slides is actually a straw man argument. To quote slides 10-11:

• Add 5% sales tax to a € 0.70 telephone
call, rounded to the nearest cent
1.05 x 0.70 using binary double is exactly
0.73499999999999998667732370449812151491641998291015625
(should have been 0.735)
• rounds to € 0.73, instead of € 0.74
Hence…
• Binary floating-point cannot be used for
commercial or human-centric applications
– cannot meet legal and financial requirements

First, "rounded to the nearest cent" is ambiguous in IEEE-754/854, even for decimal floats. The rounding operation is defined only in the context of a global rounding mode, which is specified in these standards. Julia even lets you specify the rounding mode. It turns out that in this example, only one of these rounding modes produces the "erroneous" result of 73.0 instead of 74.0:

julia> round(1.05*0.70*100) #default RoundNearest rounding mode recommended by IEEE
74.0

julia> round(1.05*0.70*100, RoundUp)
74.0

julia> round(1.05*0.70*100, RoundDown)
73.0

julia> round(1.05*0.70*100, RoundNearestTiesAway)
74.0

julia> round(1.05*0.70*100, RoundNearestTiesUp)
74.0

Second, anyone who does any serious computations knows to avoid premature rounding. Computations should only be rounded off at the very end, and never in the middle. This is why IEEE-754 takes pains to define all operations as if they were done in infinite precision, and only then apply a rounding operation to make the result float-representable.

In the financial and medical projects I work on, the sensitive numbers like dosage and prices are all stored in proprietary binary formats, presumably to preserve representability. However the analytics all use double precision floats internally. To my knowledge, no one does anything more complicated than elementary arithmetic with decimal floats. The nice thing about Julia's type system is that we actually now have the ability to do more sophisticated things (like linear algebra) on decimal floats, now that there is a package that implements the latter.

@ScottPJones
Copy link
Contributor Author

@jiahao Not sure why you think that is a straw man argument, and why do you think that that calculation is being rounded "in the middle"? There is one multiplication, followed by a round to 2 decimal places...
and that is the final result.
About the global rounding mode - that's something from IEEE standards that didn't even exist until 20 years after the language used for most medical applications came about... rounding up is all that people ever used. Can you say just what systems you are talking about, when you say the dosage and prices are stored in proprietary binary formats for representability? I could hazard a guess as to exactly what that "proprietary format" is... ;-)
I also said nothing about whether analytics would be best done using decimal or binary arithmetic... I suppose that it might be different for each calculation... (are you doing a sum of monetary amounts?
or are you doing some fancy statistics?)
I found this interesting tidbit, although I can't personally attest to its veracity:

In 1976 Intel began planning to produce a floating point coprocessor. Dr John Palmer, the manager of the effort, persuaded them that they should try to develop a standard for all their floating point operations. William Kahan was hired as a consultant; he had helped improve the accuracy of Hewlett Packard's calculators. Kahan initially recommended that the floating point base be decimal[8] but the hardware design of the coprocessor was too far advanced to make that change.

It is not true at all that no one does anything more complicated than elementary arithmetic with decimal floats... athough it is true that that is the vast majority of the computations are just elementary arithmetic... for those cases, the decimal arithmetic is often as fast as binary floating point... and back in the day when there was no hardware binary floating support, was actually much faster.
An example... I have a bunch of numbers that happen to represent money... and I want to add them up... most all of them probably already have the same scale (for US money, -2), so all you have to do is add the 8-byte integer value... just a single 64-bit integer addition. Let's say you are going to multiply by 100... that is just adding 2 to the scale... divide by 100, subtract 2... output a number? If the scale S is positive, simply output the integer the S 0's. All simple fairly fast stuff.
There are a lot of other examples... like using a number as a subscript in an index... maybe to quickly find all of the patients who are getting 0.02g of lisinopril...

I am very glad that Steve Johnson is doing the DecFP package, it is pretty much a necessity for what I'm working on... I would have had to do it myself (I was already starting to do so, using Cowlishaw's decNumber package).

@jiahao
Copy link
Member

jiahao commented May 1, 2015

Not sure why you think that is a straw man argument, and why do you think that that calculation is being rounded "in the middle"?

The title of Slide 10 is "Where it costs real money…" The implication is very clear that the example is meant to be used in the context of additional computations. (Otherwise, why would 1 cent be so onerous?) Legal and financial uses aside, summation on the rounded numbers should be avoided for accurate computations.

The argument is a straw man because "wrong rounding" is not a good reason to reject binary floats. Did you notice that only the non-standard "RoundDown" mode produces the "wrong" answer in the example given?

About the global rounding mode - that's something from IEEE standards that didn't even exist until 20 years after the language used for most medical applications came about... rounding up is all that people ever used.

Wrong. Just look at the Wikipedia article on the history of rounding:

The Round-to-even method has served as the ASTM (E-29) standard since 1940. The origin of the terms unbiased rounding and statistician's rounding are fairly self-explanatory. In the 1906 4th edition of Probability and Theory of Errors [16] Robert Simpson Woodward called this "the computer's rule" indicating that it was then in common use by human computers who calculated mathematical tables. Churchill Eisenhart indicated the practice was already "well established" in data analysis by the 1940s.

...

Until the 1980s, the rounding method used in floating-point computer arithmetic was usually fixed by the hardware, poorly documented, inconsistent, and different for each brand and model of computer.

While it is true that IEEE 754 was the first time a standardized interface was provided for different rounding modes on machines, the claim that "rounding up is all that people ever used" is simply false. Human computers consistently used "round to even" for statistical purposes, and there are machines that used "round to even" as the default rounding mode, even pre-IEEE 754-1985. An example is the machine used by the National Physical Laboratory of the UK in the 1960s, whose rounding behavior was documented by J. H. Wilkinson, "Rounding errors in algebraic processes", 1963, p. 4 as:
screen shot 2015-05-01 at 6 54 45 pm

ScottPJones added a commit to ScottPJones/julia that referenced this issue May 2, 2015
@ScottPJones
Copy link
Contributor Author

@jiahao Why would you think there would be correct to have additional calculations on the intermediate, unrounded numbers? That is totally incorrect! If the telephone company bills me $0.74, then they'd better not be adding up $0.0349999 when trying to calculate the amount of tax charged that they have to pay the government! How can you say "legal and financial issues aside"? That's kind of the whole point... you have to sum up the rounded numbers, because the rounded numbers are the real value charged the customer... not $0.7349999 or even $0.735...

What do you mean "non-standard"? I was taught in school, that from 0-4, you round down, 5-9, round up. That is also what is done when those tax payments are rounded... and do you really want to use something from Wiki as definitive? That statement about "Until the 1980's" is totally wrong in my experience... first off, for this sort of thing, people used fixed point, or decimal floating point, and for those, I've never seen anything but 5 rounding up as standard (and I don't mean "de facto" standard, I mean, standard as defined in ANSI standard computer languages... look up Cobol and Mumps, which were ANSI standards along with FORTRAN back in the 70's), secondly, even for binary floating point, most platforms did not even have floating point hardware, and even multiplication and division of integers were very costly operations back then). People who wanted consistent results used decimal arithmetic, because the software floating point libraries (and hardware on big machines) had different formats and each generated different results... It was a big mess back then to try to use binary floating point... The IEEE standardization in 1985 (which didn't get used everywhere until a good deal later... people were still using machines with IBM or DEC or xxx's floating point for years afterwards) was a great step forward... probably hardly anybody except some scientists would be using binary floating point if that hadn't happened...

I'm still interested in what systems those financial and healthcare teams you are working with are using... especially for the healthcare teams, it is highly likely that the decimal arithmetic they are using is something I architected and implemented most of (a good friend, now at Partner's Healthcare, also worked on the implementation for IBM VM/CMS with me, I did the design and Intel implementation) back in 1986.

@jiahao
Copy link
Member

jiahao commented May 3, 2015

first off, for this sort of thing, people used fixed point, or decimal floating point, and for those, I've never seen anything but 5 rounding up as standard (and I don't mean "de facto" standard, I mean, standard as defined in ANSI standard computer languages... look up Cobol and Mumps, which were ANSI standards along with FORTRAN back in the 70's), secondly, even for binary floating point, most platforms did not even have floating point hardware, and even multiplication and division of integers were very costly operations back then).

I'm not disputing the fact that people use round up rounding, and that for business applications that might even be preferable. My point is simply that you have overstated the case that everyone used round up rounding pre-IEEE 754. There are people who have had experiences different from yours, and I have already provided evidence that proves it.

You question my choice of citing Wikipedia. Fine. But you happily sidestep the fact that I have quoted you evidence from the primary literature on floating point computations that state very clearly a description of machines that do not use round up rounding. And it is not just any source: Wilkinson quite literally wrote one of the earliest books on floating point computations. Your claim that no one used anything other than "round up" is patently, demonstrably false, even though it may be indeed true that in your personal experience you have not encountered pre-standardization machines that did anything else. There are generations of statisticians who use floating point who know that rounding up introduces systematic errors into their computations that they should avoid.

What do you mean "non-standard"

In the context I used it, "non-standard" refers to the round down rounding mode.

You should reread the example I posted. I thought I had made myself very clear:

julia> round(1.05*0.70*100) #default RoundNearest rounding mode recommended by IEEE
74.0

julia> round(1.05*0.70*100, RoundUp)
74.0

julia> round(1.05*0.70*100, RoundDown)
73.0

julia> round(1.05*0.70*100, RoundNearestTiesAway)
74.0

julia> round(1.05*0.70*100, RoundNearestTiesUp)
74.0

The point is that

  1. Julia lets you use all the rounding modes specified in IEEE.
  2. The default rounding mode is not round down.
  3. Only in round down did I observe the example computation you gave to provide the "wrong" 73 cent answer.

Therefore, I don't see that the example given on the slides presents the "dangers" of rounding correctly. While I can see that there can be problems, the slide deck happened to pick an example that doesn't work to illustrate the point.

@ScottPJones
Copy link
Contributor Author

@jiahao Sorry, but your examples aren't the example Mike Cowlishaw was talking about, and the default in julia has the problem...

julia> round(.7*.05*100)
3.0

Decimal floating point would give the expected answer (the one the government will insist you pay, i.e.
4 cents, not 3...
I would say that the example VERY clearly shows the danger of using binary floating point for data like currency (which is why databases all have decimal number support, and binary floating point was generally added much later...)

Also, I didn't say that nobody used different rounding modes for binary floating point... as I said, until the mid 1980's, binary floating point was a mess, and for some years afterwards, because there were still a lot of platforms that used their own proprietary software libraries or hardware...
I was talking about facts, which you could check, about the standards available in computer languages prior to 1985 with the advent of IEEE standard... and even the IEEE standard apparently recommends rounding in the way I have stated all along!

Until the 1980s, the rounding method used in floating-point computer arithmetic was usually fixed by the hardware, poorly documented, inconsistent, and different for each brand and model of computer. This situation changed after the IEEE 754 floating point standard was adopted by most computer manufacturers. The standard allows the user to choose among several rounding modes, and in each case specifies precisely how the results should be rounded. These features made numerical computations more predictable and machine-independent, and made possible the efficient and consistent implementation of interval arithmetic.

Looking at the rounding names in your example, I see that what I had always heard called "rounding up", is called RoundNearest, which is also the default in Julia (on the Internet, I saw that what I'm used to is also called "rounding half up"...). It is also called "banker's rounding"... I've even heard that that method was found used on tables dating back to the time of the Summerians, with a base 60 system...
The "statistician's rounding" is comparatively recent, and is not the norm for computer languages
(even in Julia, you'd have to select it...).

I never saw any other rounding mode than 5-9 rounding up in the ANSI standard languages of the time, or the calculators of the time either (which were all decimal arithmetic either).
Please, if you have a reference that shows otherwise, please show it!

Here is another example: from Julia:

julia> x = BigFloat(1.05)
1.0500000000000000444089209850062616169452667236328125e+00 with 256 bits of precision
julia> x * .7 *100
7.349999999999999844568776552478064619168935941990020956785867930344258169839122e+01 with 256 bits of precision
julia> round(x * .7 * 100)
7.3e+01 with 256 bits of precision

I see the exact same error here, that was shown on Cowlishaw's presentation.
I also am very surprised that you seem to doubt his presentation... he was a major contributor to the IEEE 754 standard... I'd say he is probably the best known person in the world on the subject!

BTW, I am still curious what system those healthcare teams you are working with use? Is that a secret, or do you not know?

@ScottPJones ScottPJones mentioned this issue May 5, 2015
jakebolewski added a commit that referenced this issue May 6, 2015
#11081 Add C/C++ section to Noteworthy differences documentation
mbauman pushed a commit to mbauman/julia that referenced this issue Jun 6, 2015
tkelman pushed a commit to tkelman/julia that referenced this issue Jun 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:docs This change adds or pertains to documentation
Projects
None yet
Development

No branches or pull requests

5 participants