суббота, 16 апреля 2022 г.

О числах и вычислениях в MUMPS

На комитете MDC проходило обсуждение как различные реализации используют вычисления с дробными числами. Часть реализаций использует стандартное представление IEEE 754, часть нет. Стандарт MUMPS никак не регламентирует внутренности реализации, поэтому допустимо и то и то. Но как результат имеются некоторые расхождения в поведении разных реализаций. Приведу копипаст фрагмента обсуждения. Исследователем выступал David Wicksell.

David Wicksell — 12.04.2022
dlw@gondor:~$ node
Welcome to Node.js v16.14.2.
Type ".help" for more information.
> 0.1 + 0.2
0.30000000000000004
> 
dlw@gondor:~$ python
Python 3.9.7 (default, Sep 10 2021, 14:59:43) 
[GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 0.1 + 0.2
0.30000000000000004
>>> 
dlw@gondor:~$ irb
irb(main):001:0> 0.1 + 0.2
=> 0.30000000000000004
irb(main):002:0> 
dlw@gondor:~$ rsm
RSM [MGR]> w .1+.2
.3
RSM [MGR]> h
dlw@gondor:~$
Floating-point in JavaScript, Python, and Ruby, followed by fixed-point in M.

Rick Marshall — 12.04.2022
Thanks, David. That's a much more useful overview of the article than I was able to write. Much appreciated.

David Wicksell — 12.04.2022
I am hoping to finish reading the paper today, and I might have more to say, or not.

David Wicksell — 12.04.2022
I'm not finished with the paper yet, but so far, the author seems to be an ally to those of us that think that floating-point (IEEE 754) implementations of real numbers have a lot of problems in some domains, and he is proposing a solution.

David Wicksell — 12.04.2022
Early on in the paper, he gives an example of a calculation that is hard to get right, using other methods, but that his algorithms do get right. The square root of 17, squared. That should return 17 directly, but even most calculator apps will get it wrong, or return lots of extra zeroes. Here we can see that YottaDB even gets it wrong, and RSM did as well, in a similar fashion. RSM has a good fixed-point implementation, and it still got this wrong. To me, this is more evidence that this is a good paper, and we should consider it, as @Dave Whitten proposed.
dlw@gondor:~$ yottadb -direct_mode

YDB:WV>w 25**(1/2)
5
YDB:WV>w 25**(1/2)**2
25
YDB:WV>w 17**(1/2)**2
16.9999999999999977
YDB:WV>
 
dlw@gondor:~$ rsm
RSM [MGR]> w 25**(1/2)
5
RSM [MGR]> w 25**(1/2)**2
25
RSM [MGR]> w 17**(1/2)**2
16.99999999999999999982344980477923828004
RSM [MGR]>
However, Cache gets this one right!
Node: gondor, Instance: CACHE

USER>w 17**(1/2)**2
17
Rick Marshall — 12.04.2022
Reading this paper yesterday morning got me thinking about the MVTS and what should be included in the test suite for numeric handling. These examples definitely belong there.

mcglk — 12.04.2022
Well, that Cachй gets it right may be due to forcible rounding. I mean, this thing is 16.999... for 15 places, which I think is M's implementation limit. If the following digit is 5 or greater, Cachй may just round that off—and that's not an uncommon approach.
(Mathematica, on the other hand, uses a simplification approach that does result in the correct answer, but by definition, it has the freedom to rearrange the calculation in any way that still makes sense in order to solve it.)

David Wicksell — 12.04.2022
Decimal precision is configurable in RSM, from 0 to 64 digits of decimal precision. It defaults to 18. I might be able to hit 17 with enough precision. Or if you're correct, I might be able to hit it with less precision.

David Wicksell — 13.04.2022
Yeah, it's kind of all over the place. RSM doesn't support high enough precision right now, and it doesn't employ any tricks (like the paper in review) to get calculations like that correct. But I gave it a shot.
dlw@gondor:~$ rsm
RSM [MGR]> w 17**(1/2)**2
16.99999999999999999982344980477923828004
RSM [MGR]> w ^$j($J,"precision")
18
RSM [MGR]> s ^$j($J,"precision")=64
RSM [MGR]> w ^$j($J,"precision")
64
RSM [MGR]> w 17**(1/2)**2
17.000000000000000000000000000000000000000
  0000000000000000000000000416075260005415
  1914345701281561269003882534833609417640247304361
RSM [MGR]> s ^$j($J,"precision")=0
RSM [MGR]> w ^$j($J,"precision")
0
RSM [MGR]> w 17**(1/2)**2
1
RSM [MGR]> s ^$j($J,"precision")=2
RSM [MGR]> w 17**(1/2)**2
16.999129
RSM [MGR]> s ^$j($J,"precision")=1
RSM [MGR]> w 17**(1/2)**2
17.0569
RSM [MGR]>
(здесь перенос длинной строки с цифрами сделал я)

But it gets really close when I use 64 decimal digits of precision.
I'm actually bothered that my precision for 2 and 1 doesn't actually work correctly. I've tested this before, and never had that happen. They should be 16.99 and 17.0 respectively.
Some sort of bug there.
Actually they are all off. Well that's weird. Some sort of regression. I'll have to fix that A.S.A.P.
Okay, it's probably not a regression, rather a long-standing bug in the power function. I probably never tested its precision. When I tested division, it worked perfectly. So just another issue for my TODO file!
RSM [MGR]> w ^$j($J,"precision")
1
RSM [MGR]> w 22/7
3.1
RSM [MGR]> s ^$j($J,"precision")=64
RSM [MGR]> w 22/7
3.1428571428571428571428571428571428571428571428571428571428571429
RSM [MGR]> s ^$j($J,"precision")=2
RSM [MGR]> w 22/7
3.14
RSM [MGR]>
Here you can see the precision works perfectly.

David Wicksell — 13.04.2022
Cache doesn't always get it right either. So you are probably right that they got it right because of how they round.
Node: andronicus, Instance: CACHE

USER>f i=1:1:25 w i_" : ",i**(1/2)**2,!
1 : 1
2 : 2.000000000000000001
3 : 3.000000000000000002
4 : 4
5 : 5.000000000000000003
6 : 5.999999999999999999
7 : 7.000000000000000003
8 : 8.000000000000000002
9 : 9
10 : 10
11 : 11
12 : 12
13 : 13
14 : 14
15 : 15
16 : 16
17 : 17
18 : 18.00000000000000001
19 : 19
20 : 20
21 : 21
22 : 22
23 : 22.99999999999999998
24 : 23.99999999999999999
25 : 25
But, it's a pretty good algorithm, and seems to be more precise than most of the other M implementations. However, MiniM is our champ here! It seems to have a really good implementation, at least for this particular type of calculation
USER>f i=1:1:25 w i_" : ",i**(1/2)**2,!
1 : 1
2 : 2
3 : 3
4 : 4
5 : 5
6 : 6
7 : 7
8 : 8
9 : 9
10 : 10
11 : 11
12 : 12
13 : 13
14 : 14
15 : 15
16 : 16
17 : 17
18 : 18
19 : 19
20 : 20
21 : 21
22 : 22
23 : 23
24 : 24
25 : 25

USER>w $zv
MiniM for Linux x86-64 64 bit 1.31 release build
USER>
mcglk — 13.04.2022
Hm. I wonder what algorithm it's using for that.

David Wicksell — 13.04.2022
You'll have to ask @Eugene Karataev.
Open Mumps is also pretty good.
> f i=1:1:25 w i_" : ",i**(1/2)**2,!
1 : 1
2 : 2.0000001
3 : 3
4 : 4
5 : 5.0000001
6 : 5.9999998
7 : 6.9999999
8 : 7.9999999
9 : 9
10 : 10
11 : 11
12 : 12
13 : 13
14 : 14
15 : 15
16 : 16
17 : 17
18 : 18
19 : 19
20 : 20
21 : 21
22 : 22
23 : 23
24 : 24
25 : 25
But that's one of nearly infinite types of real number calculations. It would take a lot more testing of all the implementations to really know which ones shine and which ones don't. I'm sure that there are trade-offs here.
They all do a much better job with accuracy than IEEE 754 does.

mcglk — 13.04.2022
Well, again, with this one type of calculation. IEEE 754 has the benefit of being fast, and accurate to a well-defined degree. It just doesn't do arbitrary precision well.

David Wicksell — 13.04.2022
Right. Floating-point is very fast, and accurate for the values it can store directly.

mcglk — 13.04.2022
And, as has been shown time and again, the best way out of the mess is symbolic math—but it's considerably slower. So: tradeoffs.

David Wicksell — 13.04.2022
I have not seen anything in the M standard that sets any requirements for numeric precision and accuracy. An M implementation could totally use IEEE 754 and would still be standard. I think that since banking is a big domain for M, the M implementers kind of gravitated towards more accurate fixed-point algorithms. Since 0.1 + 0.2 better equal 0.3 in banking applications, and that is something IEEE 754 cannot do.

Был рад узнать что MiniM содержит весьма неплохие внутренности )))

Комментариев нет:

Отправить комментарий