Conditional Distribution

A professor in some computing class I took emphasized the difficulty of creating random numbers and suggested statistics that might show their weaknesses.

We were challenged to concoct an algorithm that would produce random results.

We were warned that a bunch of random operations were not likely to produce random results.

Frequency distributions were discussed.

I programmed a test that would tally frequency in two dimensions, current sample vs. previous sample.

I remember only that I was pleased to get the program to work, fill the page, and label the result. I probably tested the built-in generator and found it to be reasonably good.

I've reimplemented this program in coffee-script.

dist = [] size = 10 rand = -> Math.floor Math.random()*size for i in [1..size] dist.push (0 for j in [1..size]) samp = {x:rand()} for [1..90000] samp = {y:samp.x, x:rand()} dist[samp.y][samp.x]++ console.log dist

I've shown that each run produces similar but distinct results. Redefining rand to be, say, random()*random() produces appropriately non-uniform distributions.

[ [ 887, 864, 939, 841, 909, 899, 889, 843, 937, 892 ], [ 895, 922, 873, 883, 934, 902, 911, 942, 873, 871 ], [ 920, 923, 891, 909, 915, 892, 947, 837, 943, 930 ], [ 872, 896, 876, 887, 920, 860, 860, 875, 890, 933 ], [ 872, 898, 917, 907, 891, 910, 899, 930, 897, 918 ], [ 929, 853, 934, 891, 907, 901, 906, 877, 880, 931 ], [ 897, 903, 946, 901, 876, 910, 923, 868, 919, 902 ], [ 869, 884, 874, 879, 908, 854, 879, 918, 901, 946 ], [ 889, 870, 945, 888, 880, 940, 916, 923, 908, 866 ], [ 870, 993, 912, 884, 899, 941, 914, 899, 877, 899 ] ]

I tried making random numbers recently. I knew a floating point multiply can work ok so I tried repeatedly multiplying by pi, preserving the fraction part each time.

r = r0 = Math.PI frac = (q) -> q - Math.floor(q) rand = -> r = frac r*r0 r = frac r*r0 r = frac r*r0 Math.floor (r)*size

I found that a few iterations produced a reasonably smooth surface but not a uniform one. The distribution favored samples nearer [0,0]. I used a shorter sequence so the peak was comparable to the previous average.

[ [ 860, 611, 617, 619, 631, 608, 639, 659, 651, 624 ], [ 695, 703, 558, 543, 532, 583, 531, 508, 543, 572 ], [ 632, 528, 508, 480, 527, 478, 462, 540, 483, 487 ], [ 804, 691, 621, 665, 544, 415, 356, 319, 356, 315 ], [ 570, 538, 457, 450, 421, 443, 483, 461, 433, 447 ], [ 540, 497, 432, 443, 367, 419, 467, 462, 421, 446 ], [ 579, 521, 463, 434, 447, 435, 470, 466, 447, 375 ], [ 580, 532, 437, 423, 461, 428, 486, 488, 417, 422 ], [ 568, 498, 451, 449, 416, 395, 432, 482, 410, 401 ], [ 691, 649, 580, 580, 358, 290, 311, 289, 341, 403 ] ]

A little research shows that the method is known as the Linear Congruential Generator which has known limitations: if used to choose points in an n-dimensional space, the points will lie on, at most, m1/n hyperplanes This is due to serial correlation between successive values of the sequence Xn as shown below when we compute with only one iteration. wikipedia

[ [ 2101, 2132, 2065, 300, 0, 0, 0, 0, 0, 0 ], [ 0, 0, 0, 1823, 1857, 1659, 459, 0, 0, 0 ], [ 0, 0, 0, 0, 0, 0, 1217, 1641, 1622, 678 ], [ 1531, 1653, 964, 0, 0, 0, 0, 0, 0, 968 ], [ 0, 0, 699, 1564, 1567, 1066, 0, 0, 0, 0 ], [ 0, 0, 0, 0, 0, 418, 1378, 1477, 1250, 0 ], [ 1463, 1436, 0, 0, 0, 0, 0, 0, 189, 1426 ], [ 0, 14, 1430, 1429, 1472, 163, 0, 0, 0, 0 ], [ 0, 0, 0, 0, 0, 1217, 1460, 1390, 345, 0 ], [ 1503, 563, 0, 0, 0, 0, 0, 0, 1006, 1405 ] ]