Comparing the use of theoretical SD vs empirical SD for calculation methods used by the GCP
 

Most of the analysis done by the gcp project uses methods to give a numerical value of the likelihood that the data produced in a given time was influenced by something anomalous.  The nature of the influence is not known at this time, and it is important to rule out any possible influences that may be inherent to the generator itself.  Comparing all the data generated to theoretical values will give an indication of how well the generators conform to expectation. The random number generators (called eggs) used to create the data are electronic devices, and even when the same components are used to make them, each generator will have its own characteristics.  The result of this is that although the generators will behave well, there will be small differences between them.

Each egg will make 200 decisions per second, with each decision having an equal chance of becoming a one or a zero (a digit representing one or zero is known in the binary number system as a bit).  The number of one bits is summed to give a value that should have a mean of 100, and a standard deviation of 7.071.  To ensure the probability will be exactly 0.5, 100 of the 200 bits are inverted.  The mean value for the eggs has been confirmed by testing 18 months of data generated by the eggs of the gcp.

The theoretical SD of the groups of 200 should be the square root of 50, and to test this value, the SD of each egg was found for a six month period.  To be certain that the SD is not changing with time, three groups of six month periods were tested.  The SD does differ from expected in some eggs, but each egg's SD seems to remain at a stable rate over the 16 months tested.

Some of the eggs had SD's greater than the theoretical value, and some were less than the theoretical value.  I have constructed a table to compare the results of using the theoretical SD vs empirical SD in calculations.  Only eggs that reported data for at least 2/3 of the possible seconds in a given six month period will be used here.  If the egg has met this criteria for more than one of the six month groups, an average value was used.  For the table below, 36 eggs have empirical SD's for the time period used.

The standard method of analysis computes the stouffer z-score for each second.  The method used to find each seconds z-score is to compute (X-100) / SD for each 200 bit trial in a second, sum the z scores for the second, and divide this by the square root of N.  The seconds z-score can then be squared and added to other seconds z-scores, and the resulting sum will have a chi squared distribution.

The monthly z-score is found by adding each seconds squared z-scores together for an entire month.  The result is then computed using the following formulae:
( (Sz2) - DF) / (sqrt(2) * sqrt(DF)).
with DF = number of seconds used.

A second method of analysis adds the z-squares for all groups of 200 in a given period of time.  This is labeled in the table as the z-squares method.  For each 200 bit group, a z-score is found by dividing (result - 100) by the standard deviation. The z-score is then squared and summed with other z squares, increasing the degrees of freedom by one.  The monthly z-score is then found with the formulae:
( (Sz2) - DF) / (sqrt(2) * sqrt(DF)).
 
 
 
MONTH STANDARD METHOD
(theoretical SD)
STANDARD METHOD (empirical SD) Z-SQUARES METHOD (theoretical SD) Z-SQUARES METHOD
(empirical SD)
Jan   2000 1.10 0.75 1.07 -0.69
July  2000 0.74 0.44 0.74 -0.88
Sept  2000 -0.85 -1.12 1.54 0.24
Oct   2000 (1) 1.12 0.80 1.97 0.31
Nov   2000 -1.18 -1.44 2.48 1.37
Dec   2000 2.04 1.73 1.07 -0.38
Jan   2001 1.70 1.36 1.76 -0.07
Feb   2001 0.02 -0.30 0.68 -1.61
Mar   2001 (2) -0.04 -0.38 1.68 -0.11
April 2001 (2) 1.48 1.14 2.75 0.74
May   2001 (2) 1.42 1.06 2.70 0.56
June  2001 (2) -0.43 -0.80 2.72 0.62

Notes: 1. data from egg #1000 is not used in this month
       2. data from egg #28 is not used in these months
 
 

The values in the table are the values that would have resulted if the entire month had been predicted to have been influenced.  The results of the standard method using both the theoretical and empirical SD values for the calculations show that the method will work with theoretical values.

The z-squares method theoretically calculated result and empirical result are not well matched, with the theoretical method producing a biased result.  Only short duration events should be done with the z-squares theoretical method.
 
 
 
 

Empirical Values
 

The following table lists the values of each eggs standard deviation during three periods of six months.
Only periods with eggs that have reported over 10 million trials have been entered,
with most of the eggs reporting about 15 million trials in a six month period.  The average was found by performing the following calculations:

( (SD1*N1) + (SD2*N2) + (SD3*N3) ) / ( N1 + N2 + N3)
 
 
 
EGG ID# empirical SD
Jan00 to June00
number of 200
bit groups
(*106)
Jan00 to Jun00
empirical SD
Jul00 to Dec00
number of 200
bit groups
(*106)
Jul00 to Dec00
empirical SD
Jan01 to Jun01
number of 200
bit groups
(*106)
Jan01 to Jun01
avg SD
1 7.06756 14.7 7.07004 14.3 7.07015 15.0 7.06925
28 7.06999 15.3 7.06785 15.6 7.06891
33 7.07017 13.9 7.07227 12.4 7.07140
37 7.06718 15.2 7.07141 15.9 7.07086 15.4 7.06894
100 7.07187 15.1 7.07307 13.6 7.07244
101 7.07417 14.6 7.07311 15.3 7.07219 14.9 7.07315
102 7.07322 15.7 7.07173 15.9 7.07182 14.0 7.07227
103 7.07236 15.7 7.07284 15.8 7.07012 10.9 7.07196
105 7.07329 14.5 7.07393 13.7 7.07320
106 7.07471 12.6 7.07361 14.9 7.07637 15.5 7.07493
107 7.07391 15.7 7.07342 15.3 7.07250 15.6 7.07327
108 7.07527 15.7 7.07495 15.9 7.07544 15.3 7.07522
109 7.07402 15.4 7.07457 15.9 7.07425 13.9 7.07428
110 7.07558 15.3 7.07355 15.8 7.07280 15.6 7.07396
111 7.07161 13.9 7.07181 15.5 7.07196 15.6 7.07180
112 7.06970 15.1 7.07176 15.8 7.06898 15.3 7.07016
114 7.07246 15.7 7.07282 15.8 7.07269 14.6 7.07266
115 7.07655 15.4 7.07594 15.3 7.07585 14.3 7.07612
116 7.07055 11.8 7.07415 15.5 7.07709 12.8 7.07402
118 7.07162 14.8 7.07349 12.1 7.07354 13.0 7.07282
119 7.07442 15.8 7.07151 14.5 7.07303
134 7.07350 15.6 7.07350
142 7.07254 14.3 7.07254
161 7.07185 15.4 7.07185
1000 7.07078 12.8 7.07078
1005 7.07279 15.7 7.07180 15.3 7.07086 13.8 7.07185
1021 7.06866 15.7 7.07225 15.7 7.07107 15.6 7.07066
1022 7.07188 15.7 7.07011 14.0 7.07151 15.6 7.07120
1024 7.07107 13.7 7.06905 13.3 7.06936 14.9 7.06982
1025 7.06964 15.7 7.07194 12.5 7.07008 15.5 7.07045
1026 7.07271 12.3 7.07284 12.2 7.07278
1027 7.07205 14.6 7.06987 15.6 7.06933 15.1 7.07039
1029 7.06946 15.2 7.06955 15.6 7.06951
2000 7.06898 15.6 7.06898
2002 7.07284 14.6 7.07284
2173 7.07529 15.0 7.07529