Data collected by an egg basket are stored in a compact binary file format consisting essentially of all the packets received from eggs reporting to the basket, in the order received. Since some eggs may report only sporadically to a basket (due to network outages, egg machine downtime, or the egg not having a permanent connection to the Internet), the order of packets in a basket data file may be entirely different from the time sequence in which they were collected (although the basket software does guarantee that packets are always stored in the file containing data for the day in which the egg made the samples they contain).
Analysis of the collected basket data will often require examination of the samples performed by various eggs at specific time intervals. The basketran program, which can be run stand-alone or accessed through the Web-based extract request form, processes raw basket data files and produces a Comma Separated Value (CSV) file containing a table of time-aligned samples for a given interval of time for all eggs which supplied data during that period. A CSV file can be easily read by any programming language, and can be directly loaded into most spreadsheet packages. The following table describes the format of this CSV file. The column Variable gives the name into which the value from header records is read by Perl language utilities for reading files in this format, which are available for downloading.
Each basket data CSV file consists of a series of records, each identified by a type code in the first field of the record. Header records have unique item codes in the second field which identifies the value the remaining field(s) supply. Header records contain string comments which provide a primate-readable description of the value field.
Field 1: Type | Field 2: Item | Field 3: Value | Field 4: Comment | Variable |
---|---|---|---|---|
10 | 1 | Samples per record | "Samples per record" | $samp_rec |
10 | 2 | Seconds per record | "Seconds per record" | $sec_rec |
10 | 3 | Records per packet | "Records per packet" | $rec_pkt |
10 | 4 | Trials per sample | "Trial size" | $trialsz |
Field 1: Type | Field 2: Item | Field 3: Value | Field 4: Comment | Variable |
---|---|---|---|---|
11 | 1 | Number of eggs | "Eggs reporting" | $numEggs |
11 | 2 | Start time | "Start time" | $startTime |
11 | 3 | End time | "End time" | $endTime |
11 | 4 | Seconds of data | "Seconds of data" | $tableSeconds |
The date and time in the Start time (item 2) and End time (item 3) records is given as a decimal Unix time value (number of seconds since 1970-01-01 00:00:00 UTC not counting leap seconds). These records contain an additional fifth field which shows this time and date value as yyyy-mm-dd hh:mm:ss to make it easier to identify the file.
Field 1: Type | Field 2: Comment | Field 3: Comment | Fields 4-n: Egg IDs | Variable |
---|---|---|---|---|
12 | "gmtime" | "Date/Time" | Egg IDs for columns | @eggNumbers |
This record provides the egg IDs for the columns in the type 13 egg data records which follow. It also provides comments for the time fields in the data record. If expansion of the Unix time values in the data records has been suppressed, the label for that column in field 3 will be void.
The actual egg sample data appears in type 13 records, one for each second in the interval which the file spans (hence, the number of type 13 records is equal to the Seconds of data value provided in the type 11, item 4 record).
Field 1: Type | Field 2: Unix time | Field 3: Civil time | Fields 4-n: Egg sample data |
---|---|---|---|
13 | time() value | yyyy-mm-dd hh:mm:ss | Egg trial data for columns |
The sample data consists of the number of one bits observed by the egg in a trial of Trials per sample (given by record type 10, item 4) made during the second indicated by field 2. Field 3 gives the date and time corresponding to the Unix time value in field 2 unless expansion of date and time has been suppressed, in which case field 3 will be void. (If you're going to process the CSV file with a program which understands Unix time values, suppressing the civil date and time expansion makes for smaller files which are quicker to generate and read.) The remainder of the record gives the results observed for each egg during that second, arranged in columns which correspond to the egg ID numbers given in corresponding fields of the type 12 record.
Missing data (where the basket has received no sample from a given egg for a given second) is represented by a void field. Be careful that your analysis program does not confuse a void field with a trial result of zero which is a possible (albeit improbable in the extreme) result. Note that missing data in the CSV file denotes a sample not received by the basket at the time the CSV file was generated. If an egg has actually collected a sample for a given second, but not yet reported it to the basket, the sample will appear in a CSV file generated after the basket receives the data from the egg. One record appears in the CSV file for every second in the interval it covers, even if data are missing for all eggs for that second.
The following is a CSV file representing 30 seconds of data from the time 14:00:00 to 14:00:29 UTC on 1998-09-16 (September 16th, 1998). The file was generated with the option to expand Unix time values into a more readable form. The actual data collected for this interval are complete; I have artificially introduced some missing data in several records to indicate how it appears.
10,1,10,"Samples per record" 10,2,10,"Seconds per record" 10,3,6,"Records per packet" 10,4,200,"Trial size" 11,1,6,"Eggs reporting" 11,2,905954400,"Start time",1998-09-16 14:00:00 11,3,905954429,"End time",1998-09-16 14:00:29 11,4,30,"Seconds of data" 12,"gmtime","Date/Time",1,28,33,37,1000,1003 13,905954400,1998-09-16 14:00:00,112,104,100,109,95,100 13,905954401,1998-09-16 14:00:01,98,96,86,100,92,92 13,905954402,1998-09-16 14:00:02,109,96,,109,87,101 13,905954403,1998-09-16 14:00:03,90,86,100,98,96,100 13,905954404,1998-09-16 14:00:04,102,99,105,98,112,103 13,905954405,1998-09-16 14:00:05,99,92,103,90,89,93 13,905954406,1998-09-16 14:00:06,120,97,96,95,105,96 13,905954407,1998-09-16 14:00:07,111,102,98,106,90, 13,905954408,1998-09-16 14:00:08,97,103,104,99,88,98 13,905954409,1998-09-16 14:00:09,102,103,100,89,, 13,905954410,1998-09-16 14:00:10,112,86,84,95,90,105 13,905954411,1998-09-16 14:00:11,98,107,101,107,92,86 13,905954412,1998-09-16 14:00:12,,87,108,103,95,91 13,905954413,1998-09-16 14:00:13,91,99,90,101,91,89 13,905954414,1998-09-16 14:00:14,108,102,101,85,102,97 13,905954415,1998-09-16 14:00:15,95,97,111,102,101,115 13,905954416,1998-09-16 14:00:16,,,,,, 13,905954417,1998-09-16 14:00:17,105,99,99,118,102,91 13,905954418,1998-09-16 14:00:18,108,110,100,91,95,110 13,905954419,1998-09-16 14:00:19,93,113,93,102,99,95 13,905954420,1998-09-16 14:00:20,91,92,94,104,92,117 13,905954421,1998-09-16 14:00:21,103,94,102,105,96,73 13,905954422,1998-09-16 14:00:22,105,113,108,104,105,94 13,905954423,1998-09-16 14:00:23,101,96,104,99,102,98 13,905954424,1998-09-16 14:00:24,91,120,108,104,118,111 13,905954425,1998-09-16 14:00:25,100,,,,99,101 13,905954426,1998-09-16 14:00:26,100,105,105,102,102,100 13,905954427,1998-09-16 14:00:27,101,103,114,97,83,83 13,905954428,1998-09-16 14:00:28,106,109,109,106,98,93 13,905954429,1998-09-16 14:00:29,99,96,105,102,104,108Data function and description by John Walker, September 24th, 1998