"Looking at the raw stream led me to believe that "e002e001" was a packet
delimiter of some sort (pretty close to the truth).
"Then I looked at all the stuff right after the delimiter looking for
length, packet sequence number, packet type identifiers, etc. After
staring at a lot of packets, I found the sequence numbers OK, and I also
found what I thought was the length. But the packet always seemed to be
a bit longer than the length.
"This mystery was solved when I realized "e0" was a special character.
Of course it would have to be in order for "e002e001" to be a delimiter.
If the "e0" is followed by a 0, then it is to be taken as a "e0" in the
data stream. If it is followed by anything else, it is to be interpreted
as a packet delimiter of some sort.
"Once I got rid of all the zeroes following the "e0" characters, the
length field was correct.
"One by one I was knocking off the length and purpose for the fields
in the packets. I noticed that there were time packets which used
BCD encoding for the digits.
"The quote packets were tougher, but it was easy to see that BCD encoding
was used there too. The symbols were simply packed ascii offset by 0x40.
"At that point I had figured out enough to get the quote packets for
NYSE, AMEX, NMS, and mutual funds, but options and other esoteric stuff
eluded me for a while.
"I would display the symbols and values on the stock portfolio screen
to verify my guesses. I had all weekend to work on it since the data
never changed.
"The news articles were a breeze compared to the quote packets.
"A lot of the guesswork was made easier when I realized that fairly
standard data transmission protocol techniques were being used, and
data encoding was done in a way that would minimize bandwidth requirements.
"I figured there was a checksum of some sort over the packet, and I found
where it was kept. But I didn't take the time to figure out the algorithm.
My data feed is fairly free of errors, so I get what I need without
using the checksum.
"It took me 3-4 days to get to the point where I could start accumulating
stock quotes and news articles. What I have isn't fancy, but it works
for me. In the process, I've found that the data stream contains many
glitches - not transmission errors, but database errors. Some of the
stock quotes are hosed, so you have to use a glitch filter to keep the
HLC data reasonable. It's usually the high or low that's munged. Makes
the graphs look real strange. Out of 1700 or so issues each night, I would
say there's an average of two glitches."
--
-Brian Smithson
Motorola Inc., Computer Group, Commercial Systems Division
10700 N. De Anza Boulevard, Cupertino, CA 95014 USA, (408)366-4104
brian@csd.mot.com, {apple | pyramid}!motcsd!brian