Thursday, April 23, 2009

Creating an Excel Add-In for Market Data - Part 3 - Google Data Provider

Now that we've got our market data proxy layer in place, we need to look at actual market data provider adapters. There are a number of ways we could retrieve market data. The most brute force mechanism would be to just screen-scrape html from websites directly. This is obviously subject to unexpected html format changes on the site, or unpredictable responses when we query for things we haven't seen before - essentially we are working blind without any kind of contractual specification, so we have to way to ensure we can write a truly reliable service using this technique - it would essentialy be trial and error, and best effort to just keep it up to spec with the site.

Luckily, some sites realize that consumers are going to want to get their data in all kinds of ways, and they make it easier to do so. Google is a good starting point, because they essentially try to make all of their data available in one way or another. They even have published APIs so you can bypass the whole html request stage entirely. Unfortunately, after lots of research and examining their code samples, it is clear that the finance data is not really provided by their API yet, so we have to use an alternative method. Note they do have a Google Finance API", but this deals with the financial porfolio management features on Google - essentially if you use Google to maintain your portfolio, you could use their API to get updated data about the stocks in the portfolio. But that's a bit beyond the scope of what we are trying to accomplish right now. There are also some interesting techniques that have been demonstrated using Google Spreadsheets, which apparently can plug in directly to some finance lookup functions. One example shows using Ruby to connect to Google Spreadsheets to call a finance function and return the result. I guess it would be interesting to see if we could get more coverage this way, or to see how that solution performs. There is also some more detail about using Google Spreadsheets to get XML format output and also something called JSON format.

It turns out they have another simple solution - a specific request format that provides limited data in a no-frills, simple delimited text output format. This is much better than having to do all the work of screen scraping, and also gives us a fairly reliable and well-defined contractual interface that we can depend on. And it also involves transferring a lot less data back and forth, because it eliminates all of the images and formatting you would normally be getting with a full html page.

So to get data from Google's simplified market data service, we form a request with a special URL, like this:
string url = "http://www.google.com/finance/info?client=ig&q=" + ticker;

The response is a simple delimited text format. For example, if we query for MSFT, we get something like this:
// [
{
"id": "358464"
,"t" : "MSFT"
,"e" : "NASDAQ"
,"l" : "19.20"
,"l_cur" : "19.20"
,"ltt":"4:00PM EDT"
,"lt" : "Apr 17, 4:00PM EDT"
,"c" : "-0.56"
,"cp" : "-2.83"
,"ccol" : "chr"
,"el": "19.26"
,"el_cur": "19.26"
,"elt" : "Apr 17, 7:59PM EDT"
,"ec" : "+0.06"
,"ecp" : "0.31"
,"eccol" : "chg"
}
]


It's not easy to find a good reference for what these fields are exactly, but we can more or less figure it out by comparing the values we get with what we see on Google Finance. From what I have deciphered, here are what these fields mean:
id : Internal Google Security ID?
t : Ticker
e : Exchange Name
l : Last Price
l_cur : ???
ltt : Last Trade Time
ltt : Last Trade Date/Time
c : Change (in ccy) - formatted with +/-
cp : Change (%)
ccol : ???
el : Last Price in Extended Hours Trading
el_cur : ???
elt : Last Trade Date/Time in Extended Hours Trading
ec : Extended Hours Change (in ccy) - formatted with +/-
ec : Extended Hours Change (%)
eccol : ???


One problem with this method is that it only seems to work for equities. For example, I can't seem to get it working for FX - so we'll have to have a separate provider for other security types, that uses the basic html-scraping method on their main webpage - or maybe we could go back and try some of these other techniques such as the Google Spreadsheets method.

Now, we need to do a bit of mapping to convert between the Bloomberg ticker format and the ticker format for our specific provider - Google in this case. Bloomberg tickers generally have 3 parts - the security ticker, the exchange code (if applicable), and the security type. So for example, "MSFT US Equity", or "GBPUSD Curncy". We need to assume that the requests coming in from Excel are still in Bloomberg ticker format, because we don't want to have to change our spreadsheets at all to work with our new add-in. So we will standardize on Bloomberg ticker format as the convention for incoming requests. Then the first thing we will do is parse this request into a Security object, which has a ticker field, an exchange code, and a security type (enum). So assuming we have built up our security object, here is the code we use to determine the Google ticker for our security:
private string GetGoogleTicker(Security security)
{
switch (security.Type)
{
case SecurityType.Equity:
string exchangeCode = GetGoogleExchangeCode(security);
if (!string.IsNullOrEmpty(exchangeCode))
return exchangeCode + ":" + security.Ticker;
else
return security.Ticker;
case SecurityType.Fx:
return security.Ticker;
default:
throw new MarketDataException(string.Format("The security type [{0}] is not supported by the Google market data provider", security.Type));
}
}

private string GetGoogleExchangeCode(Security security)
{
switch (security.ExchangeCode)
{
case "US":
return null;
case "LN":
return "LON";
case "AU":
return "ASX";
case "IT":
return "BIT";
case "GY":
switch (security.Ticker.ToUpper())
{
case "PUM":
return "FRA";
default:
return "ETR";
}
case "FR":
return "ETA";
default:
throw new MarketDataException(string.Format("The exhange [{0}] is not supported by the Google market data provider", security.ExchangeCode));
}
}


This is obviously incomplete - we've only listed a small set of exchanges to be supported here - so this will need to grow as we find new ones. It would be nice if this could just be mapped automatically, but so far I haven't though of any way to do that. Also it is a bit imprecise - as you can see with the "GY" example. It might just be that Google's data is incorrect or incomplete - but Puma is certainly not a French security - if that is what "FRA" is intended to mean. So who knows why PUM is listed in Google under FRA, while Adidas is listed under ETR, and why other French securities are apparently listed under ETA? For now I'll just do the minimum work here to map securities I'm actually interested in - in reality 90% of what I care about are US securities anyways, so it's not really an issue for me to maintian a few special mappings by hand.

Also note I have provided support for only equities and fx securities. There are obviously many other security types that could be supported - and I haven't taken the time to test them all with Google, so I'm not sure what it could support through either the text request interface or the normal html page request. But this should be enough to get us started.

Finally, note that the set of fields we can get through the text interface is pretty limited. So we may want to get some additional fields by scraping the html page or from another source. But we could do these at a different frequency - for example, we could poll for price updates from the text interface every second or even faster - but we could keep udpates to other fields (like EPS, YesterdayClose, etc.) to once every 30 min or something like that. This will avoid us killing the html request channel to get updates on fields we really don't need updated that frequently. Ultimately it would probably be nice if we could configure the min update frequency we would like on a field by field basis.

3 comments:

Unknown said...

I reckon 'ccol' field in the response from Google identifies a display colour to apply to the 'c' field data. I've seen values like chg and chr, which logically correspond to green and red respectively.

Pity no volume information is returned, but like you say it can be got separately by screen-scraping.

T said...

Hi Geoffrey!

I very much appreciate your efforts. If I understand correctly, one of the goals is to get a (free) workaround/replacement for the BB BDH/BLP. I found this a bit difficult, as BB abuses Excel UDFs via timing component in order to write a whole batch of cells instead of just one (which is the only thing regular functions allow). Have you found a solution, i.e. do you know if it is possible to re-build a BB Emulator, if even only for the simple commands such as name/hi/lo/opn/clse/vol? Or has somebody even done it already?

Unknown said...

ccol & eccol look like they're referring to last trade price delta display color and extended hours price delta display color. Valid return values include chg (green), chr (red), chb (black).

I'd like to find out what the l_fix and l_cur values refer to. They return the same value as "l" when I test.