Thursday, April 23, 2009

Creating an Excel Add-In for Market Data - Part 3 - Google Data Provider

Now that we've got our market data proxy layer in place, we need to look at actual market data provider adapters. There are a number of ways we could retrieve market data. The most brute force mechanism would be to just screen-scrape html from websites directly. This is obviously subject to unexpected html format changes on the site, or unpredictable responses when we query for things we haven't seen before - essentially we are working blind without any kind of contractual specification, so we have to way to ensure we can write a truly reliable service using this technique - it would essentialy be trial and error, and best effort to just keep it up to spec with the site.

Luckily, some sites realize that consumers are going to want to get their data in all kinds of ways, and they make it easier to do so. Google is a good starting point, because they essentially try to make all of their data available in one way or another. They even have published APIs so you can bypass the whole html request stage entirely. Unfortunately, after lots of research and examining their code samples, it is clear that the finance data is not really provided by their API yet, so we have to use an alternative method. Note they do have a Google Finance API", but this deals with the financial porfolio management features on Google - essentially if you use Google to maintain your portfolio, you could use their API to get updated data about the stocks in the portfolio. But that's a bit beyond the scope of what we are trying to accomplish right now. There are also some interesting techniques that have been demonstrated using Google Spreadsheets, which apparently can plug in directly to some finance lookup functions. One example shows using Ruby to connect to Google Spreadsheets to call a finance function and return the result. I guess it would be interesting to see if we could get more coverage this way, or to see how that solution performs. There is also some more detail about using Google Spreadsheets to get XML format output and also something called JSON format.

It turns out they have another simple solution - a specific request format that provides limited data in a no-frills, simple delimited text output format. This is much better than having to do all the work of screen scraping, and also gives us a fairly reliable and well-defined contractual interface that we can depend on. And it also involves transferring a lot less data back and forth, because it eliminates all of the images and formatting you would normally be getting with a full html page.

So to get data from Google's simplified market data service, we form a request with a special URL, like this:
string url = "http://www.google.com/finance/info?client=ig&q=" + ticker;

The response is a simple delimited text format. For example, if we query for MSFT, we get something like this:
// [
{
"id": "358464"
,"t" : "MSFT"
,"e" : "NASDAQ"
,"l" : "19.20"
,"l_cur" : "19.20"
,"ltt":"4:00PM EDT"
,"lt" : "Apr 17, 4:00PM EDT"
,"c" : "-0.56"
,"cp" : "-2.83"
,"ccol" : "chr"
,"el": "19.26"
,"el_cur": "19.26"
,"elt" : "Apr 17, 7:59PM EDT"
,"ec" : "+0.06"
,"ecp" : "0.31"
,"eccol" : "chg"
}
]


It's not easy to find a good reference for what these fields are exactly, but we can more or less figure it out by comparing the values we get with what we see on Google Finance. From what I have deciphered, here are what these fields mean:
id : Internal Google Security ID?
t : Ticker
e : Exchange Name
l : Last Price
l_cur : ???
ltt : Last Trade Time
ltt : Last Trade Date/Time
c : Change (in ccy) - formatted with +/-
cp : Change (%)
ccol : ???
el : Last Price in Extended Hours Trading
el_cur : ???
elt : Last Trade Date/Time in Extended Hours Trading
ec : Extended Hours Change (in ccy) - formatted with +/-
ec : Extended Hours Change (%)
eccol : ???


One problem with this method is that it only seems to work for equities. For example, I can't seem to get it working for FX - so we'll have to have a separate provider for other security types, that uses the basic html-scraping method on their main webpage - or maybe we could go back and try some of these other techniques such as the Google Spreadsheets method.

Now, we need to do a bit of mapping to convert between the Bloomberg ticker format and the ticker format for our specific provider - Google in this case. Bloomberg tickers generally have 3 parts - the security ticker, the exchange code (if applicable), and the security type. So for example, "MSFT US Equity", or "GBPUSD Curncy". We need to assume that the requests coming in from Excel are still in Bloomberg ticker format, because we don't want to have to change our spreadsheets at all to work with our new add-in. So we will standardize on Bloomberg ticker format as the convention for incoming requests. Then the first thing we will do is parse this request into a Security object, which has a ticker field, an exchange code, and a security type (enum). So assuming we have built up our security object, here is the code we use to determine the Google ticker for our security:
private string GetGoogleTicker(Security security)
{
switch (security.Type)
{
case SecurityType.Equity:
string exchangeCode = GetGoogleExchangeCode(security);
if (!string.IsNullOrEmpty(exchangeCode))
return exchangeCode + ":" + security.Ticker;
else
return security.Ticker;
case SecurityType.Fx:
return security.Ticker;
default:
throw new MarketDataException(string.Format("The security type [{0}] is not supported by the Google market data provider", security.Type));
}
}

private string GetGoogleExchangeCode(Security security)
{
switch (security.ExchangeCode)
{
case "US":
return null;
case "LN":
return "LON";
case "AU":
return "ASX";
case "IT":
return "BIT";
case "GY":
switch (security.Ticker.ToUpper())
{
case "PUM":
return "FRA";
default:
return "ETR";
}
case "FR":
return "ETA";
default:
throw new MarketDataException(string.Format("The exhange [{0}] is not supported by the Google market data provider", security.ExchangeCode));
}
}


This is obviously incomplete - we've only listed a small set of exchanges to be supported here - so this will need to grow as we find new ones. It would be nice if this could just be mapped automatically, but so far I haven't though of any way to do that. Also it is a bit imprecise - as you can see with the "GY" example. It might just be that Google's data is incorrect or incomplete - but Puma is certainly not a French security - if that is what "FRA" is intended to mean. So who knows why PUM is listed in Google under FRA, while Adidas is listed under ETR, and why other French securities are apparently listed under ETA? For now I'll just do the minimum work here to map securities I'm actually interested in - in reality 90% of what I care about are US securities anyways, so it's not really an issue for me to maintian a few special mappings by hand.

Also note I have provided support for only equities and fx securities. There are obviously many other security types that could be supported - and I haven't taken the time to test them all with Google, so I'm not sure what it could support through either the text request interface or the normal html page request. But this should be enough to get us started.

Finally, note that the set of fields we can get through the text interface is pretty limited. So we may want to get some additional fields by scraping the html page or from another source. But we could do these at a different frequency - for example, we could poll for price updates from the text interface every second or even faster - but we could keep udpates to other fields (like EPS, YesterdayClose, etc.) to once every 30 min or something like that. This will avoid us killing the html request channel to get updates on fields we really don't need updated that frequently. Ultimately it would probably be nice if we could configure the min update frequency we would like on a field by field basis.

Wednesday, April 22, 2009

Creating an Excel Add-In for Market Data - Part 2 - Market Data Provider Proxy

Last time we looked at just putting in place the basic infrastructure to support our RTD component. Now we dive in to the details of our actual market data provider implementation.

For this particular function, we need to find alternative data sources for our financial data because we don't want to necessarily depend on having Bloomberg available at all. There are plenty of good sources of free market data out there (usually delayed). Some good examples are Google Finance, Yahoo Finance, MSN Money, Level2StockQuotes.com, etc. Also, assuming you use some kind of broker that you deal with, they probably provide you real-time data as well - for example, I use Banc of America Investment Services to manage my IRA, and they provide real-time quotes when you are logged in with your account. So we'd like to have a plugin-layer where we can have multiple possible implementations, and it would be nice if we could actually have all of the implementations available, and ingelligently route between them based on priority, security type, availability of markets or even individual securities, or even query performance, load balancing, etc. So before we put specific market data provider adapters in place, we need a proxy-cache layer that can do this intelligent routing and management of our security universe.

We can start by defining a generic interface for any market data provider. For now let's assume we have only a synchronous model, and no actual asynchronous or event-based providers. Obviously if we wanted to properly plug in a real-time provider such as the Bloomberg tick event subscription, we would need to extend this further. We define a GetSecurityFieldValues() function to do the synchronous lookup, and for the purposes of our smart proxy functionality, we define a CanGetFieldForSecurityType() function.

public interface IMarketDataProvider
{
bool CanGetFieldForSecurityType(SecurityType type, string field);

decimal?[] GetSecurityFieldValues(Security security, string[] fields);
}


Then we define a simple proxy provider implementation:

public class MarketDataProxyProvider : IMarketDataProvider
{
private readonly IMarketDataProvider[] _providers;

public MarketDataProxyProvider(IMarketDataProvider[] providers)
{
_providers = providers;
}

public bool CanGetFieldForSecurityType(SecurityType type, string field)
{
return _providers.Any(provider => provider.CanGetFieldForSecurityType(type, field));
}

public decimal?[] GetSecurityFieldValues(Security security, string[] fields)
{
decimal?[] returnValues = new decimal?[fields.Length];
bool[] fieldsCovered = new bool[fields.Length];
foreach (IMarketDataProvider provider in _providers)
{
List fieldsToRetrieve = new List();
List fieldIndexes = new List();
for (int f = 0; f < fields.Length; f++)
{
if ((!fieldsCovered[f]) && provider.CanGetFieldForSecurityType(security.Type, fields[f]))
{
fieldsToRetrieve.Add(fields[f]);
fieldIndexes.Add(f);
fieldsCovered[f] = true;
}
}
if (fieldIndexes.Count > 0)
{
decimal?[] values = provider.GetSecurityFieldValues(security, fieldsToRetrieve.ToArray());
for (int f = 0; f < values.Length; f++)
returnValues[fieldIndexes[f]] = values[f];
}
if (fieldsCovered.All(fc => fc))
break;
}
return returnValues;
}
}


This is just a starting point, but it should give us the basic functionality to proxy for several providers. Some further work could be done to make this class dynamically configurable, and also possibly automatically adaptable to remember and re-prioritize providers based on prior performance.

Tuesday, April 21, 2009

Creating an Excel Add-In for Market Data - Part 1 - Excel RTD Component

Since I left my previous job, I have lost my access to Bloomberg real-time data, and so now I have a lot of spreadsheets that use the BDP function for real-time lookup of stock prices, fx rates, etc. - and I don't have the Bloomberg plugin to make them work anymore. Rather than going back to the world of manually filling this data in periodically, I decided to see if I could create my own Excel add-in, and make it look and behave just like the Bloomberg one. This way, if I get access to Bloomberg again in the future, I can continue using the same spreadsheets without changing anything.

I've had plenty of experience writing Excel add-ins in the past, in particular with ManagedXLL. Unfortunately, my license for this was limited to my PC at work, and none of that code seems to work at all from my home PC - and presumably it would violate the licensing agreement anyways. It's too bad that there isn't a free or cheap "non-profit" version of ManagedXLL available.

So now we embark on trying to make it work without the use of ManagedXLL. Some of this turns out to be pretty easy, while other parts are more challenging. So I guess that's the most valuable thing about ManagedXLL - it provides a single nice and neat package to do everything you want, without having to piece together multiple languages and technologies.

For our mock-Bloomberg add-in, the first step is to look at the RTD piece. In ManagedXLL this was really easy - you can create a RTD server by just exposing a class with a few attributes. It turns out this alone is still pretty easy to do in plain C# code. We just need a few attributes, and we need to implement the IRtdServer interface from Microsoft.Office.Interop.Excel. The code for this ends up looking pretty similar to what it would have been with ManagedXLL. Finally we need to make sure our .NET dll gets registered - either using the project properties when we compile - or using RegAsm.exe to register it after we've built it.

Here's an abbreviated version of the code for the RTD server, showing the relevant parts...
[ClassInterface(ClassInterfaceType.AutoDual)]
[ComVisible(true), ProgId(PROGID)]
public class BdpRtdServer : Excel.IRtdServer
{
public const string PROGID = "Glaze.Finance.ExcelPlugin.Rtd";
private Excel.IRTDUpdateEvent _rtdUpdateEvent;

public object ConnectData(int topicId, ref Array requestStrings, ref bool getNewValues)
{
string[] fields = requestStrings.Cast().ToArray();
string ticker = fields[0];
string field = fields[1];
// ... add new topic subscription for ticker/field ...
}

public void DisconnectData(int topicID)
{
// ... remove topic subscription and unsubscribe ...
}

public Array RefreshData(ref int topicCount)
{
// ... refresh and notify any topics that have updates ...
}

public int Heartbeat()
{
return 1;
}

public int ServerStart(Excel.IRTDUpdateEvent rtdUpdateEvent)
{
// start background processing
_rtdUpdateEvent = rtdUpdateEvent;
}

public void ServerTerminate()
{
// shut down background processing
}

private void NotifyUpdate()
{
if (_rtdUpdateEvent != null)
_rtdUpdateEvent.UpdateNotify();
}

[ComRegisterFunctionAttribute]
public static void RegisterFunction(Type t)
{
Microsoft.Win32.Registry.ClassesRoot.CreateSubKey("CLSID\\{" + t.GUID.ToString().ToUpper() + "}\\Programmable");
}

[ComUnregisterFunctionAttribute]
public static void UnregisterFunction(Type t)
{
Microsoft.Win32.Registry.ClassesRoot.DeleteSubKey("CLSID\\{" + t.GUID.ToString().ToUpper() + "}\\Programmable");
}
}

The main challenge with the RTD piece then becomes the actual application code, which is naturally a bit more complicated than a simple function because we need to implement all of the real-time server management going on behind the scenes. In my implementation, this involved some background threads waiting for updates and batching them up to notify back to Excel, and some smart dictionary caching to keep track of what we are currently "subscribed" to across all of our client calls.

Monday, April 20, 2009

Blogging about Blog Development

While researching some other stuff, I came across a blog framework called BlogEngine.net, and was so impressed I thought it would be worth grabbing it and giving it a try. In fact, I've been wanting to add blogging and other features to my Big Chief Golf website for some time now, so this might be the perfect technology for it. The guys that maintain this also run their own blog, and have written about new features of their blogging engine.

More about this once I've had a chance to give it a try.