Friday, October 8, 2010

Screen Scraping Content To Excel via Web Connectors

My friends love to tease me for being a "spreadsheet guy" because I make spreadsheets for nearly everything under the sun.  Mostly product comparisons, financial analysis(ish), stuff like that.

One spreadsheet that I work on very frequently deals with 401k fund options from my employer.  Due diligence on 150 funds is a time consuming process.  To help me make semi-educated picks I browse finance.yahoo.com to look at stats such as the beta, expense ratio, manager tenure, turnover, yield, (1, 3, 5, 10) year load adjusted returns, etc.

The problem? These stats change constantly, and manually updating a spreadsheet is a chore!

Recently I was investigating data streams for a keystroke biometric project that I'm working on and came across some information that led me to believe that creating data connections through Excel would be a snap. 

If you open MS Excel 2007+ and go to the Data tab you'll notice there are several external data connection options (Access, Web, Text).

If you jab the "Web" button it'll open your default IE home page within a window with Yellow boxes and black arrows all over.  These boxes identify html table structures.  By selecting one (or more) of these tables you are indicating to the web connector that you'd like to import that content. Click ok and it'll give you the option to import that content into an existing worksheet or a new one. 

Rather than creating unique connections via the UI, you could create and point to saved data connectors (.iqy files).  Click the "Existing Connections" button and you'll uncover 3 examples from MSN Money, 2 of which are templates that once connected allow you to map your variables. 
I've set my connections to update upon file open and that process takes approximately 1.5 minutes for 24 connectors, so just under 4 seconds per record.  I'm assuming that I can extrapolate that number out to 9 minutes for the 150 connectors I'll eventually need (based upon a template using a single .iqy file I wrote).  While I'm not very patient when it comes to waiting for files to open, the reality is that having to do this manually would take me about 2 hours, so 9 minutes doesn't seem all that bad.

I wish Access had a feature where I could schedule the imports! But then again I'm sure there is a more elegant solution to my problem here.  So with that said, if anyone has suggestions for alternate ways to extract mutual fund data for analysis (to a database, XML, excel or whatever) I'd be all ears!

1 comment:

  1. I found the SMF/RCH excel add-in functions to be much more efficient than the web portal connection method for retrieving yahoo financial data.

    ReplyDelete