Topic: General

Data Sources

Friday, April 17th, 2009

A web site that analyzes data needs to have data sources. The ideal data source for evaluation of financial analysis screens should have at least 20 years of quarterly financial data in a “point in time” format, plus daily market data including indexes and ETFs. It should be delivered in a format that is easily incorporated into a screening system which requires time-aligned data, very fast parsing, and consistent database keys. The use license must allow some level of reasonable “redistribution” of the data, in derived form. It also needs to be affordable.

Unfortunately, no data sources satisfy all these requirements. Two come close, each with different advantages and drawbacks: Standard and Poor’s Compustat, and AAII’s SI Pro.

These are two very different data sources. Compustat is a big database covering up to 50,000 global companies, over 1500 data items, 20 years of quarterly and annual fundamental data and 20 years of daily market data. The “Xpressfeed” version of the product is delivered as incremental relational database table updates 3 times daily. Downsides: it is expensive, very large, is not point-in-time (although it retains inactive company data), requires time alignment (matching record dates over many different tables), and its redistribution license is extremely restrictive without a yet more expensive subscription.

SI Pro is a much smaller database covering about 9,000 US-listed companies, about 120 fundamental data items, and only 7 years and 8 quarters of historical data. It is distributed as a complete database each week (each month before 2003). However, SI Pro has a very big advantage: by collecting all database releases, it is possible to build a true point-in-time database over the period from 1997 through today. SI Pro also includes data from a number of original sources so, for example, it includes insider and institutional ownership data which requires an additional subscription with Compustat. SI Pro is inexpensive and its use license is less restrictive than Compustat’s, allowing publication of reasonable subsets of derived data. Major drawbacks are that its data item coverage is limited to most (not all) the major GAAP financial statement data.

At this time, all data presented on Napali Research’s web site is based on the nearly 400 point-in-time SI Pro database snapshots. The advantage of a nearly bias-free back testing methodology weighs heavily in SI Pro’s favor for this purpose. A full historical back test of a screen idea through all SI Pro snapshots takes generally 10-20 minutes, and can be scripted to run hundreds of combinations unattended.

Data source offerings are always changing, so we will continue to look for better sources.