A first exploration of financial data
Some FAQs about financial data
Is it hard to find financial data?
Depends what you are looking for. If you want to find data on
stocks (equities), then this is reasonably easy to get hold of. You
can go to Google finance
or to Yahoo finance
for example, and you then have access to the prices of many
different stocks around the world. There are some places
you can get hold of foreign exchange (FX) data, such as
Olsen
but there is usually some restriction, such as having to register,
or only being able to download a limited amount of data (500 days
at Olsen), or having to pay for the data. Options data is essentially
unobtainable without paying up front, or without an inside connection
in the industry. Much the same is true of futures price data.
Is it hard to download stuff?
If you go to Google finance
you can get hold of some data very rapidly. You first need to
type the stock ticker (=identifier) into the search box (for example, "T" is
the ticker for ATT), then clicking on the `Get quotes' button
will take you to the page for that stock. On the left, there is a
link to historical prices, which you click to get through to the
historical data. You can then choose your data window, and
download the data to a spreadsheet. You will get information on
high, low, open and close prices, as well as the volume of trade.
Yahoo Finance works similarly.
How do I know what is the stock ticker for what company?
You can get the tickers of the stocks from the SP500 index from
Wikipedia
by typing "S&P500" in the search box.
OK, so I have some data in a spreadsheet - what now?
You now need to load the data into some package where it is easy
to work with it. There are at least two good (and completely free)
alternatives here: one is Scilab,
and the other is
R. Both can be
downloaded and installed in a matter of minutes. Scilab is a general
mathematical computational package similar to Matlab, and is in my
view very easy to use. I find R a bit more clumsy, but it is well
set up to handle standard statistical analyses, and copes with missing
data and dates more efficiently than Scilab.
I've installed Scilab and R - how do I import and work on the data?
For now, let me just explain the use of Scilab, for which there is a
worked example in
DATAPAK.
Download these files, and place them all in one directory.
In Windows, you double-click on the Scilab icon,
which opens a Scilab session. Go to the `File' button on the toolbar,
click on this, and select `Change current directory ...' from the list.
Then select the directory where the data and programs from DATAPAK have
been placed. In Linux, you open an xterm, and cd to the directory
where the files from DATAPAK have been put, and then you type the
command `scilab' at the prompt. Either way, you now have an open
Scilab window in the correct directory.
The main script is called intro.sci. This contains a lot of
comments (prefixed by the character string "//") as well as a
small amount of code. You can read the script in whatever text
editor you are used to, and reading this will get you started
with the use of Scilab commands. The data in the file SomeSP.txt
contains daily prices for ten years for 29 stocks from the S&P 500
index, arranged in a big array with 30 columns, the first of which
contains the date (in terms of days counted from some arbitrary
starting date), the rest containing the closing prices. There are
also column headers to identify the different stocks; Scilab and R
can both cope with these.
Perhaps the simplest thing is just to type
exec intro.sci;
at the prompt in the Scilab window. This will execute all the
commands in the script intro.sci; at various points, you will be
offered self-explanatory choices, which you can work through. This
first script steps you through some simple exploratory data analyses
designed to answer some simple but natural questions about stock
price data:
Are log-returns Gaussian, as they would be if the Black-Scholes
model was correct?
Are log-returns stationary?
Are the tails of the distribution of log-returns polynomial?
Is the autocorrelation of log-returns consistent with a model
with stationary independent increments?
There is no formal statistical testing carried out here, but once you
have seen the plots, you will be left in no doubt that the answer to
all of the above questions is "No".
If log-returns are not Gaussian, stationary,or independent, why
do so many people use models which assume that they are?
That's not so easy to answer. At some point, when I have the
time to expand these notes, I will explain why it's not completely
stupid.
How could I make some money trading these stocks?
This is a more interesting question! Well, you have the data, you
have Scilab, now you could try to build some trading rule and see
what it did for you. For example, you could simply buy-and-hold the
stocks, splitting your wealth equally between them at time 0, and
just sitting on that portfolio. Or you could try the 1/N strategy,
where at the beginning of each day, you adjust your holdings of the
different stocks so that the monetary values of your holdings of
each stock are the same. You should find that this beats the buy-and-hold
strategy pretty conclusively; I have not tried it on this data, but
I would expect that this will give you a Sharpe ratio in the range
0.6 to 0.8. Don't forget to allow for transaction costs, which might
amount to (say) 10 basis points ( = 1 part in 1000) on the change
in your position.