Fetching market data into Excel using Python

Microsoft Excel is the 'go-to' solution for data manipulation and analysis in finance. However, it lags behind the reality of how data, particularly financial or trading data, is consumed in the age of the Internet. Here is a simple, yet powerful way to enable Excel to work with a wide variety of financial databases and sources.

AUTHOR'S BIO

Felix Zumstein is the founder of Zoomer Analytics GmbH, a consulting company for financial software. He is also the creator of xlwings, a Python-Excel package. He holds a Master's degree in Finance and Economics from the University of St. Gallen and is a CFA charterholder.

Excel is widely used in business, particularly for financial calculations. Within the financial industry itself, traders are particularly heavy users of Excel. It is even common for hedge funds - at least in their early years - to trade millions of dollars based on a few numbers that are the output of an Excel spreadsheet.
Traders, financial analysts, risk managers and those in similar roles often need to transfer market data into Excel, where all the calculations are subsequently carried out. These might be exchange rates used for financial reporting or historical security prices that feed into a trader's pricing model. Yet, with the presence of numerous data providers and the lack of a one-size-fits-all Excel functionality for pulling financial data into a spreadsheet, the latter can turn into a complex, provider-specific task. In this article, we explore a few Python tricks that can make life easier.

Bloomberg's infamous BDH formula

To move historical data into Excel from Bloomberg, traders frequently use the BDH formula (which stands for Bloomberg Data History). Assuming that you have Bloomberg's Excel add-in installed, this formula works as follows: If we write

=BDH("TICKER", "PX_LAST",...)

into a cell, we will get the end-of-period prices for that ticker.

The interesting behaviour of the above formula is that it actually fills the sheet with an array of historical prices from a single cell, without it being a proper Excel array formula! While this is slightly against the idea of how Excel is supposed to work, it is very useful in practice.

In contrast, proper array formulae in Excel are those that require the user to hit Ctrl-Shift-Enter to enter them and then appear with curly braces around the formula. They have the disadvantage that they don't work well if the data is changing size, as in the case of time series data: Excel arrays require the output array to be selected before obtaining the data. This is a significant limitation as we do not usually know how many rows will be returned when requesting the end-of-period prices of a certain stock during the last couple of years. On top of that, we might want to update these sheets every day, which would require deleting and recreating the entire array every single time.

Excel is older than the internet

Thanks to the Internet, it is now easier than ever to gain data from all sorts of providers. Not everybody has a license for a Bloomberg terminal or another commercial provider. Services that offer free data, like Quandl, Yahoo! Finance, Google Finance or even direct sources like central banks or exchanges are a welcome and frequently used addition to users' data repositories.

However, to get data from these providers into Excel, there is an issue: Excel has been around long before the Internet became widespread and VBA has not been updated for years. For example, VBA does not offer an easy way to deal with the JSON format, the de facto standard to exchange data over the Internet nowadays.

To gather data in Excel, we usually have to use a separate add-in for each data provider. Moreover, these generally do not support a BDH-style formula.

Python to the rescue

Fortunately, there is a solution using Python - a free and open-source programming language that is becoming ever more popular in finance. Python has mature libraries for scientific computing like NumPy and Pandas and it is easy to get started with. At the same time, it is powerful enough to serve in the plumbing of some of the biggest websites like YouTube (Hoff, 2012) or trading systems at BoAML (Langworth, 2013) and J.P. Morgan (JPMorgan Chase & Co., 2013).

One of Python's strengths is that it has an ecosystem of third-party packages that interface with pretty much any other system out there - the reason why Python is also often called a 'glue language'. As a result, major data providers (like Bloomberg, Thomson Reuters Eikon or Quandl) offer an official Python package, while almost all other providers of interest are easy to query using other third-party packages.

The only piece left missing then is the link between Python and Excel - specifically how to feed the data from Python into Excel in a BDH-style formula. This can be easily achieved using a Python package called xlwings.

xlwings 101

To introduce xlwings, let's push data into Excel in an interactive way, before turning the attention to building user defined functions (UDFs) in the next section.

Note: If you have never dealt with Python before, the easiest way to get started is to download the Anaconda distribution (see Links section at the end of the article). It is a single-click installer that already comes with all the packages we need, including xlwings.

Once Anaconda is installed with the defaults, let's start the Python interpreter by typing 'python' at a command prompt:

C:Usersfelix> python
>>>

We can now import the two packages that we need: quandl for data acquisition and xlwings for connecting to Excel:

>>> import quandl
>>> import xlwings as xw

As sample data for the next step, we are going to download the Apple stock price since the beginning of the year from Quandl. The quandl package is going to give us back a Pandas DataFrame that has become Python's standard data container for time series data:

>>> df = quandl.get("WIKI/AAPL",
start_date="2016-01-01")

To transfer the data to Excel, we can let xlwings write the DataFrame to a range in the active Excel workbook like this (note that it is enough to specify the top-left corner of where you want the data to be written to):

>>> xw.Range("A1").value = df

Note: xw.Range is a shortcut to a range on the active sheet of the active workbook, so you need to have an Excel workbook open to make this command work.

The Excel workbook should now look like that of Figure 01.

Note: While Quandl offers the service for free, you need to use a token that comes with every free account to be able to make more than just a few calls. You would authenticate by executing this code:

>>> quandl.ApiConfig.api_key = 'MY_API_KEY'

Building the get_data() function

Let's now turn this interactive session into an Excel user defined function so that we can simply go into a cell and type

=get_data("WIKI/AAPL", "2010-01-01")

to obtain the same result as with the interactive version above.

To do this, there is a little more work required than for the interactive session above. First, we need to install the xlwings Excel add-in. This is done by executing the following from a command prompt (after either typing Ctrl-D to exit the Python interpreter or by starting a new command prompt):

Having restarted Excel, we now have a new 'xlwings' tab in the ribbon with an 'Import' button, as illustrated in Figure 02. We still need to change one final Excel setting before we can get started: Enable 'Trust access to the VBA project object model' under File > Options > Trust Center > Trust Center Settings > Macro Settings.
Let's move back to the command prompt and create a new project, i.e. a macro-enabled workbook and a Python source code file:

C:Usersfelix> xlwings quickstart get_data

Figure 02: The installed xlwings tab in Microsoft Excel

This will create a new directory "get_data" with the two files get_data.xlsm and get_data.py. Using a text editor, we write the following function into get_data.py:

import xlwings as xw # 1
import quandl

@xw.func # 2
@xw.ret(expand='table') # 3
def get_data(ticker,
start_date=None,
end_date=None): # 4
return quandl.get(ticker,
start_date= start_date) # 5

Notes:

1. Required libraries are imported first.
2. Functions that we want the xlwings add-in to import need to be decorated with @xw.func.
3. The return decorator has the expand option set to 'table' resulting in the data being returned as dynamic array, i.e. similar to how data is returned when using the BDH formula.
4. The actual get_data function is defined and takes three arguments (of which the second and third are optional).
5. The Pandas DataFrame returned by the Quandl call is then passed on by get_data().

We save the Python source file, switch back to Excel and click on 'Import Python UDFs' . Upon writing the formula

=get_data("WIKI/AAPL", "2010-01-01")

into cell A1, we are presented with the same picture as in the interactive case above.

So far, we have used a simple example to illustrate how the xlwings Python package works. In the next step, let's extend the get_data() function to add more flexibility.

Here, we are going to add functionality for:

• Specifying an end date
• Specifying a specific column
• Switching between different data providers (Quandl and Yahoo! Finance in this example).

The extended get_data() Python function is shown in Listing 01. After reimporting it into Excel, we can examine how the corresponding Excel get_data() UDF has become smarter. The new Excel formula looks as follows:

=get_data(ticker, [column], [start_date], [end_date], [provider])

A few examples of how this formula looks like when applied:

• Get the history for Apple from Quandl since 1 January 2016:

=get_data("WIKI/AAPL", ,"2016-01-01")

• Get the history for Apple from Yahoo! Finance from 1 January 2016 until 31 January 2016:

=get_data("AAPL", ,"2016-01-01", "2016-01-31", "yahoo")

• Get only the 'Close' column from Quandl (Hint: we can use the column number - 4 in this example - instead of 'Close' to make the call more efficient.) Quandl will then only retrieve that specific column.

=get_data("WIKI/AAPL", "Close","2016-01-01")

• Get only the 'Close' column from Yahoo! (Yahoo! Finance doesn't accept column numbers):

=get_data("AAPL", "Close","2016-01-01", , "yahoo")

Note: It is of course possible to reference date-formatted cells instead of passing the date as a string.

Conclusion

Python is a very powerful programming language and - thanks to its many packages - can complement Excel to allow it to interact with numerous financial data providers. It is also possible to extend the formulae discussed here to fetch data from all kinds of internal databases. We have seen that scripts and functions can be easily built to deliver data in the desired way. Ultimately, this means that we can even replace the entire set of vendor-specific add-ins with functions written in Python.

References

Hoff, T. 2012 'Seven years of YouTube scalability lessons in 30 minutes'
http://bit.ly/1MCrHZm
Langworth, H. 2013 'Technology at Bank of America Merrill Lynch: become part of the Quartz community'
http://bit.ly/2fwtR0Q
JPMorgan Chase & Co. 2013 Athena information sheet
http://techcareers.jpmorgan.com/cm/BlobServer/athena_jpmc.pdf?blobkey=id&blobwhere=