Alle blogs
Do you, as a data & analytics specialist, turn your nose up at Excel?

Do you, as a data & analytics specialist, turn your nose up at Excel?

21-11-2022
Raymond te Veldhuis

Many people with a quantitative background prefer not to be associated with Excel. A missed opportunity! Because Excel certainly has something to offer data experts who do most of their work in R or Python. In this article, we help data & analytics specialists and beginning analysts get started with MS Excel.

As bonus material, we also explain how to connect Python with Excel to enhance a spreadsheet with some beautiful scripting. You’ll find it all in this article.

Why do data & analytics specialists often turn up their noses at Excel?

Who in their right mind decided to translate input syntax based on user language? There’s a reason data & analytics specialists often scoff at Microsoft Excel. In Excel, you rely too much on the mouse, and it doesn’t offer a sleek IDE that colorizes your input in “MATRIX”-like styles. Moreover, Excel includes some utterly ridiculous examples of digital illiteracy. “vlookup()” is a perfectly fine function name—why the need to translate it into “vert.zoeken()”?

Excel a hack tool?

Identity may also play a role. People with a business background often can't program and have learned to hack things together in Excel. People with a more quantitative background are taught early on to script as much as possible and tend to avoid clunky GUIs that help one avoid writing proper code.

They also prefer not to be associated with Excel as a hack tool and with people who can’t program. As a result, some Python or Julia experts have no idea how to use even the most basic Excel features. That’s a missed opportunity because Excel has a lot to offer data experts who usually work in R or Python. So go ahead and comment out your snob module’s import statement and read in this article what makes Excel useful.

Excel excels at quickly processing limited amounts of two-dimensional data and patching, comparing, and filtering data from different CSV and Excel files side by side. With the push of a button, you can convert a table into a chart in your corporate style—something you can proudly present to clients.

Sure, you can do all the same operations in Python, but it takes more time to set up a script and shape the data for presentation. There’s simply no better editor for two-dimensional data than Excel. That’s likely why Excel’s layout has remained unchanged for so long and why other spreadsheet software and statistical packages use similar layouts and functions. Excel data tables are easy to color, sort, and transform. Sure, you can only store about 17 million data points in an Excel sheet—not a huge amount—but for analyzing outputs or running small-scale analyses, that’s more than enough. An added benefit: most colleagues can open an Excel file, but struggle with a zipped folder full of CSVs and a Python script.

In this blog post

In this article, we help data & analytics specialists and beginning analysts get started with MS Excel. As bonus material, we also explain how to connect Python with Excel to enhance a spreadsheet with scripting. Read on to find out how!

Creative combinations of Excel functions

The essence of Excel is creatively applying functions and extrapolating them across your sheet. For those less familiar with Excel, here’s an overview of useful functions. Click here to learn how to apply formulas across an entire column.

1. Using combinations of vlookup and match

Suppose you want to retrieve data from a table based on a country code and a specific month. With vlookup() and match() combinations, you can quickly query data from another sheet. See this video for how to combine vlookup() and match(). I often used this combo to build data tables for panel data regression analyses. Don’t forget to use dollar signs to lock cell references in the correct direction when copying the formula.

2. Using averageifs() for frequency transformations

Say you have a CSV file with countries in column one, daily dates in column two, and price data in column three. With averageifs(), you can generate monthly averages in a new table. See this video for how to use the averageifs function. You can even add more criteria! But be aware: this formula is a bit clunky and demands significant computing power. With small datasets, you won’t notice, but for larger ones, it can slow down your sheet considerably.

3. Interpolating missing data

Functions like if(), ifna(), or iferror() help interpolate missing data you tried retrieving with earlier formulas. You can replace missing cells with an empty string (""), zero, or a formula—like an average of surrounding values using “average()”, or a lookup from another table. The possibilities are endless!

4. Extracting numbers from a text cell

Sometimes, date labels appear in odd formats, e.g., 2022M1. Using left() and right(), you can extract the year. Combine that with numbervalue() so Excel sees it as a number, not text. The formula would look like: =numbervalue(left(cell reference,4)).

5. Lifehack: combining Excel with Notepad

Sometimes downloaded data stores numbers as strings. If a text cell contains only digits, Excel suggests “convert to number.” But with large datasets, this is inefficient. It's faster to copy the column into Notepad and paste it back into a new column. In Notepad, you can also use “find and replace” for mass edits—like replacing periods with commas, or separating values by replacing spaces with tabs.

Once you master these basic functions, you can start hacking. There are tons of powerful functions—like indirect(), which retrieves data from other sheets or files based on other cell inputs. Combined with the earlier formulas, you can create complex, intertwined Excel systems.

Excelhell

We call the product of all this Excel hacking “Excelhell.” Cobbling together many formulas into a complex mess is a hobby for many office workers with some logical thinking skills. Someone even managed to build a full 3D game engine using only Excel formulas. Look around any organization and you’ll see the world is held together by opaque, half-functional Excel sheets. Probably even the U.S. nuclear program is run this way. Comforting thought, isn’t it?

Analysis Toolkit and Solver

The Analysis Toolkit and Solver are two Excel add-ins that simplify analyses. The Analysis Toolkit lets you estimate statistics and regression models. Solver helps minimize or match a cell value using one or more input cells. I’ve used it to minimize sums of squares in non-standard regression models. Solver supports not only simple iterative methods but also advanced algorithms like gradient descent with multistart to avoid local minima—letting you solve fairly advanced optimization problems. Sure, Python, R, or Matlab are more efficient—but this allowed my manager to better understand my work.

Python and Excel

Using Python’s win32com.client module, you can create a live connection with Excel. Here's how:

import win32com.client
import pandas as pd

app = win32com.client.Dispatch("Excel.Application")
aes = app.ActiveSheet

The active sheet “Table1” contains economic growth data for eurozone countries. I can load this into a pandas dataframe:

delta_gdp = aes.Range("Table1").Value
headers = aes.ListObjects("Table1").HeaderRowRange.Value[0]
df = pd.DataFrame(list(delta_gdp), columns=headers)

>>> df
       Country      Quarter      DeltaGDP
0      AUT          Q3           1.8
1      ESP          Q3           3.8
2      NLD          Q3           3.2
3      BEL          Q3           1.2
4      IRL          Q3           6.7

Now we can perform operations on the data—like calculating average growth per quarter and writing results back to Excel:

mean_growth = df.groupby('Quarter').DeltaGDP.mean()

aes.Range("D1:D4").Value = (('Eurozone_average_Q1',), ('Eurozone_average_Q2',), ('Eurozone_average_Q3',), ('Eurozone_average_Q4',))

aes.Range("E1:E4").Value = tuple((_,) for _ in mean_growth)

Because Python reads and writes directly in Excel’s data array storage, these operations scale easily to thousands of rows. You can also combine Excel with SciPy packages this way.

Excel: a tool to love

While Excel can sometimes drive us crazy, it's also one of the most vital tools in the modern office world. In this blog post, we’ve only scratched the surface of its features—for those who’ve never given the program a second glance. Don’t get us wrong—we’re not trying to replace Python, R, or Julia with Excel. But we firmly believe that strong Excel skills belong in every data specialist’s toolkit. The negative bias many hardcore data & analytics professionals have toward Excel is unjustified.

About DataJobs

DataJobs aims to guide and connect people and organizations active in data & analytics within the job market. Our goal is to reduce noise, confusion, and frustration—and make recruitment agencies largely redundant.

Are you active in the data & analytics world and want to be part of the DataJobs community? Follow our website www.datajobs.nl and our rapidly growing LinkedIn group.