Seeing your data

19 Oct

Seeing things from a new perspective is powerful.

Seeing data laid out graphically is illuminating in a way that reading columns of numbers in spreadsheets just isn’t.

In this, the first of a two-part post on data visualisation, we will use a little code to read a spreadsheet of a weight estimate, and plot the results. All of this graphical magic using free, open source code!

The final version of the code from this post is available at our GitHub repo to save you from having to copy / paste all the parts.

The second post in this series will apply what we’ve used here to using the spreadsheet to build data + geometry inside the Rhino CAD programme.

From this:

To this:

Stay with me here

Coding invariably involves a bit of set up, like sanding a job before you paint it. Once you’ve got a collection of this ‘boilerplate’ code that you can use as a basis, life becomes easier. Don’t be put off :)

[ This was prepared using Microsoft Excel on Windows - the process on Mac should be similar, but I’m afraid I’ve not had time to test it… ]

Coding is like plumbing

…the main thing is to work out what comes in, where it needs to go and what you need to do to get it there. After you have a sense of what you want to achieve, breaking down the task into manageable steps is a pretty good guide of where to start.

What we want to achieve: turn a spreadsheet into a graphical representation (implied: reliably, accurately and repeatably)

Our steps:

Read spreadsheet
Draw picture

So far, so good. Fleshing it out a bit:

Open the spreadsheet file
Read the data from each row
If what we read is valid, store it
Convert the data into something we can draw pictures of
Draw the picture

To do this, we’ll use a programming language called Python.

Python

There are bajillions of programming languages. They’re all handy for something. I’ve used Python here because it allows users to write their own extensions to the language to solve specific problems and publish them, so you and I can take advantage of them. Extensions to do things like scientific and numerical analysis (scipy and numpy), and in our case, read Excel files using openpyxl (a massive shout out to Eric Gazoni and Charlie Clark for their work on this).

[ An aside for the purists: the following is an introduction. Feel free to imagine the words ‘generally’ or ‘usually’ sprinkled liberally throughout the following paragraphs. Now is not the time to go into JIT compiling or the magic of LLVM. ]

Python is an interpreted language. This means that when you ‘run’ a python program, the python interpreter will read your code, and do whatever the code tells it to do.

Python programs are text files that the interpreter ‘reads’ in order to work out what to do.

Patience is vital in programming; programs almost never work properly the first time and need to be de-bugged. Visual Studio code has handy built-in debugging to help you. More info from Microsoft here.

This post is not an introduction on how to program Python - the internet is laden with great guides. Start at www.python.org, look for help at stackoverflow.com, and your favourite search engine is a great avenue for problem-solving.

Preparations

Install python: https://www.python.org/downloads/

Once you’ve installed Python, use its nifty ‘pip’ command line tool to install numpy:

and similarly for openpyxl: pip install openpyxl

2020-10-14 10_57_14-Windows PowerShell.png

Okay, now we need an elegant way to edit these text files that make up our python code.

There are many good text editor options. For Python my go-to is Microsoft’s free Visual Studio Code (Visual Studio 2017/2019 will also work, and is great for C++ and hybrid C++/Python coding…)

Getting down to business

This approach is highly flexible, but this post assumes that you’ve got a weight estimate spreadsheet that’s laid out similarly to our post on the weight estimate.

Breaking the task into steps, by the end we will be:

Defining a ‘class’ to store the info we read from the spreadsheet (if you don’t know what a class is, think of it simply as a container for our data)
Opening the spreadsheet
Reading the data
Setting up to draw the pictures
Configuring the plot
Adding the data
Adding a legend
Annotating interesting items
Tidying up
Changing the grid and tick marks
Adding a colour bar to show the scale
Showing the plot

But we’ll start as minimally as we can:

#------ Versions:  
# 001   basic file reader
import openpyxl                 # import the code to read spreadsheets  
import io  
import numpy as np  
import matplotlib  
import matplotlib.pyplot as plt
from   matplotlib.ticker import (AutoMinorLocator, MultipleLocator)

# 1 ---- define a class to store weight information ----------------------------------------  
class weightItem:  
    def __init__(self):  
        self.weight = 0         # store the weight
        self.cg = [0,0,0]       # store the x,y and z coordinates of the weight
        self.group = ""         # store the 'group' info from the spreadsheet  
        self.subgroup = ""      # store the 'subgroup' info from the spreadsheet  
        self.item = ""          # store the 'item' details from the spreadsheet

So far we’ve imported the python extensions we need, and have defined our class for storing the data, called ‘weightItem’.

Now we’ll read the spreadsheet:

# 2 ---- open spreadsheet ----------------------------------------      
xlsx_filename = "C:\\Users\\Public\\code_example\\weights.xlsx"             #   a variable for holding the name of our spreadsheet     

with open(xlsx_filename, "rb") as f:                                    #   Open the file from disk - will read the last SAVED version  
    in_mem_file = io.BytesIO(f.read())                                  #   read the file into RAM - means we can read the file even if it's open in Excel  
    spreadSheet = openpyxl.load_workbook(in_mem_file,read_only=True,keep_vba=True,data_only=True)   #   load the workbook  
    weightWorksheet = spreadSheet['Weights']  
    
  

Note that we could have just read the file straight in to the openpyxl, but we first read the file into memory, and then read the memory copy of that file into openpyxl. The second approach works even when the spreadsheet is still open, the first way will throw an error. This way the spreadsheet just needs to be saved in order to plot the data.

This next bit is important

Note that the code below will read the first 10000 lines of your spreadsheet, or it will stop when it finds the text ‘[end]’ in the first column (see the last few lines of code in the next section). Telling the code when to stop helps the process to run more quickly, but does introduce the risk of a bug: if you accidentally put the ‘[end]’ text in before the end of the data, then python will stop reading the file, and your data will be incomplete. If your data is more than 10000 lines, you need to change the code to suit.

This highlights your responsibility when coding - you must be fastidious in making sure the code is doing what you think it should. It can be complex but it’s crucial. There are whole industries built around it. Check and re check, and beware of the edge-cases that can make good-looking results be misleading.

# 3 ---- read the data _____________________________________________________________________ 
allTheWeights = []          # where we're storing the data we read from the spreadsheet
rowInSpreadsheet=2          # start reading from row 2

while True:
    wi = weightItem()       #   createan 'instance' of our class to store the weight info
    weight  = weightWorksheet.cell(row=rowInSpreadsheet, column=8).value            #   read the cell contents
    group   = weightWorksheet.cell(row=rowInSpreadsheet, column=1).value            #   read the cell contents

    if(weight is not None) and (weight > 0):                                        #   check - did we read any weight data?

        wi.weight = weight

        subgroup= weightWorksheet.cell(row=rowInSpreadsheet, column=2).value        #   read the cell contents
        item    = weightWorksheet.cell(row=rowInSpreadsheet, column=3).value        #

        wi.cg[0] = weightWorksheet.cell(row=rowInSpreadsheet, column=9).value       #   store LCG
        wi.cg[1] = weightWorksheet.cell(row=rowInSpreadsheet, column=10).value      #   store TCG
        wi.cg[2] = weightWorksheet.cell(row=rowInSpreadsheet, column=11).value      #   store VCG

        if group != None:               #   if we read a value for 'group'....
            wi.group = group            #   ...store it

        if subgroup != None:            #   if we read a value for 'subgroup'....
            wi.subgroup = subgroup      #   ...store it
    
        if item != None:                #   if we read a value for 'item'....
            wi.item = item              #   ...store it

        allTheWeights.append(wi)        #   add the data we read into our list of weight items                                        

    rowInSpreadsheet+=1                 #   move on to reading the next row

    if( rowInSpreadsheet > 10000 ):     #   STOP READING at line 10000
        break

    if group is not None and group=='[end]':  #   STOP READING if we find '[end]' in the first column
        break

The next few steps are the minimum needed to draw a picture - we’ll add to them later on.

# 4 ---- okay, now we've read all the data, let's draw pictures.

# 5 ---- plot configuration ---------------------------------------------------------- 
fig=plt.figure()
ax=fig.add_subplot()

# 6 ---- add the data ------------
weight = []
posLong = []
posTrans = []
posVert = []

for item in allTheWeights:
    weight.append(   item.weight )
    posLong.append(  item.cg[0] )
    posTrans.append( item.cg[1] )
    posVert.append(  item.cg[2] )
    
scatter = plt.scatter(posLong,posVert,s=weight,c=weight,label="Long/vert",cmap='nipy_spectral')  # see: https://matplotlib.org/3.1.1/gallery/shapes_and_collections/scatter.html#sphx-glr-gallery-shapes-and-collections-scatter-py

# 12 ---- show the plot
plt.show()

This should give you an image something like this:

What the code has done, has been to read through the spreadsheet, and for each item it has drawn a coloured circle.

The X-location of the circle is defined by the longitudinal position of the weight,
The Y-location is from the vertical position of the weight.
The diameter of the circle represents the item’s weight
The colour of the circle also represents the item’s weight

The actual drawing is done by the plt.scatter command. The first two parameters tell python what’s controlling our x and y data.

The ‘s=’ parameter tells python that we’d like the size of the circle to be proportional to the weight.

The ‘c=’ parameter tells python that we’d also like the colour of the circle to be proportional to the weight.

The ‘cmap=’ parameter tells python that we’d like to use the ‘nipy-spectral’ colour map. Check out matplotlib’s vast array of colourmaps to see what’s possible.

It may not hang in the Louvre, but it’s a solid start.

This should give an image something like this:

What it needs is grid lines, labels, scales and better control over the X and Y extents.

Make the following changes to your code: replace the code at section ‘5’ with:

# 5 ---- plot configuration ----------------------------------------------------------
fig=plt.figure(figsize=(12,9))              # make the plot bigger
ax=fig.add_subplot(1,1,1)
plt.style.use('seaborn-whitegrid')          # use a plot style: https://matplotlib.org/3.1.0/gallery/style_sheets/style_sheets_reference.html
cmapLines=plt.cm.get_cmap('nipy_spectral')  # get a 'colormap' - handy for specifying colours!

…and insert sections 7, 9, 10 and 11 before section 12 (section 8 comes later):

# 7 ---- add a legend with the unique colors from the scatter 
legend1 = ax.legend(*scatter.legend_elements(),loc="upper left", title="Weights",framealpha=0.5,frameon=True)      # see: https://matplotlib.org/3.1.1/gallery/lines_bars_and_markers/scatter_with_legend.html
ax.add_artist(legend1)

# 9 ---- plot tidy up ----------------------------------------------------------
ax.set_xlim(0,10)
ax.set_ylim(0,2)
ax.set_title('Weight estimate',fontsize=12)
ax.set_xlabel('Longitudinal position')
ax.set_ylabel('Vertical position')


# 10 ---- Change tick markers
ax.xaxis.set_major_locator(MultipleLocator(1))
ax.xaxis.set_minor_locator(MultipleLocator(0.2))
ax.yaxis.set_major_locator(MultipleLocator(0.2))
ax.yaxis.set_minor_locator(MultipleLocator(0.1))
ax.grid(which='major',color='#CCCCCC',linestyle='-')        # draw the grid
ax.grid(which='minor',color='#CCCCCC',linestyle=':')

# 11 ---- add a colour bar to show the scale
plt.colorbar()

Now our plot is looking better.

It has a ‘colorbar’ that shows the mapping between the weight of an item and its colour.

It also has a scale that shows similar mapping, in the upper left-hand corner.

The X and Y axes also now have more sensible maxima and minima.

Now for bonus points, we can get python to add annotations to interesting points in the data.

We can add code to look at all the data from the spreadsheet, and add annotations to the items we deem to be of interest. Add the following code between sections 7 and 9:

# 8 ---- annotate interesting items -------------------------------------------- 
for item in allTheWeights:
    if item.weight > 1000:
        col = cmapLines(0.5)
        arrowEndX = item.cg[0]
        arrowEndY = item.cg[2]
        tag = item.group + ": " + item.subgroup + ": " + item.item
        ax.annotate(tag,xy=(arrowEndX,arrowEndY), xycoords='data',xytext= (-150,30),textcoords='offset points',
            size=9,bbox=dict(boxstyle="square",fc=col,ec=col,alpha=0.2),arrowprops=dict(arrowstyle="->",ec=col))

What’s going on here is that the program iterates over the list of items, and any item whose weight is greater than 1000, gets a note added to it:

The note font, size, arrow style, transparency etc etc can be adjusted endlessly. Go down that rabbit hole here.

So now you’ve built the code, any time you need to review the estimate, simply re-run the code to get a visual representation of the data.

This final version of the code is on our GitHub repo https://github.com/ElanDesignStudio/2020_10_weightEstimate/blob/main/excelReader_003.py for your copy/paste convenience.

Final note

This but scratches the surface, but nonetheless I hope it’s a useful introduction to what’s possible. Don’t forget to always be checking your code and its outputs - debugging is a vital part of the process.

Check out the next post to see how this technique can be integrated directly into a CAD system.

As always, to comment on any of this or to find out more, hit the ‘contact’ page and get in touch!

All the best,

Nick.

Nick Stark