Pivoting in Microsoft Excel using Python

Author
Recent Posts

The journey began in 2008 where I officially started working in the field of IT (IT). Starting the first semester of school I realized a special attraction towards databases and automation. I have been involved with databases such as Microsoft SQL Server / Oracle Database, data analysis and automations using the command line (CLI), Visual Basic for Applications and Python. Through years of experience I have developed these capabilities so that I can make my life easier. For me, the purpose of every IT guy and every office worker is to have the knowledge so that through tools he can work a little but produce a lot. Through this his website DataPlatform.gr I try to offer knowledge and propose solutions to everyday problems.

Certifications:

certs

Latest posts by Stratos Matzouranis (see all)

How to convert a database from Physical Standby to Logical Standby in Oracle Data Guard - 2 June 2025
How to roll back an Oracle Database using a restore point in a Data Guard environment - 28 April 2025
How can we increase performance on Oracle GoldenGate Replicat target with parallelism? - 19 March 2025

In an earlier article we have seen the possibilities to perform Excel functions such as vlookup through Python. In this article we will see how we can perform pivoting through Python.

We will analyze ways so that we can find information such as, which customer made the most expensive purchases or the total amount spent on each product, etc.

Let's see the steps one by one, starting with the resources.

We have an Excel file with the total sales named sales.xlsx.

Pivoting in Microsoft Excel using Python

There is a second Excel file with the customer names called pelates.xlsx.

Apart from having Python installed, we will need the following libraries which are easily installed by running the following commands in the command prompt.

pip install xlrd
pip install pandas
pip install numpy
pip install openpyxl

We import the libraries.

import pandas as pd
import numpy as np

We fill 2 dataframe variables with the records of each Excel. With the sheet_name parameter we can also choose the gender where the data is located.

df_pelates = pd.read_excel('pelates.xlsx',sheet_name='Sheet1')

df_sales = pd.read_excel('sales.xlsx',sheet_name='Sheet1')

We should rename the id field to customer_id so that it has the same name as it has in the sales dataframe.

df_pelates.rename(columns={'id':'customer_id'}, inplace=True)

In this step the vlookup, we merge both dataframes into a new one (df_final) in the customer_id field. In the how parameter we declare the way it will be done join as in SQL we have options left, right, outer, inner.

By choice right we declare that we want all the records from the second dataframe, linking to those linked to the first

If there is no common customer_id it will have the value NaN/Null or else the blank.

df_final = pd.merge(df_pelates, df_sales, on='customer_id', how='right')

Let's see the fields that the merged dataframe now has.

print(df_final.columns)

We can also choose which fields of these to keep.

df_final = df_final[['onoma','epitheto','eidos','posotita','kostos']]

The result so far is this.

print(df_final)

Now calculate the actual cost spent with each purchase by multiplying the quantity by the cost.

df_final['kostos_agoras'] = df_final['posotita']*df_final['kostos']

At this moment, however, we do not know how much money each customer has consumed. We can easily with one line of code group by customer and have the total cost for each.

df_final = df_final.groupby(['customer_id','epitheto','onoma'],as_index=False).sum()

Let's rename the sum field to "Total Purchases"

df_final.rename(columns={'kostos_agoras':'Sinolikes_agores'}, inplace=True)

Let's see what we did.

df_final = df_final[['onoma','epitheto','Sinolikes_agores']]

print(df_final)

We can instead of sum use another function like max to find the maximum purchase made by the customer.

df_final = df_final.groupby(['customer_id','epitheto','onoma'],as_index=False).max()
df_final.rename(columns={'kostos_agoras':'Megisti_agora'}, inplace=True)
df_final = df_final[['onoma','epitheto','Megisti_agora']]

print(df_final)

We can count how many purchases each customer has made (count).

df_final = df_final.groupby(['customer_id','epitheto','onoma'],as_index=False).count()
df_final.rename(columns={'kostos_agoras':'plithos_agorwn'}, inplace=True)
df_final = df_final[['onoma','epitheto','plithos_agorwn']]

We can group by product.

df_final = df_final.groupby(['eidos'],as_index=False).sum()
df_final = df_final[['eidos','kostos_agoras']]

print(df_final)

Final result

Finally we can save the results in a new Excel file.

df_final.to_excel('pivoting_with_python.xlsx', index=False)

Share it

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by the GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the "Analytics" category.
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the "Functional" category.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by the GDPR Cookie Consent plugin. The cookies are used to store the user consent for the cookies in the "Necessary" category.
cookielawinfo-checkbox-others	11 months	This cookie is set by the GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by the GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the "Performance" category.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not the user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__gads	1 year 24 days	This cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through other sites.
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number of visitors, the source where they came from, and the pages visited in an anonymous form.

Cookie	Duration	Description
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.

Let's see the steps one by one, starting with the resources.

Final result

Leave a reply Cancel reply