What is a Pivot Table?
A pivot table is a table of statistics that summarizes the data of a more extensive table. The summary of data is reached through various aggregate functions – sum, average, min, max, etc.
A pivot table is a data processing technique to derive useful information from a table.
Pandas pivot_table() function
Pandas pivot_table() function is used to create pivot table from a DataFrame object. We can generate useful information from the DataFrame rows and columns. The pivot_table() function syntax is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
def pivot_table( data, values=None, index=None, columns=None, aggfunc="mean", fill_value=None, margins=False, dropna=True, margins_name="All", observed=False, ) |
- data: the DataFrame instance from which pivot table is created.
- values: column to aggregate.
- index: the column to group by on the pivot table index.
- columns: the column to group by on the pivot table column.
- aggfunc: the aggregate function to run on the data, default is numpy.mean
- fill_value: value to replace null or missing value in the pivot table.
- margins: add all rows/columns. It’s useful in generating grand total of the records.
- dropna: don’t include columns whose entries are all NaN.
- margins_name: Name of the row / column that will contain the totals when margins is True.
- observed: This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.
Pandas Pivot Table Examples
It’s better to use real-life data to understand the actual benefit of pivot tables. I have downloaded a sample CSV file from this link. Here is the direct download link for the CSV file.
The CSV file is a listing of 1,460 company funding records reported by TechCrunch. The below image shows the sample data from the file.
We are interested in the columns – ‘company’, ‘city’, ‘state’, ‘raisedAmt’, and ’round’. Let’s create some pivot tables to generate useful statistics from this data.
1. Simple Pivot Table Example
Let’s try to create a pivot table for average funding by the state.
1 2 3 4 5 6 7 8 9 |
import pandas as pd import numpy as np df = pd.read_csv('TechCrunchcontinentalUSA.csv', usecols=['company', 'city', 'state', 'raisedAmt', 'round']) print('DataFrame Records:n', df.head(6)) # average funding by State df1 = pd.pivot_table(df, values="raisedAmt", columns="state") print('nAverage Funding by State:n', df1) |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
DataFrame Records: company city state raisedAmt round 0 LifeLock Tempe AZ 6850000 b 1 LifeLock Tempe AZ 6000000 a 2 LifeLock Tempe AZ 25000000 c 3 MyCityFaces Scottsdale AZ 50000 seed 4 Flypaper Phoenix AZ 3000000 a 5 Infusionsoft Gilbert AZ 9000000 a Average Funding by State: state AZ CA ... VA WA raisedAmt 5613750.0 1.072324e+07 ... 1.158261e+07 8.140103e+06 [1 rows x 33 columns] |
We can also call pivot_table() function directly on the DataFrame object. The above pivot table can be generated using the below code snippet too.
1 2 3 |
df1 = df.pivot_table(values="raisedAmt", columns="state") |
2. Pivot Table with Agreegate Function
The default aggregate function is numpy.mean
. We can specify the aggregate function as numpy.sum
to generate the total funding by the state.
1 2 3 4 |
df1 = pd.pivot_table(df, values="raisedAmt", columns="state", aggfunc=np.sum) print('nTotal Funding by State:n', df1) |
Output:
1 2 3 4 5 6 |
Total Funding by State: state AZ CA CO ... UT VA WA raisedAmt 50523750 9361385000 126470000 ... 153080000 266400000 789590000 [1 rows x 33 columns] |
3. Total Funding by Company
1 2 3 4 |
df1 = pd.pivot_table(df, values="raisedAmt", columns="company", aggfunc=np.sum) print('nTotal Funding by Company:n', df1) |
Output:
1 2 3 4 5 6 |
Total Funding by Company: company 23andMe 3Jam 4HomeMedia ... vbs tv x+1 xkoto raisedAmt 9000000 4000000 2850000 ... 10000000 16000000 7500000 [1 rows x 909 columns] |
4. Setting Index Column in the Pivot Table
Let’s try to create a pivot table for the average funding by round grouped by the state. The trick is to generate a pivot table with ’round’ as the index column.
1 2 3 4 |
df1 = pd.pivot_table(df, values="raisedAmt", columns="state", index='round') print('nAverage Funding by round in State:n', df1) |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
Average Funding by round in State: state AZ CA ... VA WA round ... a 6.000000e+06 7.158314e+06 ... 9910000.0 6.570476e+06 angel 2.337500e+05 1.006784e+06 ... NaN 8.935714e+05 b 6.850000e+06 1.238483e+07 ... 9850000.0 1.187826e+07 c 2.500000e+07 2.369708e+07 ... 19500000.0 1.592222e+07 d NaN 3.012188e+07 ... 20000000.0 8.500000e+06 debt_round NaN 1.660833e+07 ... NaN NaN e NaN 3.132500e+07 ... NaN 2.200000e+07 seed 1.466667e+05 8.778214e+05 ... 350000.0 7.800000e+05 unattributed NaN 1.933000e+07 ... NaN 2.050000e+07 [9 rows x 33 columns] |
5. Replacing Null Values with a default value
1 2 3 4 |
df1 = pd.pivot_table(df, values="raisedAmt", columns="state", index='round', aggfunc=np.sum, fill_value=0) print('nTotal Funding by round in State:n', df1) |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
Total Funding by round in State: state AZ CA CO ... UT VA WA round ... a 18000000 2526885000 25650000 ... 31800000 99100000 275960000 angel 233750 74502000 3950000 ... 0 0 12510000 b 6850000 2898050000 66900000 ... 67200000 68950000 273200000 c 25000000 2109040000 28850000 ... 54000000 78000000 143300000 d 0 963900000 0 ... 0 20000000 17000000 debt_round 0 199300000 500000 ... 0 0 0 e 0 250600000 0 ... 0 0 44000000 seed 440000 49158000 620000 ... 80000 350000 3120000 unattributed 0 289950000 0 ... 0 0 20500000 [9 rows x 33 columns] |
5. Multiple Index Columns Pivot Table Example
Let’s look at a more complex example. We will create a pivot table of total funding per company per round, state wise.
1 2 3 4 |
df1 = pd.pivot_table(df, values="raisedAmt", columns="state", index=['company', 'round'], aggfunc=np.sum, fill_value=0) print('nTotal Funding by company and round in State:n', df1) |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
Total Funding by round in State: state AZ CA CO CT DC FL GA ... PA RI TN TX UT VA WA company round ... 23andMe a 0 9000000 0 0 0 0 0 ... 0 0 0 0 0 0 0 3Jam a 0 4000000 0 0 0 0 0 ... 0 0 0 0 0 0 0 4HomeMedia a 0 2850000 0 0 0 0 0 ... 0 0 0 0 0 0 0 5min a 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 angel 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 ... .. ... .. .. .. .. .. ... .. .. .. .. .. .. .. uber b 0 7600000 0 0 0 0 0 ... 0 0 0 0 0 0 0 utoopia seed 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 vbs tv seed 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 x+1 a 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 xkoto b 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 [1405 rows x 33 columns] |