Pandas pivot_table() - DataFrame Data Analysis With Examples

What is a Pivot Table?

A pivot table is a table of statistics that summarizes the data of a more extensive table. The summary of data is reached through various aggregate functions – sum, average, min, max, etc.

A pivot table is a data processing technique to derive useful information from a table.

Pandas pivot_table() function

Pandas pivot_table() function is used to create pivot table from a DataFrame object. We can generate useful information from the DataFrame rows and columns. The pivot_table() function syntax is:

  • data: the DataFrame instance from which pivot table is created.
  • values: column to aggregate.
  • index: the column to group by on the pivot table index.
  • columns: the column to group by on the pivot table column.
  • aggfunc: the aggregate function to run on the data, default is numpy.mean
  • fill_value: value to replace null or missing value in the pivot table.
  • margins: add all rows/columns. It’s useful in generating grand total of the records.
  • dropna: don’t include columns whose entries are all NaN.
  • margins_name: Name of the row / column that will contain the totals when margins is True.
  • observed: This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

Pandas Pivot Table Examples

It’s better to use real-life data to understand the actual benefit of pivot tables. I have downloaded a sample CSV file from this link. Here is the direct download link for the CSV file.

The CSV file is a listing of 1,460 company funding records reported by TechCrunch. The below image shows the sample data from the file.

csv-file-pandas-pivot-table

We are interested in the columns – ‘company’, ‘city’, ‘state’, ‘raisedAmt’, and ’round’. Let’s create some pivot tables to generate useful statistics from this data.

1. Simple Pivot Table Example

Let’s try to create a pivot table for average funding by the state.

Output:

We can also call pivot_table() function directly on the DataFrame object. The above pivot table can be generated using the below code snippet too.

2. Pivot Table with Agreegate Function

The default aggregate function is numpy.mean. We can specify the aggregate function as numpy.sum to generate the total funding by the state.

Output:

3. Total Funding by Company

Output:

4. Setting Index Column in the Pivot Table

Let’s try to create a pivot table for the average funding by round grouped by the state. The trick is to generate a pivot table with ’round’ as the index column.

Output:

5. Replacing Null Values with a default value

Output:

5. Multiple Index Columns Pivot Table Example

Let’s look at a more complex example. We will create a pivot table of total funding per company per round, state wise.

Output:

References

By admin

Leave a Reply

%d bloggers like this: