Pandas DataFrame merge() function is used to merge two DataFrame objects with a database-style join operation. The joining is performed on columns or indexes.
If the joining is done on columns, indexes are ignored. This function returns a new DataFrame and the source DataFrame objects are unchanged.
Pandas DataFrame merge() Function Syntax
The merge() function syntax is:
def merge(
self,
right,
how="inner",
on=None,
left_on=None,
right_on=None,
left_index=False,
right_index=False,
sort=False,
suffixes=("_x", "_y"),
copy=True,
indicator=False,
validate=None,
)
- right: The other DataFrame to merge with the source DataFrame.
- how: {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’. This is the most important parameter to define the merge operation type. These are similar to SQL left outer join, right outer join, full outer join, and inner join.
- on: Column or index level names to join on. These columns must be present in both the DataFrames. If not provided, the intersection of the columns in both DataFrames are used.
- left_on: Column or index level names to join on in the left DataFrame.
- right_on: Column or index level names to join on in the right DataFrame.
- left_index: Use the index from the left DataFrame as the join key(s).
- right_index: Use the index from the right DataFrame as the join key.
- sort: Sort the join keys lexicographically in the result DataFrame.
- suffixes: Suffix to apply to overlapping column names in the left and right side, respectively.
- indicator: If True, adds a column to output DataFrame called “_merge” with information on the source of each row.
- validate: used to validate the merge process. The valid values are {“one_to_one” or “1:1”, “one_to_many” or “1:m”, “many_to_one” or “m:1”, “many_to_many” or “m:m”}.
Pandas DataFrame merge() Examples
Let’s look at some examples of merging two DataFrame objects.
1. Default Merging – inner join
import pandas as pd
d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'Country': ['India', 'India', 'USA'], 'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)
print('DataFrame 1:n', df1)
df2 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})
print('DataFrame 2:n', df2)
df_merged = df1.merge(df2)
print('Result:n', df_merged)
Output:
DataFrame 1:
Name Country Role
0 Pankaj India CEO
1 Meghna India CTO
2 Lisa USA CTO
DataFrame 2:
ID Name
0 1 Pankaj
1 2 Anupam
2 3 Amit
Result:
Name Country Role ID
0 Pankaj India CEO 1
2. Merging DataFrames with Left, Right, and Outer Join
print('Result Left Join:n', df1.merge(df2, how='left'))
print('Result Right Join:n', df1.merge(df2, how='right'))
print('Result Outer Join:n', df1.merge(df2, how='outer'))
Output:
Result Left Join:
Name Country Role ID
0 Pankaj India CEO 1.0
1 Meghna India CTO NaN
2 Lisa USA CTO NaN
Result Right Join:
Name Country Role ID
0 Pankaj India CEO 1
1 Anupam NaN NaN 2
2 Amit NaN NaN 3
Result Outer Join:
Name Country Role ID
0 Pankaj India CEO 1.0
1 Meghna India CTO NaN
2 Lisa USA CTO NaN
3 Anupam NaN NaN 2.0
4 Amit NaN NaN 3.0
3. Merging DataFrame on Specific Columns
import pandas as pd
d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'ID': [1, 2, 3], 'Country': ['India', 'India', 'USA'],
'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})
print(df1.merge(df2, on='ID'))
print(df1.merge(df2, on='Name'))
Output:
Name_x ID Country Role Name_y
0 Pankaj 1 India CEO Pankaj
1 Meghna 2 India CTO Anupam
2 Lisa 3 USA CTO Amit
Name ID_x Country Role ID_y
0 Pankaj 1 India CEO 1
4. Specify Left and Right Columns for Merging DataFrame Objects
import pandas as pd
d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'ID1': [1, 2, 3], 'Country': ['India', 'India', 'USA'],
'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame({'ID2': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})
print(df1.merge(df2))
print(df1.merge(df2, left_on='ID1', right_on='ID2'))
Output;
Name ID1 Country Role ID2
0 Pankaj 1 India CEO 1
Name_x ID1 Country Role ID2 Name_y
0 Pankaj 1 India CEO 1 Pankaj
1 Meghna 2 India CTO 2 Anupam
2 Lisa 3 USA CTO 3 Amit
5. Using Index as the Join Keys for Merging DataFrames
import pandas as pd
d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'Country': ['India', 'India', 'USA'], 'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})
df_merged = df1.merge(df2)
print('Result Default Merge:n', df_merged)
df_merged = df1.merge(df2, left_index=True, right_index=True)
print('nResult Index Merge:n', df_merged)
Output:
Result Default Merge:
Name Country Role ID
0 Pankaj India CEO 1
Result Index Merge:
Name_x Country Role ID Name_y
0 Pankaj India CEO 1 Pankaj
1 Meghna India CTO 2 Anupam
2 Lisa USA CTO 3 Amit