Pandas DataFrame append() function is used to merge rows from another DataFrame object. This function returns a new DataFrame object and doesn’t change the source objects. If there is a mismatch in the columns, the new columns are added in the result DataFrame.
1. Pandas DataFrame append() Parameters
The append() function syntax is:
append(other, ignore_index=False, verify_integrity=False, sort=None)
- other: The DataFrame, Series or Dict-like object whose rows will be added to the caller DataFrame.
- ignore_index: if True, the indexes from the source DataFrame objects are ignored.
- verify_integrity: if True, raise
ValueError
on creating index with duplicates. - sort: sort columns if the source DataFrame columns are not aligned. This functionality is deprecated. So we have to pass
sort=True
to sort and silence the warning message. Ifsort=False
is passed, the columns are not sorted and warning is ignored.
Let’s look into some examples of the DataFrame append() function.
2. Appending Two DataFrames
import pandas as pd
df1 = pd.DataFrame({'Name': ['Pankaj', 'Lisa'], 'ID': [1, 2]})
df2 = pd.DataFrame({'Name': ['David'], 'ID': [3]})
print(df1)
print(df2)
df3 = df1.append(df2)
print('nResult DataFrame:n', df3)
Output:
Name ID
0 Pankaj 1
1 Lisa 2
Name ID
0 David 3
Result DataFrame:
Name ID
0 Pankaj 1
1 Lisa 2
0 David 3
3. Appending and Ignoring DataFrame Indexes
If you look at the previous example, the output contains duplicate indexes. We can pass ignore_index=True
to ignore the source indexes and assign new index to the output DataFrame.
df3 = df1.append(df2, ignore_index=True)
print(df3)
Output:
Name ID
0 Pankaj 1
1 Lisa 2
2 David 3
4. Raise ValueError for duplicate indexes
We can pass verify_integrity=True
to raise ValueError if there are duplicate indexes in the two DataFrame objects.
import pandas as pd
df1 = pd.DataFrame({'Name': ['Pankaj', 'Lisa'], 'ID': [1, 2]})
df2 = pd.DataFrame({'Name': ['David'], 'ID': [3]})
df3 = df1.append(df2, verify_integrity=True)
Output:
ValueError: Indexes have overlapping values: Int64Index([0], dtype="int64")
Let’s look at another example where we don’t have duplicate indexes.
import pandas as pd
df1 = pd.DataFrame({'Name': ['Pankaj', 'Lisa'], 'ID': [1, 2]}, index=[100, 200])
df2 = pd.DataFrame({'Name': ['David'], 'ID': [3]}, index=[300])
df3 = df1.append(df2, verify_integrity=True)
print(df3)
Output:
Name ID
100 Pankaj 1
200 Lisa 2
300 David 3
5. Appending DataFrame objects with Non-Matching Columns
import pandas as pd
df1 = pd.DataFrame({'Name': ['Pankaj', 'Lisa'], 'ID': [1, 2]})
df2 = pd.DataFrame({'Name': ['Pankaj', 'David'], 'ID': [1, 3], 'Role': ['CEO', 'Author']})
df3 = df1.append(df2, sort=False)
print(df3)
Output:
Name ID Role
0 Pankaj 1 NaN
1 Lisa 2 NaN
0 Pankaj 1 CEO
1 David 3 Author
We are explicitly passing sort=False
to avoid sorting of columns and ignore FutureWarning. If you don’t pass this parameter, the output will contain the following warning message.
FutureWarning: Sorting because the non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.
To accept the future behavior, pass 'sort=False'.
To retain the current behavior and silence the warning, pass 'sort=True'.
Let’s see what happens when we pass sort=True
.
import pandas as pd
df1 = pd.DataFrame({'Name': ['Pankaj', 'Lisa'], 'ID': [1, 2]})
df2 = pd.DataFrame({'Name': ['Pankaj', 'David'], 'ID': [1, 3], 'Role': ['CEO', 'Author']})
df3 = df1.append(df2, sort=True)
print(df3)
Output:
ID Name Role
0 1 Pankaj NaN
1 2 Lisa NaN
0 1 Pankaj CEO
1 3 David Author
Notice that the columns are sorted in the result DataFrame object. Note that this feature is deprecated and will be removed from future releases.
Let’s look at another example where we have non-matching columns with int values.
import pandas as pd
df1 = pd.DataFrame({'ID': [1, 2]})
df2 = pd.DataFrame({'Name': ['Pankaj', 'Lisa']})
df3 = df1.append(df2, sort=False)
print(df3)
Output:
ID Name
0 1.0 NaN
1 2.0 NaN
0 NaN Pankaj
1 NaN Lisa
Notice that the ID values are changed to floating-point numbers to allow NaN value.