Pandas read_csv() method is used to read CSV file into DataFrame object. The CSV file is like a two-dimensional table where the values are separated using a delimiter.
1. Pandas read_csv() Example
Let’s say we have a CSV file “employees.csv” with the following content.
Emp ID,Emp Name,Emp Role
1,Pankaj Kumar,Admin
2,David Lee,Editor
3,Lisa Ray,Author
Let’s see how to read it into a DataFrame using Pandas read_csv() function.
import pandas
emp_df = pandas.read_csv('employees.csv')
print(emp_df)
Output:
Emp ID Emp Name Emp Role
0 1 Pankaj Kumar Admin
1 2 David Lee Editor
2 3 Lisa Ray Author
2. Specifying Delimiter with Pandas read_csv() function
The default delimiter of a CSV file is a comma. But, we can use any other delimiter too. Let’s say our CSV file delimiter is #.
Emp ID#Emp Name#Emp Role
1#Pankaj Kumar#Admin
2#David Lee#Editor
3#Lisa Ray#Author
In this case, we can specify the sep
parameter while calling read_csv() function.
import pandas
emp_df = pandas.read_csv('employees.csv', sep="https://www.journaldev.com/33316/#")
print(emp_df)
Output:
Emp ID Emp Name Emp Role
0 1 Pankaj Kumar Admin
1 2 David Lee Editor
2 3 Lisa Ray Author
3. Reading only specific Columns from the CSV File
We can specify usecols
parameter to read specific columns from the CSV file. This is very helpful when the CSV file has many columns but we are interested in only a few of them.
import pandas
emp_df = pandas.read_csv('employees.csv', usecols=['Emp Name', 'Emp Role'])
print(emp_df)
Output:
Emp Name Emp Role
0 Pankaj Kumar Admin
1 David Lee Editor
2 Lisa Ray Author
It’s not mandatory to have a header row in the CSV file. If the CSV file doesn’t have header row, we can still read it by passing header=None
to the read_csv() function.
Let’s say our employees.csv file has the following content.
1,Pankaj Kumar,Admin
2,David Lee,Editor
Let’s see how to read this CSV file into a DataFrame object.
import pandas
emp_df = pandas.read_csv('employees.csv', header=None)
print(emp_df)
Output:
0 1 2
0 1 Pankaj Kumar Admin
1 2 David Lee Editor
2 3 Lisa Ray Author
Notice that the column headers are auto-assigned from 0 to N. We can pass these column values in the usecols
parameter to read specific columns.
import pandas
emp_df = pandas.read_csv('employees.csv', header=None, usecols=[1])
print(emp_df)
Output:
1
0 Pankaj Kumar
1 David Lee
We can also specify the row for the header value. Any rows before the header row will be discarded. Let’s say the CSV file has the following data.
# some random data
invalid data
Emp ID,Emp Name,Emp Role
1,Pankaj Kumar,Admin
2,David Lee,Editor
3,Lisa Ray,Author
The header data is present in the 3rd row. So we have to pass header=2
to read the CSV data from the file.
import pandas
emp_df = pandas.read_csv('employees.csv', header=2)
print(emp_df)
Output:
Emp ID Emp Name Emp Role
0 1 Pankaj Kumar Admin
1 2 David Lee Editor
2 3 Lisa Ray Author
6. Skipping CSV Rows
We can pass the skiprows
parameter to skip rows from the CSV file. Let’s say we want to skip the 3rd and 4th line from our original CSV file.
import pandas
emp_df = pandas.read_csv('employees.csv', skiprows=[2, 3])
print(emp_df)
Output:
Emp ID Emp Name Emp Role
0 1 Pankaj Kumar Admin
7. Specifying Parser Engine for Pandas read_csv() function
Let’s say our CSV file delimiter is ‘##’ i.e. multiple characters.
Emp ID##Emp Name##Emp Role
1##Pankaj Kumar##Admin
2##David Lee##Editor
3##Lisa Ray##Author
Let’s see what happens when we try to read this CSV file.
import pandas
emp_df = pandas.read_csv('employees.csv', sep='##')
print(emp_df)
Output:
/Users/pankaj/Documents/PycharmProjects/AskPython/hello-world/journaldev/pandas/pandas_read_csv.py:5: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from 's+' are interpreted as regex); you can avoid this warning by specifying engine="python".
emp_df = pandas.read_csv('employees.csv', sep='##')
Emp ID Emp Name Emp Role
0 1 Pankaj Kumar Admin
1 2 David Lee Editor
2 3 Lisa Ray Author
We can avoid the warning by specifying the ‘engine’ parameter in the read_csv() function.
emp_df = pandas.read_csv('employees.csv', sep='##', engine="python")
There are two parser engines – c and python. The C parser engine is faster and default but the python parser engine is more feature complete.