Pandas read_csv()

Pandas read_csv() method is used to read CSV file into DataFrame object. The CSV file is like a two-dimensional table where the values are separated using a delimiter.

1. Pandas read_csv() Example

Let’s say we have a CSV file “employees.csv” with the following content.


Emp ID,Emp Name,Emp Role
1,Pankaj Kumar,Admin
2,David Lee,Editor
3,Lisa Ray,Author

Let’s see how to read it into a DataFrame using Pandas read_csv() function.


import pandas
emp_df = pandas.read_csv('employees.csv')
print(emp_df)

Output:


   Emp ID      Emp Name Emp Role
0       1  Pankaj Kumar    Admin
1       2     David Lee   Editor
2       3      Lisa Ray   Author
Recommended Reading: Python Pandas Tutorial

2. Specifying Delimiter with Pandas read_csv() function

The default delimiter of a CSV file is a comma. But, we can use any other delimiter too. Let’s say our CSV file delimiter is #.


Emp ID#Emp Name#Emp Role
1#Pankaj Kumar#Admin
2#David Lee#Editor
3#Lisa Ray#Author

In this case, we can specify the sep parameter while calling read_csv() function.


import pandas
emp_df = pandas.read_csv('employees.csv', sep="https://www.journaldev.com/33316/#")
print(emp_df)

Output:


   Emp ID      Emp Name Emp Role
0       1  Pankaj Kumar    Admin
1       2     David Lee   Editor
2       3      Lisa Ray   Author

3. Reading only specific Columns from the CSV File

We can specify usecols parameter to read specific columns from the CSV file. This is very helpful when the CSV file has many columns but we are interested in only a few of them.


import pandas
emp_df = pandas.read_csv('employees.csv', usecols=['Emp Name', 'Emp Role'])
print(emp_df)

Output:


       Emp Name Emp Role
0  Pankaj Kumar    Admin
1     David Lee   Editor
2      Lisa Ray   Author

It’s not mandatory to have a header row in the CSV file. If the CSV file doesn’t have header row, we can still read it by passing header=None to the read_csv() function.

Let’s say our employees.csv file has the following content.


1,Pankaj Kumar,Admin
2,David Lee,Editor

Let’s see how to read this CSV file into a DataFrame object.


import pandas
emp_df = pandas.read_csv('employees.csv', header=None)
print(emp_df)

Output:


   0             1       2
0  1  Pankaj Kumar   Admin
1  2     David Lee  Editor
2  3      Lisa Ray  Author

Notice that the column headers are auto-assigned from 0 to N. We can pass these column values in the usecols parameter to read specific columns.


import pandas
emp_df = pandas.read_csv('employees.csv', header=None, usecols=[1])
print(emp_df)

Output:


              1
0  Pankaj Kumar
1     David Lee

We can also specify the row for the header value. Any rows before the header row will be discarded. Let’s say the CSV file has the following data.


# some random data
invalid data
Emp ID,Emp Name,Emp Role
1,Pankaj Kumar,Admin
2,David Lee,Editor
3,Lisa Ray,Author

The header data is present in the 3rd row. So we have to pass header=2 to read the CSV data from the file.


import pandas
emp_df = pandas.read_csv('employees.csv', header=2)
print(emp_df)

Output:


   Emp ID      Emp Name Emp Role
0       1  Pankaj Kumar    Admin
1       2     David Lee   Editor
2       3      Lisa Ray   Author

6. Skipping CSV Rows

We can pass the skiprows parameter to skip rows from the CSV file. Let’s say we want to skip the 3rd and 4th line from our original CSV file.


import pandas
emp_df = pandas.read_csv('employees.csv', skiprows=[2, 3])
print(emp_df)

Output:


   Emp ID      Emp Name Emp Role
0       1  Pankaj Kumar    Admin

7. Specifying Parser Engine for Pandas read_csv() function

Let’s say our CSV file delimiter is ‘##’ i.e. multiple characters.


Emp ID##Emp Name##Emp Role
1##Pankaj Kumar##Admin
2##David Lee##Editor
3##Lisa Ray##Author

Let’s see what happens when we try to read this CSV file.


import pandas
emp_df = pandas.read_csv('employees.csv', sep='##')
print(emp_df)

Output:


/Users/pankaj/Documents/PycharmProjects/AskPython/hello-world/journaldev/pandas/pandas_read_csv.py:5: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from 's+' are interpreted as regex); you can avoid this warning by specifying engine="python".
  emp_df = pandas.read_csv('employees.csv', sep='##')
   Emp ID      Emp Name Emp Role
0       1  Pankaj Kumar    Admin
1       2     David Lee   Editor
2       3      Lisa Ray   Author

We can avoid the warning by specifying the ‘engine’ parameter in the read_csv() function.


emp_df = pandas.read_csv('employees.csv', sep='##', engine="python")

There are two parser engines – c and python. The C parser engine is faster and default but the python parser engine is more feature complete.

8. References

By admin

Leave a Reply

%d bloggers like this: