How to drop duplicates in Pandas

You can drop duplicates in Pandas with the following code. I highly recommend you This book to learn Python. In this article, You will see 3 examples to drop duplicates in Pandas.

Step 1: Install Pandas Library

Install the Pandas library using this code, if it is not installed.

pip install pandas

Example 1: Drop all duplicates from the DataFrame

# Import the Pandas library as pd
import pandas as pd

# Initialize a dictionary
dict = {'Students':['John', 'Harry', 'John', 'Chris'],
        'Scores':[84, 73, 84, 84],
       'Values':[84, 75, 84, 84]}

# Create DataFrame from dictionary
df = pd.DataFrame(dict)

# Display the DataFrame
print(df)

# Drop duplicates from DataFrame
df = df.drop_duplicates()

# Display the DataFrame
print(df)

Output:

  Students  Scores  Values
0     John      84      84
1    Harry      73      75
2     John      84      84
3    Chris      84      84
  Students  Scores  Values
0     John      84      84
1    Harry      73      75
3    Chris      84      84

Example 2: Drop all duplicates from a specific column

# Import the Pandas library as pd
import pandas as pd

# Initialize a dictionary
dict = {'Students':['John', 'Harry', 'John', 'Chris'],
        'Scores':[85, 73, 84, 86],
       'Values':[91, 75, 84, 91]}

# Create DataFrame from dictionary
df = pd.DataFrame(dict)

# Display the DataFrame
print(df)

# Drop duplicates from the specific Column
df = df.drop_duplicates(subset = "Values")

# Display the DataFrame
print(df)

Output:

  Students  Scores  Values
0     John      85      91
1    Harry      73      75
2     John      84      84
3    Chris      86      91
  Students  Scores  Values
0     John      85      91
1    Harry      73      75
2     John      84      84

Example 3: Drop all duplicate pairs from the DataFrame

# Import the Pandas library as pd
import pandas as pd

# Initialize a dictionary
dict = {'Students':['John', 'Harry', 'John', 'Chris'],
        'Scores':[67, 67, 88, 87],
       'Values':[65, 65, 89, 88]}

# Create DataFrame from dictionary
df = pd.DataFrame(dict)

# Display the DataFrame
print(df)

# Drop duplicates that are common in these two columns
df = df.drop_duplicates(subset = ["Scores", "Values"])

# Display the DataFrame
print(df)

Output:

  Students  Scores  Values
0     John      67      65
1    Harry      67      65
2     John      88      89
3    Chris      87      88
  Students  Scores  Values
0     John      67      65
2     John      88      89
3    Chris      87      88

Free Learning Resources

Leave a Comment

Your email address will not be published.