Filtering DataFrame rows by column values in Pandas using NumPy array

Last update on December 21 2024 07:43:30 (UTC/GMT +8 hours)

Extract rows from a Pandas DataFrame where a specific column's values are in a given NumPy array.

Sample Solution:

Python Code:

import pandas as pd
import numpy as np

# Create a sample DataFrame
data = {'Name': ['Teodosija', 'Sutton', 'Taneli', 'David', 'Emily'],
        'Age': [25, 30, 22, 35, 28],
        'Salary': [50000, 60000, 45000, 70000, 55000]}

df = pd.DataFrame(data)

# Define a NumPy array with values to filter by
selected_age_values = np.array([25, 35])

# Extract rows where 'Age' column values are in the NumPy array
selected_rows = df[df['Age'].isin(selected_age_values)]

# Display the selected rows
print(selected_rows)

Output:

        Name  Age  Salary
0  Teodosija   25   50000
3      David   35   70000

Explanation:

In the exerciser above -

First create a sample DataFrame (df) with columns 'Name', 'Age', and 'Salary'.
We define a NumPy array selected_age_values containing the values we want to filter by in the 'Age' column.
The df['Age'].isin(selected_age_values) condition creates a boolean Series, and boolean indexing is used to extract rows where the condition is True.
The resulting DataFrame (selected_rows) contains only rows where the 'Age' column values are in the specified NumPy array.

Flowchart:

Python Code Editor:

Previous: Merging DataFrames based on a common column in Pandas.
Next: Performing element-wise addition in Pandas DataFrame with NumPy array.