w3resource

Mastering Pandas: 100 Exercises with solutions for Python data analysis


Welcome to w3resource's 100 Pandas exercises collection! This comprehensive set of exercises is designed to help you master the fundamentals of Pandas, a powerful data manipulation and analysis library in Python. Whether you're a beginner or an experienced user looking to improve your skills, these exercises cover a wide range of topics. They provide practical challenges to enhance your Pandas understanding.

[An editor is available at the bottom of the page to write and execute the scripts. Go to the editor]

Exercise 1:

Create a DataFrame from a dictionary of lists.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
print(df)

Output:

   X  Y
0  1  5
1  2  6
2  3  7
3  4  8

Exercise 2:

Select the first 3 rows of a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
print(df.head(3))

Output:

   X  Y
0  1  5
1  2  6
2  3  7

Exercise 3:

Select the 'X' column from a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
print(df['X'])

Output:

0    1
1    2
2    3
3    4
Name: X, dtype: int64

Exercise 4:

Filter rows based on a column condition.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
filtered_df = df[df['X'] > 2]
print(filtered_df)

Output:

   X  Y
2  3  7
3  4  8

Exercise 5:

Add a new column to an existing DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
df['Z'] = df['X'] + df['Y']
print(df)

Output:

   X  Y   Z
0  1  5   6
1  2  6   8
2  3  7  10
3  4  8  12

Exercise 6:

Remove a column from a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8], 'Z': [9, 10, 11, 12]}
df = pd.DataFrame(data)
df.drop(columns=['Z'], inplace=True)
print(df)

Output:

   X  Y
0  1  5
1  2  6
2  3  7
3  4  8

Exercise 7:

Sort a DataFrame by a column.

Solution:

import pandas as pd
data = {'X': [4, 3, 2, 1], 'Y': [8, 7, 6, 5]}
df = pd.DataFrame(data)
df.sort_values(by='X', inplace=True)
print(df)

Output:

   X  Y
3  1  5
2  2  6
1  3  7
0  4  8

Exercise 8:

Group a DataFrame by a column and calculate the mean of each group.

Solution:

import pandas as pd
data = {'X': [1, 2, 1, 2], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
grouped_df = df.groupby('X').mean()
print(grouped_df)

Output:

     Y
X     
1  6.0
2  7.0

Exercise 9:

Replace missing values in a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, None, 4], 'Y': [5, None, 7, 8]}
df = pd.DataFrame(data)
df.fillna(0, inplace=True)
print(df)

Output:

     X    Y
0  1.0  5.0
1  2.0  0.0
2  0.0  7.0
3  4.0  8.0

Exercise 10:

Convert a column to datetime.

Solution:

import pandas as pd
data = {'X': ['2020-01-01', '2020-01-02', '2020-01-03']}
df = pd.DataFrame(data)
df['X'] = pd.to_datetime(df['X'])
print(df)

Output:

           X
0 2020-01-01
1 2020-01-02
2 2020-01-03

Exercise 11:

Create a DataFrame with specific column names.

Solution:

import pandas as pd
data = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(data)
print(df)

Output:

   col1  col2
0     1     4
1     2     5
2     3     6

Exercise 12:

Calculate the sum of values in each column.

Solution:

import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
print(df.sum())

Output:

X     6
Y    15
dtype: int64

Exercise 13:

Calculate the mean of values in each row.

Solution:

import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
print(df.mean(axis=1))

Output:

0    2.5
1    3.5
2    4.5
dtype: float64

Exercise 14:

Concatenate two DataFrames.

Solution:

import pandas as pd
data1 = {'X': [1, 2, 3]}
data2 = {'Y': [4, 5, 6]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
concatenated_df = pd.concat([df1, df2], axis=1)
print(concatenated_df)

Output:

   X  Y
0  1  4
1  2  5
2  3  6

Exercise 15:

Merge two DataFrames on a key.

Solution:

import pandas as pd
data1 = {'key': ['X', 'Y', 'Z'], 'value1': [1, 2, 3]}
data2 = {'key': ['X', 'Y', 'D'], 'value2': [4, 5, 6]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
merged_df = pd.merge(df1, df2, on='key')
print(merged_df)

Output:

  key  value1  value2
0   X       1       4
1   Y       2       5

Exercise 16:

Create a pivot table from a DataFrame.

Solution:

import pandas as pd
data = {'X': ['foo', 'foo', 'bar', 'bar'], 'Y': ['one', 'two', 'one', 'two'], 'Z': [1, 2, 3, 4]}
df = pd.DataFrame(data)
pivot_table = df.pivot_table(values='Z', index='X', columns='Y')
print(pivot_table)

Output:

Y    one  two
X            
bar  3.0  4.0
foo  1.0  2.0

Exercise 17:

Reshape a DataFrame from long to wide format.

Solution:

import pandas as pd
data = {'X': ['foo', 'foo', 'bar', 'bar'], 'Y': ['one', 'two', 'one', 'two'], 'Z': [1, 2, 3, 4]}
df = pd.DataFrame(data)
wide_df = df.pivot(index='X', columns='Y', values='Z')
print(wide_df) 

Output:

Y    one  two
X            
bar    3    4
foo    1    2

Exercise 18:

Calculate the correlation between columns in a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [4, 3, 2, 1]}
df = pd.DataFrame(data)
correlation = df.corr()
print(correlation)

Output:

     X    Y
X  1.0 -1.0
Y -1.0  1.0

Exercise 19:

Iterate over rows in a DataFrame using iterrows().

Solution:

import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
for index, row in df.iterrows():
    print(index, row['X'], row['Y'])

Output:

0 1 4
1 2 5
2 3 6

Exercise 20:

Apply a function to each element in a DataFrame.

Solution:

import pandas as pd  # Import the Pandas library
# Create a sample DataFrame
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
# Apply a function to each element using the map method
df = df.apply(lambda col: col.map(lambda x: x * 2))
print(df)

Output:

   X   Y
0  2   8
1  4  10
2  6  12

Exercise 21:

Create a DataFrame from a list of dictionaries.

Solution:

import pandas as pd
data = [{'X': 1, 'Y': 2}, {'X': 3, 'Y': 4}]
df = pd.DataFrame(data)
print(df)

Output:

   X  Y
0  1  2
1  3  4

Exercise 22:

Rename columns in a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df.rename(columns={'X': 'X', 'Y': 'Y'}, inplace=True)
print(df)

Output:

   X  Y
0  1  4
1  2  5
2  3  6

Exercise 23:

Filter rows by multiple conditions.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [4, 5, 6, 7]}
df = pd.DataFrame(data)
filtered_df = df[(df['X'] > 2) & (df['Y'] < 7)]
print(filtered_df)

Output:

   X  Y
2  3  6

Exercise 24:

Calculate the cumulative sum of a column.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Cumulative_Sum'] = df['X'].cumsum()
print(df)

Output:

   X  Cumulative_Sum
0  1               1
1  2               3
2  3               6
3  4              10

Exercise 25:

Drop rows with missing values.

Solution:

import pandas as pd
data = {'X': [1, 2, None, 4], 'Y': [4, 5, 6, None]}
df = pd.DataFrame(data)
df.dropna(inplace=True)
print(df)

Output:

     X    Y
0  1.0  4.0
1  2.0  5.0

Exercise 26:

Replace values in a DataFrame based on a condition.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
df.loc[df['X'] > 2, 'Y'] = 0
print(df)

Output:

   X  Y
0  1  5
1  2  6
2  3  0
3  4  0

Exercise 27:

Create a DataFrame with a MultiIndex.

Solution:

import pandas as pd
arrays = [['X', 'X', 'Y', 'Y'], [1, 2, 1, 2]]
index = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Number'))
data = {'Value': [10, 20, 30, 40]}
df = pd.DataFrame(data, index=index)
print(df)

Output:

              Value
Group Number       
X     1          10
      2          20
Y     1          30
      2          40

Exercise 28:

Calculate the rolling mean of a column.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Rolling_Mean'] = df['X'].rolling(window=3).mean()
print(df)

Output:

   X  Rolling_Mean
0  1           NaN
1  2           NaN
2  3           2.0
3  4           3.0
4  5           4.0
5  6           5.0

Exercise 29:

Create a DataFrame from a list of tuples.

Solution:

import pandas as pd
data = [(1, 2), (3, 4), (5, 6)]
df = pd.DataFrame(data, columns=['X', 'Y'])
print(df)

Output:

   X  Y
0  1  2
1  3  4
2  5  6

Exercise 30:

Add a row to a DataFrame.

Solution:

import pandas as pd  # Import the Pandas library
# Create a sample DataFrame
data = {'X': [1, 2], 'Y': [3, 4]}
df = pd.DataFrame(data)

# Create a new row as a DataFrame
new_row = pd.DataFrame({'X': [5], 'Y': [6]})
# Concatenate the new row to the DataFrame
df = pd.concat([df, new_row], ignore_index=True)
print(df)

Output:

   X  Y
0  1  3
1  2  4
2  5  6

Exercise 31:

Create a DataFrame with random values.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df)

Output:

          X         Y         Z
0  0.688292  0.950264  0.665916
1  0.497719  0.840536  0.923938
2  0.285218  0.091178  0.722034
3  0.037824  0.248689  0.584696

Exercise 32:

Calculate the rank of values in a DataFrame.

Solution:

import pandas as pd
data = {'X': [3, 1, 4, 1], 'Y': [2, 3, 1, 4]}
df = pd.DataFrame(data)
df['Rank'] = df['X'].rank()
print(df)

Output:

   X  Y  Rank
0  3  2   3.0
1  1  3   1.5
2  4  1   4.0
3  1  4   1.5

Exercise 33:

Change the data type of a column.

Solution:

import pandas as pd
data = {'X': ['1', '2', '3']}
df = pd.DataFrame(data)
df['X'] = df['X'].astype(int)
print(df)

Output:

   X
0  1
1  2
2  3

Exercise 34:

Filter rows based on string matching.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'baz', 'qux']}
df = pd.DataFrame(data)
filtered_df = df[df['X'].str.contains('ba')]
print(filtered_df)

Output:

    X
1  bar
2  baz

Exercise 35:

Create a DataFrame with specified row and column labels.

Solution:

import pandas as pd
data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
df = pd.DataFrame(data, index=['row1', 'row2', 'row3'], columns=['col1', 'col2', 'col3'])
print(df)

Output:

       col1  col2  col3
row1     1     2     3
row2     4     5     6
row3     7     8     9

Exercise 36:

Transpose a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
transposed_df = df.T
print(transposed_df)

Output:

   0  1  2
X  1  2  3
Y  4  5  6

Exercise 37:

Set a column as the index of a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df.set_index('X', inplace=True)
print(df)

Output:

   Y
X   
1  4
2  5
3  6

Exercise 38:

Reset the index of a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df.set_index('X', inplace=True)
df.reset_index(inplace=True)
print(df)

Output:

   X  Y
0  1  4
1  2  5
2  3  6

Exercise 39:

Add a prefix or suffix to column names.

Solution:

import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df = df.add_prefix('col_')
print(df)

Output:

   col_X  col_Y
0      1      4
1      2      5
2      3      6

Exercise 40:

Filter rows based on datetime index.

Solution:

import pandas as pd
date_range = pd.date_range(start='1/1/2020', periods=5, freq='D')
data = {'X': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data, index=date_range)
filtered_df = df['2020-01-03':'2020-01-05']
print(filtered_df)

Output:

            X
2020-01-03  3
2020-01-04  4
2020-01-05  5

Exercise 41:

Create a DataFrame with duplicate rows and remove duplicates.

Solution:

import pandas as pd
data = {'X': [1, 2, 2, 3], 'Y': [4, 5, 5, 6]}
df = pd.DataFrame(data)
df.drop_duplicates(inplace=True)
print(df)

Output:

   X  Y
0  1  4
1  2  5
3  3  6

Exercise 42:

Create a DataFrame with hierarchical index.

Solution:

import pandas as pd
arrays = [['X', 'X', 'Y', 'Y'], [1, 2, 1, 2]]
index = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Number'))
data = {'Value': [10, 20, 30, 40]}
df = pd.DataFrame(data, index=index)
print(df)

Output:

              Value
Group Number       
X     1          10
      2          20
Y     1          30
      2          40

Exercise 43:

Calculate the difference between consecutive rows in a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 3, 6, 10]}
df = pd.DataFrame(data)
df['Difference'] = df['X'].diff()
print(df)

Output:

    X  Difference
0   1         NaN
1   3         2.0
2   6         3.0
3  10         4.0

Exercise 44:

Create a DataFrame with hierarchical columns.

Solution:

import pandas as pd
arrays = [['X', 'X', 'Y', 'Y'], ['C1', 'C2', 'C1', 'C2']]
columns = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Type'))
data = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]
df = pd.DataFrame(data, columns=columns)
print(df)

Output:

Group  X       Y    
Type  C1  C2  C1  C2
0      1   2   3   4
1      5   6   7   8
2      9  10  11  12

Exercise 45:

Filter rows based on the length of strings in a column.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'baz', 'qux']}
df = pd.DataFrame(data)
filtered_df = df[df['X'].str.len() > 3]
print(filtered_df)

Output:

Empty DataFrame
Columns: [X]
Index: []

Exercise 46:

Calculate the percentage change between rows in a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Pct_Change'] = df['X'].pct_change()
print(df)

Output:

   X  Pct_Change
0  1         NaN
1  2    1.000000
2  3    0.500000
3  4    0.333333

Exercise 47:

Create a DataFrame from a dictionary of Series.

Solution:

import pandas as pd
data = {'X': pd.Series([1, 2, 3]), 'Y': pd.Series([4, 5, 6])}
df = pd.DataFrame(data)
print(df)

Output:

   X  Y
0  1  4
1  2  5
2  3  6

Exercise 48:

Filter rows based on whether a column value is in a list.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
filtered_df = df[df['X'].isin([2, 3])]
print(filtered_df)

Output:

   X  Y
1  2  6
2  3  7

Exercise 49:

Calculate the z-score of values in a DataFrame.

Solution:

import pandas as pd
import numpy as np
data = {'X': [1, 2, 3, 4], 'Y': [4, 5, 6, 7]}
df = pd.DataFrame(data)
df['zscore_A'] = (df['X'] - np.mean(df['X'])) / np.std(df['X'])
print(df)

Output:

   X  Y  zscore_A
0  1  4 -1.341641
1  2  5 -0.447214
2  3  6  0.447214
3  4  7  1.341641

Exercise 50:

Create a DataFrame with random integers and calculate descriptive statistics.

Solution:

import pandas as pd
import numpy as np
data = np.random.randint(1, 100, size=(5, 3))
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.describe())

Output:

               X          Y          Z
count   5.000000   5.000000   5.000000
mean   60.600000  71.800000  42.600000
std    38.435661  13.971399  12.218838
min     5.000000  53.000000  28.000000
25%    40.000000  64.000000  34.000000
50%    69.000000  72.000000  41.000000
75%    91.000000  82.000000  55.000000
max    98.000000  88.000000  55.000000

Exercise 51:

Calculate the rank of values in each column of a DataFrame.

Solution:

import pandas as pd
data = {'X': [3, 1, 4, 1], 'Y': [2, 3, 1, 4]}
df = pd.DataFrame(data)
df['Rank_A'] = df['X'].rank()
df['Rank_B'] = df['Y'].rank()
print(df)

Output:

   X  Y  Rank_A  Rank_B
0  3  2     3.0     2.0
1  1  3     1.5     3.0
2  4  1     4.0     1.0
3  1  4     1.5     4.0

Exercise 52:

Filter rows based on multiple string conditions.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'baz', 'qux']}
df = pd.DataFrame(data)
filtered_df = df[df['X'].str.contains('ba|qu')]
print(filtered_df)

Output:

     X
1  bar
2  baz
3  qux

Exercise 53:

Create a DataFrame with random values and calculate the skewness.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'baz', 'qux']}
df = pd.DataFrame(data)
filtered_df = df[df['X'].str.contains('ba|qu')]
print(filtered_df)

Output:

     X
1  bar
2  baz
3  qux

Exercise 54:

Create a DataFrame and calculate the kurtosis.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.kurt())

Output:

X    2.958407
Y   -2.639654
Z    2.704430
dtype: float64

Exercise 55:

Calculate the cumulative product of a column in a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Cumulative_Product'] = df['X'].cumprod()
print(df) 

Output:

   X  Cumulative_Product
0  1                   1
1  2                   2
2  3                   6
3  4                  24

Exercise 56:

Create a DataFrame and calculate the rolling standard deviation.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Rolling_Std'] = df['X'].rolling(window=3).std()
print(df)

Output:

   X  Rolling_Std
0  1          NaN
1  2          NaN
2  3          1.0
3  4          1.0
4  5          1.0
5  6          1.0

Exercise 57:

Create a DataFrame and calculate the expanding mean.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Expanding_Mean'] = df['X'].expanding().mean()
print(df)

Output:

   X  Expanding_Mean
0  1             1.0
1  2             1.5
2  3             2.0
3  4             2.5
4  5             3.0
5  6             3.5

Exercise 58:

Create a DataFrame with random values and calculate the covariance matrix.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.cov()) 

Output:

          X         Y         Z
X  0.054079  0.007398 -0.031403
Y  0.007398  0.053211 -0.020480
Z -0.031403 -0.020480  0.048057

Exercise 59:

Create a DataFrame with random values and calculate the correlation matrix.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.corr())

Output:

               X              Y             Z
X  1.000000 -0.258187  0.541044
Y -0.258187  1.000000 -0.432419
Z  0.541044 -0.432419  1.000000

Exercise 60:

Create a DataFrame and calculate the rolling correlation between two columns.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6], 'Y': [6, 5, 4, 3, 2, 1]}
df = pd.DataFrame(data)
df['Rolling_Corr'] = df['X'].rolling(window=3).corr(df['Y'])
print(df) 

Output:

   X  Y  Rolling_Corr
0  1  6           NaN
1  2  5           NaN
2  3  4          -1.0
3  4  3          -1.0
4  5  2          -1.0
5  6  1          -1.0

Exercise 61:

Create a DataFrame and calculate the expanding variance.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Expanding_Var'] = df['X'].expanding().var()
print(df) 

Output:

   X  Expanding_Var
0  1            NaN
1  2       0.500000
2  3       1.000000
3  4       1.666667
4  5       2.500000
5  6       3.500000

Exercise 62:

Create a DataFrame with datetime index and resample by month.

Solution:

import pandas as pd
date_range = pd.date_range(start='1/1/2020', periods=100, freq='D')
data = {'X': range(100)}
df = pd.DataFrame(data, index=date_range)
monthly_df = df.resample('M').sum()
print(monthly_df)

Output:

               X
2020-01-31   465
2020-02-29  1305
2020-03-31  2325
2020-04-30   855

Exercise 63:

Create a DataFrame and calculate the exponential moving average.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['EMA'] = df['X'].ewm(span=3, adjust=False).mean()
print(df) 

Output:

   X      EMA
0  1  1.00000
1  2  1.50000
2  3  2.25000
3  4  3.12500
4  5  4.06250
5  6  5.03125

Exercise 64:

Create a DataFrame with random integers and calculate the mode.

Solution:

import pandas as pd
import numpy as np
data = np.random.randint(1, 10, size=(5, 3))
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.mode()) 

Output:

    X    Y    Z
0  2  1.0  2.0
1  3  3.0  7.0
2  5  NaN  NaN
3  6  NaN  NaN
4  9  NaN  NaN

Exercise 65:

Create a DataFrame and calculate the z-score of each column.

Solution:

import pandas as pd
import numpy as np
data = {'X': [1, 2, 3, 4], 'Y': [4, 5, 6, 7]}
df = pd.DataFrame(data)
df['zscore_A'] = (df['X'] - np.mean(df['X'])) / np.std(df['X'])
df['zscore_B'] = (df['Y'] - np.mean(df['Y'])) / np.std(df['Y'])
print(df) 

Output:

    X  Y  zscore_A  zscore_B
0  1  4 -1.341641 -1.341641
1  2  5 -0.447214 -0.447214
2  3  6  0.447214  0.447214
3  4  7  1.341641  1.341641

Exercise 66:

Create a DataFrame with random values and calculate the median.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.median())

Output:

X    0.787042
Y    0.477837
Z    0.696911
dtype: float64

Exercise 67:

Create a DataFrame and apply a custom function to each column.

Solution:

import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df = df.apply(lambda x: x + 1)
print(df)

Output:

   X  Y
0  2  5
1  3  6
2  4  7

Exercise 68:

Create a DataFrame with hierarchical index and calculate the mean for each group.

Solution:

import pandas as pd
arrays = [['X', 'X', 'Y', 'Y'], [1, 2, 1, 2]]
index = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Number'))
data = {'Value': [10, 20, 30, 40]}
df = pd.DataFrame(data, index=index)
grouped_df = df.groupby('Group').mean()
print(grouped_df) 

Output:

         Value
Group       
X       15.0
Y       35.0

Exercise 69:

Create a DataFrame and calculate the percentage of missing values in each column.

Solution:

import pandas as pd
data = {'X': [1, 2, None, 4], 'Y': [4, None, 6, 8]}
df = pd.DataFrame(data)
missing_percentage = df.isnull().mean() * 100
print(missing_percentage)

Output:

X    25.0
Y    25.0
dtype: float64

Exercise 70:

Create a DataFrame and apply a custom function to each row.

Solution:

import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df['Sum'] = df.apply(lambda row: row['X'] + row['Y'], axis=1)
print(df)

Output:

   X  Y  Sum
0  1  4    5
1  2  5    7
2  3  6    9

Exercise 71:

Create a DataFrame with random values and calculate the quantiles.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.quantile([0.25, 0.5, 0.75])) 

Output:

            X          Y         Z
0.25  0.174265  0.184036  0.520573
0.50  0.468040  0.315593  0.644571
0.75  0.767870  0.436426  0.771297

Exercise 72:

Create a DataFrame and calculate the interquartile range (IQR).

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
Q1 = df.quantile(0.25)
Q3 = df.quantile(0.75)
IQR = Q3 - Q1
print(IQR) 

Output:

X    0.354244
Y    0.329573
Z    0.245520
dtype: float64

Exercise 73:

Create a DataFrame with datetime index and calculate the rolling mean.

Solution:

import pandas as pd
date_range = pd.date_range(start='1/1/2020', periods=10, freq='D')
data = {'X': range(10)}
df = pd.DataFrame(data, index=date_range)
df['Rolling_Mean'] = df['X'].rolling(window=3).mean()
print(df) 

Output:

                  X  Rolling_Mean
2020-01-01  0           NaN
2020-01-02  1           NaN
2020-01-03  2           1.0
2020-01-04  3           2.0
2020-01-05  4           3.0
2020-01-06  5           4.0
2020-01-07  6           5.0
2020-01-08  7           6.0
2020-01-09  8           7.0
2020-01-10  9           8.0

Exercise 74:

Create a DataFrame and calculate the cumulative maximum.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 2, 1]}
df = pd.DataFrame(data)
df['Cumulative_Max'] = df['X'].cummax()
print(df) 

Output:

   X  Cumulative_Max
0  1               1
1  2               2
2  3               3
3  2               3
4  1               3

Exercise 75:

Create a DataFrame and calculate the cumulative minimum.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 2, 1]}
df = pd.DataFrame(data)
df['Cumulative_Min'] = df['X'].cummin()
print(df)

Output:

   X  Cumulative_Min
0  1               1
1  2               1
2  3               1
3  2               1
4  1               1

Exercise 76:

Create a DataFrame with random values and calculate the cumulative variance.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Cumulative_Var'] = df['X'].expanding().var()
print(df)

Output:

          X         Y         Z  Cumulative_Var
0  0.315669  0.900791  0.404858             NaN
1  0.462000  0.463257  0.922495        0.010706
2  0.328968  0.200027  0.967625        0.006548
3  0.630370  0.992849  0.231884        0.021460
4  0.574397  0.968600  0.926893        0.020023
5  0.204077  0.889864  0.589022        0.027130
6  0.386806  0.630882  0.242157        0.022759
7  0.319831  0.935747  0.829739        0.020630
8  0.786435  0.377739  0.879458        0.034407
9  0.523467  0.077937  0.764476        0.031194

Exercise 77:

Create a DataFrame and apply a custom function to each element.

Solution:

import pandas as pd
# Create a DataFrame
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
# Define the custom function
def custom_function(x):
    return x * 2
# Apply the function to each element using map on each column
df = df.apply(lambda col: col.map(custom_function))
# Print the DataFrame
print(df) 

Output:

   X   Y
0  2   8
1  4  10
2  6  12

Exercise 78:

Create a DataFrame with random values and calculate the z-score for each element.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df = df.apply(lambda x: (x - x.mean()) / x.std(), axis=0)
print(df) 

Output:

          X         Y         Z
0  1.027393  0.656858  1.032853
1  0.674079 -1.277904 -0.220065
2 -0.996641 -0.298841  0.475217
3 -0.704831  0.919887 -1.288005

Exercise 79:

Create a DataFrame and calculate the cumulative sum for each group.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Cumulative_Sum'] = df.groupby('X')['Y'].cumsum()
print(df) 

Output:

     X  Y  Cumulative_Sum
0  foo  1               1
1  bar  2               2
2  foo  3               4
3  bar  4               6

Exercise 80:

Create a DataFrame with random values and calculate the rank for each element.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df = df.rank()
print(df)

Output:

     X    Y    Z
0  4.0  3.0  3.0
1  3.0  2.0  2.0
2  1.0  4.0  1.0
3  2.0  1.0  4.0

Exercise 81:

Create a DataFrame and calculate the cumulative product for each group.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Cumulative_Product'] = df.groupby('X')['Y'].cumprod()
print(df) 

Output:

     X  Y  Cumulative_Product
0  foo  1                   1
1  bar  2                   2
2  foo  3                   3
3  bar  4                   8

Exercise 82:

Create a DataFrame with random values and calculate the expanding sum.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Expanding_Sum'] = df['X'].expanding().sum()
print(df) 

Output:

         X          Y         Z  Expanding_Sum
0  0.815750  0.062819  0.699743       0.815750
1  0.128772  0.843222  0.411903       0.944522
2  0.857516  0.219424  0.234460       1.802038
3  0.011010  0.774375  0.259412       1.813048

Exercise 83:

Create a DataFrame and calculate the expanding minimum for each group.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Expanding_Min'] = df.groupby('X')['Y'].expanding().min().reset_index(level=0, drop=True)
print(df)

Output:

     X  Y  Expanding_Min
0  foo  1            1.0
1  bar  2            2.0
2  foo  3            1.0
3  bar  4            2.0

Exercise 84:

Create a DataFrame with random values and calculate the expanding maximum for each group.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Expanding_Max'] = df.groupby('X')['Y'].expanding().max().reset_index(level=0, drop=True)
print(df) 

Output:

          X         Y         Z  Expanding_Max
0  0.751392  0.015856  0.313990       0.015856
1  0.812436  0.701808  0.069307       0.701808
2  0.148614  0.838726  0.290646       0.838726
3  0.764419  0.586510  0.470466       0.586510

Exercise 85:

Create a DataFrame and calculate the expanding variance for each group.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Expanding_Var'] = df.groupby('X')['Y'].expanding().var().reset_index(level=0, drop=True)
print(df) 

Output:

       X  Y  Expanding_Var
0  foo  1            NaN
1  bar  2            NaN
2  foo  3            2.0
3  bar  4            2.0

Exercise 86:

Create a DataFrame with random values and calculate the expanding standard deviation.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Expanding_Std'] = df['X'].expanding().std()
print(df) 

Output:

          X         Y         Z  Expanding_Std
0  0.693184  0.088273  0.109510            NaN
1  0.031186  0.163005  0.803467       0.468103
2  0.294881  0.409395  0.278145       0.333272
3  0.918778  0.854961  0.791329       0.397322

Exercise 87:

Create a DataFrame and calculate the expanding covariance.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [4, 3, 2, 1]}
df = pd.DataFrame(data)
df['Expanding_Cov'] = df['X'].expanding().cov(df['Y'])
print(df)

Output:

   X  Y  Expanding_Cov
0  1  4            NaN
1  2  3      -0.500000
2  3  2      -1.000000
3  4  1      -1.666667

Exercise 88:

Create a DataFrame with random values and calculate the expanding correlation.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Expanding_Corr'] = df['X'].expanding().corr(df['Y'])
print(df)

Output:

          X         Y         Z  Expanding_Corr
0  0.094026  0.320246  0.044218             NaN
1  0.422531  0.002172  0.995907       -1.000000
2  0.265459  0.391239  0.589878       -0.751147
3  0.118812  0.061489  0.837821       -0.372750

Exercise 89:

Create a DataFrame and calculate the expanding median.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Expanding_Median'] = df['X'].expanding().median()
print(df)

Output:

   X  Expanding_Median
0  1               1.0
1  2               1.5
2  3               2.0
3  4               2.5
4  5               3.0
5  6               3.5

Exercise 90:

Create a DataFrame with datetime index and calculate the expanding mean for each group.

Solution:

import pandas as pd
date_range = pd.date_range(start='1/1/2020', periods=10, freq='D')
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data, index=date_range)
df['Expanding_Mean'] = df.groupby('X')['Y'].expanding().mean().reset_index(level=0, drop=True)
print(df) 

Output:

              X  Y  Expanding_Mean
2020-01-01  foo  0             0.0
2020-01-02  bar  1             1.0
2020-01-03  foo  2             1.0
2020-01-04  bar  3             2.0
2020-01-05  foo  4             2.0
2020-01-06  bar  5             3.0
2020-01-07  foo  6             3.0
2020-01-08  bar  7             4.0
2020-01-09  foo  8             4.0
2020-01-10  bar  9             5.0

Exercise 91:

Create a DataFrame with random values and calculate the rolling sum for each group.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Rolling_Sum'] = df.groupby('X')['Y'].rolling(window=3).sum().reset_index(level=0, drop=True)
print(df) 

Output:

          X         Y         Z  Rolling_Sum
0  0.342706  0.579330  0.902681          NaN
1  0.182432  0.163406  0.156607          NaN
2  0.983085  0.052785  0.588865          NaN
3  0.756982  0.123991  0.704262          NaN
4  0.876875  0.710953  0.923588          NaN
5  0.359818  0.135520  0.277327          NaN
6  0.693156  0.590918  0.985834          NaN
7  0.892253  0.633529  0.169000          NaN
8  0.084238  0.007579  0.076730          NaN
9  0.663869  0.780832  0.644874          NaN

Exercise 92:

Create a DataFrame and calculate the rolling mean for each group.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data)
df['Rolling_Mean'] = df.groupby('X')['Y'].rolling(window=3).mean().reset_index(level=0, drop=True)
print(df) 

Output:

       X  Y  Rolling_Mean
0  foo  0           NaN
1  bar  1           NaN
2  foo  2           NaN
3  bar  3           NaN
4  foo  4           2.0
5  bar  5           3.0
6  foo  6           4.0
7  bar  7           5.0
8  foo  8           6.0
9  bar  9           7.0

Exercise 93:

Create a DataFrame with random values and calculate the rolling standard deviation for each group.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Rolling_Std'] = df.groupby('X')['Y'].rolling(window=3).std().reset_index(level=0, drop=True)
print(df) 

Output:

          X         Y         Z  Rolling_Std
0  0.154838  0.162793  0.808882          NaN
1  0.740167  0.920318  0.650240          NaN
2  0.033449  0.007883  0.249656          NaN
3  0.983601  0.261995  0.399816          NaN
4  0.883155  0.051084  0.125735          NaN
5  0.986930  0.470328  0.612276          NaN
6  0.981338  0.016731  0.627210          NaN
7  0.670522  0.247346  0.530971          NaN
8  0.978909  0.752500  0.903401          NaN
9  0.185614  0.362602  0.541459          NaN

Exercise 94:

Create a DataFrame and calculate the rolling variance for each group.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data)
df['Rolling_Var'] = df.groupby('X')['Y'].rolling(window=3).var().reset_index(level=0, drop=True)
print(df) 

Output:

     X  Y  Rolling_Var
0  foo  0          NaN
1  bar  1          NaN
2  foo  2          NaN
3  bar  3          NaN
4  foo  4          4.0
5  bar  5          4.0
6  foo  6          4.0
7  bar  7          4.0
8  foo  8          4.0
9  bar  9          4.0

Exercise 95:

Create a DataFrame with random values and calculate the rolling correlation for each group.

Solution:

import pandas as pd
import numpy as np
# Create a DataFrame with random values
np.random.seed(42)  # For reproducibility
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
# Optionally create a group column if necessary
df['Group'] = np.random.choice(['A', 'B'], size=10)
# Calculate the rolling correlation for each group
df['Rolling_Corr'] = df.groupby('Group').apply(lambda group: group['Y'].rolling(window=3).corr(group['Z'])).reset_index(level=0, drop=True)
print(df) 

Output:

          X                   Z Group  Rolling_Corr
0  0.374540  0.950714  0.731994     A           NaN	
1  0.598658  0.156019  0.155995     A           NaN
2  0.058084  0.866176  0.601115     A      0.992633
3  0.708073  0.020584  0.969910     A     -0.095420
4  0.832443  0.212339  0.181825     A     -0.180021
5  0.183405  0.304242  0.524756     B           NaN
6  0.431945  0.291229  0.611853     B           NaN
7  0.139494  0.292145  0.366362     A     -0.869948
8  0.456070  0.785176  0.199674     B     -0.984073
9  0.514234  0.592415  0.046450     B     -0.788379

Exercise 96:

Create a DataFrame and calculate the rolling covariance for each group.

Solution:

import pandas as pd

# Create a DataFrame with sample data
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'],
        'Y': range(10), 'Z': range(10, 20)}
df = pd.DataFrame(data)
# Calculate the rolling covariance for each group
rolling_cov = df.groupby('X').apply(lambda group: group['Y'].rolling(window=3).cov(group['Z'])).reset_index(level=0, drop=True)
# Add the rolling covariance to the original DataFrame
df['Rolling_Cov'] = rolling_cov
print(df) 

Output:

     X  Y   Z  Rolling_Cov
0  foo  0  10          NaN
1  bar  1  11          NaN
2  foo  2  12          NaN
3  bar  3  13          NaN
4  foo  4  14          4.0
5  bar  5  15          4.0
6  foo  6  16          4.0
7  bar  7  17          4.0
8  foo  8  18          4.0
9  bar  9  19          4.0

Exercise 97:

Create a DataFrame with random values and calculate the rolling skewness for each group.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Rolling_Skew'] = df.groupby('X')['Y'].rolling(window=3).skew().reset_index(level=0, drop=True)
print(df)

Output:

          X         Y         Z  Rolling_Skew
0  0.808397  0.304614  0.097672           NaN
1  0.684233  0.440152  0.122038           NaN
2  0.495177  0.034389  0.909320           NaN
3  0.258780  0.662522  0.311711           NaN
4  0.520068  0.546710  0.184854           NaN
5  0.969585  0.775133  0.939499           NaN
6  0.894827  0.597900  0.921874           NaN
7  0.088493  0.195983  0.045227           NaN
8  0.325330  0.388677  0.271349           NaN
9  0.828738  0.356753  0.280935           NaN

Exercise 98:

Create a DataFrame and calculate the rolling kurtosis for each group.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data)
df['Rolling_Kurt'] = df.groupby('X')['Y'].rolling(window=3).kurt().reset_index(level=0, drop=True)
print(df) 

Output:

     X  Y  Rolling_Kurt
0  foo  0           NaN
1  bar  1           NaN
2  foo  2           NaN
3  bar  3           NaN
4  foo  4           NaN
5  bar  5           NaN
6  foo  6           NaN
7  bar  7           NaN
8  foo  8           NaN
9  bar  9           NaN

Exercise 99:

Create a DataFrame with random values and calculate the rolling median for each group.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Rolling_Median'] = df.groupby('X')['Y'].rolling(window=3).median().reset_index(level=0, drop=True)
print(df) 

Output:

          X         Y         Z  Rolling_Median
0  0.542696  0.140924  0.802197             NaN
1  0.074551  0.986887  0.772245             NaN
2  0.198716  0.005522  0.815461             NaN
3  0.706857  0.729007  0.771270             NaN
4  0.074045  0.358466  0.115869             NaN
5  0.863103  0.623298  0.330898             NaN
6  0.063558  0.310982  0.325183             NaN
7  0.729606  0.637557  0.887213             NaN
8  0.472215  0.119594  0.713245             NaN
9  0.760785  0.561277  0.770967             NaN

Exercise 100:

Create a DataFrame and calculate the expanding sum for each group.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data)
df['Expanding_Sum'] = df.groupby('X')['Y'].expanding().sum().reset_index(level=0, drop=True)
print(df) 

Output:

     X  Y  Expanding_Sum
0  foo  0            0.0
1  bar  1            1.0
2  foo  2            2.0
3  bar  3            4.0
4  foo  4            6.0
5  bar  5            9.0
6  foo  6           12.0
7  bar  7           16.0
8  foo  8           20.0
9  bar  9           25.0

Python-Pandas Code Editor:

More to Come !

Do not submit any solution of the above exercises at here, if you want to contribute go to the appropriate exercise page.

Test your Python skills with w3resource's quiz



Follow us on Facebook and Twitter for latest update.