Mastering Pandas: 100 Exercises with solutions for Python data analysis
Welcome to w3resource's 100 Pandas exercises collection! This comprehensive set of exercises is designed to help you master the fundamentals of Pandas, a powerful data manipulation and analysis library in Python. Whether you're a beginner or an experienced user looking to improve your skills, these exercises cover a wide range of topics. They provide practical challenges to enhance your Pandas understanding.
[An editor is available at the bottom of the page to write and execute the scripts. Go to the editor]
Exercise 1:
Create a DataFrame from a dictionary of lists.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
print(df)
Output:
X Y 0 1 5 1 2 6 2 3 7 3 4 8
Exercise 2:
Select the first 3 rows of a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
print(df.head(3))
Output:
X Y 0 1 5 1 2 6 2 3 7
Exercise 3:
Select the 'X' column from a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
print(df['X'])
Output:
0 1 1 2 2 3 3 4 Name: X, dtype: int64
Exercise 4:
Filter rows based on a column condition.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
filtered_df = df[df['X'] > 2]
print(filtered_df)
Output:
X Y 2 3 7 3 4 8
Exercise 5:
Add a new column to an existing DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
df['Z'] = df['X'] + df['Y']
print(df)
Output:
X Y Z 0 1 5 6 1 2 6 8 2 3 7 10 3 4 8 12
Exercise 6:
Remove a column from a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8], 'Z': [9, 10, 11, 12]}
df = pd.DataFrame(data)
df.drop(columns=['Z'], inplace=True)
print(df)
Output:
X Y 0 1 5 1 2 6 2 3 7 3 4 8
Exercise 7:
Sort a DataFrame by a column.
Solution:
import pandas as pd
data = {'X': [4, 3, 2, 1], 'Y': [8, 7, 6, 5]}
df = pd.DataFrame(data)
df.sort_values(by='X', inplace=True)
print(df)
Output:
X Y 3 1 5 2 2 6 1 3 7 0 4 8
Exercise 8:
Group a DataFrame by a column and calculate the mean of each group.
Solution:
import pandas as pd
data = {'X': [1, 2, 1, 2], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
grouped_df = df.groupby('X').mean()
print(grouped_df)
Output:
Y X 1 6.0 2 7.0
Exercise 9:
Replace missing values in a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, None, 4], 'Y': [5, None, 7, 8]}
df = pd.DataFrame(data)
df.fillna(0, inplace=True)
print(df)
Output:
X Y 0 1.0 5.0 1 2.0 0.0 2 0.0 7.0 3 4.0 8.0
Exercise 10:
Convert a column to datetime.
Solution:
import pandas as pd
data = {'X': ['2020-01-01', '2020-01-02', '2020-01-03']}
df = pd.DataFrame(data)
df['X'] = pd.to_datetime(df['X'])
print(df)
Output:
X 0 2020-01-01 1 2020-01-02 2 2020-01-03
Exercise 11:
Create a DataFrame with specific column names.
Solution:
import pandas as pd
data = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(data)
print(df)
Output:
col1 col2 0 1 4 1 2 5 2 3 6
Exercise 12:
Calculate the sum of values in each column.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
print(df.sum())
Output:
X 6 Y 15 dtype: int64
Exercise 13:
Calculate the mean of values in each row.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
print(df.mean(axis=1))
Output:
0 2.5 1 3.5 2 4.5 dtype: float64
Exercise 14:
Concatenate two DataFrames.
Solution:
import pandas as pd
data1 = {'X': [1, 2, 3]}
data2 = {'Y': [4, 5, 6]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
concatenated_df = pd.concat([df1, df2], axis=1)
print(concatenated_df)
Output:
X Y 0 1 4 1 2 5 2 3 6
Exercise 15:
Merge two DataFrames on a key.
Solution:
import pandas as pd
data1 = {'key': ['X', 'Y', 'Z'], 'value1': [1, 2, 3]}
data2 = {'key': ['X', 'Y', 'D'], 'value2': [4, 5, 6]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
merged_df = pd.merge(df1, df2, on='key')
print(merged_df)
Output:
key value1 value2 0 X 1 4 1 Y 2 5
Exercise 16:
Create a pivot table from a DataFrame.
Solution:
import pandas as pd
data = {'X': ['foo', 'foo', 'bar', 'bar'], 'Y': ['one', 'two', 'one', 'two'], 'Z': [1, 2, 3, 4]}
df = pd.DataFrame(data)
pivot_table = df.pivot_table(values='Z', index='X', columns='Y')
print(pivot_table)
Output:
Y one two X bar 3.0 4.0 foo 1.0 2.0
Exercise 17:
Reshape a DataFrame from long to wide format.
Solution:
import pandas as pd
data = {'X': ['foo', 'foo', 'bar', 'bar'], 'Y': ['one', 'two', 'one', 'two'], 'Z': [1, 2, 3, 4]}
df = pd.DataFrame(data)
wide_df = df.pivot(index='X', columns='Y', values='Z')
print(wide_df)
Output:
Y one two X bar 3 4 foo 1 2
Exercise 18:
Calculate the correlation between columns in a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [4, 3, 2, 1]}
df = pd.DataFrame(data)
correlation = df.corr()
print(correlation)
Output:
X Y X 1.0 -1.0 Y -1.0 1.0
Exercise 19:
Iterate over rows in a DataFrame using iterrows().
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
for index, row in df.iterrows():
print(index, row['X'], row['Y'])
Output:
0 1 4 1 2 5 2 3 6
Exercise 20:
Apply a function to each element in a DataFrame.
Solution:
import pandas as pd # Import the Pandas library
# Create a sample DataFrame
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
# Apply a function to each element using the map method
df = df.apply(lambda col: col.map(lambda x: x * 2))
print(df)
Output:
X Y 0 2 8 1 4 10 2 6 12
Exercise 21:
Create a DataFrame from a list of dictionaries.
Solution:
import pandas as pd
data = [{'X': 1, 'Y': 2}, {'X': 3, 'Y': 4}]
df = pd.DataFrame(data)
print(df)
Output:
X Y 0 1 2 1 3 4
Exercise 22:
Rename columns in a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df.rename(columns={'X': 'X', 'Y': 'Y'}, inplace=True)
print(df)
Output:
X Y 0 1 4 1 2 5 2 3 6
Exercise 23:
Filter rows by multiple conditions.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [4, 5, 6, 7]}
df = pd.DataFrame(data)
filtered_df = df[(df['X'] > 2) & (df['Y'] < 7)]
print(filtered_df)
Output:
X Y 2 3 6
Exercise 24:
Calculate the cumulative sum of a column.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Cumulative_Sum'] = df['X'].cumsum()
print(df)
Output:
X Cumulative_Sum 0 1 1 1 2 3 2 3 6 3 4 10
Exercise 25:
Drop rows with missing values.
Solution:
import pandas as pd
data = {'X': [1, 2, None, 4], 'Y': [4, 5, 6, None]}
df = pd.DataFrame(data)
df.dropna(inplace=True)
print(df)
Output:
X Y 0 1.0 4.0 1 2.0 5.0
Exercise 26:
Replace values in a DataFrame based on a condition.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
df.loc[df['X'] > 2, 'Y'] = 0
print(df)
Output:
X Y 0 1 5 1 2 6 2 3 0 3 4 0
Exercise 27:
Create a DataFrame with a MultiIndex.
Solution:
import pandas as pd
arrays = [['X', 'X', 'Y', 'Y'], [1, 2, 1, 2]]
index = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Number'))
data = {'Value': [10, 20, 30, 40]}
df = pd.DataFrame(data, index=index)
print(df)
Output:
Value Group Number X 1 10 2 20 Y 1 30 2 40
Exercise 28:
Calculate the rolling mean of a column.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Rolling_Mean'] = df['X'].rolling(window=3).mean()
print(df)
Output:
X Rolling_Mean 0 1 NaN 1 2 NaN 2 3 2.0 3 4 3.0 4 5 4.0 5 6 5.0
Exercise 29:
Create a DataFrame from a list of tuples.
Solution:
import pandas as pd
data = [(1, 2), (3, 4), (5, 6)]
df = pd.DataFrame(data, columns=['X', 'Y'])
print(df)
Output:
X Y 0 1 2 1 3 4 2 5 6
Exercise 30:
Add a row to a DataFrame.
Solution:
import pandas as pd # Import the Pandas library
# Create a sample DataFrame
data = {'X': [1, 2], 'Y': [3, 4]}
df = pd.DataFrame(data)
# Create a new row as a DataFrame
new_row = pd.DataFrame({'X': [5], 'Y': [6]})
# Concatenate the new row to the DataFrame
df = pd.concat([df, new_row], ignore_index=True)
print(df)
Output:
X Y 0 1 3 1 2 4 2 5 6
Exercise 31:
Create a DataFrame with random values.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df)
Output:
X Y Z 0 0.688292 0.950264 0.665916 1 0.497719 0.840536 0.923938 2 0.285218 0.091178 0.722034 3 0.037824 0.248689 0.584696
Exercise 32:
Calculate the rank of values in a DataFrame.
Solution:
import pandas as pd
data = {'X': [3, 1, 4, 1], 'Y': [2, 3, 1, 4]}
df = pd.DataFrame(data)
df['Rank'] = df['X'].rank()
print(df)
Output:
X Y Rank 0 3 2 3.0 1 1 3 1.5 2 4 1 4.0 3 1 4 1.5
Exercise 33:
Change the data type of a column.
Solution:
import pandas as pd
data = {'X': ['1', '2', '3']}
df = pd.DataFrame(data)
df['X'] = df['X'].astype(int)
print(df)
Output:
X 0 1 1 2 2 3
Exercise 34:
Filter rows based on string matching.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'baz', 'qux']}
df = pd.DataFrame(data)
filtered_df = df[df['X'].str.contains('ba')]
print(filtered_df)
Output:
X 1 bar 2 baz
Exercise 35:
Create a DataFrame with specified row and column labels.
Solution:
import pandas as pd
data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
df = pd.DataFrame(data, index=['row1', 'row2', 'row3'], columns=['col1', 'col2', 'col3'])
print(df)
Output:
col1 col2 col3 row1 1 2 3 row2 4 5 6 row3 7 8 9
Exercise 36:
Transpose a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
transposed_df = df.T
print(transposed_df)
Output:
0 1 2 X 1 2 3 Y 4 5 6
Exercise 37:
Set a column as the index of a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df.set_index('X', inplace=True)
print(df)
Output:
Y X 1 4 2 5 3 6
Exercise 38:
Reset the index of a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df.set_index('X', inplace=True)
df.reset_index(inplace=True)
print(df)
Output:
X Y 0 1 4 1 2 5 2 3 6
Exercise 39:
Add a prefix or suffix to column names.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df = df.add_prefix('col_')
print(df)
Output:
col_X col_Y 0 1 4 1 2 5 2 3 6
Exercise 40:
Filter rows based on datetime index.
Solution:
import pandas as pd
date_range = pd.date_range(start='1/1/2020', periods=5, freq='D')
data = {'X': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data, index=date_range)
filtered_df = df['2020-01-03':'2020-01-05']
print(filtered_df)
Output:
X 2020-01-03 3 2020-01-04 4 2020-01-05 5
Exercise 41:
Create a DataFrame with duplicate rows and remove duplicates.
Solution:
import pandas as pd
data = {'X': [1, 2, 2, 3], 'Y': [4, 5, 5, 6]}
df = pd.DataFrame(data)
df.drop_duplicates(inplace=True)
print(df)
Output:
X Y 0 1 4 1 2 5 3 3 6
Exercise 42:
Create a DataFrame with hierarchical index.
Solution:
import pandas as pd
arrays = [['X', 'X', 'Y', 'Y'], [1, 2, 1, 2]]
index = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Number'))
data = {'Value': [10, 20, 30, 40]}
df = pd.DataFrame(data, index=index)
print(df)
Output:
Value Group Number X 1 10 2 20 Y 1 30 2 40
Exercise 43:
Calculate the difference between consecutive rows in a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 3, 6, 10]}
df = pd.DataFrame(data)
df['Difference'] = df['X'].diff()
print(df)
Output:
X Difference 0 1 NaN 1 3 2.0 2 6 3.0 3 10 4.0
Exercise 44:
Create a DataFrame with hierarchical columns.
Solution:
import pandas as pd
arrays = [['X', 'X', 'Y', 'Y'], ['C1', 'C2', 'C1', 'C2']]
columns = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Type'))
data = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]
df = pd.DataFrame(data, columns=columns)
print(df)
Output:
Group X Y Type C1 C2 C1 C2 0 1 2 3 4 1 5 6 7 8 2 9 10 11 12
Exercise 45:
Filter rows based on the length of strings in a column.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'baz', 'qux']}
df = pd.DataFrame(data)
filtered_df = df[df['X'].str.len() > 3]
print(filtered_df)
Output:
Empty DataFrame Columns: [X] Index: []
Exercise 46:
Calculate the percentage change between rows in a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Pct_Change'] = df['X'].pct_change()
print(df)
Output:
X Pct_Change 0 1 NaN 1 2 1.000000 2 3 0.500000 3 4 0.333333
Exercise 47:
Create a DataFrame from a dictionary of Series.
Solution:
import pandas as pd
data = {'X': pd.Series([1, 2, 3]), 'Y': pd.Series([4, 5, 6])}
df = pd.DataFrame(data)
print(df)
Output:
X Y 0 1 4 1 2 5 2 3 6
Exercise 48:
Filter rows based on whether a column value is in a list.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
filtered_df = df[df['X'].isin([2, 3])]
print(filtered_df)
Output:
X Y 1 2 6 2 3 7
Exercise 49:
Calculate the z-score of values in a DataFrame.
Solution:
import pandas as pd
import numpy as np
data = {'X': [1, 2, 3, 4], 'Y': [4, 5, 6, 7]}
df = pd.DataFrame(data)
df['zscore_A'] = (df['X'] - np.mean(df['X'])) / np.std(df['X'])
print(df)
Output:
X Y zscore_A 0 1 4 -1.341641 1 2 5 -0.447214 2 3 6 0.447214 3 4 7 1.341641
Exercise 50:
Create a DataFrame with random integers and calculate descriptive statistics.
Solution:
import pandas as pd
import numpy as np
data = np.random.randint(1, 100, size=(5, 3))
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.describe())
Output:
X Y Z count 5.000000 5.000000 5.000000 mean 60.600000 71.800000 42.600000 std 38.435661 13.971399 12.218838 min 5.000000 53.000000 28.000000 25% 40.000000 64.000000 34.000000 50% 69.000000 72.000000 41.000000 75% 91.000000 82.000000 55.000000 max 98.000000 88.000000 55.000000
Exercise 51:
Calculate the rank of values in each column of a DataFrame.
Solution:
import pandas as pd
data = {'X': [3, 1, 4, 1], 'Y': [2, 3, 1, 4]}
df = pd.DataFrame(data)
df['Rank_A'] = df['X'].rank()
df['Rank_B'] = df['Y'].rank()
print(df)
Output:
X Y Rank_A Rank_B 0 3 2 3.0 2.0 1 1 3 1.5 3.0 2 4 1 4.0 1.0 3 1 4 1.5 4.0
Exercise 52:
Filter rows based on multiple string conditions.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'baz', 'qux']}
df = pd.DataFrame(data)
filtered_df = df[df['X'].str.contains('ba|qu')]
print(filtered_df)
Output:
X 1 bar 2 baz 3 qux
Exercise 53:
Create a DataFrame with random values and calculate the skewness.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'baz', 'qux']}
df = pd.DataFrame(data)
filtered_df = df[df['X'].str.contains('ba|qu')]
print(filtered_df)
Output:
X 1 bar 2 baz 3 qux
Exercise 54:
Create a DataFrame and calculate the kurtosis.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.kurt())
Output:
X 2.958407 Y -2.639654 Z 2.704430 dtype: float64
Exercise 55:
Calculate the cumulative product of a column in a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Cumulative_Product'] = df['X'].cumprod()
print(df)
Output:
X Cumulative_Product 0 1 1 1 2 2 2 3 6 3 4 24
Exercise 56:
Create a DataFrame and calculate the rolling standard deviation.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Rolling_Std'] = df['X'].rolling(window=3).std()
print(df)
Output:
X Rolling_Std 0 1 NaN 1 2 NaN 2 3 1.0 3 4 1.0 4 5 1.0 5 6 1.0
Exercise 57:
Create a DataFrame and calculate the expanding mean.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Expanding_Mean'] = df['X'].expanding().mean()
print(df)
Output:
X Expanding_Mean 0 1 1.0 1 2 1.5 2 3 2.0 3 4 2.5 4 5 3.0 5 6 3.5
Exercise 58:
Create a DataFrame with random values and calculate the covariance matrix.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.cov())
Output:
X Y Z X 0.054079 0.007398 -0.031403 Y 0.007398 0.053211 -0.020480 Z -0.031403 -0.020480 0.048057
Exercise 59:
Create a DataFrame with random values and calculate the correlation matrix.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.corr())
Output:
X Y Z X 1.000000 -0.258187 0.541044 Y -0.258187 1.000000 -0.432419 Z 0.541044 -0.432419 1.000000
Exercise 60:
Create a DataFrame and calculate the rolling correlation between two columns.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6], 'Y': [6, 5, 4, 3, 2, 1]}
df = pd.DataFrame(data)
df['Rolling_Corr'] = df['X'].rolling(window=3).corr(df['Y'])
print(df)
Output:
X Y Rolling_Corr 0 1 6 NaN 1 2 5 NaN 2 3 4 -1.0 3 4 3 -1.0 4 5 2 -1.0 5 6 1 -1.0
Exercise 61:
Create a DataFrame and calculate the expanding variance.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Expanding_Var'] = df['X'].expanding().var()
print(df)
Output:
X Expanding_Var 0 1 NaN 1 2 0.500000 2 3 1.000000 3 4 1.666667 4 5 2.500000 5 6 3.500000
Exercise 62:
Create a DataFrame with datetime index and resample by month.
Solution:
import pandas as pd
date_range = pd.date_range(start='1/1/2020', periods=100, freq='D')
data = {'X': range(100)}
df = pd.DataFrame(data, index=date_range)
monthly_df = df.resample('M').sum()
print(monthly_df)
Output:
X 2020-01-31 465 2020-02-29 1305 2020-03-31 2325 2020-04-30 855
Exercise 63:
Create a DataFrame and calculate the exponential moving average.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['EMA'] = df['X'].ewm(span=3, adjust=False).mean()
print(df)
Output:
X EMA 0 1 1.00000 1 2 1.50000 2 3 2.25000 3 4 3.12500 4 5 4.06250 5 6 5.03125
Exercise 64:
Create a DataFrame with random integers and calculate the mode.
Solution:
import pandas as pd
import numpy as np
data = np.random.randint(1, 10, size=(5, 3))
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.mode())
Output:
X Y Z 0 2 1.0 2.0 1 3 3.0 7.0 2 5 NaN NaN 3 6 NaN NaN 4 9 NaN NaN
Exercise 65:
Create a DataFrame and calculate the z-score of each column.
Solution:
import pandas as pd
import numpy as np
data = {'X': [1, 2, 3, 4], 'Y': [4, 5, 6, 7]}
df = pd.DataFrame(data)
df['zscore_A'] = (df['X'] - np.mean(df['X'])) / np.std(df['X'])
df['zscore_B'] = (df['Y'] - np.mean(df['Y'])) / np.std(df['Y'])
print(df)
Output:
X Y zscore_A zscore_B 0 1 4 -1.341641 -1.341641 1 2 5 -0.447214 -0.447214 2 3 6 0.447214 0.447214 3 4 7 1.341641 1.341641
Exercise 66:
Create a DataFrame with random values and calculate the median.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.median())
Output:
X 0.787042 Y 0.477837 Z 0.696911 dtype: float64
Exercise 67:
Create a DataFrame and apply a custom function to each column.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df = df.apply(lambda x: x + 1)
print(df)
Output:
X Y 0 2 5 1 3 6 2 4 7
Exercise 68:
Create a DataFrame with hierarchical index and calculate the mean for each group.
Solution:
import pandas as pd
arrays = [['X', 'X', 'Y', 'Y'], [1, 2, 1, 2]]
index = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Number'))
data = {'Value': [10, 20, 30, 40]}
df = pd.DataFrame(data, index=index)
grouped_df = df.groupby('Group').mean()
print(grouped_df)
Output:
Value Group X 15.0 Y 35.0
Exercise 69:
Create a DataFrame and calculate the percentage of missing values in each column.
Solution:
import pandas as pd
data = {'X': [1, 2, None, 4], 'Y': [4, None, 6, 8]}
df = pd.DataFrame(data)
missing_percentage = df.isnull().mean() * 100
print(missing_percentage)
Output:
X 25.0 Y 25.0 dtype: float64
Exercise 70:
Create a DataFrame and apply a custom function to each row.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df['Sum'] = df.apply(lambda row: row['X'] + row['Y'], axis=1)
print(df)
Output:
X Y Sum 0 1 4 5 1 2 5 7 2 3 6 9
Exercise 71:
Create a DataFrame with random values and calculate the quantiles.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.quantile([0.25, 0.5, 0.75]))
Output:
X Y Z 0.25 0.174265 0.184036 0.520573 0.50 0.468040 0.315593 0.644571 0.75 0.767870 0.436426 0.771297
Exercise 72:
Create a DataFrame and calculate the interquartile range (IQR).
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
Q1 = df.quantile(0.25)
Q3 = df.quantile(0.75)
IQR = Q3 - Q1
print(IQR)
Output:
X 0.354244 Y 0.329573 Z 0.245520 dtype: float64
Exercise 73:
Create a DataFrame with datetime index and calculate the rolling mean.
Solution:
import pandas as pd
date_range = pd.date_range(start='1/1/2020', periods=10, freq='D')
data = {'X': range(10)}
df = pd.DataFrame(data, index=date_range)
df['Rolling_Mean'] = df['X'].rolling(window=3).mean()
print(df)
Output:
X Rolling_Mean 2020-01-01 0 NaN 2020-01-02 1 NaN 2020-01-03 2 1.0 2020-01-04 3 2.0 2020-01-05 4 3.0 2020-01-06 5 4.0 2020-01-07 6 5.0 2020-01-08 7 6.0 2020-01-09 8 7.0 2020-01-10 9 8.0
Exercise 74:
Create a DataFrame and calculate the cumulative maximum.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 2, 1]}
df = pd.DataFrame(data)
df['Cumulative_Max'] = df['X'].cummax()
print(df)
Output:
X Cumulative_Max 0 1 1 1 2 2 2 3 3 3 2 3 4 1 3
Exercise 75:
Create a DataFrame and calculate the cumulative minimum.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 2, 1]}
df = pd.DataFrame(data)
df['Cumulative_Min'] = df['X'].cummin()
print(df)
Output:
X Cumulative_Min 0 1 1 1 2 1 2 3 1 3 2 1 4 1 1
Exercise 76:
Create a DataFrame with random values and calculate the cumulative variance.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Cumulative_Var'] = df['X'].expanding().var()
print(df)
Output:
X Y Z Cumulative_Var 0 0.315669 0.900791 0.404858 NaN 1 0.462000 0.463257 0.922495 0.010706 2 0.328968 0.200027 0.967625 0.006548 3 0.630370 0.992849 0.231884 0.021460 4 0.574397 0.968600 0.926893 0.020023 5 0.204077 0.889864 0.589022 0.027130 6 0.386806 0.630882 0.242157 0.022759 7 0.319831 0.935747 0.829739 0.020630 8 0.786435 0.377739 0.879458 0.034407 9 0.523467 0.077937 0.764476 0.031194
Exercise 77:
Create a DataFrame and apply a custom function to each element.
Solution:
import pandas as pd
# Create a DataFrame
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
# Define the custom function
def custom_function(x):
return x * 2
# Apply the function to each element using map on each column
df = df.apply(lambda col: col.map(custom_function))
# Print the DataFrame
print(df)
Output:
X Y 0 2 8 1 4 10 2 6 12
Exercise 78:
Create a DataFrame with random values and calculate the z-score for each element.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df = df.apply(lambda x: (x - x.mean()) / x.std(), axis=0)
print(df)
Output:
X Y Z 0 1.027393 0.656858 1.032853 1 0.674079 -1.277904 -0.220065 2 -0.996641 -0.298841 0.475217 3 -0.704831 0.919887 -1.288005
Exercise 79:
Create a DataFrame and calculate the cumulative sum for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Cumulative_Sum'] = df.groupby('X')['Y'].cumsum()
print(df)
Output:
X Y Cumulative_Sum 0 foo 1 1 1 bar 2 2 2 foo 3 4 3 bar 4 6
Exercise 80:
Create a DataFrame with random values and calculate the rank for each element.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df = df.rank()
print(df)
Output:
X Y Z 0 4.0 3.0 3.0 1 3.0 2.0 2.0 2 1.0 4.0 1.0 3 2.0 1.0 4.0
Exercise 81:
Create a DataFrame and calculate the cumulative product for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Cumulative_Product'] = df.groupby('X')['Y'].cumprod()
print(df)
Output:
X Y Cumulative_Product 0 foo 1 1 1 bar 2 2 2 foo 3 3 3 bar 4 8
Exercise 82:
Create a DataFrame with random values and calculate the expanding sum.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Expanding_Sum'] = df['X'].expanding().sum()
print(df)
Output:
X Y Z Expanding_Sum 0 0.815750 0.062819 0.699743 0.815750 1 0.128772 0.843222 0.411903 0.944522 2 0.857516 0.219424 0.234460 1.802038 3 0.011010 0.774375 0.259412 1.813048
Exercise 83:
Create a DataFrame and calculate the expanding minimum for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Expanding_Min'] = df.groupby('X')['Y'].expanding().min().reset_index(level=0, drop=True)
print(df)
Output:
X Y Expanding_Min 0 foo 1 1.0 1 bar 2 2.0 2 foo 3 1.0 3 bar 4 2.0
Exercise 84:
Create a DataFrame with random values and calculate the expanding maximum for each group.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Expanding_Max'] = df.groupby('X')['Y'].expanding().max().reset_index(level=0, drop=True)
print(df)
Output:
X Y Z Expanding_Max 0 0.751392 0.015856 0.313990 0.015856 1 0.812436 0.701808 0.069307 0.701808 2 0.148614 0.838726 0.290646 0.838726 3 0.764419 0.586510 0.470466 0.586510
Exercise 85:
Create a DataFrame and calculate the expanding variance for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Expanding_Var'] = df.groupby('X')['Y'].expanding().var().reset_index(level=0, drop=True)
print(df)
Output:
X Y Expanding_Var 0 foo 1 NaN 1 bar 2 NaN 2 foo 3 2.0 3 bar 4 2.0
Exercise 86:
Create a DataFrame with random values and calculate the expanding standard deviation.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Expanding_Std'] = df['X'].expanding().std()
print(df)
Output:
X Y Z Expanding_Std 0 0.693184 0.088273 0.109510 NaN 1 0.031186 0.163005 0.803467 0.468103 2 0.294881 0.409395 0.278145 0.333272 3 0.918778 0.854961 0.791329 0.397322
Exercise 87:
Create a DataFrame and calculate the expanding covariance.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [4, 3, 2, 1]}
df = pd.DataFrame(data)
df['Expanding_Cov'] = df['X'].expanding().cov(df['Y'])
print(df)
Output:
X Y Expanding_Cov 0 1 4 NaN 1 2 3 -0.500000 2 3 2 -1.000000 3 4 1 -1.666667
Exercise 88:
Create a DataFrame with random values and calculate the expanding correlation.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Expanding_Corr'] = df['X'].expanding().corr(df['Y'])
print(df)
Output:
X Y Z Expanding_Corr 0 0.094026 0.320246 0.044218 NaN 1 0.422531 0.002172 0.995907 -1.000000 2 0.265459 0.391239 0.589878 -0.751147 3 0.118812 0.061489 0.837821 -0.372750
Exercise 89:
Create a DataFrame and calculate the expanding median.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Expanding_Median'] = df['X'].expanding().median()
print(df)
Output:
X Expanding_Median 0 1 1.0 1 2 1.5 2 3 2.0 3 4 2.5 4 5 3.0 5 6 3.5
Exercise 90:
Create a DataFrame with datetime index and calculate the expanding mean for each group.
Solution:
import pandas as pd
date_range = pd.date_range(start='1/1/2020', periods=10, freq='D')
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data, index=date_range)
df['Expanding_Mean'] = df.groupby('X')['Y'].expanding().mean().reset_index(level=0, drop=True)
print(df)
Output:
X Y Expanding_Mean 2020-01-01 foo 0 0.0 2020-01-02 bar 1 1.0 2020-01-03 foo 2 1.0 2020-01-04 bar 3 2.0 2020-01-05 foo 4 2.0 2020-01-06 bar 5 3.0 2020-01-07 foo 6 3.0 2020-01-08 bar 7 4.0 2020-01-09 foo 8 4.0 2020-01-10 bar 9 5.0
Exercise 91:
Create a DataFrame with random values and calculate the rolling sum for each group.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Rolling_Sum'] = df.groupby('X')['Y'].rolling(window=3).sum().reset_index(level=0, drop=True)
print(df)
Output:
X Y Z Rolling_Sum 0 0.342706 0.579330 0.902681 NaN 1 0.182432 0.163406 0.156607 NaN 2 0.983085 0.052785 0.588865 NaN 3 0.756982 0.123991 0.704262 NaN 4 0.876875 0.710953 0.923588 NaN 5 0.359818 0.135520 0.277327 NaN 6 0.693156 0.590918 0.985834 NaN 7 0.892253 0.633529 0.169000 NaN 8 0.084238 0.007579 0.076730 NaN 9 0.663869 0.780832 0.644874 NaN
Exercise 92:
Create a DataFrame and calculate the rolling mean for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data)
df['Rolling_Mean'] = df.groupby('X')['Y'].rolling(window=3).mean().reset_index(level=0, drop=True)
print(df)
Output:
X Y Rolling_Mean 0 foo 0 NaN 1 bar 1 NaN 2 foo 2 NaN 3 bar 3 NaN 4 foo 4 2.0 5 bar 5 3.0 6 foo 6 4.0 7 bar 7 5.0 8 foo 8 6.0 9 bar 9 7.0
Exercise 93:
Create a DataFrame with random values and calculate the rolling standard deviation for each group.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Rolling_Std'] = df.groupby('X')['Y'].rolling(window=3).std().reset_index(level=0, drop=True)
print(df)
Output:
X Y Z Rolling_Std 0 0.154838 0.162793 0.808882 NaN 1 0.740167 0.920318 0.650240 NaN 2 0.033449 0.007883 0.249656 NaN 3 0.983601 0.261995 0.399816 NaN 4 0.883155 0.051084 0.125735 NaN 5 0.986930 0.470328 0.612276 NaN 6 0.981338 0.016731 0.627210 NaN 7 0.670522 0.247346 0.530971 NaN 8 0.978909 0.752500 0.903401 NaN 9 0.185614 0.362602 0.541459 NaN
Exercise 94:
Create a DataFrame and calculate the rolling variance for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data)
df['Rolling_Var'] = df.groupby('X')['Y'].rolling(window=3).var().reset_index(level=0, drop=True)
print(df)
Output:
X Y Rolling_Var 0 foo 0 NaN 1 bar 1 NaN 2 foo 2 NaN 3 bar 3 NaN 4 foo 4 4.0 5 bar 5 4.0 6 foo 6 4.0 7 bar 7 4.0 8 foo 8 4.0 9 bar 9 4.0
Exercise 95:
Create a DataFrame with random values and calculate the rolling correlation for each group.
Solution:
import pandas as pd
import numpy as np
# Create a DataFrame with random values
np.random.seed(42) # For reproducibility
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
# Optionally create a group column if necessary
df['Group'] = np.random.choice(['A', 'B'], size=10)
# Calculate the rolling correlation for each group
df['Rolling_Corr'] = df.groupby('Group').apply(lambda group: group['Y'].rolling(window=3).corr(group['Z'])).reset_index(level=0, drop=True)
print(df)
Output:
X Z Group Rolling_Corr 0 0.374540 0.950714 0.731994 A NaN 1 0.598658 0.156019 0.155995 A NaN 2 0.058084 0.866176 0.601115 A 0.992633 3 0.708073 0.020584 0.969910 A -0.095420 4 0.832443 0.212339 0.181825 A -0.180021 5 0.183405 0.304242 0.524756 B NaN 6 0.431945 0.291229 0.611853 B NaN 7 0.139494 0.292145 0.366362 A -0.869948 8 0.456070 0.785176 0.199674 B -0.984073 9 0.514234 0.592415 0.046450 B -0.788379
Exercise 96:
Create a DataFrame and calculate the rolling covariance for each group.
Solution:
import pandas as pd
# Create a DataFrame with sample data
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'],
'Y': range(10), 'Z': range(10, 20)}
df = pd.DataFrame(data)
# Calculate the rolling covariance for each group
rolling_cov = df.groupby('X').apply(lambda group: group['Y'].rolling(window=3).cov(group['Z'])).reset_index(level=0, drop=True)
# Add the rolling covariance to the original DataFrame
df['Rolling_Cov'] = rolling_cov
print(df)
Output:
X Y Z Rolling_Cov 0 foo 0 10 NaN 1 bar 1 11 NaN 2 foo 2 12 NaN 3 bar 3 13 NaN 4 foo 4 14 4.0 5 bar 5 15 4.0 6 foo 6 16 4.0 7 bar 7 17 4.0 8 foo 8 18 4.0 9 bar 9 19 4.0
Exercise 97:
Create a DataFrame with random values and calculate the rolling skewness for each group.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Rolling_Skew'] = df.groupby('X')['Y'].rolling(window=3).skew().reset_index(level=0, drop=True)
print(df)
Output:
X Y Z Rolling_Skew 0 0.808397 0.304614 0.097672 NaN 1 0.684233 0.440152 0.122038 NaN 2 0.495177 0.034389 0.909320 NaN 3 0.258780 0.662522 0.311711 NaN 4 0.520068 0.546710 0.184854 NaN 5 0.969585 0.775133 0.939499 NaN 6 0.894827 0.597900 0.921874 NaN 7 0.088493 0.195983 0.045227 NaN 8 0.325330 0.388677 0.271349 NaN 9 0.828738 0.356753 0.280935 NaN
Exercise 98:
Create a DataFrame and calculate the rolling kurtosis for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data)
df['Rolling_Kurt'] = df.groupby('X')['Y'].rolling(window=3).kurt().reset_index(level=0, drop=True)
print(df)
Output:
X Y Rolling_Kurt 0 foo 0 NaN 1 bar 1 NaN 2 foo 2 NaN 3 bar 3 NaN 4 foo 4 NaN 5 bar 5 NaN 6 foo 6 NaN 7 bar 7 NaN 8 foo 8 NaN 9 bar 9 NaN
Exercise 99:
Create a DataFrame with random values and calculate the rolling median for each group.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Rolling_Median'] = df.groupby('X')['Y'].rolling(window=3).median().reset_index(level=0, drop=True)
print(df)
Output:
X Y Z Rolling_Median 0 0.542696 0.140924 0.802197 NaN 1 0.074551 0.986887 0.772245 NaN 2 0.198716 0.005522 0.815461 NaN 3 0.706857 0.729007 0.771270 NaN 4 0.074045 0.358466 0.115869 NaN 5 0.863103 0.623298 0.330898 NaN 6 0.063558 0.310982 0.325183 NaN 7 0.729606 0.637557 0.887213 NaN 8 0.472215 0.119594 0.713245 NaN 9 0.760785 0.561277 0.770967 NaN
Exercise 100:
Create a DataFrame and calculate the expanding sum for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data)
df['Expanding_Sum'] = df.groupby('X')['Y'].expanding().sum().reset_index(level=0, drop=True)
print(df)
Output:
X Y Expanding_Sum 0 foo 0 0.0 1 bar 1 1.0 2 foo 2 2.0 3 bar 3 4.0 4 foo 4 6.0 5 bar 5 9.0 6 foo 6 12.0 7 bar 7 16.0 8 foo 8 20.0 9 bar 9 25.0
Python-Pandas Code Editor:
More to Come !
Do not submit any solution of the above exercises at here, if you want to contribute go to the appropriate exercise page.
Test your Python skills with w3resource's quiz
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics