Pandas Cheat Sheet
Importing Pandas
import pandas as pd
Data Structures
1) Series: 1D labeled array.
s = pd.Series([1, 3, 5, np.nan, 6, 8])
2) DataFrame: 2D labeled data structure with columns.
df = pd.DataFrame({
'A': [1, 2, 3],
'B': ['a', 'b', 'c']
})
Reading Data
1) CSV: Read from a CSV file.
df = pd.read_csv('data.csv')
2) Excel: Read from an Excel file.
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')
Data Exploration
1) Basic Info: Get an overview of the DataFrame.
df.info()
2) Summary Statistics: Calculate statistics for numerical columns.
df.describe()
3) Column Names: Get column names.
df.columns
Data Selection
1) Select Columns: Access columns by name.
df['Column_Name']
2) Select Rows: Access rows by index.
df.iloc[index]
3) Filtering: Filter rows based on conditions.
df[df['Column'] > 5]
Data Manipulation
1) Adding Columns: Create a new column.
df['New_Column'] = df['A'] + df['B']
2) Renaming Columns: Rename one or more columns.
df.rename(columns={'Old_Name': 'New_Name'}, inplace=True)
3) Dropping Columns/Rows: Remove columns or rows.
df.drop('Column_Name', axis=1, inplace=True) # Drop column
df.drop(0, axis=0, inplace=True) # Drop row by index
4) Sorting: Sort by one or more columns.
df.sort_values(by='Column_Name', ascending=False)
5) Grouping and Aggregation: Group data and apply functions.
df.groupby('Column_Name').agg({'A': 'mean', 'B': 'count'})
6) Missing Data: Handle missing data.
df.dropna() # Drop rows with missing values
df.fillna(value) # Fill missing values with a specific value
7) Merging and Joining: Combine DataFrames.
merged_df = pd.concat([df1, df2], axis=0) # Concatenate vertically
merged_df = pd.merge(df1, df2, on='Key_Column', how='inner') # Merge
Data Visualization
1) Plotting: Create basic plots.
df.plot(x='Column1', y='Column2', kind='scatter')
2) Matplotlib Integration: Customize plots using Matplotlib.
import matplotlib.pyplot as plt
df['Column'].plot.hist(bins=10)
plt.show()
3) Seaborn Integration: Use Seaborn for more advanced plots.
import seaborn as sns
sns.boxplot(x='Column1', y='Column2', data=df)