Pandas Cheat Sheet

Importing Pandas

import pandas as pd

Data Structures

1) Series: 1D labeled array.

s = pd.Series([1, 3, 5, np.nan, 6, 8])

2) DataFrame: 2D labeled data structure with columns.

df = pd.DataFrame({
   'A': [1, 2, 3],
   'B': ['a', 'b', 'c']
})

Reading Data

1) CSV: Read from a CSV file.

df = pd.read_csv('data.csv')

2) Excel: Read from an Excel file.

df = pd.read_excel('data.xlsx', sheet_name='Sheet1')

Data Exploration

1) Basic Info: Get an overview of the DataFrame.

df.info()

2) Summary Statistics: Calculate statistics for numerical columns.

df.describe()

3) Column Names: Get column names.

df.columns

Data Selection

1) Select Columns: Access columns by name.

df['Column_Name']

2) Select Rows: Access rows by index.

df.iloc[index]

3) Filtering: Filter rows based on conditions.

df[df['Column'] > 5]

Data Manipulation

1) Adding Columns: Create a new column.

df['New_Column'] = df['A'] + df['B']

2) Renaming Columns: Rename one or more columns.

df.rename(columns={'Old_Name': 'New_Name'}, inplace=True)

3) Dropping Columns/Rows: Remove columns or rows.

df.drop('Column_Name', axis=1, inplace=True) # Drop column
df.drop(0, axis=0, inplace=True)             # Drop row by index

4) Sorting: Sort by one or more columns.

df.sort_values(by='Column_Name', ascending=False)

5) Grouping and Aggregation: Group data and apply functions.

df.groupby('Column_Name').agg({'A': 'mean', 'B': 'count'})

6) Missing Data: Handle missing data.

df.dropna()          # Drop rows with missing values
df.fillna(value)     # Fill missing values with a specific value

7) Merging and Joining: Combine DataFrames.

merged_df = pd.concat([df1, df2], axis=0) # Concatenate vertically
merged_df = pd.merge(df1, df2, on='Key_Column', how='inner') # Merge

Data Visualization

1) Plotting: Create basic plots.

df.plot(x='Column1', y='Column2', kind='scatter')

2) Matplotlib Integration: Customize plots using Matplotlib.

import matplotlib.pyplot as plt
df['Column'].plot.hist(bins=10)
plt.show()

3) Seaborn Integration: Use Seaborn for more advanced plots.

import seaborn as sns
sns.boxplot(x='Column1', y='Column2', data=df)