Pandas

Missing values are inevitable in real-world datasets. This guide covers proven methods to handle missing data in pandas without compromising data integrity or analytical accuracy. TL;DR Use df.isnull().sum() to audit missing values before doing anything. Drop rows/columns only when missingness is random and < 5% of data. Fill with mean/median for numerical columns with low missingness. Forward/backward fill for time series; interpolation for smooth numerical sequences. Never fill categoricals with mean — use mode or a dedicated “Unknown” category. What Are Missing Values in Pandas # Missing values in pandas are represented as NaN (Not a Number), None, or NaT (Not a Time) for datetime objects. These occur due to:

Pandas

Detect and Remove Outliers in Python: IQR and Z-Score

Handle Missing Values in Pandas Without Losing Information