Skip to main content
  1. Tags/

Pandas

Detect and Remove Outliers in Python: IQR and Z-Score

·1925 words·10 mins
Outliers can significantly skew statistical analysis and machine learning model performance. This guide covers every practical method to detect, visualize, and handle outliers in Python — from IQR and Z-Score to Isolation Forest — with runnable code at each step.

Handle Missing Values in Pandas Without Losing Information

·1090 words·6 mins
Missing values are inevitable in real-world datasets. This guide covers proven methods to handle missing data in pandas without compromising data integrity or analytical accuracy. TL;DR Use df.isnull().sum() to audit missing values before doing anything. Drop rows/columns only when missingness is random and < 5% of data. Fill with mean/median for numerical columns with low missingness. Forward/backward fill for time series; interpolation for smooth numerical sequences. Never fill categoricals with mean — use mode or a dedicated “Unknown” category. What Are Missing Values in Pandas # Missing values in pandas are represented as NaN (Not a Number), None, or NaT (Not a Time) for datetime objects. These occur due to: