Outliers can significantly skew statistical analysis and machine learning model performance. This guide covers every practical method to detect, visualize, and handle outliers in Python — from IQR and Z-Score to Isolation Forest — with runnable code at each step.
Missing values are inevitable in real-world datasets. This guide covers proven methods to handle missing data in pandas without compromising data integrity or analytical accuracy.
TL;DR
Use df.isnull().sum() to audit missing values before doing anything. Drop rows/columns only when missingness is random and < 5% of data. Fill with mean/median for numerical columns with low missingness. Forward/backward fill for time series; interpolation for smooth numerical sequences. Never fill categoricals with mean — use mode or a dedicated “Unknown” category. What Are Missing Values in Pandas # Missing values in pandas are represented as NaN (Not a Number), None, or NaT (Not a Time) for datetime objects. These occur due to:
NumPy’s reshape() and flatten() are both used for array manipulation, but they serve different purposes and have distinct behaviors. This guide explains when and how to use each method effectively.
TL;DR
reshape() returns a view (no copy) when possible — memory-efficient, changes affect original. flatten() always returns a copy — safe to modify independently. Use ravel() instead of flatten() when you want a view (like reshape(-1)) to save memory. Use reshape(-1) to flatten without copying; use flatten() only when you need an independent 1D copy. What is reshape() in NumPy # The reshape() method changes the shape of an array without changing its data. It returns a new view of the array with a different shape when possible.