Data Science

Detect and Remove Outliers in Python: IQR and Z-Score

30 September 2025·1925 words·10 mins

Outliers can significantly skew statistical analysis and machine learning model performance. This guide covers every practical method to detect, visualize, and handle outliers in Python — from IQR and Z-Score to Isolation Forest — with runnable code at each step.

Handle Missing Values in Pandas Without Losing Information

17 September 2025·1090 words·6 mins

Data Science

Missing values are inevitable in real-world datasets. This guide covers proven methods to handle missing data in pandas without compromising data integrity or analytical accuracy. TL;DR Use df.isnull().sum() to audit missing values before doing anything. Drop rows/columns only when missingness is random and < 5% of data. Fill with mean/median for numerical columns with low missingness. Forward/backward fill for time series; interpolation for smooth numerical sequences. Never fill categoricals with mean — use mode or a dedicated “Unknown” category. What Are Missing Values in Pandas # Missing values in pandas are represented as NaN (Not a Number), None, or NaT (Not a Time) for datetime objects. These occur due to:

Difference between reshape() and flatten() in NumPy

25 July 2025·1442 words·7 mins

Data Science

NumPy’s reshape() and flatten() are both used for array manipulation, but they serve different purposes and have distinct behaviors. This guide explains when and how to use each method effectively. TL;DR reshape() returns a view (no copy) when possible — memory-efficient, changes affect original. flatten() always returns a copy — safe to modify independently. Use ravel() instead of flatten() when you want a view (like reshape(-1)) to save memory. Use reshape(-1) to flatten without copying; use flatten() only when you need an independent 1D copy. What is reshape() in NumPy # The reshape() method changes the shape of an array without changing its data. It returns a new view of the array with a different shape when possible.