This project focuses on cleaning and preprocessing raw data to improve its quality and usability. We applied Pandas and NumPy functions to handle missing values, duplicates, and formatting issues.
- Name: Customer Sales Data
- Issues Found:
- Missing values in key columns.
- Inconsistent date formats.
- Duplicate records.
- Irregular capitalization in text fields.
- Handling Missing Data: Used
.fillna(),.dropna(), and.replace()functions. - Fixing Duplicates: Removed duplicate records.
- Standardizing Formats: Converted dates to a common format (
YYYY-MM-DD). - Normalizing Text Data: Corrected case inconsistencies in categorical fields.
- Outlier Detection: Identified and removed outliers affecting sales trends.