data-preprocessing-pipelines

Here are 13 public repositories matching this topic...

data-prep-kit / data-prep-kit

Open source project for data preparation for GenAI applications

python data spark malware code-quality data-preprocessing ray data-preparation deduplication data-prep finetuning data-preprocessing-pipelines datacuration large-language-models llm llmapps large-scale-data-processing datarecipes

Updated May 15, 2026
HTML

preprocessy / preprocessy

Star

Python package for Customizable Data Preprocessing Pipelines

data-science machine-learning python-library pipelines data-engineering preprocessing under-construction hacktoberfest data-preprocessing-pipelines hacktoberfest2022

Updated Apr 6, 2026
Jupyter Notebook

shamspias / gpt3-data-preprocessing

Star

This repository containing code for preprocessing text data from PDF and DOCX files for use with GPT-3. It includes steps such as tokenization, removal of stop words and punctuation, and formatting for GPT-3 input.

data-science machine-learning artificial-intelligence data-preprocessing gpt-3 data-preprocessing-pipelines

Updated Jan 29, 2023
Python

bohyy / Video-Processing-Pipeline

Star

Video quality assessment and filtering pipeline for ML training data. Automatically handles format conversion, scene segmentation, face detection, text detection, and audio-video sync checking. Supports 127 concurrent processes with checkpoint recovery

opencv machine-learning ffmpeg pipelines video-processing face-detection video-streaming training-data training-project data-preprocessing-pipelines

Updated Feb 12, 2026
Python

firefly-cpp / succulent

Star

A lightweight framework for collecting and processing data from HTTP POST requests

raspberry-pi data-science machine-learning esp32 data-collection data-preprocessing-pipelines

Updated May 14, 2026
Python

DigitalLifeYZQiu / Data-Process-Library

Star

The data process library to help better industrial data understanding.

data-preprocessing-pipelines data-understanding

Updated Jun 25, 2025
Jupyter Notebook

vuanhngo14 / Decision-Tree-from-Scratch

Star

Understand and Implement decision tree

data-visualization data-preprocessing decision-tree data-preprocessing-pipelines decision-tree-from-scratch

Updated Feb 3, 2024
Jupyter Notebook

kolhesamiksha / Nemo_Curator

Star

This repository contains a sample text data-preparation code using Nemo Curator for pre-training or synthetic data generation

nvidia nemo curator synthetic-dataset-generation data-preprocessing-pipelines generative-ai finetuning-llms

Updated Dec 25, 2024
Jupyter Notebook

amadou-6e / pymimic3

Star

Pymimic3 is a scalable experimentation platform for MIMIC-III, featuring ready-to-run models, fully tested utilities for concept drift research, and a parallelized, configurable data pipeline.

machine-learning neural-networks data-preprocessing mimic-iii parallel-processing concept-drift machine-learning-models medical-data machine-learning-projects medical-dataset neural-networks-and-deep-learning clinical-informatics medical-datasets healthcare-ai data-preprocessing-pipelines medical-data-analysis concept-drift-detection data-preprocessing-and-augmentation

Updated Oct 30, 2024
Jupyter Notebook

SaraLittleSquirrel / Obesity-estimator

Star

Project for Machine Learning Data Mining course

machine-learning data-mining random-forest numpy rbf-kernel sklearn pandas adaboost support-vector-machines decision-tree voting-classifier polynomial-kernel linear-kernel extra-trees k-neighbors data-preprocessing-pipelines

Updated Nov 4, 2023
Jupyter Notebook

gobind-works / facial-emotion-recognition-deeplearning

Star

Comparative study of CNN and SVM models for facial emotion recognition on CK+ (CNN: 96%, SVM: 97%) and RAF-DB (CNN: 85%, SVM: 77%) datasets. Full data preprocessing pipeline in Python. Published in Springer 2024.

python computer-vision deep-learning numpy keras cnn matplotlib decision-tree-classifier svm-classifier tenserflow data-preprocessing-pipelines sciket-learn

Updated May 4, 2026
Jupyter Notebook

PrasunDatta / adorsho-praniSheba_Preprocessing-Pipeline-of-Muzzle-Data-of-Cow

Star

This work highlights my contribution as a "ML Engineer" at "adorsho praniSheb"(an ML based agro farming company of Bangladesh) where I was assigned the task of designing the preprocessing pipeline.

jupyter-notebook python-script image-preprocessing data-preprocessing-pipelines

Updated Nov 13, 2022
Jupyter Notebook

MustofAhmed41 / Data-Preprocessing-using-Distributed-Database

Star

Machine learning models cannot be directly applied to raw data. This desktop application consists of a central server and two client servers. The main servers send raw data to clients, where the data is preprocessed and prepared to be fed to the machine learning model.

machine-learning database plsql distributed-database data-preprocessing-pipelines

Updated Oct 24, 2022

Improve this page

Add a description, image, and links to the data-preprocessing-pipelines topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-preprocessing-pipelines topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-preprocessing-pipelines

Here are 13 public repositories matching this topic...

data-prep-kit / data-prep-kit

preprocessy / preprocessy

shamspias / gpt3-data-preprocessing

bohyy / Video-Processing-Pipeline

firefly-cpp / succulent

DigitalLifeYZQiu / Data-Process-Library

vuanhngo14 / Decision-Tree-from-Scratch

kolhesamiksha / Nemo_Curator

amadou-6e / pymimic3

SaraLittleSquirrel / Obesity-estimator

gobind-works / facial-emotion-recognition-deeplearning

PrasunDatta / adorsho-praniSheba_Preprocessing-Pipeline-of-Muzzle-Data-of-Cow

MustofAhmed41 / Data-Preprocessing-using-Distributed-Database

Improve this page

Add this topic to your repo