🛡️ End-to-End MLOps Project: Network Security Threat Detection

Built with industry-grade practices: ML pipeline automation, containerization, CI/CD deployment, cloud-native architecture.

🚀 Overview

This project presents a production-ready Machine Learning pipeline developed around a network security dataset. While the dataset is domain-specific, the true focus was to design a modular, real-world MLOps system that could seamlessly integrate into any enterprise stack.

The pipeline was built with automation, traceability, and deployment scalability as primary goals. Everything — from data ingestion to cloud deployment — has been containerized, tracked, and versioned.

🎯 Project Objectives

✅ Build a reusable, extensible ML pipeline
✅ Automate data ingestion, transformation, training, evaluation, and deployment
✅ Implement MLflow for experiment tracking and model registry
✅ Use Docker + AWS ECR/EC2 for containerized deployment
✅ Enable CI/CD via GitHub Actions to simulate a production rollout

🔄 Workflow Architecture

📊 ML Pipeline: Ingestion to Inference

flowchart LR
    A[MongoDB] --> B[Data Ingestion Component]
    B --> C[Data Validation]
    C --> D[Data Transformation]
    D --> E[Model Trainer]
    E --> F[Model Evaluation]
    F -- Accepted --> G[Model Pusher to AWS S3 / Azure]
    G --> H[Deployed on EC2 via Docker]

📦 Modular Breakdown

Each block is modular and configurable using YAML configs. All logs, model files, and metrics are stored and versioned automatically.

⚙️ Pipeline Components

1️⃣ Data Ingestion (From MongoDB)

2️⃣ Data Validation

3️⃣ Data Transformation

4️⃣ Model Training & Evaluation

This component trains and evaluates multiple classification models to detect potential network intrusions. The following models were included:

🌲 Random Forest
🌳 Decision Tree
🚀 Gradient Boosting
📈 Logistic Regression
⚡ AdaBoost

Each model is evaluated using:

precision
recall
f1-score

All models are trained on the transformed datasets with hyperparameter tuning and are logged into MLflow for comparison.

✅ RandomForestClassifier and GradientBoostingClassifier are trained with verbosity enabled to provide iterative training feedback.

📈 MLflow Tracking & Experimentation

MLflow is used to:

Log metrics (accuracy, precision, recall, F1)
Store trained models and preprocessing artifacts
Compare performance visually across experiments

You can view each experiment by model type and choose the best-performing one using the F1-score.

🎯 In our tests, Random Forest and Gradient Boosting consistently outperformed others in terms of precision-recall balance.

🔍 Visual Comparison (Across 4 Runs)

📊 Metrics Table

🚀 CI/CD Pipeline

GitHub Actions Workflow

☁️ Tech Stack

Category	Tools Used
Data Storage	MongoDB
ML Models & Preprocessing	Random Forest, Gradient Boosting, Decision Tree, Logistic Regression, AdaBoost, Scikit-learn, Imbalanced-learn, NumPy
Tracking	MLflow, DVC, Dagshub
Orchestration	Python scripts, YAML configs
Containerization	Docker
CI/CD	GitHub Actions
Cloud Deployment	Amazon ECR, EC2

🧠 Challenges & Solutions

Challenge	Solution
Automating data drift detection	Implemented data validation schema with drift logic
Managing multiple model experiments	MLflow integration with metrics tracking and visual comparison
Avoiding manual deployment	CI/CD setup via GitHub Actions
Ensuring model reproducibility	All artifacts (model, preprocessing, metrics) versioned & stored
Cloud deployment configuration	ECR + EC2 with secured credentials from GitHub secrets

📌 Key Learnings

End-to-end MLOps projects go far beyond model accuracy — it's about designing scalable, maintainable systems.
Tracking, versioning, and reproducibility are as important as model performance.
Real-world deployments benefit from cloud-native, containerized infrastructure.

🔍 How to Reproduce This Project

# 1. Clone the repository
git clone https://github.com/wasnikh0/NetworkSecurity.git
cd network-security-mlops

# 2. Set up virtual environment
python conda create -p venv python==3.10
source conda activate venv/  # or venv\Scripts\activate on Windows

# 3. Install dependencies
pip install -r requirements.txt

# 4. Create .env or setup secrets for MongoDB and AWS

# 5. Run the pipeline
python main.py  # Or run modular scripts

# 6. Build Docker image
docker build -t networksecurity .

# 7. Push to ECR (after aws login)
docker tag networksecurity:latest <aws_account_id>.dkr.ecr.<region>.amazonaws.com/<repository>:latest
docker push <aws_account_id>.dkr.ecr.<region>.amazonaws.com/<repository>:latest

📬 Contact

👨‍💻 Author: Harsh Wasnik
📫 LinkedIn
🛠️ Open to freelance, consulting, or full-time opportunities

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
Network_Data		Network_Data
__pycache__		__pycache__
data_schema		data_schema
docs		docs
final_model		final_model
networksecurity		networksecurity
prediction_output		prediction_output
templates		templates
valid_data		valid_data
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
main.py		main.py
push_data.py		push_data.py
requirements.txt		requirements.txt
setup.py		setup.py
streamlit_app.py		streamlit_app.py
testing.ipynb		testing.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛡️ End-to-End MLOps Project: Network Security Threat Detection

🚀 Overview

🎯 Project Objectives

🔄 Workflow Architecture

📊 ML Pipeline: Ingestion to Inference

📦 Modular Breakdown

⚙️ Pipeline Components

1️⃣ Data Ingestion (From MongoDB)

2️⃣ Data Validation

3️⃣ Data Transformation

4️⃣ Model Training & Evaluation

📈 MLflow Tracking & Experimentation

🔍 Visual Comparison (Across 4 Runs)

📊 Metrics Table

🚀 CI/CD Pipeline

GitHub Actions Workflow

☁️ Tech Stack

🧠 Challenges & Solutions

📌 Key Learnings

🔍 How to Reproduce This Project

📬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🛡️ End-to-End MLOps Project: Network Security Threat Detection

🚀 Overview

🎯 Project Objectives

🔄 Workflow Architecture

📊 ML Pipeline: Ingestion to Inference

📦 Modular Breakdown

⚙️ Pipeline Components

1️⃣ Data Ingestion (From MongoDB)

2️⃣ Data Validation

3️⃣ Data Transformation

4️⃣ Model Training & Evaluation

📈 MLflow Tracking & Experimentation

🔍 Visual Comparison (Across 4 Runs)

📊 Metrics Table

🚀 CI/CD Pipeline

GitHub Actions Workflow

☁️ Tech Stack

🧠 Challenges & Solutions

📌 Key Learnings

🔍 How to Reproduce This Project

📬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages