This project is a complete end-to-end machine learning pipeline designed to predict a student's math score based on various inputs such as gender, ethnicity, parental education level, lunch type, test preparation course, reading score, and writing score. The project is deployed as a web application using Flask and hosted on Render.
The application takes input through a web form and returns the predicted math score using a trained regression model. It covers data ingestion, preprocessing, model training, and inference.
- Clean modular structure (components separated logically)
- Data ingestion and train/test split
- Data transformation using pipelines with handling of categorical and numerical features
- Model training using multiple regressors (Random Forest, XGBoost, CatBoost, etc.)
- Automatic model selection based on best R² score
- Persisted preprocessor and model using
dill - Flask-based web interface for real-time prediction
- Deployed on Render
ML_PROJECTS/ ├── artifacts/ # Holds model.pkl, preprocessor.pkl, etc. ├── logs/ # Stores log files ├── notebook/ # Likely has your data CSV ├── src/ # Source files (components, pipeline, utils) ├── components/ # Data ingestion, transformation, training ├── pipeline/ # train_pipeline.py for full pipeline run ├── utils.py # Helper functions (save/load object, evaluation) ├── logger.py # Logging ├── exception.py # Custom exceptions ├── templates/ # HTML files for Flask ├── venv/ # Virtual environment (should be ignored) ├── application.py # Flask app ├── requirements.txt # contain all python dependencies ├── Procfile # For Render deployment,specifies how to run the Flask app using Gunicorn ├── .gitignore ├── README.md ├── setup.py # Can be used if you're packaging this as a module
- Data Ingestion: Loads CSV data from
notebook/data/stud.csvand splits it into train/test sets. - Data Transformation: Applies preprocessing pipeline including
SimpleImputer,StandardScaler, andOneHotEncoderwithhandle_unknown='ignore'. - Model Training: Trains multiple models, performs hyperparameter tuning, selects the best model, and saves it.
- Prediction: Loads the model and preprocessor, transforms incoming form data, and returns the predicted math score.
-
Clone the repository: git clone https://github.com/DataProjectHub/ML_Students_Performance.git
-
Create a virtual environment: python -m venv venv venv\Scripts\activate
-
Install dependencies pip install -r requirements.txt
-
Run training pipeline (optional) python src/pipeline/train_pipeline.py
-
Start the Flask server python application.py
Visit http://localhost:5000/predictdata to use the form.
The app is deployed on Render. You can access it here:
https://student-performance-deployment.onrender.com/predictdata
- Python
- Scikit-learn
- XGBoost
- CatBoost
- Flask
- HTML/CSS (Bootstrap)
- Render (for deployment)
Project built and deployed by Pooja Anilkumar as part of learning full-cycle machine learning implementation and deployment.
Connect with me on LinkedIn
GitHub: DataProjectHub