Skip to content

HelloJahid/CarDVLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CarDVLM: Car Damage Assessment using Vision-Language Models

Accurate vehicle damage assessment is critical for automotive eCommerce platforms, where buyers depend on transparent visual evidence to make informed purchasing decisions. Traditional object detectors (e.g., YOLO, Mask R-CNN) can localise damage but fail to provide contextual explanations such as type, location, and severity.

To address this gap, we introduce CarDVLM, a multimodal framework that integrates GroundingCarDD with a fine-tuned vision–language model (LLaVA). The system detects and localises damages, then generates structured, query-driven textual descriptions, enabling interpretable and user-centric assessments.


📑 Paper

Paper (Under Review)

Key Contributions

  • CarDVLM Framework
    A domain-adapted system combining phrase-grounded object detection with multimodal reasoning for interpretable car damage analysis.

  • Structured Automotive Dataset
    Constructed from public (CarDD) and private sources, with bounding box annotations and semantically aligned descriptions across 25 vehicle body parts.

  • CarDamageEval
    A two-tier evaluation framework that measures structured accuracy with precision, recall, F1, and human-aligned evaluation.

  • Ablation Study under Real-World Scenarios
    Comprehensive testing across three challenging conditions:

    • Clearly visible damages
    • Spatially ambiguous damages (left/right or front/rear orientation)
    • Extremely subtle or partially obscured damages
      This validates CarDVLM’s robustness and practical deployment readiness.

Results

Structured Evaluation (Pair-Matching)

Model Accuracy Precision Recall F1 Score
ChatGPT 59.5 59.6 65.5 61.5
LLaMA 70.6 77.6 75.5 75.3
Qwen 74.9 87.4 80.5 82.3
CarDVLM 86.7 88.8 90.2 89.5

CarDVLM delivers state-of-the-art structured accuracy, outperforming all baselines.


Model Architecture

CarDVLM Architecture

Figure: CarDVLM integrates GroundingCarDD with a fine-tuned VLM (CLIP + LLaMA-2 13B + LoRA).





Installation Guide

1. Install Miniconda


2. Install LLaVA for Testing

2.1 Clone the Repository

git clone https://github.com/HelloJahid/CarDVLM
cd CarDVLM

2.3 Set Up Conda Environment

conda create -n llava python=3.10 -y
conda activate llava

2.4 Install Dependencies

pip install --upgrade pip                  
pip install -e .
pip install -e ".[train]"
pip install flash-attn==2.2.0
pip install peft==0.10.0
pip install deepspeed

3. Set up GroundingCarDD

Follow the instructions here: GroundingCarDD Installation Guide


🙏 Acknowledgements

This project builds upon the LLaVA-1.5 vision–language model developed by Haotian Liu and contributors.

For more details, see the LICENSE and NOTICE files.


About

Car Damage Assessment using Vision LLM

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published