Skip to content

Shared-Reality-Lab/IMAGE-test-graphics

Repository files navigation

IMAGE-test-graphics

Scripts and graphics used for ongoing testing of IMAGE

Scripts are used to iterate through the graphics, get output from the server, and compare outputs.

testset.py:

Iterates through collection of graphics and builds a testset based on specified flags, sends a POST request for each of these graphics, writes the output to a timestamped JSON file. Default server is unicorn if not specified.

examples:

to iterate through the entire test set on pegasus:

./testset.py -s pegasus

to iterate through graphics that are larger than 1000000 bytes and are tagged as "outdoor":

./testset.py --minBytes 1000000 -t outdoor

testdiff.py:

Compares preprocessor output for any two JSON files.

examples:

to compare the output from August 6 2022 at midnight and August 7 2022 at midnight for all graphics which were run at that time:

./testdiff.py -t 08_06_2022_00_00_00 08_07_2022_00_00_00, output will be a list of all graphics that have both timestamps, plus any differences

to compare the output from August 6 2022 at midnight and August 7 2022 at midnight for graphic 35:

./testdiff.py -t 08_06_2022_00_00_00 08_07_2022_00_00_00 -n 35

to compare the grouping, sorting, and semantic segmentation output from August 6 2022 at midnight and August 7 2022 at midnight for graphic 35:

./testdiff.py -t 08_06_2022_00_00_00 08_07_2022_00_00_00 -n 35 --preprocessor grouping sorting semanticSegmentation

to get a list of objects found for object detection

Insert two time stamps and then use --od flag to specify which timestamp is which model. For example, if Azure was run at August 6 at 12:00am and YOLO was run at August 6 at 12:01am for graphic 35

./testdiff.py -t 08_06_2022_00_00_00 08_06_2022_00_00_01 -n 35 --od Azure YOLO

-d flag on testset.py will run a testdiff on the JSON that was just created and the next most recent JSON that was created for the graphic(s) (if it exists)

azure.sh and yolo.sh:

azure.sh switches the docker compose for object detection from YOLO (default) to Azure yolo.sh switches back through a restoreunstable

example:

to compare YOLO and Azure outputs for all indoor graphics

./testset.py -t indoor
./azure.sh
./testset.py -t indoor -d
./yolo.sh

llm-caption-test.py

Automated testing script for evaluating multimodal LLM descriptions of images from the IMAGE-test-graphics repository.

Requirements

  • Python 3.7+
  • Ollama running locally (ollama serve)
  • Internet connection (for GitHub API access)

Installation

pip install requests pandas pillow

Usage

python llm-caption-test.py

Configuration

Models

Edit the MODELS list in the script with model names and temperature settings:

MODELS = [
    ("gemma3:12b", 0.0),
    ("gemma3:12b", 1.0),          
    ("llama3.2-vision:latest", 0.0),
    ("llama3.2-vision:latest", 1.0)
]

Other Parameters

  • Prompt: PROMPT parameter is applied to all models
  • Image size: Modify max_size in image_to_base64() (default: 2048x2048)
  • API endpoint: Update url in run_ollama_model() if not using localhost:11434
  • Image formats: Add to IMAGE_EXTENSIONS set for other formats

Output Files

  1. llm_test_results.csv - Main results with columns:

    • folder: Folder number (0000-0067)
    • filename: Original image filename
    • image: HTML-embedded thumbnail
    • Model description columns (e.g., "llama3.2-vision:latest (t=0.0)")
  2. llm_test_results.html - Formatted HTML view with embedded images

  3. intermediate_results.csv - Auto-saved every 5 images (backup)

Notes

  • Processes 68 folders (0000-0067) from Shared-Reality-Lab/IMAGE-test-graphics
  • Images are resized to max 2048x2048
  • 1-second delay between model calls to avoid overload
  • Handles various image formats (JPG, JPEG, PNG, GIF, BMP, WEBP)
  • Error handling for missing images or API failures

About

Scripts and graphics used for ongoing testing of IMAGE

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published