ZeroVO: Visual Odometry with Minimal Assumptions

This repository contains the code that accompanies our CVPR 2025 paper ZeroVO: Visual Odometry with Minimal Assumptions. Please find our project page for more details.

Overview

We introduce ZeroVO, a novel visual odometry (VO) algorithm that achieves zero-shot generalization across diverse cameras and environments, overcoming limitations in existing methods that depend on predefined or static camera calibration setups. Our approach incorporates three main innovations. First, we design a calibration-free, geometry-aware network structure capable of handling noise in estimated depth and camera parameters. Second, we introduce a language-based prior that infuses semantic information to enhance robust feature extraction and generalization to previously unseen domains. Third, we develop a flexible, semi-supervised training paradigm that iteratively adapts to new scenes using unlabeled data, further boosting the models' ability to generalize across diverse real-world scenarios. We analyze complex autonomous driving contexts, demonstrating over 30% improvement against prior methods on three standard benchmarks--KITTI, nuScenes, and Argoverse 2--as well as a newly introduced, high-fidelity synthetic dataset derived from Grand Theft Auto (GTA). By not requiring fine-tuning or camera calibration, our work broadens the applicability of VO, providing a versatile solution for real-world deployment at scale.

Datasets

We use KITTI, Argoverse 2 and nuScenes dataset along with in-the-wild YouTube videos. Please find their websites for dataset setup.

Datasets	Download Link
KITTI	The KITTI dataset can be downloaded from the official source here. All other datasets, after processing, will adhere to the same directory structure as the KITTI dataset.
Argoverse 2	The Argoverse 2 dataset can be downloaded from the official source here. Once downloaded, the subset corresponding to the VO task can be extracted using the provided script located in the data directory.
nuScenes	The nuScenes dataset can be downloaded from the official source here. Once downloaded, the subset corresponding to the VO task can be extracted using the provided script located in the data directory.
GTA V	The GTA dataset can be downloaded here. Once downloaded, please extract the contents by running the following command: unzip GTA.zip
YouTube	Approximately 50 hours of driving footage were selected from videos published on the YouTube channel J Utah, featuring a diverse range of driving scenarios. A more comprehensive list of driving videos from YouTube can be found here.

The directory structure within the data folder is organized as follows:

data/ 
├── KITTI/
│   ├── kitti_est_intrs.json
│   ├── text_feature/ 
│   ├── depth_est_intrs/ 
│   ├── sequences/ 
│   └── poses/
├── Argoverse 2/
│   ├── stereo_front_left_est_intrs.json
│   ├── text_feature/ 
│   ├── depth_est_intrs/ 
│   ├── sequences/ 
│   └── poses/
├── nuScenes/
│   ├── cam_front_est_intrs.json
│   ├── text_feature/ 
│   ├── depth_est_intrs/ 
│   ├── sequences/ 
│   └── poses/
├── GTA/
│   ├── est_intrs.json
│   ├── text_feature/ 
│   ├── GTA_Depth_est_intrs/ 
│   ├── sequences/ 
│   └── poses/

The estimated camera intrinsics, metric depth, and text features are available for download here. Alternatively, users may regenerate these components using WildCamera for intrinsics estimation, Metric3Dv2 for metric depth prediction, and LLaVA-NeXT and SentenceTransformers for text feature extraction.

Environment Requirements and Installation

# create a new environment
conda create -n ZVO python=3.9
conda activate ZVO
# install pytorch
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
conda install -c iopath iopath
# install pytorch3d
wget https://anaconda.org/pytorch3d/pytorch3d/0.7.5/download/linux-64/pytorch3d-0.7.5-py39_cu117_pyt201.tar.bz2
conda install pytorch3d-0.7.5-py39_cu117_pyt201.tar.bz2
sudo rm pytorch3d-0.7.5-py39_cu117_pyt201.tar.bz2
# export CUDA 11.7 
export CUDA_HOME=/usr/local/cuda-11.7
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
pip install PyYAML==6.0.2 timm==1.0.16 matplotlib==3.5.3 pandas==2.3.0 opencv-python==4.11.0.86 a-unet==0.0.16 mmcv-full==1.7.2 numpy==1.26.4 pillow==11.0.0 av2==0.2.1 nuscenes-devkit==1.1.11

sudo mv /ZeroVO/vision_transformer_cross.py /home/user/anaconda3/envs/XVO/lib/python3.9/site-packages/timm/models

Training

Install the correlation package
The correlation package must be installed first:
```
cd model/correlation_package
python setup.py install
```
Preprocess the dataset
The labels are available in the poses directory. To regenerate the labels or review the corresponding implementation details, please refer to the code and execute the following command:
```
python3 preprocess.py
```
Download initial weights

Download initial weights to init_weights directory. Initial weights can be found here.

Run training

Supervised Training on nuScenes OneNorth:

# update params.py
self.train_video = {'NUSC': nusc_scene_map['singapore-onenorth'],}
self.checkpoint_path = 'saved_models/zvo_nusc_sl'

Self-Training on nuScenes OneNorth and YouTube:

# update params.py
self.train_video = {
    'NUSC': nusc_scene_map['singapore-onenorth'],
    'YouTube': [str(i).zfill(2) for i in range(49)],
    }
self.checkpoint_path = 'saved_models/zvo_nusc_ssl'

and run:

python3 main.py

Test

Dwonload model checkpoints to saved_models directory. Model checkpoints can be found here.

We test on the KITTI, Argoverse 2, the unseen regions in nuScenes, and GTA:

# update test_utils.py
args.model_path = "/saved_models/ZVO"
gta_scenes = sorted(os.listdir(args.data_path['GTA']+'/sequences/'))
# args.testing_data = {'ARGO2_Stereo': {'ARGO2_Stereo': [str(i).zfill(3) for i in range(1000) if i not in argo2_stereo_remove]}}
# args.testing_data = {'NUSC_X': {'NUSC_X': nusc_scene_map['boston-seaport']+nusc_scene_map['singapore-queenstown']+nusc_scene_map['singapore-hollandvillage']}}
# args.testing_data = {'GTA': {'GTA': gta_scenes}}
# args.testing_data = {'KITTI': {'KITTI': ['00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10']}}
fast_test(args)

and run:

python3 test_utils.py

Evaluation

cd odom-eval
# update eval.py
eval_dirs = ['ZVO']

and run:

python3 eval.py

VO evaluation tool is revised from https://github.com/Huangying-Zhan/kitti-odom-eval.

Inference

If you would like to use a trained model to generate predictions on new input data, we provide an inference.py script to facilitate the inference process.

Result

We show trajectory prediction results across the four most complex driving sequences (00, 02, 05, and 08) from the KITTI dataset. Each subplot illustrates the trajectories generated by our proposed model and the baseline models alongside the ground truth trajectory. The qualitative results demonstrate that our approach achieves the highest alignment with the ground truth, particularly in challenging turns and extended straight paths. These findings highlight the robustness of our method in handling complex and diverse driving scenarios.

Contact

If you have any questions or comments, please feel free to contact me at leilai@bu.edu.

License

Our work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ZeroVO: Visual Odometry with Minimal Assumptions

Overview

Datasets

Environment Requirements and Installation

Training

Test

Evaluation

Inference

Result

Contact

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
data		data
fisher		fisher
init_weights		init_weights
misc		misc
model		model
odom-eval		odom-eval
poses		poses
results		results
saved_models		saved_models
.gitignore		.gitignore
README.md		README.md
argo2_logs_map.py		argo2_logs_map.py
dataset.py		dataset.py
inference.py		inference.py
main.py		main.py
nusc_scene_map.py		nusc_scene_map.py
params.py		params.py
preprocess.py		preprocess.py
test_utils.py		test_utils.py
utils.py		utils.py
vision_transformer_cross.py		vision_transformer_cross.py

h2xlab/ZeroVO

Folders and files

Latest commit

History

Repository files navigation

ZeroVO: Visual Odometry with Minimal Assumptions

Overview

Datasets

Environment Requirements and Installation

Training

Test

Evaluation

Inference

Result

Contact

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages