Finn Busch, Matti Vahs, Quantao Yang, Jesús Gerardo Ortega Peimbert, Yixi Cai, Jana Tumova, Olov Andersson
Division of Robotics, Perception, and Learning at KTH Royal Institute of Technology
LoTIS is a model for visual navigation that provides robot-agnostic image-space guidance by localizing a reference RGB trajectory in the robot's current view. Given a reference trajectory (a sequence of RGB images) and a query image from the robot's current viewpoint, LoTIS predicts the 2D image-space coordinates, visibility, and relative distance of each trajectory pose as it would appear in the query view.
1. Clone this repository and install dependencies
git clone https://github.com/KTH-RPL/lotis.git
cd lotisDependencies are managed with uv. uv run will automatically install everything on first use. To install manually:
uv sync2. Clone the DINOv3 repository
LoTIS uses DINOv3 as a frozen backbone. Clone Meta's repository into the project root:
git clone https://github.com/facebookresearch/dinov3.git3. Download DINOv3 weights
Request access to DINOv3 pretrained weights at ai.meta.com/resources/models-and-libraries/dinov3-downloads. Once approved, you will receive an email with download URLs. Download the ViT-B/16 pretrain weights (dinov3_vitb16_pretrain.pth) and set the path:
export DINOV3_WEIGHTS=/path/to/dinov3_vitb16_pretrain.pthOr pass it directly via --dinov3-weights when running scripts.
4. Download LoTIS model weights
huggingface-cli download fnnBsch/lotis final_model.pth final_config.yaml --local-dir .Localize a reference trajectory in a query (image, video, or directory), visualized live in Rerun:
uv run python infer.py \
--trajectory examples/00_KTH_Campus/Courtyard/trajectory.mp4 \
--query examples/00_KTH_Campus/Courtyard/queries/Forward/query.mp4Live webcam — encode a trajectory once, then localize from your camera in real time:
uv run python infer.py \
--trajectory path/to/trajectory.mp4 \
--usb-camUse --cam-id 1 to select a different camera. All infer.py options:
--trajectory PATH Reference trajectory (video, image dir, or single image)
--query PATH Query input (video, image dir, or single image)
--usb-cam Use webcam as live query [mutually exclusive with --query]
--cam-id INT Camera device ID (default: 0)
--checkpoint PATH LoTIS checkpoint (default: final_model.pth)
--config PATH LoTIS config (default: final_config.yaml)
--dinov3-weights PATH DINOv3 weights file (or set DINOV3_WEIGHTS)
--dinov3-repo PATH Path to cloned dinov3 repo (default: ./dinov3)
--device cuda|cpu Inference device (auto-detected if not set)
--vis-threshold FLOAT Visibility threshold for displaying points (default: 0.5)
Rerun also accepts additional flags (e.g. --serve to stream to a remote viewer). Run python infer.py --help for the full list.
If running on a remote machine, start the Rerun viewer locally and forward the port:
# On your local machine
rerun
# Forward the Rerun port from remote to local
ssh -R 9876:localhost:9876 <remote-host>
# On the remote machine
uv run python infer.py --trajectory ... --query ... --connectA full interactive demo is available at the HuggingFace Space. To run it locally:
uv run python app.pySet DINOV3_WEIGHTS (and optionally DINOV3_REPO) before running.
from lotis import TrajectoryLocalizer
localizer = TrajectoryLocalizer.from_checkpoint(
checkpoint_path="final_model.pth",
config_path="final_config.yaml",
dinov3_weights="/path/to/dinov3_vitb16_pretrain.pth",
)
# Encode a trajectory — do this once and reuse
encoding = localizer.encode_trajectory("path/to/trajectory.mp4")
# Localize a query image
result = localizer.localize("query.jpg", encoding)
print(f"Closest trajectory frame: {result.closest_frame()}")
print(f"Visible frames: {result.visible_indices()}")
# Save and reload the trajectory encoding
import torch
torch.save(encoding.to_dict(), "encoding.pt")
encoding = TrajectoryEncoding.from_dict(torch.load("encoding.pt"))localize() accepts single images or lists, and single encodings or lists — see lotis/localizer.py for the full batching API.
- Inference code + Gradio Demo
- Evaluation code
- Training code
@misc{busch2026learninglocalizereferencetrajectories,
title={Learning to Localize Reference Trajectories in Image-Space for Visual Navigation},
author={Finn Lukas Busch and Matti Vahs and Quantao Yang and Jesús Gerardo Ortega Peimbert and Yixi Cai and Jana Tumova and Olov Andersson},
year={2026},
eprint={2602.18803},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2602.18803},
}