Skip to content
/ lotis Public

LoTIS: Learning to Localize Reference Trajectories in Image-Space for Visual Navigation

License

Notifications You must be signed in to change notification settings

KTH-RPL/lotis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LoTIS: Learning to Localize Reference Trajectories in Image-Space for Visual Navigation

arXiv Project Page HuggingFace

Finn Busch, Matti Vahs, Quantao Yang, Jesús Gerardo Ortega Peimbert, Yixi Cai, Jana Tumova, Olov Andersson

Division of Robotics, Perception, and Learning at KTH Royal Institute of Technology

LoTIS is a model for visual navigation that provides robot-agnostic image-space guidance by localizing a reference RGB trajectory in the robot's current view. Given a reference trajectory (a sequence of RGB images) and a query image from the robot's current viewpoint, LoTIS predicts the 2D image-space coordinates, visibility, and relative distance of each trajectory pose as it would appear in the query view.

Setup

1. Clone this repository and install dependencies

git clone https://github.com/KTH-RPL/lotis.git
cd lotis

Dependencies are managed with uv. uv run will automatically install everything on first use. To install manually:

uv sync

2. Clone the DINOv3 repository

LoTIS uses DINOv3 as a frozen backbone. Clone Meta's repository into the project root:

git clone https://github.com/facebookresearch/dinov3.git

3. Download DINOv3 weights

Request access to DINOv3 pretrained weights at ai.meta.com/resources/models-and-libraries/dinov3-downloads. Once approved, you will receive an email with download URLs. Download the ViT-B/16 pretrain weights (dinov3_vitb16_pretrain.pth) and set the path:

export DINOV3_WEIGHTS=/path/to/dinov3_vitb16_pretrain.pth

Or pass it directly via --dinov3-weights when running scripts.

4. Download LoTIS model weights

huggingface-cli download fnnBsch/lotis final_model.pth final_config.yaml --local-dir .

Running inference

Localize a reference trajectory in a query (image, video, or directory), visualized live in Rerun:

uv run python infer.py \
    --trajectory examples/00_KTH_Campus/Courtyard/trajectory.mp4 \
    --query examples/00_KTH_Campus/Courtyard/queries/Forward/query.mp4

Live webcam — encode a trajectory once, then localize from your camera in real time:

uv run python infer.py \
    --trajectory path/to/trajectory.mp4 \
    --usb-cam

Use --cam-id 1 to select a different camera. All infer.py options:

--trajectory PATH       Reference trajectory (video, image dir, or single image)
--query PATH            Query input (video, image dir, or single image)
--usb-cam               Use webcam as live query  [mutually exclusive with --query]
--cam-id INT            Camera device ID (default: 0)
--checkpoint PATH       LoTIS checkpoint (default: final_model.pth)
--config PATH           LoTIS config (default: final_config.yaml)
--dinov3-weights PATH   DINOv3 weights file (or set DINOV3_WEIGHTS)
--dinov3-repo PATH      Path to cloned dinov3 repo (default: ./dinov3)
--device cuda|cpu       Inference device (auto-detected if not set)
--vis-threshold FLOAT   Visibility threshold for displaying points (default: 0.5)

Rerun also accepts additional flags (e.g. --serve to stream to a remote viewer). Run python infer.py --help for the full list.

Remote visualization

If running on a remote machine, start the Rerun viewer locally and forward the port:

# On your local machine
rerun

# Forward the Rerun port from remote to local
ssh -R 9876:localhost:9876 <remote-host>

# On the remote machine
uv run python infer.py --trajectory ... --query ... --connect

Gradio demo

A full interactive demo is available at the HuggingFace Space. To run it locally:

uv run python app.py

Set DINOV3_WEIGHTS (and optionally DINOV3_REPO) before running.

Python API

from lotis import TrajectoryLocalizer

localizer = TrajectoryLocalizer.from_checkpoint(
    checkpoint_path="final_model.pth",
    config_path="final_config.yaml",
    dinov3_weights="/path/to/dinov3_vitb16_pretrain.pth",
)

# Encode a trajectory — do this once and reuse
encoding = localizer.encode_trajectory("path/to/trajectory.mp4")

# Localize a query image
result = localizer.localize("query.jpg", encoding)

print(f"Closest trajectory frame: {result.closest_frame()}")
print(f"Visible frames: {result.visible_indices()}")

# Save and reload the trajectory encoding
import torch
torch.save(encoding.to_dict(), "encoding.pt")
encoding = TrajectoryEncoding.from_dict(torch.load("encoding.pt"))

localize() accepts single images or lists, and single encodings or lists — see lotis/localizer.py for the full batching API.

Roadmap

  • Inference code + Gradio Demo
  • Evaluation code
  • Training code

Citation

@misc{busch2026learninglocalizereferencetrajectories,
      title={Learning to Localize Reference Trajectories in Image-Space for Visual Navigation}, 
      author={Finn Lukas Busch and Matti Vahs and Quantao Yang and Jesús Gerardo Ortega Peimbert and Yixi Cai and Jana Tumova and Olov Andersson},
      year={2026},
      eprint={2602.18803},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2602.18803}, 
}

About

LoTIS: Learning to Localize Reference Trajectories in Image-Space for Visual Navigation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages