ILIAS:
Instance-Level Image retrieval At Scale

Giorgos Kordopatis-Zilos Vladan Stojnić Anna Manko Pavel Šuma Nikolaos-Antonios Ypsilantis Nikos Efthymiadis Zakaria Laskar Jiří Matas Ondřej Chum Giorgos Tolias
Visual Recognition Group, Faculty of Electrical Engineering, Czech Technical Univesity in Prague

Dataset intro

ILIAS is a large-scale test dataset for evaluation on Instance-Level Image retrieval At Scale.
It is designed to support future research in image-to-image and text-to-image retrieval for particular objects and serves as a benchmark for evaluating representations of foundation or customized vision and vision-language models, as well as specialized retrieval techniques.

The dataset includes 1,000 object instances across diverse domains, with:

  • 1,232 image queries, depicting query objects on clean or uniform background.
  • 4,715 positive images, featuring the query objects in real-world conditions with clutter, occlusions, scale variations, and partial views.
  • 1,000 text queries, providing fine-grained textual descriptions of the query objects.
  • 100M distractors from YFCC100M to evaluate retrieval performance under large-scale settings, while asserting noise-free ground truth.

object instance. Manually collected 5,947 images -- 1,232 query images and 4,715 positives -- and 1,000 text queries

Retrieval in large-scale settings is achieved using 100M distractors from the YFCC100M dataset

Noise-free ground truth guaranteed by collecting objects non publicly available before 2014, the YFCC100M compilation date

foundation and legacy models are evaluated combined with linear adaptation and various re-ranking techniques

Benchmark

performance on image-to-image retrieval Evaluation based on cosine similarity between global image representations extracted from frozen backbones.

name

name of the model

year

year of model's release

repo

source repository for model weights and code

timm: pytorch-image-models library

torchvision: torchvision library

github: official github library

arch

architecture of the model

ViT-(S|B|L): small, base or large Vision Transformer

CN-(B|L): base or large ConvNext

R(50|101): ResNet 50 or 101

CNN: Convolutional Neural Networks

train

training scheem used for model learning

sup: supervised learning

ssl: self-supervised learning

dist: distillation

vla: vision-language alignment

dims

dimensionality of desriptors

dataset

dataset used to train the model

data size

size of the training dataset

train res

size of the images used during training

test res

size of the largest image side used during testing

5M

mAP@1k on mini-ILIAS

100M

mAP@1k on ILIAS

AlexNet 2012 torchvision CNN sup 256 in1k 1M 224 384 2.0 1.5
VGG16 2015 torchvision CNN sup 512 in1k 1M 224 384 3.0 2.3
ResNet50 2016 torchvision R50 sup 2048 in1k 1M 224 384 2.3 1.7
ResNet101 2016 torchvision R101 sup 2048 in1k 1M 224 384 2.7 1.9
DenseNet169 2016 torchvision CNN sup 2048 in1k 1M 224 384 3.2 2.4
Inception-v4 2017 torchvision CNN sup 1536 in1k 1M 299 512 1.7 1.1
NASNet 2018 torchvision CNN sup 4032 in1k 1M 331 512 1.7 1.0
EffNet 2019 timm CNN sup+dist 1792 in1k 1M 380 512 3.8 2.6
SWAV 2020 github R50 ssl 2048 in1k 1M 224 384 2.2 1.7
ViT-B 2021 timm ViT-B sup 768 in1k 1M 224 384 1.4 1.0
ViT-B-in22k 2021 timm ViT-B sup 768 in21k 14M 224 384 4.2 3.0
ViT-L-in22k 2021 timm ViT-L sup 1024 in21k 14M 224 384 6.0 4.6
ViT-L 2021 timm ViT-L sup 1024 in1k 1M 224 384 5.1 3.6
ViT-L@384 2021 timm ViT-L sup 1024 in1k 1M 384 512 7.2 5.3
OAI-CLIP-R50 2021 github R50 vla 1024 opanai 400M 224 384 4.4 3.2
OAI-CLIP-B 2021 timm ViT-B vla 512 opanai 400M 224 384 5.9 4.2
OAI-CLIP-L 2021 timm ViT-L vla 768 opanai 400M 224 384 9.0 7.0
OAI-CLIP-L@336 2021 timm ViT-L vla 768 opanai 400M 336 512 12.1 9.4
DINO-R50 2021 github R50 ssl 2048 in1k 1M 224 384 3.8 2.9
DINO-ViT-B 2021 github ViT-B ssl 768 in1k 1M 224 384 5.0 3.7
MoCov3-R50 2021 github R50 ssl 2048 in1k 1M 224 384 3.3 2.6
MoCov3-ViT-B 2021 github ViT-B ssl 768 in1k 1M 224 384 2.5 1.9
OpenCLIP-ViT-L 2022 timm ViT-L vla 768 laion2b 2B 224 384 11.8 9.4
ConvNext-B 2022 timm CN-B sup 1024 in1k 1M 288 384 2.8 2.0
ConvNext-B-in22k 2022 timm CN-B sup 1536 in22k 14M 224 384 3.2 6.4
ConvNext-L 2022 timm CN-L sup 1024 in1k 1M 288 384 8.9 2.2
ConvNext-L-in22k 2022 timm CN-L sup 1536 in22k 14M 288 384 8.6 6.6
OpenCLIP-CN-B 2022 timm CN-B vla 640 laion2b 2B 256 384 10.7 7.9
OpenCLIP-CN-L@320 2022 timm CN-L vla 768 laion2b 2B 320 512 12.7 9.6
Recall@k-R50-SOP 2022 github R50 sup 512 sop 60k 224 384 2.3 1.6
Recall@k-ViT-B-SOP 2022 github ViT-B sup 512 sop 60k 224 384 6.8 5.0
CVNet-R50 2022 github R50 sup 2048 gldv2 1M 512 724 3.7 2.9
CVNet-R101 2022 github R101 sup 2048 gldv2 1M 512 724 3.9 3.0
DeiT3-B 2022 timm ViT-B sup+dist 768 in1k 1M 224 384 1.9 1.2
DeiT3-L 2022 timm ViT-L sup+dist 1024 in1k 1M 224 384 2.0 1.5
EVA-MIM-B 2023 timm ViT-B ssl 768 in22k 14M 224 384 3.1 2.1
EVA-MIM-L 2023 timm ViT-L ssl 1024 in22k 14M 224 384 2.5 1.5
EVA-MIM-L 2023 timm ViT-L ssl 1024 merged38m 38M 224 384 6.7 4.7
EVA-CLIP-B 2023 timm ViT-B vla 512 merged2b 2B 224 384 7.8 5.9
EVA-CLIP-L 2023 timm ViT-L vla 768 merged2b 2B 336 512 13.6 10.9
HIER-ViT-S-SOP 2023 github ViT-S sup 384 sop 60k 224 384 4.6 3.3
Unicom-B 2023 github ViT-B dist 768 laion400m 400M 224 384 13.8 11.0
Unicom-L 2023 github ViT-L dist 768 laion400m 400M 224 384 18.0 13.8
Unicom-L@336 2023 github ViT-L dist 768 laion400m 400M 336 512 17.8 13.9
Unicom-B-GLDv2 2023 github ViT-B sup 768 gldv2 400M 512 724 3.7 3.0
Unicom-B-SOP 2023 github ViT-B sup 768 sop 400M 224 384 12.2 9.1
SG-R50 2023 github R50 sup 2048 gldv2 1M 512 724 4.3 3.4
SG-R101 2023 github R101 sup 2048 gldv2 1M 512 724 4.5 3.4
USCRR-CLIP 2023 github ViT-B sup 768 uned 2.8M 224 384 5.7 3.8
SigLIP-B 2023 timm ViT-B vla 768 webli 10B 224 384 14.1 11.2
SigLIP-B@256 2023 timm ViT-B vla 768 webli 10B 256 384 14.6 11.5
SigLIP-B@384 2023 timm ViT-B vla 768 webli 10B 384 512 19.3 15.6
SigLIP-B@512 2023 timm ViT-B vla 768 webli 10B 512 724 20.1 16.6
SigLIP-L@256 2023 timm ViT-L vla 1024 webli 10B 256 384 18.8 15.2
SigLIP-L@384 2023 timm ViT-L vla 1024 webli 10B 384 512 24.2 19.6
DINOv2-B 2024 github ViT-B ssl 768 lvd142m 142M 518 724 14.3 11.5
DINOv2-L 2024 github ViT-L ssl 1024 lvd142m 142M 518 724 18.5 15.3
MetaCLIP-B 2024 timm ViT-B vla 768 2pt5b 2.5B 224 384 8.8 6.6
MetaCLIP-L 2024 timm ViT-L vla 1024 2pt5b 2.5B 224 384 14.4 11.7
DINOv2-B-reg 2024 github ViT-B ssl 768 lvd142m 142M 518 724 11.8 9.4
DINOv2-L-reg 2024 github ViT-L ssl 1024 lvd142m 142M 518 724 15.9 12.7
UNIC-L 2024 github ViT-L dist 1024 in1k 1M 518 512 11.4 8.9
UDON-ViT-B 2024 github ViT-B sup 768 uned 2.8M 224 384 7.5 5.5
UDON-CLIP 2024 github ViT-B sup 768 uned 2.8M 224 384 8.3 5.9
SigLIP2-B@384 2025 timm ViT-B vla 768 webli 10B 384 512 18.4 15.0
SigLIP2-B@512 2025 timm ViT-B vla 768 webli 10B 512 724 18.6 15.4
SigLIP2-L@384 2025 timm ViT-L vla 1024 webli 10B 384 512 24.6 19.9
SigLIP2-L@512 2025 timm ViT-L vla 1024 webli 10B 512 724 25.3 20.8
PE-B 2025 timm ViT-B vla 1024 meta 2.3B 224 384 20.2 9.7
PE-L@336 2025 timm ViT-L vla 1024 meta 2.3B 336 512 27.1 22.0
DINOv3-B 2025 github ViT-B ssl 768 lvd1689m 1.7B 768 768 26.4 22.0
DINOv3-L 2025 github ViT-L ssl 1024 lvd1689m 1.7B 768 768 31.1 26.5
Franca-L 2025 github ViT-L ssl 1024 laion600m 600M 224 384 9.7 7.6

performance on image-to-image retrieval with linear adaptation Evaluation based on cosine similarity between adapted image representations. Adaptation is performed via a linear layer learned on top of frozen backbones using supervised multi-domain learning on UnED.

name

name of the model

year

year of model's release

repo

source repository for model weights and code

timm: pytorch-image-models library

torchvision: torchvision library

github: official github library

arch

architecture of the model

ViT-(S|B|L): small, base or large Vision Transformer

CN-(B|L): base or large ConvNext

R(50|101): ResNet 50 or 101

CNN: Convolutional Neural Networks

train

training scheem used for model learning

sup: supervised learning

ssl: self-supervised learning

dist: distillation

vla: vision-language alignment

dims

dimensionality of desriptors

dataset

dataset used to train the model

data size

size of the training dataset

train res

size of the images used during training

test res

size of the largest image side used during testing

5M

mAP@1k on mini-ILIAS

100M

mAP@1k on ILIAS

AlexNet 2012 torchvision CNN sup 256 in1k 1M 224 384 1.9 1.3
VGG16 2015 torchvision CNN sup 512 in1k 1M 224 384 2.3 1.6
ResNet50 2016 torchvision R50 sup 2048 in1k 1M 224 384 2.5 1.8
ResNet101 2016 torchvision R101 sup 2048 in1k 1M 224 384 2.7 1.8
DenseNet169 2016 torchvision CNN sup 2048 in1k 1M 224 384 2.9 2.0
Inception-v4 2017 torchvision CNN sup 1536 in1k 1M 299 512 1.5 1.0
NASNet 2018 torchvision CNN sup 4032 in1k 1M 331 512 1.6 1.0
EffNet 2019 timm CNN sup+dist 1792 in1k 1M 380 512 4.3 2.9
SWAV 2020 github R50 ssl 2048 in1k 1M 224 384 2.9 2.1
ViT-B 2021 timm ViT-B sup 768 in1k 1M 224 384 1.9 1.3
ViT-B-in22k 2021 timm ViT-B sup 768 in21k 14M 224 384 6.2 4.4
ViT-L-in22k 2021 timm ViT-L sup 1024 in21k 14M 224 384 7.3 5.3
ViT-L 2021 timm ViT-L sup 1024 in1k 1M 224 384 6.6 4.7
ViT-L@384 2021 timm ViT-L sup 1024 in1k 1M 384 512 8.7 6.4
OAI-CLIP-R50 2021 github R50 vla 1024 opanai 400M 224 384 8.5 6.0
OAI-CLIP-B 2021 timm ViT-B vla 512 opanai 400M 224 384 10.7 7.9
OAI-CLIP-L 2021 timm ViT-L vla 768 opanai 400M 224 384 15.8 11.9
OAI-CLIP-L@336 2021 timm ViT-L vla 768 opanai 400M 336 512 19.9 15.2
DINO-R50 2021 github R50 ssl 2048 in1k 1M 224 384 4.1 2.9
DINO-ViT-B 2021 github ViT-B ssl 768 in1k 1M 224 384 6.6 4.8
MoCov3-R50 2021 github R50 ssl 2048 in1k 1M 224 384 3.4 2.6
MoCov3-ViT-B 2021 github ViT-B ssl 768 in1k 1M 224 384 3.2 2.3
OpenCLIP-ViT-L 2022 timm ViT-L vla 768 laion2b 2B 224 384 17.5 13.7
ConvNext-B 2022 timm CN-B sup 1024 in1k 1M 288 384 3.9 2.7
ConvNext-B-in22k 2022 timm CN-B sup 1536 in22k 14M 224 384 9.9 7.6
ConvNext-L 2022 timm CN-L sup 1024 in1k 1M 288 384 4.2 2.9
ConvNext-L-in22k 2022 timm CN-L sup 1536 in22k 14M 288 384 9.1 6.9
OpenCLIP-CN-B 2022 timm CN-B vla 640 laion2b 2B 256 384 18.1 14.0
OpenCLIP-CN-L@320 2022 timm CN-L vla 768 laion2b 2B 320 512 22.9 18.3
Recall@k-R50-SOP 2022 github R50 sup 512 sop 60k 224 384 3.1 2.1
Recall@k-ViT-B-SOP 2022 github ViT-B sup 512 sop 60k 224 384 7.3 5.3
CVNet-R50 2022 github R50 sup 2048 gldv2 1M 512 724 3.5 2.6
CVNet-R101 2022 github R101 sup 2048 gldv2 1M 512 724 4.2 3.1
DeiT3-B 2022 timm ViT-B sup+dist 768 in1k 1M 224 384 2.7 1.8
DeiT3-L 2022 timm ViT-L sup+dist 1024 in1k 1M 224 384 3.3 2.4
EVA-MIM-B 2023 timm ViT-B ssl 768 in22k 14M 224 384 4.7 3.2
EVA-MIM-L 2023 timm ViT-L ssl 1024 in22k 14M 224 384 3.9 2.7
EVA-MIM-L 2023 timm ViT-L ssl 1024 merged38m 38M 224 384 8.8 6.1
EVA-CLIP-B 2023 timm ViT-B vla 512 merged2b 2B 224 384 11.7 8.7
EVA-CLIP-L 2023 timm ViT-L vla 768 merged2b 2B 336 512 20.9 16.0
HIER-ViT-S-SOP 2023 github ViT-S sup 384 sop 60k 224 384 5.1 3.6
Unicom-B 2023 github ViT-B dist 768 laion400m 400M 224 384 13.8 11.1
Unicom-L 2023 github ViT-L dist 768 laion400m 400M 224 384 17.7 13.8
Unicom-L@336 2023 github ViT-L dist 768 laion400m 400M 336 512 18.6 14.6
Unicom-B-GLDv2 2023 github ViT-B sup 768 gldv2 400M 512 724 4.1 3.3
Unicom-B-SOP 2023 github ViT-B sup 768 sop 400M 224 384 12.8 9.9
SG-R50 2023 github R50 sup 2048 gldv2 1M 512 724 3.8 2.8
SG-R101 2023 github R101 sup 2048 gldv2 1M 512 724 4.5 3.2
USCRR-CLIP 2023 github ViT-B sup 768 uned 2.8M 224 384 6.4 4.3
SigLIP-B 2023 timm ViT-B vla 768 webli 10B 224 384 19.4 15.7
SigLIP-B@256 2023 timm ViT-B vla 768 webli 10B 256 384 20.6 16.7
SigLIP-B@384 2023 timm ViT-B vla 768 webli 10B 384 512 26.2 21.5
SigLIP-B@512 2023 timm ViT-B vla 768 webli 10B 512 724 27.5 23.0
SigLIP-L@256 2023 timm ViT-L vla 1024 webli 10B 256 384 26.3 21.8
SigLIP-L@384 2023 timm ViT-L vla 1024 webli 10B 384 512 34.3 28.9
DINOv2-B 2024 github ViT-B ssl 768 lvd142m 142M 518 724 15.0 12.1
DINOv2-L 2024 github ViT-L ssl 1024 lvd142m 142M 518 724 18.8 15.3
MetaCLIP-B 2024 timm ViT-B vla 768 2pt5b 2.5B 224 384 12.7 9.4
MetaCLIP-L 2024 timm ViT-L vla 1024 2pt5b 2.5B 224 384 21.7 16.9
DINOv2-B-reg 2024 github ViT-B ssl 768 lvd142m 142M 518 724 13.5 10.7
DINOv2-L-reg 2024 github ViT-L ssl 1024 lvd142m 142M 518 724 17.1 13.6
UNIC-L 2024 github ViT-L dist 1024 in1k 1M 518 512 15.3 11.7
UDON-ViT-B 2024 github ViT-B sup 768 uned 2.8M 224 384 7.3 5.3
UDON-CLIP 2024 github ViT-B sup 768 uned 2.8M 224 384 9.2 6.7
SigLIP2-B@384 2025 timm ViT-B vla 768 webli 10B 384 512 27.5 22.6
SigLIP2-B@512 2025 timm ViT-B vla 768 webli 10B 512 724 28.6 23.5
SigLIP2-L@384 2025 timm ViT-L vla 1024 webli 10B 384 512 36.3 30.3
SigLIP2-L@512 2025 timm ViT-L vla 1024 webli 10B 512 724 37.3 31.3
PE-B 2025 timm ViT-B vla 1024 meta 2.3B 224 384 20.2 16.1
PE-L@336 2025 timm ViT-L vla 1024 meta 2.3B 336 512 39.6 33.4
DINOv3-B 2025 github ViT-B ssl 768 lvd1689m 1.7B 768 768 26.4 22.5
DINOv3-L 2025 github ViT-L ssl 1024 lvd1689m 1.7B 768 768 32.9 28.3
Franca-L 2025 github ViT-L ssl 1024 laion600m 600M 224 384 12 9

performance on image-to-image retrieval with re-ranking Initial ranking based on global image representations via exhaustive search, and refinement of image similarity based on methods relying on local or refined global descriptors. Evaluation based on the refined similarities.

name

name of the combination

adapt: linearly adapted representations

year

year of re-ranking method publication

type

type of re-ranking

global

global descriptors used for re-ranking

adapt: linearly adapted representations

local

local descriptors used for re-ranking and their number in parentheses

top-NN

top nearest neighbors used for re-ranking

100M

mAP@1k on full ILIAS

oracle

oracle re-ranking on top-1k

AMES + SigLIP (adapt) 2024 local SigLIP-L@384 (adapt) AMES-bin-dist (600) 10k 38.9 56.0
AMES + SigLIP2 (adapt) 2024 local SigLIP2-L@512 (adapt) AMES-bin-dist (100) 1k 38.4 62.7
AMES + SigLIP (adapt) 2024 local SigLIP-L@384 (adapt) AMES-bin-dist (100) 10k 36.7 56.0
AMES + SigLIP (adapt) 2024 local SigLIP-L@384 (adapt) AMES-bin-dist (100) 1k 35.6 56.0
AMES + DINOv2 (adapt) 2024 local DINOv2-L (adapt) AMES-bin-dist (100) 1k 21.8 34.0
AMES + OpenCLIP (adapt) 2024 local OpenCLIP-CN-L@320 (adapt) AMES-bin-dist (100) 1k 27.1 48.0
AMES + SigLIP 2024 local SigLIP-L@384 AMES-bin-dist (100) 1k 26.4 48.7
SP + SigLIP (adapt) 2007 local SigLIP-L@384 (adapt) DINOv2-B-reg + ITQ (100) 1k 30.5 56.0
SP + SigLIP 2007 local SigLIP-L@384 DINOv2-B-reg + ITQ (100) 1k 21.8 56.0
CS + SigLIP (adapt) 2014 local SigLIP-L@384 (adapt) DINOv2-B-reg + ITQ (100) 1k 32.5 56.0
CS + SigLIP 2014 local SigLIP-L@384 DINOv2-B-reg + ITQ (100) 1k 22.9 48.7
αQE1 + SigLIP (adapt) 2019 global SigLIP-L@384 (adapt) -- full 33.7 56.9
αQE2 + SigLIP (adapt) 2019 global SigLIP-L@384 (adapt) -- full 31.5 54.4
αQE5 + SigLIP (adapt) 2019 global SigLIP-L@384 (adapt) -- full 23.5 49.3
αQE1 + SigLIP 2019 global SigLIP-L@384 -- full 22.1 44.7
αQE2 + SigLIP 2019 global SigLIP-L@384 -- full 20.4 40.8
αQE5 + SigLIP 2019 global SigLIP-L@384 -- full 14.3 34.9

performance on text-to-image retrieval Evaluation based on cosine similarity between the text query and db global image representations, extracted using textual and visual encoders of VLMs.

name

name of the model

year

year of model's release

repo

source repository for model weights and code

timm: pytorch-image-models library

hf: huggingface library

oc: open-clip library

arch

architecture of the model

ViT-(S|B|L): small, base or large Vision Transformer

CN-(B|L): base or large ConvNext

R(50|101): ResNet 50 or 101

CNN: Convolutional Neural Networks

dims

dimensionality of desriptors

dataset

dataset used to train the model

data size

size of the training dataset

train res

size of the images used during training

test res

size of the largest image side used during testing

5M

mAP@1k on mini-ILIAS

100M

mAP@1k on ILIAS

OAI-CLIP-R50 2021 oc R50 1024 opanai 400M 224 384 2.3 1.5
OAI-CLIP-B 2021 timm+oc ViT-B 512 opanai 400M 224 384 2.7 1.6
OAI-CLIP-L 2021 timm+oc ViT-L 768 opanai 400M 224 384 6.7 4.6
OAI-CLIP-L@336 2021 timm+oc ViT-L 768 opanai 400M 336 512 8.4 5.8
OpenCLIP-ViT-L 2022 timm+oc ViT-L 768 laion2b 2B 224 384 9.4 7.0
OpenCLIP-CN-B 2022 timm+oc CN-B 640 laion2b 2B 256 384 7.0 4.6
OpenCLIP-CN-L@320 2022 timm+oc CN-L 768 laion2b 2B 320 512 11.5 8.1
EVA-CLIP-B 2023 timm+oc ViT-B 512 merged2b 2B 224 384 4.4 2.5
EVA-CLIP-L 2023 timm+oc ViT-L 768 merged2b 2B 336 512 10.6 7.2
SigLIP-B 2023 timm+hf ViT-B 768 webli 10B 224 384 10.1 7.1
SigLIP-B@256 2023 timm+hf ViT-B 768 webli 10B 256 384 10.3 7.5
SigLIP-B@384 2023 timm+hf ViT-B 768 webli 10B 384 512 14.4 11.0
SigLIP-B@512 2023 timm+hf ViT-B 768 webli 10B 512 724 14.6 11.1
SigLIP-L@256 2023 timm+hf ViT-L 1024 webli 10B 256 384 16.4 12.8
SigLIP-L@384 2023 timm+hf ViT-L 1024 webli 10B 384 512 22.2 18.1
MetaCLIP-B 2024 timm+oc ViT-B 768 2pt5b 2.5B 224 384 7.6 4.9
MetaCLIP-L 2024 timm+oc ViT-L 1024 2pt5b 2.5B 224 384 13.1 9.2
SigLIP2-B@384 2025 timm+hf ViT-B 768 webli 10B 384 512 15.1 11.1
SigLIP2-B@512 2025 timm+hf ViT-B 768 webli 10B 512 724 14.6 10.4
SigLIP2-L@384 2025 timm+hf ViT-L 1024 webli 10B 384 512 23.7 18.6
SigLIP2-L@512 2025 timm+hf ViT-L 1024 webli 10B 512 724 24.7 19.8
PE-B 2025 timm+oc ViT-B 1024 meta 2.3B 224 384 7.9 5.5
PE-L@336 2025 timm+oc ViT-L 1024 meta 2.3B 336 512 19.5 14.6

Explore the collected data for your instance-level research!

Browse ILIAS

Get in touch

Citation

If you find our project useful, please consider citing us:

@inproceeding{ilias2025,
title={{ILIAS}: Instance-Level Image retrieval At Scale},
author={Kordopatis-Zilos, Giorgos and Stojnić, Vladan and Manko, Anna and Šuma, Pavel and Ypsilantis, Nikolaos-Antonios and Efthymiadis, Nikos and Laskar, Zakaria and Matas, Jiří and Chum, Ondřej and Tolias, Giorgos},
booktitle={Computer Vision and Pattern Recognition (CVPR)},
year={2025},
}

Results

Sumbit your results here:

If you have any further questions, please don't hesitate to reach out to kordogeo@fel.cvut.cz