ILIAS:
Instance-Level Image retrieval At Scale

Giorgos Kordopatis-Zilos Vladan Stojnić Anna Manko Pavel Šuma Nikolaos-Antonios Ypsilantis Nikos Efthymiadis Zakaria Laskar Jiří Matas Ondřej Chum Giorgos Tolias

Visual Recognition Group, Faculty of Electrical Engineering, Czech Technical Univesity in Prague

explore

Dataset intro

ILIAS is a large-scale test dataset for evaluation on Instance-Level Image retrieval At Scale.
It is designed to support future research in image-to-image and text-to-image retrieval for particular objects and serves as a benchmark for evaluating representations of foundation or customized vision and vision-language models, as well as specialized retrieval techniques.

The dataset includes 1,000 object instances across diverse domains, with:

1,232 image queries, depicting query objects on clean or uniform background.
4,715 positive images, featuring the query objects in real-world conditions with clutter, occlusions, scale variations, and partial views.
1,000 text queries, providing fine-grained textual descriptions of the query objects.
100M distractors from YFCC100M to evaluate retrieval performance under large-scale settings, while asserting noise-free ground truth.

object instance. Manually collected 5,947 images -- 1,232 query images and 4,715 positives -- and 1,000 text queries

Retrieval in large-scale settings is achieved using 100M distractors from the YFCC100M dataset

Noise-free ground truth guaranteed by collecting objects non publicly available before 2014, the YFCC100M compilation date

foundation and legacy models are evaluated combined with linear adaptation and various re-ranking techniques

Benchmark

performance on image-to-image retrieval Evaluation based on cosine similarity between global image representations extracted from frozen backbones.

name name of the model	year year of model's release	repo source repository for model weights and code timm: pytorch-image-models library torchvision: torchvision library github: official github library	arch architecture of the model ViT-(S\|B\|L): small, base or large Vision Transformer CN-(B\|L): base or large ConvNext R(50\|101): ResNet 50 or 101 CNN: Convolutional Neural Networks	train training scheem used for model learning sup: supervised learning ssl: self-supervised learning dist: distillation vla: vision-language alignment	dims dimensionality of desriptors	dataset dataset used to train the model	data size size of the training dataset	train res size of the images used during training	test res size of the largest image side used during testing	5M mAP@1k on mini-ILIAS	100M mAP@1k on ILIAS
AlexNet	2012	torchvision	CNN	sup	256	in1k	1M	224	384	2.0	1.5
VGG16	2015	torchvision	CNN	sup	512	in1k	1M	224	384	3.0	2.3
ResNet50	2016	torchvision	R50	sup	2048	in1k	1M	224	384	2.3	1.7
ResNet101	2016	torchvision	R101	sup	2048	in1k	1M	224	384	2.7	1.9
DenseNet169	2016	torchvision	CNN	sup	2048	in1k	1M	224	384	3.2	2.4
Inception-v4	2017	torchvision	CNN	sup	1536	in1k	1M	299	512	1.7	1.1
NASNet	2018	torchvision	CNN	sup	4032	in1k	1M	331	512	1.7	1.0
EffNet	2019	timm	CNN	sup+dist	1792	in1k	1M	380	512	3.8	2.6
SWAV	2020	github	R50	ssl	2048	in1k	1M	224	384	2.2	1.7
ViT-B	2021	timm	ViT-B	sup	768	in1k	1M	224	384	1.4	1.0
ViT-B-in22k	2021	timm	ViT-B	sup	768	in21k	14M	224	384	4.2	3.0
ViT-L-in22k	2021	timm	ViT-L	sup	1024	in21k	14M	224	384	6.0	4.6
ViT-L	2021	timm	ViT-L	sup	1024	in1k	1M	224	384	5.1	3.6
ViT-L@384	2021	timm	ViT-L	sup	1024	in1k	1M	384	512	7.2	5.3
OAI-CLIP-R50	2021	github	R50	vla	1024	opanai	400M	224	384	4.4	3.2
OAI-CLIP-B	2021	timm	ViT-B	vla	512	opanai	400M	224	384	5.9	4.2
OAI-CLIP-L	2021	timm	ViT-L	vla	768	opanai	400M	224	384	9.0	7.0
OAI-CLIP-L@336	2021	timm	ViT-L	vla	768	opanai	400M	336	512	12.1	9.4
DINO-R50	2021	github	R50	ssl	2048	in1k	1M	224	384	3.8	2.9
DINO-ViT-B	2021	github	ViT-B	ssl	768	in1k	1M	224	384	5.0	3.7
MoCov3-R50	2021	github	R50	ssl	2048	in1k	1M	224	384	3.3	2.6
MoCov3-ViT-B	2021	github	ViT-B	ssl	768	in1k	1M	224	384	2.5	1.9
OpenCLIP-ViT-L	2022	timm	ViT-L	vla	768	laion2b	2B	224	384	11.8	9.4
ConvNext-B	2022	timm	CN-B	sup	1024	in1k	1M	288	384	2.8	2.0
ConvNext-B-in22k	2022	timm	CN-B	sup	1536	in22k	14M	224	384	3.2	6.4
ConvNext-L	2022	timm	CN-L	sup	1024	in1k	1M	288	384	8.9	2.2
ConvNext-L-in22k	2022	timm	CN-L	sup	1536	in22k	14M	288	384	8.6	6.6
OpenCLIP-CN-B	2022	timm	CN-B	vla	640	laion2b	2B	256	384	10.7	7.9
OpenCLIP-CN-L@320	2022	timm	CN-L	vla	768	laion2b	2B	320	512	12.7	9.6
Recall@k-R50-SOP	2022	github	R50	sup	512	sop	60k	224	384	2.3	1.6
Recall@k-ViT-B-SOP	2022	github	ViT-B	sup	512	sop	60k	224	384	6.8	5.0
CVNet-R50	2022	github	R50	sup	2048	gldv2	1M	512	724	3.7	2.9
CVNet-R101	2022	github	R101	sup	2048	gldv2	1M	512	724	3.9	3.0
DeiT3-B	2022	timm	ViT-B	sup+dist	768	in1k	1M	224	384	1.9	1.2
DeiT3-L	2022	timm	ViT-L	sup+dist	1024	in1k	1M	224	384	2.0	1.5
EVA-MIM-B	2023	timm	ViT-B	ssl	768	in22k	14M	224	384	3.1	2.1
EVA-MIM-L	2023	timm	ViT-L	ssl	1024	in22k	14M	224	384	2.5	1.5
EVA-MIM-L	2023	timm	ViT-L	ssl	1024	merged38m	38M	224	384	6.7	4.7
EVA-CLIP-B	2023	timm	ViT-B	vla	512	merged2b	2B	224	384	7.8	5.9
EVA-CLIP-L	2023	timm	ViT-L	vla	768	merged2b	2B	336	512	13.6	10.9
HIER-ViT-S-SOP	2023	github	ViT-S	sup	384	sop	60k	224	384	4.6	3.3
Unicom-B	2023	github	ViT-B	dist	768	laion400m	400M	224	384	13.8	11.0
Unicom-L	2023	github	ViT-L	dist	768	laion400m	400M	224	384	18.0	13.8
Unicom-L@336	2023	github	ViT-L	dist	768	laion400m	400M	336	512	17.8	13.9
Unicom-B-GLDv2	2023	github	ViT-B	sup	768	gldv2	400M	512	724	3.7	3.0
Unicom-B-SOP	2023	github	ViT-B	sup	768	sop	400M	224	384	12.2	9.1
SG-R50	2023	github	R50	sup	2048	gldv2	1M	512	724	4.3	3.4
SG-R101	2023	github	R101	sup	2048	gldv2	1M	512	724	4.5	3.4
USCRR-CLIP	2023	github	ViT-B	sup	768	uned	2.8M	224	384	5.7	3.8
SigLIP-B	2023	timm	ViT-B	vla	768	webli	10B	224	384	14.1	11.2
SigLIP-B@256	2023	timm	ViT-B	vla	768	webli	10B	256	384	14.6	11.5
SigLIP-B@384	2023	timm	ViT-B	vla	768	webli	10B	384	512	19.3	15.6
SigLIP-B@512	2023	timm	ViT-B	vla	768	webli	10B	512	724	20.1	16.6
SigLIP-L@256	2023	timm	ViT-L	vla	1024	webli	10B	256	384	18.8	15.2
SigLIP-L@384	2023	timm	ViT-L	vla	1024	webli	10B	384	512	24.2	19.6
DINOv2-B	2024	github	ViT-B	ssl	768	lvd142m	142M	518	724	14.3	11.5
DINOv2-L	2024	github	ViT-L	ssl	1024	lvd142m	142M	518	724	18.5	15.3
MetaCLIP-B	2024	timm	ViT-B	vla	768	2pt5b	2.5B	224	384	8.8	6.6
MetaCLIP-L	2024	timm	ViT-L	vla	1024	2pt5b	2.5B	224	384	14.4	11.7
DINOv2-B-reg	2024	github	ViT-B	ssl	768	lvd142m	142M	518	724	11.8	9.4
DINOv2-L-reg	2024	github	ViT-L	ssl	1024	lvd142m	142M	518	724	15.9	12.7
UNIC-L	2024	github	ViT-L	dist	1024	in1k	1M	518	512	11.4	8.9
UDON-ViT-B	2024	github	ViT-B	sup	768	uned	2.8M	224	384	7.5	5.5
UDON-CLIP	2024	github	ViT-B	sup	768	uned	2.8M	224	384	8.3	5.9
SigLIP2-B@384	2025	timm	ViT-B	vla	768	webli	10B	384	512	18.4	15.0
SigLIP2-B@512	2025	timm	ViT-B	vla	768	webli	10B	512	724	18.6	15.4
SigLIP2-L@384	2025	timm	ViT-L	vla	1024	webli	10B	384	512	24.6	19.9
SigLIP2-L@512	2025	timm	ViT-L	vla	1024	webli	10B	512	724	25.3	20.8
PE-B	2025	timm	ViT-B	vla	1024	meta	2.3B	224	384	20.2	9.7
PE-L@336	2025	timm	ViT-L	vla	1024	meta	2.3B	336	512	27.1	22.0
DINOv3-B	2025	github	ViT-B	ssl	768	lvd1689m	1.7B	768	768	26.4	22.0
DINOv3-L	2025	github	ViT-L	ssl	1024	lvd1689m	1.7B	768	768	31.1	26.5
Franca-L	2025	github	ViT-L	ssl	1024	laion600m	600M	224	384	9.7	7.6

performance on image-to-image retrieval with linear adaptation Evaluation based on cosine similarity between adapted image representations. Adaptation is performed via a linear layer learned on top of frozen backbones using supervised multi-domain learning on UnED.

name name of the model	year year of model's release	repo source repository for model weights and code timm: pytorch-image-models library torchvision: torchvision library github: official github library	arch architecture of the model ViT-(S\|B\|L): small, base or large Vision Transformer CN-(B\|L): base or large ConvNext R(50\|101): ResNet 50 or 101 CNN: Convolutional Neural Networks	train training scheem used for model learning sup: supervised learning ssl: self-supervised learning dist: distillation vla: vision-language alignment	dims dimensionality of desriptors	dataset dataset used to train the model	data size size of the training dataset	train res size of the images used during training	test res size of the largest image side used during testing	5M mAP@1k on mini-ILIAS	100M mAP@1k on ILIAS
AlexNet	2012	torchvision	CNN	sup	256	in1k	1M	224	384	1.9	1.3
VGG16	2015	torchvision	CNN	sup	512	in1k	1M	224	384	2.3	1.6
ResNet50	2016	torchvision	R50	sup	2048	in1k	1M	224	384	2.5	1.8
ResNet101	2016	torchvision	R101	sup	2048	in1k	1M	224	384	2.7	1.8
DenseNet169	2016	torchvision	CNN	sup	2048	in1k	1M	224	384	2.9	2.0
Inception-v4	2017	torchvision	CNN	sup	1536	in1k	1M	299	512	1.5	1.0
NASNet	2018	torchvision	CNN	sup	4032	in1k	1M	331	512	1.6	1.0
EffNet	2019	timm	CNN	sup+dist	1792	in1k	1M	380	512	4.3	2.9
SWAV	2020	github	R50	ssl	2048	in1k	1M	224	384	2.9	2.1
ViT-B	2021	timm	ViT-B	sup	768	in1k	1M	224	384	1.9	1.3
ViT-B-in22k	2021	timm	ViT-B	sup	768	in21k	14M	224	384	6.2	4.4
ViT-L-in22k	2021	timm	ViT-L	sup	1024	in21k	14M	224	384	7.3	5.3
ViT-L	2021	timm	ViT-L	sup	1024	in1k	1M	224	384	6.6	4.7
ViT-L@384	2021	timm	ViT-L	sup	1024	in1k	1M	384	512	8.7	6.4
OAI-CLIP-R50	2021	github	R50	vla	1024	opanai	400M	224	384	8.5	6.0
OAI-CLIP-B	2021	timm	ViT-B	vla	512	opanai	400M	224	384	10.7	7.9
OAI-CLIP-L	2021	timm	ViT-L	vla	768	opanai	400M	224	384	15.8	11.9
OAI-CLIP-L@336	2021	timm	ViT-L	vla	768	opanai	400M	336	512	19.9	15.2
DINO-R50	2021	github	R50	ssl	2048	in1k	1M	224	384	4.1	2.9
DINO-ViT-B	2021	github	ViT-B	ssl	768	in1k	1M	224	384	6.6	4.8
MoCov3-R50	2021	github	R50	ssl	2048	in1k	1M	224	384	3.4	2.6
MoCov3-ViT-B	2021	github	ViT-B	ssl	768	in1k	1M	224	384	3.2	2.3
OpenCLIP-ViT-L	2022	timm	ViT-L	vla	768	laion2b	2B	224	384	17.5	13.7
ConvNext-B	2022	timm	CN-B	sup	1024	in1k	1M	288	384	3.9	2.7
ConvNext-B-in22k	2022	timm	CN-B	sup	1536	in22k	14M	224	384	9.9	7.6
ConvNext-L	2022	timm	CN-L	sup	1024	in1k	1M	288	384	4.2	2.9
ConvNext-L-in22k	2022	timm	CN-L	sup	1536	in22k	14M	288	384	9.1	6.9
OpenCLIP-CN-B	2022	timm	CN-B	vla	640	laion2b	2B	256	384	18.1	14.0
OpenCLIP-CN-L@320	2022	timm	CN-L	vla	768	laion2b	2B	320	512	22.9	18.3
Recall@k-R50-SOP	2022	github	R50	sup	512	sop	60k	224	384	3.1	2.1
Recall@k-ViT-B-SOP	2022	github	ViT-B	sup	512	sop	60k	224	384	7.3	5.3
CVNet-R50	2022	github	R50	sup	2048	gldv2	1M	512	724	3.5	2.6
CVNet-R101	2022	github	R101	sup	2048	gldv2	1M	512	724	4.2	3.1
DeiT3-B	2022	timm	ViT-B	sup+dist	768	in1k	1M	224	384	2.7	1.8
DeiT3-L	2022	timm	ViT-L	sup+dist	1024	in1k	1M	224	384	3.3	2.4
EVA-MIM-B	2023	timm	ViT-B	ssl	768	in22k	14M	224	384	4.7	3.2
EVA-MIM-L	2023	timm	ViT-L	ssl	1024	in22k	14M	224	384	3.9	2.7
EVA-MIM-L	2023	timm	ViT-L	ssl	1024	merged38m	38M	224	384	8.8	6.1
EVA-CLIP-B	2023	timm	ViT-B	vla	512	merged2b	2B	224	384	11.7	8.7
EVA-CLIP-L	2023	timm	ViT-L	vla	768	merged2b	2B	336	512	20.9	16.0
HIER-ViT-S-SOP	2023	github	ViT-S	sup	384	sop	60k	224	384	5.1	3.6
Unicom-B	2023	github	ViT-B	dist	768	laion400m	400M	224	384	13.8	11.1
Unicom-L	2023	github	ViT-L	dist	768	laion400m	400M	224	384	17.7	13.8
Unicom-L@336	2023	github	ViT-L	dist	768	laion400m	400M	336	512	18.6	14.6
Unicom-B-GLDv2	2023	github	ViT-B	sup	768	gldv2	400M	512	724	4.1	3.3
Unicom-B-SOP	2023	github	ViT-B	sup	768	sop	400M	224	384	12.8	9.9
SG-R50	2023	github	R50	sup	2048	gldv2	1M	512	724	3.8	2.8
SG-R101	2023	github	R101	sup	2048	gldv2	1M	512	724	4.5	3.2
USCRR-CLIP	2023	github	ViT-B	sup	768	uned	2.8M	224	384	6.4	4.3
SigLIP-B	2023	timm	ViT-B	vla	768	webli	10B	224	384	19.4	15.7
SigLIP-B@256	2023	timm	ViT-B	vla	768	webli	10B	256	384	20.6	16.7
SigLIP-B@384	2023	timm	ViT-B	vla	768	webli	10B	384	512	26.2	21.5
SigLIP-B@512	2023	timm	ViT-B	vla	768	webli	10B	512	724	27.5	23.0
SigLIP-L@256	2023	timm	ViT-L	vla	1024	webli	10B	256	384	26.3	21.8
SigLIP-L@384	2023	timm	ViT-L	vla	1024	webli	10B	384	512	34.3	28.9
DINOv2-B	2024	github	ViT-B	ssl	768	lvd142m	142M	518	724	15.0	12.1
DINOv2-L	2024	github	ViT-L	ssl	1024	lvd142m	142M	518	724	18.8	15.3
MetaCLIP-B	2024	timm	ViT-B	vla	768	2pt5b	2.5B	224	384	12.7	9.4
MetaCLIP-L	2024	timm	ViT-L	vla	1024	2pt5b	2.5B	224	384	21.7	16.9
DINOv2-B-reg	2024	github	ViT-B	ssl	768	lvd142m	142M	518	724	13.5	10.7
DINOv2-L-reg	2024	github	ViT-L	ssl	1024	lvd142m	142M	518	724	17.1	13.6
UNIC-L	2024	github	ViT-L	dist	1024	in1k	1M	518	512	15.3	11.7
UDON-ViT-B	2024	github	ViT-B	sup	768	uned	2.8M	224	384	7.3	5.3
UDON-CLIP	2024	github	ViT-B	sup	768	uned	2.8M	224	384	9.2	6.7
SigLIP2-B@384	2025	timm	ViT-B	vla	768	webli	10B	384	512	27.5	22.6
SigLIP2-B@512	2025	timm	ViT-B	vla	768	webli	10B	512	724	28.6	23.5
SigLIP2-L@384	2025	timm	ViT-L	vla	1024	webli	10B	384	512	36.3	30.3
SigLIP2-L@512	2025	timm	ViT-L	vla	1024	webli	10B	512	724	37.3	31.3
PE-B	2025	timm	ViT-B	vla	1024	meta	2.3B	224	384	20.2	16.1
PE-L@336	2025	timm	ViT-L	vla	1024	meta	2.3B	336	512	39.6	33.4
DINOv3-B	2025	github	ViT-B	ssl	768	lvd1689m	1.7B	768	768	26.4	22.5
DINOv3-L	2025	github	ViT-L	ssl	1024	lvd1689m	1.7B	768	768	32.9	28.3
Franca-L	2025	github	ViT-L	ssl	1024	laion600m	600M	224	384	12	9

performance on image-to-image retrieval with re-ranking Initial ranking based on global image representations via exhaustive search, and refinement of image similarity based on methods relying on local or refined global descriptors. Evaluation based on the refined similarities.

name name of the combination adapt: linearly adapted representations	year year of re-ranking method publication	type type of re-ranking	global global descriptors used for re-ranking adapt: linearly adapted representations	local local descriptors used for re-ranking and their number in parentheses	top-NN top nearest neighbors used for re-ranking	100M mAP@1k on full ILIAS	oracle oracle re-ranking on top-1k
AMES + SigLIP (adapt)	2024	local	SigLIP-L@384 (adapt)	AMES-bin-dist (600)	10k	38.9	56.0
AMES + SigLIP2 (adapt)	2024	local	SigLIP2-L@512 (adapt)	AMES-bin-dist (100)	1k	38.4	62.7
AMES + SigLIP (adapt)	2024	local	SigLIP-L@384 (adapt)	AMES-bin-dist (100)	10k	36.7	56.0
AMES + SigLIP (adapt)	2024	local	SigLIP-L@384 (adapt)	AMES-bin-dist (100)	1k	35.6	56.0
AMES + DINOv2 (adapt)	2024	local	DINOv2-L (adapt)	AMES-bin-dist (100)	1k	21.8	34.0
AMES + OpenCLIP (adapt)	2024	local	OpenCLIP-CN-L@320 (adapt)	AMES-bin-dist (100)	1k	27.1	48.0
AMES + SigLIP	2024	local	SigLIP-L@384	AMES-bin-dist (100)	1k	26.4	48.7
SP + SigLIP (adapt)	2007	local	SigLIP-L@384 (adapt)	DINOv2-B-reg + ITQ (100)	1k	30.5	56.0
SP + SigLIP	2007	local	SigLIP-L@384	DINOv2-B-reg + ITQ (100)	1k	21.8	56.0
CS + SigLIP (adapt)	2014	local	SigLIP-L@384 (adapt)	DINOv2-B-reg + ITQ (100)	1k	32.5	56.0
CS + SigLIP	2014	local	SigLIP-L@384	DINOv2-B-reg + ITQ (100)	1k	22.9	48.7
αQE1 + SigLIP (adapt)	2019	global	SigLIP-L@384 (adapt)	--	full	33.7	56.9
αQE2 + SigLIP (adapt)	2019	global	SigLIP-L@384 (adapt)	--	full	31.5	54.4
αQE5 + SigLIP (adapt)	2019	global	SigLIP-L@384 (adapt)	--	full	23.5	49.3
αQE1 + SigLIP	2019	global	SigLIP-L@384	--	full	22.1	44.7
αQE2 + SigLIP	2019	global	SigLIP-L@384	--	full	20.4	40.8
αQE5 + SigLIP	2019	global	SigLIP-L@384	--	full	14.3	34.9

performance on text-to-image retrieval Evaluation based on cosine similarity between the text query and db global image representations, extracted using textual and visual encoders of VLMs.

name name of the model	year year of model's release	repo source repository for model weights and code timm: pytorch-image-models library hf: huggingface library oc: open-clip library	arch architecture of the model ViT-(S\|B\|L): small, base or large Vision Transformer CN-(B\|L): base or large ConvNext R(50\|101): ResNet 50 or 101 CNN: Convolutional Neural Networks	dims dimensionality of desriptors	dataset dataset used to train the model	data size size of the training dataset	train res size of the images used during training	test res size of the largest image side used during testing	5M mAP@1k on mini-ILIAS	100M mAP@1k on ILIAS
OAI-CLIP-R50	2021	oc	R50	1024	opanai	400M	224	384	2.3	1.5
OAI-CLIP-B	2021	timm+oc	ViT-B	512	opanai	400M	224	384	2.7	1.6
OAI-CLIP-L	2021	timm+oc	ViT-L	768	opanai	400M	224	384	6.7	4.6
OAI-CLIP-L@336	2021	timm+oc	ViT-L	768	opanai	400M	336	512	8.4	5.8
OpenCLIP-ViT-L	2022	timm+oc	ViT-L	768	laion2b	2B	224	384	9.4	7.0
OpenCLIP-CN-B	2022	timm+oc	CN-B	640	laion2b	2B	256	384	7.0	4.6
OpenCLIP-CN-L@320	2022	timm+oc	CN-L	768	laion2b	2B	320	512	11.5	8.1
EVA-CLIP-B	2023	timm+oc	ViT-B	512	merged2b	2B	224	384	4.4	2.5
EVA-CLIP-L	2023	timm+oc	ViT-L	768	merged2b	2B	336	512	10.6	7.2
SigLIP-B	2023	timm+hf	ViT-B	768	webli	10B	224	384	10.1	7.1
SigLIP-B@256	2023	timm+hf	ViT-B	768	webli	10B	256	384	10.3	7.5
SigLIP-B@384	2023	timm+hf	ViT-B	768	webli	10B	384	512	14.4	11.0
SigLIP-B@512	2023	timm+hf	ViT-B	768	webli	10B	512	724	14.6	11.1
SigLIP-L@256	2023	timm+hf	ViT-L	1024	webli	10B	256	384	16.4	12.8
SigLIP-L@384	2023	timm+hf	ViT-L	1024	webli	10B	384	512	22.2	18.1
MetaCLIP-B	2024	timm+oc	ViT-B	768	2pt5b	2.5B	224	384	7.6	4.9
MetaCLIP-L	2024	timm+oc	ViT-L	1024	2pt5b	2.5B	224	384	13.1	9.2
SigLIP2-B@384	2025	timm+hf	ViT-B	768	webli	10B	384	512	15.1	11.1
SigLIP2-B@512	2025	timm+hf	ViT-B	768	webli	10B	512	724	14.6	10.4
SigLIP2-L@384	2025	timm+hf	ViT-L	1024	webli	10B	384	512	23.7	18.6
SigLIP2-L@512	2025	timm+hf	ViT-L	1024	webli	10B	512	724	24.7	19.8
PE-B	2025	timm+oc	ViT-B	1024	meta	2.3B	224	384	7.9	5.5
PE-L@336	2025	timm+oc	ViT-L	1024	meta	2.3B	336	512	19.5	14.6

Explore the collected data for your instance-level research!

Browse ILIAS

Get in touch

Citation

If you find our project useful, please consider citing us:

@inproceeding{ilias2025,
title={{ILIAS}: Instance-Level Image retrieval At Scale},
author={Kordopatis-Zilos, Giorgos and Stojnić, Vladan and Manko, Anna and Šuma, Pavel and Ypsilantis, Nikolaos-Antonios and Efthymiadis, Nikos and Laskar, Zakaria and Matas, Jiří and Chum, Ondřej and Tolias, Giorgos},
booktitle={Computer Vision and Pattern Recognition (CVPR)},
year={2025},
}

Results

Sumbit your results here:

Open submission form

If you have any further questions, please don't hesitate to reach out to kordogeo@fel.cvut.cz

ILIAS: Instance-Level Image retrieval At Scale