No, you embed all images with CLIP and use an approximate nearest neighbour library (like faiss) to get the most similar ones to the query in logarithmic time. Embedding will also be invariant to small variations.
You can try this on images.yandex.com - they do similarity search with embeddings. Upload any photo and you'll get millions of similar photos, unlike Google that has only exact duplicate search. It's diverse like Pinterest but without the logins.
You can try this on images.yandex.com - they do similarity search with embeddings. Upload any photo and you'll get millions of similar photos, unlike Google that has only exact duplicate search. It's diverse like Pinterest but without the logins.
Query image: https://cdn.discordapp.com/attachments/1005626182869467157/1...
Yandex similarity search results: https://yandex.com/images/search?rpt=imageview&url=https%3A%...