## Analysis of CLIP Embeddings of validation split of ImageNette.

First, we install the CLIP model and PyTorch to create embeddings from some sample data. We follow the setup instructions from the OpenAI's [CLIP](https://github.com/openai/CLIP) repository.

In [None]:
%pip install ftfy packaging regex tqdm torch torchvision
%pip install git+https://github.com/openai/CLIP.git

### Imports

In [None]:
import os

import clip
import numpy as np
import pandas as pd
import torch
import torchvision
from PIL import Image
from tqdm import tqdm

import cobalt

### Data

We use the [Imagenette dataset](https://github.com/fastai/imagenette), a subset of the ImageNet dataset on 10 very different classes. In this example, we're using the CLIP model to analyse the images.

In [None]:
# run the commented-out line if you have not downloaded the dataset previously
# imagenette = torchvision.datasets.Imagenette(root=".", split="val", download=True)
imagenette = torchvision.datasets.Imagenette(root=".", split="val")

### Model
We use the CLIP model, which given an image or text, outputs a $512$ dimension vector of that image or text, with the property that two images that are similar to the human eye, will have small distance in the embedding space. 

In [None]:
device = "cpu"
if torch.cuda.is_available():
    device = "cuda"
elif torch.backends.mps.is_available():
    device = "mps"

device

In [None]:
model, preprocess = clip.load("ViT-B/32", device=device)

### Processing

We create a list of all the images in the `ImageNette` dataset in the variable `paths`, and for each image path in `paths`, we embed that. Then we stack all of those embeddings together vertically to create an embedding matrix. These image paths and the embedding are the necessary inputs for Cobalt.

In [None]:
image_root = imagenette._image_root

# Create a list of Image Paths.
paths = []
target_indices = []
for c in os.listdir(image_root):
    class_path = os.path.join(image_root, c)
    target_index = imagenette.wnid_to_idx[c]
    for f in os.listdir(class_path):
        paths.append(os.path.join(class_path, f))
        target_indices.append(target_index)

In [None]:
# Iterate through image paths to build an embedding of shape (N x 512).
# This takes about 2 minutes with the PyTorch MPS backend.
embedding = []
for p in tqdm(paths):
    with torch.no_grad():
        image = preprocess(Image.open(p)).unsqueeze(0).to(device)
        image_features = model.encode_image(image)
        embedding.append(image_features)

embedding_np = [element.cpu().numpy() for element in embedding]
embedding_matrix = np.concatenate(embedding_np)

We're ready to explore how CLIP groups the Imagenette dataset. Let's load this metadata (in this case the targets associated with each image, the embeddings, and the raw images) into cobalt. 
- We add the targets so that we can see if the CLIP embeddings are consistent with the target labels of the dataset.
- We use `add_embedding_array` to add an embedding. Note that every element in your dataframe needs to have a corresponding row in your embedding.
- We're going to need to perform one additional step of `add_media_column` to pass in a list of paths that should be interpreted as images for Cobalt to display. 

In [None]:
index_to_class = [c[0] for c in imagenette.classes]
targets = [index_to_class[y] for y in target_indices]

In [None]:
df = pd.DataFrame({"targets": targets})
df["targets"] = df["targets"].astype("category")
ds = cobalt.CobaltDataset(df)
ds.add_embedding_array(embedding_matrix)
ds.add_media_column(paths, local_root_path=".")
w = cobalt.Workspace(ds)

## UI Overview
Let's open the UI and see how Cobalt helps understand the embeddings.

In [None]:
w.ui.table_image_size = (160, 160)
w.ui

### Exploring the data in the UI
The different classes seem to be well separated in the graph shown above. You can double-click on nodes of the graph to select them, and open the data table to see the data contained in the selected node(s). 

You can also explore the automatically-generated clusters and see how well they align with the target classes. Each cluster seems to correspond to one class, but some classes are split into multiple clusters. See if you can come up with a hypothesis for why this might be.

## API usage

It is also possible to access the results and algorithms without using the UI.

In [None]:
clusters = w.clustering_results["auto_cluster"]

`subgroups` is a list of all of the clusters our algorithms return. Each cluster is a `CobaltDataSubset`. 

In [None]:
subgroups = [g.subset for g in clusters.groups]

# If you want to visualize the images stored in _image_path, scroll down below to see
# w.view_table in action.
subgroups[0]

You can access a more familiar `pd.DataFrame` version of this object by running:


In [None]:
subgroups[0].df

And you can see the images in it by running:

In [None]:
w.view_table(subgroups[0])

In this case, this group contains a lot of French horns.