MMFM model with IconQA dataset

The IconQA (icon question answering) dataset aims to highlight the importance of abstract diagram understanding and comprehensive cognitive reasoning in real-world diagram word problems. It consists of three sub-tasks: multi-image-choice, multi-text-choice, and filling-in-the-blank. IconQA requires not only perception skills like object recognition and text understanding, but also diverse cognitive reasoning skills, such as geometric reasoning, commonsense reasoning, and arithmetic reasoning.

In this project we used the multi-text-choice data with a pre-trained Multimodal Foundation Model.

Using Tensorleap we can explore the latent space, easily detect unlabeled clusters, and handle those with high loss. This quick start guide will walk you through the steps to get started with this example repository project.

Population Exploration

Below is a population exploration plot. It represents a samples similarity map based on the model’s latent space, built using the extracted features of the trained model.

It shows a visualization of the latenat space where each dot represent a smple. The color and the size affected from the loss value. In our case the latent space is clustered by the questions type.



Detecting High Loss Clusters

– Using Tensorleap Insight

When filtering the latent space by the higher loss samples the first “low performance” insight correlated to the question word “how” and other metadata, indicates that the model fails to predict how many marbles are in the image.

Image 1
Image 2 Image 3

– Using PE

When examining the population exploration (PE), we notice a group in the upper part of the latent space (marked with a yellow circle) that contains images associated with the same question: “On which color is the spinner less likely to land?“



Further investigation revealed two distinct groups: one with higher loss and one with lower loss. We found that the model consistently chooses ‘white’ as the answer, regardless of the actual conditions in the images.

Image 1 Image 2 Image 3 Image 4

– Using Dashboards

In tensorleap platform we can create and use dashboards in an easy way. Each sample contains required skills (one or more) to answer the question correctly. Using the dashboard, we found that tasks containing the ‘fraction’ skill tend to have a higher loss value.


Detecting Unlabeled Clusters in the Latent Space

Now, let’s look for additional clusters in our data using an unsupervised clustering algorithm on the model’s latent space.

Upon examination of these clusters, we can see that clusters 6, 13 and 18, located close to each other, contain different question and images, but they are all related to time and clock. The proximity of these clusters in the latent space suggests that the model has recognized a higher-level relationship among these concepts, grouping them together due to their shared relevance to the theme of time and clocks.



Fetching similar samples

Another approach to finding clusters using the model’s latent space is fetching similar samples to a selected sample. It enables you to identify a cluster with an intrinsic property you want to investigate. By detecting this cluster, you can gain insights into how the model interprets this sample and, in general, retrieve clusters with more abstract patterns.

The figure below shows a cluster of images with the question: ״What has been done to this letter?”



Upon analysis, we have noticed that the model consistently fails when the ground truth answer is ‘flip’. This difficulty suggests a specific challenge for the model in accurately identifying changes related to flipping letters.

Image 1 Image 2

Sample Loss Analysis

In this section, we can see the results of a gradient-based explanatory algorithm to interpret what drives the model to make specific predictions. It enables us to analyze which of the informative features contributes most to the loss function. We then generate a heatmap with these features that shows the relevant information.

Let’s analyze the following sample containing the question: “Are there enough carrot s for every rabbit?”. The correct predicted answer is: “no”. We see that the token that had the most impact on the model’s prediction is: ‘enough′.

Image 1 Image 2

Getting Started with Tensorleap Project

This quick start guide will walk you through the steps to get started with this example repository project.


Before you begin, ensure that you have the following prerequisites installed:

Tensorleap CLI Installation

with curl:

curl -s | bash

Tensorleap CLI Usage

Tensorleap Login

To login to Tensorleap:

tensorleap auth login [api key] [api url].


How To Generate CLI Token from the UI

  1. Login to the platform in ‘’
  2. Scroll down to the bottom of the Resources Management page, then click GENERATE CLI TOKEN in the bottom-left corner.
  3. Once a CLI token is generated, just copy the whole text and paste it into your shell.

Tensorleap Project Deployment

To deploy your local changes:

leap projects push --> need to change in the rest

Tensorleap files

Tensorleap files in the repository include and leap.yaml. The files consist of the required configurations to make the code integrate with the Tensorleap engine:


leap.yaml file is configured to a dataset in your Tensorleap environment and is synced to the dataset saved in the environment.

For any additional file being used, we add its path under the include parameter:

    - mmfm/
    - [...] file configures all binding functions used to bind to Tensorleap engine. These are the functions used to evaluate and train the model, visualize the variables, and enrich the analysis with external metadata variables


To test the system we can run leap_binder.check() function using poetry:

poetry run test

This file will execute several tests on script to assert that the implemented binding functions: preprocess, encoders, metadata, etc, run smoothly.

For further explanation please refer to the docs

Inspected models






Data Type





Visual Question Answering (VQA)

Picture of Chen Rothschild and Tom Koren
Chen Rothschild and Tom Koren