CelebA Dataset Gender and Age Group Prediction with MobileNetV2

CelebA Dataset Gender and Age Group Prediction with MobileNetV2

This project showcases a deep learning classification task for predicting the gender and age group of face individuals in images. We train a MobileNetV2 model to perform this task on CelebA benchmark using Keras and Tensorflow implementation.

Using Tensorleap we can easily debug and improve the model development process. This example project demonstrates how.

The Dataset

The dataset is a large-scale face dataset with attribute-based annotations. Cropped and aligned face regions are utilized as the training source. The data consist of 162,770 train images and 39,829 test images.


Training the MobileNetV2, with initialized weights – pretrained on Imagenet, with a new head for the multi-label binary task.

First, the model was trained for 5 epochs with a batch size of 64, on Binary Cross Entropy (BCE) loss, reaching to accuracy of 87.7% and loss value of 0.6497 for test set.

Latent Space Exploration

First, since the model’s task is to classify the gender and age group based on face individuals in images we can significantly learn about the model’s performance quality by investigating its latent space that TL creates for us automatically.

Gender Analysis

We can color the samples by their actual label and we notice that the image samples are projected in such a way that most of ‘Male’ samples are within the right side and ‘Female’ samples tend to be are within the left side. Also, when coloring the samples based on the model’s prediction we see the clear distinction in accordance to the actual label spread.


Model’s Latent Space colored by Gender GT


Model’s Latent Space colored by Gender Prediction

Age Analysis

Looking at the differences for the actual Age labels vs the model’s prediction it’s clear-cut that the model is weaker on this prediction task. The samples predicted as ‘Non-Young’ are mainly spread at the top whilst based on the actual label there isn’t any clear distinction and both age groups spread in the space.


Model’s Latent Space colored by Age GT


Model’s Latent Space colored by Age Prediction

We plot using TL dashboard the accuracy per class and indeed the Gender accuracy is around 90.5% and the Age accuracy is around 84%.


Gender and Age Accuracy

From plotting the binary accuracy vs the actual labels, we see that the model is failing on the ‘Non-Young’ samples compared to the ‘Young’ samples. The accuracy for ‘Non-Young’ ‘Male’ cohort is the lowest in around 6% from the ‘Non-Young’ ‘Female’ cohort. In addition, on average, the model is weaker on ‘Male’ samples compared to ‘Female’.


Binary Accuracy per labels combination

Image 1 Image 2
Gender and Age Accuracies per labels combination

The data labels distribution in the training set is imbalanced, particularly to ‘Age’. ‘Young’ samples consist of around 77% of the data and ‘Male’ samples consist around of 42%.

This might have caused the model to be more biased, likely to predict ‘Female’ and ‘Young’ rather than ‘Male’ and ‘Non-Young’ . Now we can fine-tune the model using a weighted loss function or by choosing a sampling method that will balance the distribution (upsampling/ downsampling) to improve the performance.

Untitled Data samples count per label


Data samples count per labels combination

We trained the model using a weighted BCE loss function based on the classes distributions. After finetuning the model we can compare in TL dashboard both models:

We see that the loss (used a non-weighted BCE loss function) is decreased by half from around 0.66 to 0.33. The Binary Accuracy is improved for around 90%, Gender Accuracy increased to 95.5% and Age accuracy increased to 86%.


Accuracy and Loss (BCE): BCE-model (left) and Weighted-BCE model (right)


Gender and Age Accuracies: BCE-model (left) and Weighted-BCE model (right)

Also, in terms of the four cohorts: the accuracy is much better for the ‘Non-Young’-‘Male’ samples however the model is still failing on ‘Non-Young’-‘Female’ which needs to be further investigated.


Binary Accuracy for per labels

Combination: BCE-model (up) and Weighted-BCE model (bottom)

Sample Analysis

We can observe that the more ‘Male-like’ features the model extracts are around the chin and under eye and neck. When the more ‘Women-like’ features are around the hair. Interesting to see that the ‘Young-like’ features are similar to those who direct the prediction towards Woman. This might be an important observation that that needs to be addressed and perhaps might explain why we found that the model is still failing on such population. The ‘Non-Young’ features are around the wrinkles which are in the forehead, in the neck, around eyebrows and the nose.

Image 1
Features direct the model towards ‘Young’ prediction

Image 2

Features direct the model towards ‘Non-Young’ prediction


Image 1

Features direct the model towards ‘Male’ prediction


Image 2

Features direct the model towards ‘Female’ prediction


To assess it, we split the confusion matrix for the Age prediction and indeed we see that the model tends to falsely predict for ‘Female’ samples as being ‘Young’ (FP) (left matrix).

Project Quick Start

Tensorleap CLI Installation


Before you begin, ensure that you have the following prerequisites installed:


with curl:

curl -s https://raw.githubusercontent.com/tensorleap/leap-cli/master/install.sh | bash

with wget:

wget -q -O - https://raw.githubusercontent.com/tensorleap/leap-cli/master/install.sh | bash

Tensorleap CLI Usage

Tensorleap Login

To login to Tensorealp:

leap auth login [api key] [api url].
  • See how to generate a CLI token here

Tensorleap Project Deployment

Navigate to the project directory.

To push your local project files (model + code files):

leap projects push <modelPath> [flags]

To deploy only the project’s code files:

leap code push

Tensorleap files

Tensorleap files in the repository include leap_binder.py and leap.yaml. The files consist of the required configurations to make the code integrate with the Tensorleap engine:


leap.yaml file is configured to a dataset in your Tensorleap environment and is synced to the dataset saved in the environment.

For any additional file being used we add its path under include parameter:


 - leap_binder.py


leap_binder.py file

leap_binder.py configure all binding functions used to bind to Tensorleap engine. These are the functions used to evaluate and train the model, visualize the variables, and enrich the analysis with external metadata variables


To test the system we can run leap_test.py file using poetry:

poetry run test

This file will execute several tests on leap_binder.py script to assert that the implemented binding functions: preprocess, encoders, metadata, etc, run smoothly.

For further explanation please refer to the docs.

Find out more about how Tensorleap can answer your needs. Click here.

Inspected models



CelebA and Tensorleap inc.



Data Type





Danielle Ben Bashat
Danielle Ben Bashat

Data Scientist