Zero-shot learning (ZSL) is a machine learning approach that enables models to recognize and classify items without prior examples of those specific categories. This differs from traditional supervised learning, which requires large labeled datasets. Obtaining labeled data for every category is often difficult, especially for rare diseases or newly discovered species.
Zero-shot learning solves this by using auxiliary information such as textual descriptions or semantic attributes, rather than explicit labels. For example, a model trained on cats and dogs can identify birds using a text description of what a bird looks like. This method enables models to generalize and predict unseen data while utilizing existing knowledge to recognize new objects.

This article explores Zero-shot learning, its methodology, various types, and its limitations.
What is Zero-Shot Learning?
Zero-shot learning is a subfield of transfer learning where knowledge is transferred from seen to unseen classes. It allows models to predict unseen, unlabeled data without requiring prior training on specific classes. Key aspects of zero-shot learning include:
- Pre-trained models: ZSL usually uses models that have been pre-trained on large datasets, allowing them to learn general features and representations of the world.
- Auxiliary information: Instead of explicit labels, ZSL models depend on auxiliary information, such as textual descriptions, semantic information, or attributes. This information provides a form of supervision but is not a label.
- Seen and unseen classes: The classes that the model has been trained on are referred to as “seen” classes, and the classes the model is asked to predict are “unseen” classes. The model is able to classify instances from “unseen” classes despite never having seen labeled examples of them during training.
- Knowledge transfer: Zero shot learning is based on the idea of transferring knowledge gained from training on the “seen” classes to classify the “unseen” classes. This knowledge transfer is achieved by understanding the relationships and similarities between “seen” and “unseen” classes.
- Semantic space: Both seen and unseen classes are related in a high-dimensional vector space, called semantic space, where knowledge from seen classes can be transferred to unseen classes.
- Generalization: The goal of Zero-shot learning is to enable models to generalize to new, previously unseen data, similar to how humans can recognize new objects by understanding the relationship between known and unknown categories.
How Does Zero-Shot Learning Work?
The zero-shot learning process generally involves two main stages:
- Training
- Inference
Training Stages of Zero-Shot Learning
During training, the model is exposed to a variety of data, such as images and text, to develop a rich, general understanding of the world and the relationships between objects and their attributes. The model learns important features or traits that describe different things and how these features correlate with known categories.
The model may learn from auxiliary information such as text descriptions, attributes, or semantic embeddings. The model is trained on a set of “seen” classes using this auxiliary information. The goal is to create a shared feature space where both seen and unseen classes can be mapped and compared.
Inference Stages of Zero-Shot Learning
In the inference stage, the trained model is presented with a new classification task, including “unseen” classes, without any additional training. The model receives a description or embedding vector that explains what those new classes represent rather than labeled examples.
The model uses its pre-existing knowledge and learned feature space to infer connections between the new class descriptions and their features. It uses semantic similarities between the known and unknown classes to make predictions.
The model matches new data instances to the concepts based on this association, even though it has never seen labeled examples of the “unseen” class. It outputs a probability vector that represents the likelihood that a given input belongs to certain classes.
Types of Zero-Shot Learning
While the core concept of zero-shot learning (ZSL) involves classifying unseen data, there are different variations that address the challenge in slightly different ways. These variations can be categorized based on how they approach the learning problem:
- Standard Zero-Shot Learning
- Generalized Zero-Shot Learning (GZSL)
- Transductive Zero-Shot Learning
Standard Zero-Shot Learning
Standard zero-shot learning focuses on transferring knowledge from seen classes to entirely new, unseen classes. The model is trained on a set of labeled “seen” classes and then tested on a different set of “unseen” classes, without any overlap.
Standard ZSL maps out a semantic space to better understand the relationships between seen and unseen data, then makes connections to classify new data into unseen categories.
Generalized Zero-Shot Learning (GZSL)
Generalized zero-shot learning is a more realistic scenario that goes a step further than standard ZSL by incorporating both seen and unseen classes into the evaluation process.
In GZSL, the model is tested on a dataset that might contain samples from either seen classes or unseen classes. This means that the model must not only classify unseen data but also correctly distinguish between seen and unseen classes.
Generalized zero-shot learning addresses a common bias in standard ZSL models, which tend to favor seen classes over unseen ones. Its goal is to reduce the bias toward seen classes and improve performance for unseen classes. GZSL often requires additional techniques to mitigate the bias toward seen classes.
Transductive Zero-Shot Learning
Transductive zero-shot learning further expands on GZSL by utilizing unseen classes and unlabeled data points during the training phase. Unlike standard and generalized ZSL, which keep the unseen data completely separate during training, transductive ZSL allows the model to have some exposure to the unseen data, even though the data is unlabeled.
The data used in transductive ZSL scenarios does not always match logically with the labeled training data. This additional challenge is designed to improve the model’s ability to generalize and make better predictions when faced with real-world tasks.
The model can learn more about the data distribution by incorporating unlabeled data, improving its ability to classify data from unseen categories.
Zero-Shot Learning Methods
There are several approaches to zero-shot learning (ZSL), each with its own way of using auxiliary information to bridge the gap between seen and unseen classes. These methods can be broadly categorized into the following approaches:
Attribute-Based Methods
Attribute-based methods depend on human-defined attributes to describe the characteristics of different classes. Instead of training a classifier on labeled examples of each class, these methods train classifiers on labeled features or attributes of certain data classes, like color, shape, or other key characteristics.
Each class can be described by a set of attributes. For example, an animal might be described by attributes such as “has fur,” “has wings,” “is a mammal,” or “is carnivorous”.
The model learns to associate these attributes with the seen classes during training. It can then infer the label of an unseen class if its attributes resemble attribute classes present in the training data. These methods are handy when labeled examples of a target class are unavailable, but labeled examples of its characteristic features are relatively abundant.
Embedding-Based Methods
Embedding-based methods use semantic embeddings to represent classes in a high-dimensional space.
Semantic embeddings are vector representations of data attributes. Embeddings, such as word vectors or contextual embeddings, capture the semantic relationships between words and concepts. Both seen and unseen classes are mapped into a shared feature or semantic space, where the similarities between classes can be measured.
The model classifies a sample by checking how similar its embedding is to those of different classes. The similarity between embeddings is used to predict the class of unseen data.
For example, a model performing zero-shot text classification might use a pre-trained transformer model to convert words into vector embeddings. Likewise, a zero-shot image classification model might use a pre-trained convolutional neural network to identify important image features that could inform classification.
Generative-Based Methods
Generative-based methods use generative models such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) to generate synthetic examples of unseen classes. The generative model uses auxiliary information, such as a textual description of a class, to generate samples that can be used to convert the zero-shot learning problem into a standard supervised learning problem.
For instance, a large language model (LLM) can be used to create descriptions for new classes or generate data that helps train other models. These methods are useful in cases where labeled samples for the unseen classes are not available.
Figure 5. Illustration of Generative-Based Method. Source: Abdar et al. (2022)
Instance-Based Methods
Instance-based methods first obtain labeled instances for the unseen classes and then use these instances to train the zero-shot classifier. The instances can be obtained using different techniques, such as projection methods, which project feature space instances and semantic space prototypes into a shared space. Or, synthesizing methods, which create pseudo-instances using different techniques.
Why is Zero-Shot Learning Important?
Zero-shot learning (ZSL) is a significant advancement in machine learning because it addresses several limitations of traditional supervised learning methods. Its importance stems from its ability to enable models to generalize to new, unseen categories without requiring labeled data for every possible class. This capability is important for building more adaptable and efficient AI systems. Here’s a breakdown of why ZSL is important:
- Overcoming Data Scarcity: Traditional machine learning models require a large number of labeled examples for each class they need to recognize. However, in many real-world scenarios, obtaining such data is impractical, expensive, or even impossible. ZSL addresses this challenge by enabling models to learn from descriptions or attributes of classes rather than depend on labeled examples.
- Reducing Data Labeling Costs and Efforts: Data labeling is a labor-intensive, time-consuming, and expensive process. ZSL can help to reduce the need for extensive labeled datasets, which streamlines the development process for machine learning applications. This is particularly beneficial when specialized experts are needed for annotations, such as in biomedical datasets.
- Enhanced Scalability and Flexibility: ZSL models can generalize to new classes without needing to be retrained. This adaptability means that AI systems can quickly process new data in real-world circumstances, making them more scalable and flexible. ZSL provides a new level of flexibility in AI, allowing models to adapt to completely new data and tasks without additional labeling or retraining.
- Improved Generalization: Zero-shot models can generalize better by transferring knowledge from seen classes to unseen ones. This opens up new possibilities in AI applications where models must adapt to new environments or recognize novel objects.
- Dynamic Recognition of New Concepts: ZSL models can recognize new concepts dynamically without any additional data collection or retraining, relying on descriptions.
- Cost-Effective Innovation: ZSL can enable companies to innovate and personalize their offerings cost-effectively. It also helps assess risks, identify anomalies, and continuously improve processes.
Limitations of Zero-Shot Learning
In addition to the advantages of zero shot learning (ZSL), it also has several limitations that can impact its performance and applicability. These limitations are significant to consider when implementing ZSL models in real-world scenarios:
- Bias towards Seen Classes: ZSL models are often biased towards the classes they were trained on, which means they may favor predicting unseen data samples as belonging to one of the seen classes. This occurs because the model’s training is primarily based on the data and labels of seen classes. This bias becomes more pronounced when the model is evaluated on samples from both seen and unseen classes.
- Domain Shift: The “domain shift” problem is a common issue in ZSL, which occurs when the statistical distribution of data in the training set (seen classes) differs significantly from the testing set, which may include samples from seen or unseen classes. This means that ZSL models might not perform well on images or data that are far from the domain on which the model was trained.
- Reduced Performance: Due to the fact that ZSL models make inferences based on unseen classes, there’s a risk of incorrect generalization, particularly when the unseen classes are very different from the training data. This may result in performance issues and waste time instead of saving it.
- Greater Complexity: Because of the vast and varied nature of real-world data, ZSL models can perform well during standard ZSL scenarios, but may struggle in generalized or transductive settings.
- Dependence on Auxiliary Information: The quality and relevance of auxiliary information (such as textual descriptions or attributes) significantly impact the performance of ZSL models. If the auxiliary information is incomplete, inaccurate, or not rich enough, the model might struggle to classify unseen classes accurately.
- Generalization Challenges: Some methods of ZSL assume that every class can be described with a single vector of attributes, which is not always true. Also, attribute-based methods cannot generalize to classes whose attributes are unknown or not present in available samples.
Conclusion
Zero-shot learning (ZSL) is a significant advancement in machine learning that enables models to classify unseen objects or concepts without requiring labeled examples for every category. ZSL addresses data scarcity and reduces data labeling costs. This increases the scalability and adaptability of AI systems. While ZSL faces challenges such as bias, domain shift, hubness, and semantic loss, ongoing research continues to improve its performance.
Ultimately, ZSL is paving the way for more generalized, efficient, and autonomous AI systems, with widespread applications across multiple fields.