Zero-Shot Learning: How AI Learns Without Examples

TL;DR

Q 1. What is zero-shot learning?
Zero-shot learning (ZSL) is a machine learning technique where a model can recognize or classify new “unseen” classes that it never encountered in training. It does this by leveraging auxiliary information (like descriptions or attributes) about those unseen classes, effectively transferring knowledge from known classes to make predictions on unknown classes.

Q 2. How can a model learn without any training examples of a class?
By using semantic information or attributes that describe each class. During training, the model learns to associate class descriptions with the classes it knows. At test time, if a new class has a description (like “looks like a horse with black-and-white stripes”), the model can match this description to what it learned about similar seen classes and correctly recognize the zebra.

Q 3. How is zero-shot learning related to transfer learning?
Zero-shot learning is essentially an extreme form of transfer learning (also called learning transfer or transfer of learning). In traditional transfer learning, a model trained on one task or domain is fine-tuned on another related task. In zero-shot learning, instead of fine-tuning on new classes, the model transfers its knowledge via class descriptors to handle new classes without any direct training data for those classes.

Q 4. What’s the difference between zero-shot, one-shot, and few-shot learning?
These terms all refer to how many examples of a new class a model gets to see before it must classify that class. Zero-shot means zero examples – the model sees no labeled data of the new class (it relies on descriptions instead). One-shot learning gives the model one example of the new class to learn from, and few-shot provides a few examples. All are ways to generalize to new classes, but zero-shot is the most extreme case with no direct examples.

Q 5. How do language models use zero-shot learning?
Modern large language models (LLMs) can follow instructions or prompts for tasks they weren’t explicitly trained on – this is often called zero-shot prompting. For example, without any specific training, you can ask a language model to classify text into categories or answer questions and it can do so by relying on the vast knowledge it learned during pre-training. In NLP, zero-shot learning is achieved by representing task labels in the same semantic space as text (so the model “understands” the label meaning).

Q 6. Where is zero-shot learning used in real life?
Zero-shot learning is used in a variety of domains. In computer vision, it’s used to recognize new object categories without collecting new images. In healthcare, ZSL helps detect or diagnose emerging diseases using descriptions of symptoms when few or no training cases are available. In natural language processing, zero-shot models can classify text into new categories or handle new languages without additional training data. This flexibility makes ZSL very powerful when labeled data is scarce or when the set of possible classes can grow over time.

Imagine training a computer vision model to recognize animals like cats, dogs, and horses. Later, you want it to identify a zebra in a photo, but the model has never seen a zebra during training. Remarkably, if the model knows (through some side information) that “a zebra looks like a horse with black-and-white stripes,” it could correctly recognize the zebra! This ability to generalize to new categories without any direct examples is precisely what zero-shot learning aims to achieve.

In traditional supervised learning models are trained on labeled examples for every class they need to recognize. If you want a classifier to identify 100 different object categories, you must provide plenty of labeled images for each of those 100 categories. But what if an unexpected new class appears after deployment? Gathering and labeling data for that new class can be costly or too slow.

Zero-shot learning (ZSL) offers a solution: it enables a model to *recognize objects or perform tasks with “zero” direct training examples for some of the target classes. Instead of learning from examples, the model use auxiliary knowledge (like textual descriptions, attributes, or embeddings) to make inferences about new classes.

In the rest of this article, we will break down what zero-shot learning is, discuss different types and methods, illustrate how it works under the hood, and explore use cases in vision.

Here is what we will cover:

What is Zero-Shot Learning?
How Does Zero-Shot Learning Work?
Types of Zero-Shot Learning Approaches
Standard Zero-Shot Learning vs. Generalized Zero-Shot Learning
Zero-Shot Learning vs. Transfer Learning
Applications and Use Cases of Zero-Shot Learning
Pros and Cons of Zero-Shot Learning

What is Zero-Shot Learning?

Zero-shot learning (ZSL) is a subfield of machine learning that addresses the scenario where we want to classify or recognize new classes that the model was never trained on. In other words, during training, the model only sees a set of “seen” classes, but at test time, it may encounter “unseen” classes that have no labeled examples in the training set.

Despite that, the model should correctly predict the label of an unseen class instance. This is achieved by providing the model with auxiliary information about every class (both seen and unseen) – such as human-defined attributes or textual descriptions, which serves as a bridge to relate unseen classes to the seen ones.

Let’s break that down in simpler terms. In zero-shot learning, your dataset is divided into two parts:

Seen classes: Categories for which you have training data. For example, images of cats and dogs with labels.
Unseen classes: Categories for which you have no training examples. For instance, a category “tiger” with zero training images in your dataset.
Auxiliary information: Additional information available for all classes (seen and unseen) that describes what the class is. This could be a list of attributes or a text description. It can also be learned embeddings of the class name or description in a vector space.

During training, the model learns how the auxiliary information correlates with the visual (or textual, etc.) features of the seen classes. At test time, when an unseen class instance comes in, the model uses the auxiliary info of unseen classes to infer which class best matches the instance.

For example, suppose during training we have classes Cat and Dog with images. At test time, we get an image of a Tiger (unseen). We also have auxiliary info: perhaps each animal class has attributes like {striped, has mane, cute, domestic, etc}. The model knows from training that “Cat” is {not striped, no mane, cute, domestic} and “Dog” is {not striped, no mane, cute, domestic} (just hypothetical attributes).

The unseen class “Tiger” is described as {striped, has mane, not domestic}. The model compares the image’s features to the attribute profiles: the image likely has “striped” and “has mane” traits, which match Tiger’s description much more than Cat or Dog, so the model predicts Tiger. In this way, the model recognized Tiger without ever seeing a tiger image, by using the description as context.

Key Definition and Terminology

Zero-shot classification: The most common setting of ZSL, where the task is to assign a label from unseen classes to a new sample. For instance, classify an image as “zebra” even though no zebra images were in the training data.
Auxiliary information: Also called side information, semantic information, or metadata. This can be:
- Attribute vectors: Pre-defined characteristics for each class (like a binary vector indicating which attributes are present). Originally used in early zero-shot vision tasks for animals.
- Textual descriptions: Free-form text descriptions or definitions of each class, such as encyclopedia entries.
- Word embeddings: Continuous vector representations of class names or descriptions (e.g., using Word2Vec or GloVe embeddings of the words “zebra”, “cat”, etc.). These place classes in a semantic vector space where related classes are closer together.
- Knowledge graphs: In some cases, a graph of relationships between classes (like WordNet hierarchy) is used to inform the model.
Seen vs. Unseen classes: As described, seen classes have training data; unseen classes have none (they appear only at test time).
Generalization: Zero-shot learning is all about generalizing knowledge. It’s sometimes described as pushing generalization to the extreme: not just generalizing to new instances of known classes, but to new classes entirely.

How Does Zero-Shot Learning Work?

Under the hood, zero-shot learning methods often rely on learning a shared embedding space between inputs and class labels (via their descriptions). The model architecture typically has two or three main components:

Training
Testing

**Figure:** A basic architecture for zero-shot learning. **Left:** The **semantic module** encodes class information into a semantic embedding space (S). **Right:** The **visual module** encodes inputs (x) into a feature space (V). **Top:** The **core ZSL module**. | Source

In training, the model sees examples (x) of seen classes with their label y. It learns to project the input x (image features, for example) through the visual module V and the class label y (or class attributes) through the semantic module S, such that the correct pair (x belonging to class y) has a high similarity score E(V(x), S(y)).

Essentially, it’s learning to align images with the corresponding class description in the shared embedding space. A loss function is used to ensure that matching pairs are scored higher than mismatched pairs.

During testing, we have a new input (say an image) with an unknown class. The model computes its visual embedding V(x). It also takes the candidate unseen class labels, obtains their semantic embeddings S(y) for each candidate y, and computes the compatibility score E(V(x), S(y)) for each. The class with the highest score is chosen as the prediction. In simpler terms, the model asks, “Which class description y is most similar to this input x?” and picks the best match.

An Example Workflow

Gather auxiliary info: Suppose we have seen classes {cat, dog} and an unseen class {tiger}. We predefine or obtain attributes for each:
- Cat: (has whiskers=Yes, has stripes=No, is big=No, domestic=Yes, has mane=No)
- Dog: (whiskers=No, stripes=No, big=No, domestic=Yes, mane=No)
- Tiger: (whiskers=Yes, stripes=Yes, big=Yes, domestic=No, mane=Yes)
  Here each class is represented by a binary attribute vector of these properties.
Train on seen classes: Use many images of cats and dogs. For each image, the model learns to predict its class by effectively matching the image’s visual features to either the “cat” or “dog” attribute vector. If an image has features like “furry, four-legged, small”, it should align more with the cat or dog vectors (similar to this toy example).
Inference on unseen class: Now give the model a new image (a tiger) at test time. The model extracts visual features (maybe it sees “striped, four-legged, furry, large”). It then compares this against the semantic embeddings:
- Similarity with “Cat” attributes: moderate (some features match, but “striped” and “large” do not match cat’s profile).
- Similarity with “Dog” attributes: also moderate (no stripes in dog profile either, and dog isn’t large).
- Similarity with “Tiger” attributes: high (striped=yes, large=yes, etc. match well).
  The model correctly picked a Tiger as the best match, even though it had never seen a tiger image before. The attributes acted as the glue connecting what it learned about cats and dogs to the concept of a tiger.

In practice, modern approaches use more advanced features and embeddings rather than simple binary vectors. For example, they might use a pre-trained CNN like ResNet to get a visual feature vector for an image, and use word embeddings (like Word2Vec or BERT) to get a vector for a class name or description.

Both vectors live in a high-dimensional space, and the model’s job (the ZSL module) is to bring matching pairs closer together (e.g., image of tiger close to the “tiger” description vector).

The ability to compare these embeddings allows zero-shot learning to work across modalities too. If your input is text and your classes are defined by some other information, as long as you can embed them in a common space, you can do zero-shot classification.

A classic NLP case is zero-shot text classification: represent a label (like “sports” or “politics”) as a sentence or embedding, and represent the text to classify in the same space; then see which label embedding is closest to the text embedding.

Types of Zero-Shot Learning Approaches

Not all zero-shot learning methods are identical. Over time, various approaches improve ther zero-shot learning, often falling into a few broad categories:

Attribute-Based Methods

This is one of the earliest approaches. Here, human-defined attributes are used as the intermediate information. Each class is described by a vector of attributes (binary or continuous) that humans think are important.

Models are trained to predict these attributes from images (or other input) and then use the predicted attributes to decide the class.

For example, an attribute-based ZSL model for animals might have 50 attributes (stripe, spots, hooves, tail, etc.). The model learning stage might involve predicting that a given image has certain attributes. At test time, the model predicts the attribute vector for a new image and then picks the class whose attribute vector is closest to that prediction. If the new image has attributes [stripes=yes, hooves=no, mane=yes, …] and the closest matching class profile is “zebra”, it outputs zebra.

The strength of attribute-based methods is that attributes are often interpretable – you can see what the model thinks the unseen object has, and these are human-understandable traits. However, attribute definitions need to be provided for every class, which can be labor-intensive or not always feasible.

Semantic Embedding-Based Methods

Embedding-based methods take advantage of continuous vector representations of class semantics, typically learned from data (rather than manual attributes). One common approach is to use word embeddings of class names or descriptions.

Techniques like Word2Vec, GloVe, or BERT can produce a vector for a word or a phrase. These vectors capture semantic relations – for instance, the embedding for “tiger” will be nearer to “cat” or “lion” than to “airplane” in the vector space, because of how words are used in text corpora.

In ZSL, we use such embeddings as the class prototypes in the semantic space. The model maps input features into the same semantic space and compares.

For example, DeViSE is a well-known embedding-based zero shot learning model that maps images into a pre-trained word embedding space and classifies by finding the nearest class word vector. If an image’s embedding is closer to the vector for “zebra” than other words, it’s labeled as zebra, even if zebra images were not in training.

Semantic embeddings don’t require manually listing attributes, and they can be obtained from unannotated text data. They enable knowledge transfer through the semantic relatedness of words.

However, a challenge is that these embeddings might not capture fine-grained visual differences needed for classification (they capture general language context). Researchers often improve this by learning a joint embedding space or adding adjustments so that visual features and semantic features align better.

This image has an empty alt attribute; its file name is AD_4nXdGZSKLxerTZBF2nS7Rzgrz81qmS_7GdRNNW82eYwqzaujkl73KtCtlKcKYYW7EF4pcpE3uXmTxK6L2SOGcFupNjTee3k2dA_NwExtcEhGryBLLA6eZK-UP2pQvNAiscIe6WYZvXA — **Figure:** Illustration of Embedding-based Method | **Source**

Generative Methods

Generative approaches have gained popularity to tackle some limitations of the above methods, especially for generalized zero shot learning. The idea here is to actually generate synthetic data for unseen classes, effectively turning the zero-shot task into a more standard supervised problem.

How does that work? Methods using Generative Adversarial Networks (GANs) or Variational Autoencoders (VAEs) will train on the seen classes’ data and also ingest the semantic info of both seen and unseen classes. They learn to produce feature vectors that look like they could have come from unseen class images.

For example, using a GAN conditioned on class embeddings, one can generate fake visual feature vectors for the class “zebra”. By generating, say, hundreds of zebra-like feature vectors, we can then train a regular classifier on both the real seen class features and these generated unseen class features. Now the model has “pseudo-examples” of zebras and can classify zebras vs cats vs dogs in one unified model.

This image has an empty alt attribute; its file name is AD_4nXeMq0qPiM7-atJrGdLrpHsGWsNKNGdJ23uyDDijLTMCtkxmcsDwgciPTAKWEWYJtozVaE1wrIwZfpi6grDeIkX7gg_YnH19exl9HmXndmfkLdZ4plAfLhlKbBj6wQ_aUTNdFrNtog — **Figure:** Illustration of Generative-Based Method | **Source**

This approach essentially bypasses the tricky step of direct zero-shot recognition by creating data for unseen classes. Models like f-CLSWGAN (a famous GAN-based ZSL model) have demonstrated significantly improved performance by synthesizing unseen features.

Standard Zero-Shot Learning vs. Generalized Zero-Shot Learning

When evaluating zero-shot models, two settings are commonly discussed:

Standard Zero-Shot Learning (ZSL): The assumption here is that at test time, the model will only encounter unseen classes. In other words, the set of possible labels for your test samples includes only the unseen classes. The model knows that anything it sees at test must be one of those new classes (none will be from the old seen classes). This makes the task slightly easier, because the model won’t mistakenly predict a seen class for an unseen instance – seen classes aren’t even in the choice set during testing.
Generalized Zero-Shot Learning (GZSL): This more realistic scenario assumes that at test time, samples can be either from seen or unseen classes (mixed). The model does not get a heads-up on which is which. It must be prepared to recognize any class from the union of sthe een and the unseen. This is much harder because models tend to be biased toward predicting seen classes (since they were optimized on those). An image of a zebra might be erroneously labeled “horse” by a biased model because it learned horses well and never actually saw a zebra – unless it truly learned to pay attention to the “striped” attribute and realize a horse doesn’t have that.

The difference can be summarized as:

Scenario	Training Classes	Test Classes (Possible Labels)
Standard ZSL	Seen classes only	Unseen classes only (new classes)
Generalized ZSL	Seen classes only	Seen + Unseen classes (all classes)

In GZSL, a key challenge is detecting if a test sample belongs to an unseen class or not. Some approaches introduce an additional “gating” or rejection model that first decides “seen vs unseen?” for a given input.

For example, it might output a confidence score – if the input doesn’t seem to match any seen class with high confidence, then treat it as unseen. Only then pass it to the zero-shot mechanism to decide which unseen class it is. If it looks very much like a seen class, just predict that.

This image has an empty alt attribute; its file name is AD_4nXdRBQqPq95yea5JGS_yAkTZnZCxMLIyikLTCwXulXai6ca3cPatw-3y6t7M2FxlrHL-aiun-yrgwtNEhOYSxgWOL6uDPgR69vrqntiqgHiQKPxk3Ve3el25vk_hneQuQVuvL7uN — **Figure:** Illustration of ZSL versus GZSL | **Source**

Zero-Shot Learning vs. Transfer Learning

Zero-shot learning can be viewed as a special case of transfer learning in machine learning. In classical transfer learning, we transfer knowledge from one setting to another.

For example, you might train a neural network on a large dataset (like ImageNet) and then fine-tune it on a smaller specific dataset for your task – that’s transfer learning (reusing learned features for a new task). The model has experience in a related domain, which helps it perform better with fewer new examples.

In zero-shot learning, we are also transferring knowledge, but in an even more extreme way:

The model’s knowledge from seen classes and the provided semantic information is being transferred to make predictions about unseen classes without any direct fine-tuning on those classes. The “transfer of learning” happens through the semantic descriptors. Essentially, the model’s understanding of attributes or language is the transferred piece that bridges tasks.
There is no supervised fine-tuning on the unseen classes (because we have no labels for them). This is why ZSL is sometimes called “extreme zero-data transfer”.

Another perspective: Domain Adaptation is a form of transfer learning where a model trained in one domain is adapted to another domain (say, trained on photos, applied to sketches). ZSL can be considered an extreme case of domain adaptation where the “domain” is actually new classes Altogether.

If I sum up, I can say that transfer learning is the umbrella concept of leveraging learned knowledge for a new problem.On the other hand, zero-shot learning is a specific instance where the “new problem” is recognizing new classes with no examples, using some knowledge (attributes/embeddings) as the transfer medium. Both aim to avoid learning from scratch when faced with a new task. Zero-shot just pushes the idea to the limit by requiring zero new training examples.

Applications and Use Cases of Zero-Shot Learning

One reason zero-shot learning has gained traction is its applicability across various domains where data collection is difficult or the set of categories is open-ended. Here we highlight a few key areas and examples:

Computer Vision (Image Recognition and Beyond)

Image recognition problems largely drove early research on ZSL. A classic use case is in recognizing new object categories without additional training images. For example, the Animals with Attributes dataset was designed to evaluate ZSL: a model might be trained on images of 40 animals and then tested on 10 new animals, using a predefined list of attributes for all 50 animals.

This simulates discovering new species – you might not have pictures of them yet, but if someone describes the creature, the AI can identify it in images.

Object Classification: Given a dictionary of attributes or textual descriptions, vision models can identify objects like vehicles, animals, or plants that were unseen. For instance, a system could learn to identify a new brand of car by reading its specs (engine type, shape features) without needing thousands of photos of that car.
Image Search and Tagging: Zero-shot models can be used in image retrieval. A search engine can recognize a concept in an image even if that concept was not in the training set, by understanding the semantic relationship. For example, if a user searches for “zebra”, a ZSL-enabled vision system that was trained on horses might still fetch zebra images by understanding the text description of “zebra”.
Object Detection and Segmentation: Beyond classification, ZSL can be applied to detection (finding unseen object types in an image) and segmentation (labeling pixels of unseen categories). An autonomous car, for example, might encounter a new type of traffic sign or obstacle. ZSL could help it recognize “this is like those other signs I know, but with a new symbol” by using a description. This can make systems more robust to novel objects in the environment.
Image Captioning: Some vision-and-language models use zero-shot techniques to describe images that contain objects not present in the training data. They use knowledge of language – if the model knows the word “unicorn” through reading text, and the visual features hint at a horse-like creature with a horn, it could caption an image as “a unicorn” despite never seeing one.

Natural Language Processing (NLP)

In NLP, zero-shot learning has become quite prominent with the rise of large language models and flexible frameworks:

Zero-Shot Text Classification: Using pre-trained language models (like BERT and GPT), one can classify text into categories without explicit training for those categories. For example, you could classify customer reviews by sentiment into positive, negative, and neutral without having a labeled training set for sentiment.
Intent Recognition and Dialogue: In virtual assistants, you might have new user intents or queries that were not in the training data. Zero-shot models can match an utterance to a described intent. For instance, if a new intent “order flowers” is introduced, rather than collecting hundreds of examples of users asking to order flowers, one can describe “user wants to purchase or send flowers,” and a zero-shot learning model might correctly classify a novel phrasing of that intent as such.
Cross-Lingual Transfer: A multilingual language model can often perform zero-shot cross-lingual classification: train on English labeled data, then given the same model, classify text in Spanish or Arabic by relying on its multilingual knowledge. The “unseen” aspect here is the language rather than the class. For example, XLM-Roberta or mBERT can do sentiment analysis in a language even if fine-tuned only in English, to some extent – this is zero-shot transfer across languages.

Overall, zero-shot learning in NLP has benefited greatly from pre-trained language models that capture a lot of world knowledge. The model’s experience on vast text (a form of transfer learning) enables it to make leaps to new tasks without task-specific data.

Healthcare and Biomedical

The healthcare domain often faces scenarios with limited or no data for certain conditions. Zero-shot learning opens some promising directions here:

Medical Image Classification: Consider a deep learning model for radiology trained on common conditions. If a new condition emerges (or a very rare one wasn’t in the training set), a zero-shot approach could allow the model to recognize it by using a description of the condition.
Genomics and Biological Data: In computational biology, zero-shot learning might help in identifying functions of genes or proteins that were not seen during training, using descriptions of those genes (from literature) as background knowledge.
Clinical Text Understanding: A clinical NLP system might be trained on certain known medical intents or findings. If a new syndrome is identified, the system could use a definition of that syndrome to flag relevant reports or patient messages, even before any training data is collected.
Drug Discovery: Predicting the properties of new drugs or molecules could potentially use zero-shot ideas: a model trained on known compounds might predict effects of a new compound if given descriptors of its structure or properties relative to known ones.

A concrete example is that ZSL was applied to identify novel adverse drug reactions by linking patient symptoms to known side-effect profiles of drugs (if a symptom is described that wasn’t in training data, the model can still recognize it as related to a drug using the description of the drug’s effects).

It should be noted that in high-stakes fields like healthcare, one must be cautious – zero-shot predictions can be wrong if the model’s prior knowledge is incomplete. But it’s a powerful assistive tool, for instance to prioritize cases for human review.

Pros and Cons of Zero-Shot Learning

Like any approach, zero-shot learning comes with advantages and trade-offs. Let’s highlight some:

Major Advantages:

No Labeled Data Needed for New Classes: The most obvious benefit is that you can deploy a classifier for new categories without collecting and labeling new training datasets. This makes AI systems more scalable and adaptable, especially in environments where new classes appear frequently or labeling is extremely costly like new biomedical conditions, new object categories, or new intents in an app.
Efficiency and Speed: You save the time and cost of annotation. ZSL can enable near-instant expansion of a model’s knowledge base by just providing new class descriptions. For businesses, this means faster adaptation to trends (e.g., a content classifier can start catching a new content category as it arises, by describing it to the model).
Human-Like Flexibility: It aligns with how humans learn and generalize. This can be important for AI explainability—using attributes or descriptions is somewhat interpretable. By looking at the semantic similarities, you can see why the model thought an unseen instance was a certain class.
Handles Rare/Emerging Scenarios: In cases where data is inherently scarce (rare diseases, zero-day cybersecurity threats), ZSL is one of the few ways to get any model support, short of waiting to gather data. It provides a starting point when you have no examples of a phenomenon yet.

Key Challenges and Downsides:

Accuracy Trade-off: Zero-shot predictions are generally less accurate than predictions for classes the model was trained on. There is often a significant gap in performance between seen and unseen classes, especially under generalized settings. The model’s understanding from descriptions might be incomplete or too coarse, so it can make mistakes a human wouldn’t if given one or two examples.
Quality of Auxiliary Information Matters: The saying “garbage in, garbage out” applies. If the class descriptions or attributes are insufficient or inaccurate, the model will struggle. Crafting good class embeddings or attributes is crucial. In some cases, obtaining these may require expert knowledge, such as listing genomic features of a new virus.
Bias Toward Seen Classes: In generalized ZSL, models often show a bias, meaning they tend to predict a seen class label even when the input is actually an unseen class. This happens because the model was optimized heavily on seen classes and might treat unseen class descriptions as second-class citizens.
The Hubness Problem: In high-dimensional embedding spaces, a known issue is “hubness” – some points (hubs) end up unusually close to many others, causing the model to erroneously map many inputs to the same class (the hub). This technical challenge, specific to nearest-neighbor search in embedding,s can hurt ZSL performance, though advanced methods attempt to mitigate it.
Need for General Knowledge: Zero-shot learning, in essence, pushes the difficulty to the knowledge representation. For a model to succeed, it often needs a lot of prior knowledge. For instance, language-model-based ZSL works much better with very large models (hundreds of millions or billions of parameters) because they encode more world knowledge. This can make ZSL approaches heavy and dependent on pre-training. If your model’s knowledge is limited, it might simply not know enough to make the leap to an unseen class.

Conclusion

Zero-shot learning represents an exciting direction in AI that strives for more versatile and adaptable models. ZSL pushes the boundaries of generalization in machine learning by allowing models to transfer knowledge to entirely new classes or tasks without direct examples. We have seen how this concept plays out in image recognition and in NLP, as well as its potential in fields like healthcare for tackling novel problems.

Critically, zero-shot learning is enabled by incorporating semantic information – human-defined attributes or distributed word embeddings – into the learning process. This extra information provides the model’s context to bridge from the known to the unknown.