Deep Encoders with Auxiliary Parameters for Extreme Classification

¹IIT Delhi, ²Microsoft Research, ³Microsoft, ⁴IIT Kanpur

Abstract

The task of annotating a data point with labels most relevant to it from a large universe of labels is referred to as Extreme Classification (XC). State-of-the-art XC methods have applications in ranking, recommendation, and tagging and mostly employ a combination architecture comprised of a deep encoder and a high-capacity classifier. These two components are often trained in a modular fashion to conserve compute. This paper shows that in XC settings where data paucity and semantic gap issues abound, this can lead to suboptimal encoder training which negatively affects the performance of the overall architecture. The paper then proposes a lightweight alternative DEXA that augments encoder training with auxiliary parameters. Incorporating DEXA into existing XC architectures requires minimal modifications and the method can scale to datasets with 40 million labels and offer predictions that are up to 6% and 15% more accurate than embeddings offered by existing deep XC methods on benchmark and proprietary datasets, respectively. The paper also analyzes DEXA theoretically and shows that it offers provably superior encoder training than existing Siamese training strategies in certain realizable settings.

Method Overview

DEXA introduces a novel approach to addressing the semantic gap in Extreme Classification (XC) applications, especially those involving short-text data points and labels. Traditional XC methods often suffer from suboptimal encoder training due to the modular approach where the encoder and classifier are trained separately. DEXA addresses this by incorporating shared auxiliary parameters during encoder training, significantly improving the embedding quality and subsequent classification performance.

The core idea behind DEXA is that "related" labels likely have similar correction terms. Instead of assigning individual correction terms to each label, DEXA clusters labels into groups, each represented by a shared auxiliary vector. During training, these auxiliary vectors help adjust the label embeddings, ensuring that the encoder captures the nuanced relationships between datapoints and labels without overfitting to label texts. This clustering approach reduces computational overhead and enhances scalability, allowing DEXA to perform effectively on datasets with millions of labels.

DEXA's architecture involves two key modules:

Module I: Encoder Training with Auxiliary Vectors - Labels are clustered, and each cluster is assigned an auxiliary vector. The encoder is trained to embed data points and augmented labels into a shared space, leveraging these auxiliary vectors to bridge the semantic gap.
Module II: Classifier Training - The encoder is frozen, and label classifiers are initialized with the augmented label embeddings. These classifiers are then fine-tuned to achieve high accuracy in predicting the relevant labels for given data points.

This approach enables DEXA to outperform existing XC methods, providing significant gains in prediction accuracy while maintaining computational efficiency.

Effect of Auxiliary Parameters

Performance of DEXA with number of auxiliary vectors on the proprietary SponsoredSearch-40M dataset. DEXA yielded 7-15% gains over NGAME embeddings. Please note that K = 0 in the figure corresponds to NGAME. The 3-layer MiniLM-L3-v2 model could perform on par with a 6-layer DistilBERT model when both are augmented with K ≈ L/40 auxiliary vectors.

Example Predictions: DEXA vs NGAME

Example is taken from the LF-WikiSeeAlso-320K dataset, which involves predicting relevant 'See Also' Wikipedia articles for given input texts.
Document	DEXA Predictions	NGAME Predictions
Constitutional reforms of Julius Caesar: The constitutional reforms of Julius Caesar were a series of laws pertaining to the Constitution of the Roman Republic enacted between 49 and 44 BC, during Caesar's dictatorship. Caesar died in 44 BC before the implications of his constitutional actions could ...	Acta Senatus Centuria Roman Law Interrex Byzantine Senate	Julius Caesar Assassination of Julius Caesar Caesarism Constitution of the Roman Republic Caesar's civil war

Deep Encoders with Auxiliary Parameters for Extreme Classification

Abstract

Video

Method Overview

Improved Encoder Training

Precision@1 Values for Different Methods and Benchmarks

Effect of Auxiliary Parameters

Example Predictions: DEXA vs NGAME

Research Poster

Related Works and Links

BibTeX