SAGE: A Unified Evaluation Framework for Data Augmentation and Few-Shot Learning on Small and Imbalanced Tabular Datasets

Yuhao Yan; Linlu Chen; Houyan Zhang; Chong Chen; Leran Liang; Meng Yang

doi:10.31449/inf.v50i13.12581

SAGE: A Unified Evaluation Framework for Data Augmentation and Few-Shot Learning on Small and Imbalanced Tabular Datasets

Abstract

In critical domains such as healthcare and finance, structured data often suffers from sample scarcity and class imbalance, undermining the traditional machine learning assumption that training data adequately reflects the true distribution. To address this challenge, this study proposes SAGE (Small-sample Adaptive Generalization Evaluation), a unified framework for systematically comparing data-driven augmentation methods with model-driven few-shot learning (FSL) approaches. The framework integrates a standardized data conditioning pipeline, a comprehensive spectrum of 12 models (including 6 classical classifiers and 6 FSL architectures), and multi-dimensional evaluation metrics. Experimental validation was conducted on three diverse datasets: UCI Heart Disease (297 samples), Hepatitis (155 samples), and Glass Identification (214 samples), covering medical and forensic domains. Results demonstrate the complementary strengths of both paradigms. For data-driven methods, CatBoost augmented with Large Language Models (LLMs) achieved a Macro-F1 of 0.4219 on the heart disease dataset, significantly outperforming traditional oversampling methods like SMOTE (p<0.001). However, for extreme scarcity, model-driven approaches proved superior; Siamese Networks achieved the highest Macro-F1 of 0.5959 on heart disease and maintained robustness across datasets, specifically attaining an F1-score of 0.64 on the rarest class (Class 4). Furthermore, SHAP analysis confirmed that the best-performing models successfully captured clinically relevant features, such as albumin levels in hepatitis prediction. The SAGE framework thus provides empirical evidence to guide paradigm selection: FSL for extreme scarcity and LLM-based augmentation for enhancing ensemble classifiers.

Authors

Yuhao Yan
Linlu Chen
Houyan Zhang
Chong Chen
Leran Liang
Meng Yang

DOI:

https://doi.org/10.31449/inf.v50i13.12581

Downloads

Published

05/18/2026

How to Cite

Yan, Y., Chen, L., Zhang, H., Chen, C., Liang, L., & Yang, M. (2026). SAGE: A Unified Evaluation Framework for Data Augmentation and Few-Shot Learning on Small and Imbalanced Tabular Datasets. Informatica, 50(13). https://doi.org/10.31449/inf.v50i13.12581

Download Citation

Issue

Vol. 50 No. 13 (2026): Online-only issue

Section

Online-only

License

Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.

All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.

Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.

SAGE: A Unified Evaluation Framework for Data Augmentation and Few-Shot Learning on Small and Imbalanced Tabular Datasets

Abstract

Authors

DOI:

Downloads

Published

How to Cite

Issue

Section

License

Developed By

Information