TokenSeek: Memory Efficient Fine Tuning via Instance-Aware Token Ditching

Runjia Zeng1, Qifan Wang2, Qiang Guan3, Ruixiang Tang4, Lifu Huang5, Zhenting Wang6, Xueling Zhang1, Cheng Han7, Dongfang Liu1†

1Rochester Institute of Technology, 2Meta AI, 3Kent State University, 4Rutgers University,
5UC Davis, 6Accenture, 7University of Missouri-Kansas City,
Corresponding author
Interpolate start reference image.

Memory Efficient Fine Tuning via Instance-Aware Token Ditching.

Abstract

Fine tuning has been regarded as a de facto approach for adapting large language models (LLMs) to downstream tasks, but the high training memory consumption inherited from LLMs makes this process inefficient. Among existing memory efficient approaches, activation-related optimization has proven particularly effective, as activations consistently dominate overall memory consumption. Although prior arts offer various activation optimization strategies, their data-agnostic nature ultimately results in ineffective and unstable fine tuning.

In this paper, we propose TokenSeek, a universal plugin solution for various transformer-based models through instance-aware token seeking and ditching, achieving significant fine-tuning memory savings (e.g., requiring only 14.8% of the memory on Llama3.2 1B) with on-par or even better performance. Furthermore, our interpretable token seeking process reveals the underlying reasons for its effectiveness, offering valuable insights for future research on token efficiency.

P.S. The video and slides are on the way!!

Interpolate start reference image. Interpolate start reference image.

BibTeX

If you find our work useful, please consider citing our paper:

@inproceedings{zeng2026tokenseek,
  title={TokenSeek: Memory Efficient Fine Tuning via Instance-Aware Token Ditching},
  author={Zeng, Runjia and Wang, Qifan and Guan, Qiang and Tang, Ruixiang and Huang, Lifu and Wang, Zhenting and Zhang, Xueling and Han, Cheng and Liu, Dongfang},
  booktitle={ICLR},
  year={2026}
}