DELIP

DELIP: Dense Enhanced Language-Image Pretraining

News 📢

Introduction

In this project, we have gathered a large number of high-quality <image, dense caption> pairs. By initializing with the CLIP model, we’ve conducted continue pre-training to enhance model’s capabilities. A key improvement in our method is increasing the token limit from 77 to 2k. This expansion allows our model to handle more detailed and precise alignments between images and text.

Citation

If you find DELIP useful in your research, please consider citing our work:

@misc{DELIP2023,
  title={DELIP: Dense Enhanced Language-Image Pretraining},
  author={Zhiyuan Fan and Zhihong Chen and Benyou Wang},
  year={2023},
  publisher={GitHub},
  journal={GitHub repository},
  howpublished={\url{https://github.com/Elvisambition/DELIP}}
}