Vulnerability of Text-to-Image Models to Prompt Template Stealing: A Differential Evolution Approach

Yurong Wu1,3*, Fangwen Mu1,3*, Qiuhong Zhang1,3*, Jinjing Zhao4, Xinrun Xu1,3, Lingrui Mei3, Yang Wu3, Lin Shi1,3, Junjie Wang1,3, Zhiming Ding1,3, Yiwei Wang2
1Institute of Software, Chinese Academy of Sciences
2University of California at Merced
3University of Chinese Academy of Sciences
4The University of Sydney

*Indicates Equal Contribution

Abstract

Prompt trading has emerged as a significant intellectual property concern in recent years, where vendors entice users by showcasing sample images before selling prompt templates that can generate similar images. This work investigates a critical security vulnerability: attackers can steal prompt templates using only a limited number of sample images. To investigate this threat, we introduce Prism, a prompt-stealing benchmark consisting of 50 templates and 450 images, organized into Easy and Hard difficulty levels. To identify the vulnerabity of VLMs to prompt stealing, we propose EvoStealer, a novel template stealing method that operates without model fine-tuning by leveraging differential evolution algorithms. The system first initializes population sets using multimodal large language models (MLLMs) based on predefined patterns, then iteratively generates enhanced offspring through MLLMs. During evolution, EvoStealer identifies common features across offspring to derive generalized templates. Our comprehensive evaluation conducted across open-source (InternVL2-26B) and closed-source models (GPT-4o and GPT-4o-mini) demonstrates that EvoStealer's stolen templates can reproduce images highly similar to originals and effectively generalize to other subjects, significantly outperforming baseline methods with an average improvement of over 10%. Moreover, our cost analysis reveals that EvoStealer achieves template stealing with negligible computational expenses. Our code and dataset are available at https://github.com/whitepagewu/evostealer.

Prompt template stealing


Top: Creators develop prompt templates that conform to the desired stylistic requirements by progressively incorporating appropriate stylistic modifiers. During this iterative process, it is crucial for creators to test the templates across diverse subjects to ensure their robustness and applicability. Once the design of the prompt template is finalized, creators upload a set of publicly accessible sample images alongside the corresponding prompt template, which becomes available for purchase, to prompt marketplaces. Bottom: Malicious actors may analyze the limited number of publicly accessible sample images to deduce and unlawfully replicate the underlying prompt template. Using the stolen template, attackers can combine it with custom subjects to generate derivative images, thereby violating the creator's intellectual property rights and undermining their creative efforts.

Evolution process of EvoStealer


EvoStealer employs Vision Large Language Models to extract the triplet from input images as an initial step. Subsequently, the system refines the extracted content through a differential evolution algorithm, which iteratively optimizes the results. The evolutionary process consists of several key stages: identifying differences and commonalities, performing mutation, introducing mutation additions, and executing crossover operations. Finally, EvoStealer utilizes a fitness function to evaluate the generated offspring and select the most optimal results, ensuring the effectiveness of the extracted and refined content.

Performance


EvoStealer is evaluated on the Prism dataset to assess the quality of its extracted and optimized prompt templates. The results indicate that EvoStealer consistently shows an upward trend in both its fitness score and the quality of the generated optimal prompt templates, regardless of whether the data is in-domain or out-of-domain. These findings demonstrate that EvoStealer effectively optimizes and generates high-quality prompt templates across varying data distributions, highlighting its robustness and adaptability.

BibTeX

@article{wu2025vulnerability,
        title={Vulnerability of Text-to-Image Models to Prompt Template Stealing: A Differential Evolution Approach},
        author={Wu, Yurong and Mu, Fangwen and Zhang, Qiuhong and Zhao, Jinjing and Xu, Xinrun and Mei, Lingrui and Wu, Yang and Shi, Lin and Wang, Junjie and Ding, Zhiming and others},
        journal={arXiv preprint arXiv:2502.14285},
        year={2025}
}