PC環境
Windows 11 RTX 3080 Laptop (VRAM 16GB) CUDA 11.8 Python 3.12
Python環境構築
pip install torch==2.5.1+cu118 --index-url https://download.pytorch.org/whl/cu118 pip install gguf pip install git+https://github.com/huggingface/diffusers pip install accelerate transformers protobuf sentencepiece
結果
3パターン試してみましたが画質の劣化はごくわずかです。flux1-dev-Q8_0.gguf
text_encoder:
torch.cuda.max_memory_allocated: 9.33 GB
transformer:
torch.cuda.max_memory_allocated: 14.53 GB
time: 149.64 sec
GPU 0 - Used memory: 15.98/16.00 GB
flux1-dev-Q6_K.gguf
text_encoder: torch.cuda.max_memory_allocated: 9.33 GB transformer: torch.cuda.max_memory_allocated: 11.79 GB time: 168.83 sec GPU 0 - Used memory: 14.55/16.00 GB
flux1-dev-Q4_K_S.gguf
text_encoder: torch.cuda.max_memory_allocated: 9.33 GB transformer: torch.cuda.max_memory_allocated: 9.33 GB time: 147.56 sec GPU 0 - Used memory: 11.51/16.00 GB
Pythonスクリプト
import torch from diffusers import FluxPipeline, FluxTransformer2DModel, GGUFQuantizationConfig from decorator import gpu_monitor, time_monitor import gc def flush(): gc.collect() torch.cuda.empty_cache() @gpu_monitor(interval=0.5) @time_monitor def main(): # downloaded from https://huggingface.co/city96/FLUX.1-dev-gguf gguf_file = "flux1-dev-Q8_0.gguf" model_id = "black-forest-labs/Flux.1-Dev" pipeline = FluxPipeline.from_pretrained( model_id, transformer=None, vae=None, torch_dtype=torch.bfloat16 ).to("cuda") prompt = "A cat holding a sign that says hello world" with torch.no_grad(): prompt_embeds, pooled_prompt_embeds, text_ids = pipeline.encode_prompt( prompt=prompt, prompt_2=None, ) print("text_encoder:") print(f"torch.cuda.max_memory_allocated: {torch.cuda.max_memory_allocated()/ 1024**3:.2f} GB") del pipeline flush() transformer = FluxTransformer2DModel.from_single_file( gguf_file, quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16), torch_dtype=torch.bfloat16 ) pipeline = FluxPipeline.from_pretrained( model_id, transformer=transformer, text_encoder=None, text_encoder_2=None, tokenizer=None, tokenizer_2=None, torch_dtype=torch.bfloat16 ).to("cuda") image = pipeline( prompt_embeds=prompt_embeds, pooled_prompt_embeds=pooled_prompt_embeds, generator=torch.manual_seed(0) ).images[0] save_file = gguf_file.replace(".gguf", ".jpg") image.save(save_file) print("transformer:") print(f"torch.cuda.max_memory_allocated: {torch.cuda.max_memory_allocated()/ 1024**3:.2f} GB") if __name__ == "__main__": main()
その他
ベンチマークはこちらで記述したスクリプトで行いました。touch-sp.hatenablog.com