From 4e3a3a78a3e68cf3bc30e9fffcf053110595fa10 Mon Sep 17 00:00:00 2001 From: Philipp Emanuel Weidmann Date: Fri, 22 May 2026 14:51:24 +0530 Subject: [PATCH] docs: update README --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 94c24a5..708927d 100644 --- a/README.md +++ b/README.md @@ -116,8 +116,9 @@ a configuration file. At the start of a program run, Heretic benchmarks the system to determine the optimal batch size to make the most of the available hardware. -On an RTX 3090, with the default configuration, decensoring Llama-3.1-8B-Instruct -takes about 45 minutes. Note that Heretic supports model quantization with +On an RTX 3090, with the default configuration, decensoring +[Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) +takes about 20-30 minutes. Note that Heretic supports model quantization with bitsandbytes, which can drastically reduce the amount of VRAM required to process models. Set the `quantization` option to `bnb_4bit` to enable quantization.