# ComfyUI-ExLlama-Nodes **Repository Path**: comfyui_custom_nodes/ComfyUI-ExLlama-Nodes ## Basic Information - **Project Name**: ComfyUI-ExLlama-Nodes - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-07-19 - **Last Updated**: 2024-07-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ComfyUI ExLlamaV2 Nodes A simple local text generator for [ComfyUI](https://github.com/comfyanonymous/ComfyUI) using [ExLlamaV2](https://github.com/turboderp/exllamav2). ## Installation Clone the repository to `custom_nodes` and install the requirements: ``` git clone https://github.com/Zuellni/ComfyUI-ExLlama-Nodes custom_nodes/ComfyUI-ExLlamaV2-Nodes pip install -r custom_nodes/ComfyUI-ExLlamaV2-Nodes/requirements.txt ``` Use wheels for [ExLlamaV2](https://github.com/turboderp/exllamav2/releases/latest) and [FlashAttention](https://github.com/bdashore3/flash-attention/releases/latest) on Windows: ``` pip install exllamav2-X.X.X+cuXXX.torch2.X.X-cp3XX-cp3XX-win_amd64.whl pip install flash_attn-X.X.X+cuXXX.torch2.X.X-cp3XX-cp3XX-win_amd64.whl ``` ## Usage Only EXL2, 4-bit GPTQ and unquantized models are supported. You can find them on [Hugging Face](https://huggingface.co). To use a model with the nodes, you should clone its repository with `git` or manually download all the files and place them in `models/llm`. For example, if you want to download the 6-bit [Llama-3-8B-Instruct](https://huggingface.co/turboderp/Llama-3-8B-Instruct-exl2), use the following command: ``` git install lfs git clone https://huggingface.co/turboderp/Llama-3-8B-Instruct-exl2 -b 6.0bpw models/llm/Llama-3-8B-Instruct-exl2-6.0bpw ``` > [!TIP] > You can add your own `llm` path to the [extra_model_paths.yaml](https://github.com/comfyanonymous/ComfyUI/blob/master/extra_model_paths.yaml.example) file and put the models there instead. ## Nodes
Loader Loads models from the llm directory.
cache_bits A lower value reduces VRAM usage, but also affects generation speed and quality.
fast_tensors Enabling reduces RAM usage and speeds up model loading.
flash_attention Enabling reduces VRAM usage, not supported on cards with compute capability below 8.0.
max_seq_len Max context, higher value equals higher VRAM usage. 0 will default to model config.
Generator Generates text based on the given prompt. Refer to SillyTavern for sampler parameters.
unload Unloads the model after each generation to reduce VRAM usage.
stop_conditions List of strings to stop generation on, e.g. ["\n"] to stop on newline. Leave empty to only stop on eos token.
max_tokens Max new tokens, 0 will use available context.
Previewer Displays generated text in the UI.
Replacer Replaces variable names in brackets, e.g. [a], with their values.
## Workflow An example workflow is embedded in the image below and can be opened in ComfyUI. ![workflow](https://github.com/Zuellni/ComfyUI-ExLlama-Nodes/assets/123005779/bf688acb-6f7a-4410-98ff-cf22b6937ae7)