# llama_test **Repository Path**: sparkle__code__guy/llama_test ## Basic Information - **Project Name**: llama_test - **Description**: 体验llama 7B的效果以及结果展示一下 - **Primary Language**: Python - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2023-03-17 - **Last Updated**: 2023-05-11 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## Usage ### 1. Import the library and choose model size ```python import llama MODEL = 'decapoda-research/llama-7b-hf' ``` **We currently support the following models sizes:** - - Options for `MODEL`: - `decapoda-research/llama-7b-hf` - `decapoda-research/llama-13b-hf` - `decapoda-research/llama-30b-hf` - `decapoda-research/llama-65b-hf` **Note:** The model size is the number of parameters in the model. The larger the model, the more accurate the model is, but the slower, heavier and more expensive it is to run. ### 2. Load the tokenizer and model ```python tokenizer = llama.LLaMATokenizer.from_pretrained(MODEL) model = llama.LLaMAForCausalLM.from_pretrained(MODEL) model.to('cuda') ``` ### 3. Encode the prompt > For example, we will use the prompt: "Yo mama" > > We will use the `tokenizer` to encode the prompt into a tensor of integers. ```python PROMPT = 'Yo mama' encoded = tokenizer(PROMPT, return_tensors = "pt") ``` ### 4. Generate the output > We will use the `model` to generate the output. ```python generated = model.generate(encoded["input_ids"].cuda())[0]) ``` ### 5. Decode the output ```python decoded = tokenizer.decode(generated) ``` ### 6. Print the output ```python print(decoded) ```