# llama_test

**Repository Path**: sparkle__code__guy/llama_test

## Basic Information

- **Project Name**: llama_test
- **Description**: 体验llama 7B的效果以及结果展示一下
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 1
- **Created**: 2023-03-17
- **Last Updated**: 2023-05-11

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

## Usage

### 1. Import the library and choose model size

```python
import llama
MODEL = 'decapoda-research/llama-7b-hf'
```

**We currently support the following models sizes:**
- 
- Options for `MODEL`:
    - `decapoda-research/llama-7b-hf`
    - `decapoda-research/llama-13b-hf`
    - `decapoda-research/llama-30b-hf`
    - `decapoda-research/llama-65b-hf`

**Note:** The model size is the number of parameters in the model. The larger the model, the more accurate the model is, but the slower, heavier and more expensive it is to run. 

### 2. Load the tokenizer and model

```python
tokenizer = llama.LLaMATokenizer.from_pretrained(MODEL)
model = llama.LLaMAForCausalLM.from_pretrained(MODEL)
model.to('cuda')
```

### 3. Encode the prompt

> For example, we will use the prompt: "Yo mama"
>   
> We will use the `tokenizer` to encode the prompt into a tensor of integers.

```python
PROMPT = 'Yo mama'
encoded = tokenizer(PROMPT, return_tensors = "pt")
```

### 4. Generate the output

> We will use the `model` to generate the output.

```python
generated = model.generate(encoded["input_ids"].cuda())[0])
``` 

### 5. Decode the output
```python
decoded = tokenizer.decode(generated)
```

### 6. Print the output

```python
print(decoded)
```