# webarena
**Repository Path**: jasonlp/webarena
## Basic Information
- **Project Name**: webarena
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: 33-adding-option-to-clear-text
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-04-10
- **Last Updated**: 2025-04-10
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# WebArena: A Realistic Web Environment for Building Autonomous Agents
WebArena is a standalone, self-hostable web environment for building autonomous agents
Website •
Paper

## Roadmap
- [ ] In-house end-to-end evaluation. We are working on an API that accepts predicted actions from any interface and then returns the subsequent observation.
- [ ] Support more agents with different prompting mechanisms such as [ASH](https://arxiv.org/pdf/2305.14257.pdf).
## News
* [8/4/2023] Added the instructions and the docker resources to host your own WebArena Environment. Check out [this page](environment_docker/README.md) for details.
* [7/29/2023] Added [a well commented script](minimal_example.py) to walk through the environment setup.
## Install
```bash
# Python 3.10+
conda create -n webarena python=3.10; conda activate webarena
pip install -r requirements.txt
playwright install
pip install -e .
# optional, dev only
pip install -e ".[dev]"
mypy --install-types --non-interactive browser_env agents evaluation_harness
pip install pre-commit
pre-commit install
```
## Quick Walkthrough
Check out [this script](minimal_example.py) for a quick walkthrough on how to set up the browser environment and interact with it using the demo sites we hosted. This script is only for education purpose, to perform *reproducible* experiments, please check out the next section. In the nutshell, using WebArena is very similar to using OpenAI Gym. The following code snippet shows how to interact with the environment.
```python
from browser_env import ScriptBrowserEnv, create_id_based_action
# init the environment
env = ScriptBrowserEnv(
headless=False,
observation_type="accessibility_tree",
current_viewport_only=True,
viewport_size={"width": 1280, "height": 720},
)
# prepare the environment for a configuration defined in a json file
config_file = "config_files/0.json"
obs, info = env.reset(options={"config_file": config_file})
# get the text observation (e.g., html, accessibility tree) through obs["text"]
# create a random action
id = random.randint(0, 1000)
action = create_id_based_action(f"click [id]")
# take the action
obs, _, terminated, _, info = env.step(action)
```
## End-to-end Evaluation
1. Setup the standalone environment.
Please check out [this page](environment_docker/README.md) for details.
2. Configurate the urls for each website.
```bash
export SHOPPING=":7770"
export SHOPPING_ADMIN=":7780/admin"
export REDDIT=":9999"
export GITLAB=":8023"
export MAP=":3000"
export WIKIPEDIA=":8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing"
export HOMEPAGE=":4399" # this is a placeholder
```
> You are encouraged to update the environment variables in [github workflow](.github/workflows/tests.yml#L7) to ensure the correctness of unit tests
3. Generate config file for each test example
```bash
python scripts/generate_test_data.py
```
You will see `*.json` files generated in [config_files](./config_files) folder. Each file contains the configuration for one test example.
4. Obtain the auto-login cookies for all websites
```
mkdir -p ./.auth
python browser_env/auto_login.py
```
5. export `OPENAI_API_KEY=your_key`, a valid OpenAI API key starts with `sk-`
6. Launch the evaluation
```bash
python run.py \
--instruction_path agent/prompts/jsons/p_cot_id_actree_2s.json \ # this is the reasoning agent prompt we used in the paper
--test_start_idx 0 \
--test_end_idx 1 \
--model gpt-3.5-turbo \
--result_dir
```
This script will run the first example with GPT-3.5 reasoning agent. The trajectory will be saved in `/0.html`
## Develop Your Prompt-based Agent
1. Define the prompts. We provide two baseline agents whose correrponding prompts are listed [here](./agent/prompts/raw). Each prompt is a dictionary with the following keys:
```python
prompt = {
"intro": ,
"examples": [
(
example_1_observation,
example_1_response
),
(
example_2_observation,
example_2_response
),
...
],
"template": ,
"meta_data": {
"observation": ,
"action_type": ,
"keywords": ,
"prompt_constructor": ,
"action_splitter":
}
}
```
2. Implement the prompt constructor. An example prompt constructor using Chain-of-thought/ReAct style reasoning is [here](./agent/prompts/prompt_constructor.py#L184). The prompt constructor is a class with the following methods:
* `construct`: construct the input feed to an LLM
* `_extract_action`: given the generation from an LLM, how to extract the phrase that corresponds to the action
## Citation
If you use our environment or data, please cite our paper:
```
@article{zhou2023webarena,
title={WebArena: A Realistic Web Environment for Building Autonomous Agents},
author={Zhou, Shuyan and Xu, Frank F and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Bisk, Yonatan and Fried, Daniel and Alon, Uri and others},
journal={arXiv preprint arXiv:2307.13854},
year={2023}
}
```