# webarena **Repository Path**: jasonlp/webarena ## Basic Information - **Project Name**: webarena - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: 33-adding-option-to-clear-text - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-04-10 - **Last Updated**: 2025-04-10 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # WebArena: A Realistic Web Environment for Building Autonomous Agents

WebArena is a standalone, self-hostable web environment for building autonomous agents

Website • Paper

![Overview](media/overview.png) ## Roadmap - [ ] In-house end-to-end evaluation. We are working on an API that accepts predicted actions from any interface and then returns the subsequent observation. - [ ] Support more agents with different prompting mechanisms such as [ASH](https://arxiv.org/pdf/2305.14257.pdf). ## News * [8/4/2023] Added the instructions and the docker resources to host your own WebArena Environment. Check out [this page](environment_docker/README.md) for details. * [7/29/2023] Added [a well commented script](minimal_example.py) to walk through the environment setup. ## Install ```bash # Python 3.10+ conda create -n webarena python=3.10; conda activate webarena pip install -r requirements.txt playwright install pip install -e . # optional, dev only pip install -e ".[dev]" mypy --install-types --non-interactive browser_env agents evaluation_harness pip install pre-commit pre-commit install ``` ## Quick Walkthrough Check out [this script](minimal_example.py) for a quick walkthrough on how to set up the browser environment and interact with it using the demo sites we hosted. This script is only for education purpose, to perform *reproducible* experiments, please check out the next section. In the nutshell, using WebArena is very similar to using OpenAI Gym. The following code snippet shows how to interact with the environment. ```python from browser_env import ScriptBrowserEnv, create_id_based_action # init the environment env = ScriptBrowserEnv( headless=False, observation_type="accessibility_tree", current_viewport_only=True, viewport_size={"width": 1280, "height": 720}, ) # prepare the environment for a configuration defined in a json file config_file = "config_files/0.json" obs, info = env.reset(options={"config_file": config_file}) # get the text observation (e.g., html, accessibility tree) through obs["text"] # create a random action id = random.randint(0, 1000) action = create_id_based_action(f"click [id]") # take the action obs, _, terminated, _, info = env.step(action) ``` ## End-to-end Evaluation 1. Setup the standalone environment. Please check out [this page](environment_docker/README.md) for details. 2. Configurate the urls for each website. ```bash export SHOPPING=":7770" export SHOPPING_ADMIN=":7780/admin" export REDDIT=":9999" export GITLAB=":8023" export MAP=":3000" export WIKIPEDIA=":8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing" export HOMEPAGE=":4399" # this is a placeholder ``` > You are encouraged to update the environment variables in [github workflow](.github/workflows/tests.yml#L7) to ensure the correctness of unit tests 3. Generate config file for each test example ```bash python scripts/generate_test_data.py ``` You will see `*.json` files generated in [config_files](./config_files) folder. Each file contains the configuration for one test example. 4. Obtain the auto-login cookies for all websites ``` mkdir -p ./.auth python browser_env/auto_login.py ``` 5. export `OPENAI_API_KEY=your_key`, a valid OpenAI API key starts with `sk-` 6. Launch the evaluation ```bash python run.py \ --instruction_path agent/prompts/jsons/p_cot_id_actree_2s.json \ # this is the reasoning agent prompt we used in the paper --test_start_idx 0 \ --test_end_idx 1 \ --model gpt-3.5-turbo \ --result_dir ``` This script will run the first example with GPT-3.5 reasoning agent. The trajectory will be saved in `/0.html` ## Develop Your Prompt-based Agent 1. Define the prompts. We provide two baseline agents whose correrponding prompts are listed [here](./agent/prompts/raw). Each prompt is a dictionary with the following keys: ```python prompt = { "intro": , "examples": [ ( example_1_observation, example_1_response ), ( example_2_observation, example_2_response ), ... ], "template": , "meta_data": { "observation": , "action_type": , "keywords": , "prompt_constructor": , "action_splitter": } } ``` 2. Implement the prompt constructor. An example prompt constructor using Chain-of-thought/ReAct style reasoning is [here](./agent/prompts/prompt_constructor.py#L184). The prompt constructor is a class with the following methods: * `construct`: construct the input feed to an LLM * `_extract_action`: given the generation from an LLM, how to extract the phrase that corresponds to the action ## Citation If you use our environment or data, please cite our paper: ``` @article{zhou2023webarena, title={WebArena: A Realistic Web Environment for Building Autonomous Agents}, author={Zhou, Shuyan and Xu, Frank F and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Bisk, Yonatan and Fried, Daniel and Alon, Uri and others}, journal={arXiv preprint arXiv:2307.13854}, year={2023} } ```