# personas-llms-analysis **Repository Path**: mirrors_ibm/personas-llms-analysis ## Basic Information - **Project Name**: personas-llms-analysis - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-08-07 - **Last Updated**: 2025-08-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Localizing Persona Representations in LLMs We present a study on how and where personas – defined by distinct sets of human characteristics, values, and beliefs – are encoded in the representation space of large language models (LLMs). Using a range of dimension reduction and pattern recognition methods, we first identify the model layers that show the greatest divergence in encoding these representations. We then analyze the activations within a selected layer to examine how specific personas are encoded relative to others, including their shared and distinct embedding spaces. We find that, across multiple pre-trained decoder-only LLMs, the analyzed personas show large differences in representation space only within the final third of the decoder layers. We observe overlapping activations for specific ethical perspectives, suggesting a degree of polysemy. In contrast, political ideologies like conservatism and liberalism appear to be represented in more distinct regions. See more details in [ArXiV](https://arxiv.org/pdf/2505.24539). > [!CAUTION] > The dataset used includes potentially offensive sample statements. ![overview](./overview.png) ## Repo organization - See [notebooks/representation_extraction.ipynb](./notebooks/representation_extraction.ipynb) for loading models and tokenizers, extracting activations from statements, and visualizing dimension reduction techniques. - See [notebooks/run_deepscan.ipynb](./notebooks/run_deepscan.ipynb) for examples of how to run DeepScan and details regarding hyperparameter definitions. - See [notebooks/node_visualization.ipynb](./notebooks/node_visualization.ipynb) for examples of extracting nodes' information from DeepScan output run files and visualizing it. - Several auxiliary [utils](./utils) modules are setup for [layer-wise operations](./utils/utils_layers.py), [node extraction and filtering](./utils/utils_nodes.py) and [visualization](./utils/utils_viz.py). - See [data](./data) folder for statement examples. - See [output](./output) folder for DeepScan output run examples. - See [deepscan](./deepscan) folder for DeepScan basic functionality code to run the experiments in [run_deepscan.ipynb](./notebooks/run_deepscan.ipynb). ## Citation TBD ## Setup ``` $ git clone https://github.com/IBM/personas-llms-analysis.git $ cd personas-llms-analysis $ uv venv --python 3.10 $ uv sync $ uv pip install -e . $ uv run --with jupyter jupyter lab ``` ## References - [[Speakman 2018]](https://arxiv.org/abs/1810.08676) Speakman, S., Sridharan, S., Remy, S., et al. 2018. Subset scanning over neural network activations. - [[Cintas 2021]](https://www.ijcai.org/proceedings/2020/0122.pdf) Cintas, C., Speakman, S., Akinwande, V., et al. 2021. Detecting adversarial attacks via subset scanning of autoencoder activations and reconstruction error. In Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence (pp. 876-882). - [[Rateike 2023]](https://arxiv.org/abs/2312.02798) Rateike, M., Cintas, C., Wamburu, J., Akumu, T. and Speakman, S., 2023. Weakly supervised detection of hallucinations in LLM activations. Socially Responsible Language Modelling Research (SoLaR) Workshop. - [[Cintas 2025]](https://www.nature.com/articles/s41598-025-09717-1) Cintas, C., Das, P., Ross, J. et al. Property-driven localization and characterization in deep molecular representations. Sci Rep 15, 29365 (2025).