# fair_flearn

**Repository Path**: giteewpu/fair_flearn

## Basic Information

- **Project Name**: fair_flearn
- **Description**: Fair Resource Allocation in Federated Learning (ICLR '20)
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 2
- **Created**: 2025-05-27
- **Last Updated**: 2025-05-27

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Fair Resource Allocation in Federated Learning


This repository contains the code and experiments for the paper:

> [Fair Resource Allocation in Federated Learning](https://openreview.net/forum?id=ByexElSYDr)
> 
> [ICLR '20](https://iclr.cc/)

 
## Preparation

### Download Dependencies

```
pip3 install -r requirements.txt
```


### Generate Datasets

See the `README` files in separate `data/$dataset` folders for instructions on preprocessing and/or sampling each dataset.

For example,

under ```fair_flearn/data/fmnist```, we clearly describe how to generate and preprocess the Fashion MNIST dataset.


**In order to run the following demo on the Vehicle dataset, please go to `fair_flearn/data/vehicle`, download, and generate the Vehicle dataset following the `README` file under that directory.**

## Get Started

### Example: the Vehicle dataset

*[We provide a quick demo on the Vehicle dataset here. Don't need to change any default parameters in any scripts.]*

First specify GPU ids (we can just use CPUs for Vehicle with a linear SVM)

```
export CUDA_VISIBLE_DEVICES=

```
Then go to the `fair_flearn` directory, and start running:

```
bash run.sh $dataset $method $data_partition_seed $q $sampling_device_method | tee $log
```
For Vehicle, `$dataset` is `vehicle`, `$data_partition_seed` can be set to 1, `q` is `0` for FedAvg, and `5` for q-FedAvg (the proposed objective). For sampling with weights proportional to the number of data points, `$sampling_device_method` is `2`; for uniform sampling (one of the baselines), `$sampling_device_method` is `1`. The exact command lines are as follows.

(1) Experiments to verify the fairness of the q-FFL objective, and compare with uniform sampling schemes:

```
mkdir log_vehicle
bash run.sh vehicle qffedavg 1 0 2 | tee log_vehicle/ffedavg_run1_q0
bash run.sh vehicle qffedavg 1 5 2 | tee log_vehicle/ffedavg_run1_q5
bash run.sh vehicle qffedavg 1 0 1 | tee log_vehicle/fedavg_uniform_run1

```

Plot to re-produce the results in the manuscript:

(we use `seaborn` to draw the fitting curves of accuracy distributions)

```
pip install seaborn
python plot_fairness.py
```

We can then compare the generated `fairness_vehicle.pdf` with Figure 1 (the Vehicle subfigure) and Figure 2 (the Vehicle subfigure) in the paper to validate reproducibility. Note that the accuracy distributions reported (both in figures and tables) are the results averaged across 5 different train/test/validation data partitions with data parititon seeds 1, 2, 3, 4, and 5.

(2) Experiments to demonstrate the communication-efficiency of the proposed method q-FedAvg:

```
bash run.sh vehicle qffedsgd 1 5 2 | tee log_vehicle/ffedsgd_run1_q5
```

Plot to re-produce the results in the paper:

```
python plot_efficiency.py
```

We can then compare the generated `efficiency_qffedavg.pdf` fig with Figure 3 (the Vehicle subfigure) to verify reproducibility.

### Run on other datasets

* First, config `run.sh` based on all hyper-parameters (e.g., batch size, learning rate, etc) reported in the manuscript (appendix B.2.3). 
* If you would like to run on Sent140, you also need to download a pre-trained embedding file using the following commands (this may take 3-5 minutes):

```
cd fair_flearn/flearn/models/sent140
bash get_embs.sh
```
* We use different models for different datasets, so you need to change the model name specified by `--model`. The corrsponding model associated with a dataset is described in `fair_flearn/models/$dataset/$model.py`. For instance, if you would like to run on the Shakespeare dataset, you can find the model name under `fair_flearn/models/shakespeare/`, which is `stacked_lstm`, and pass this parameter to `--model='stacked_lstm'`. 
* You also need to specify total communication rounds using `--num_rounds`. Suggested number of rounds based on our previous experiments are:

```
Vehicle: default
synthetic: 20000
sent140: 200
shakespeare: 80
fashion mnist: 6000
adult: 600
```

For fairness and efficiency experiments, we use four datasets: Vehicle, Sythetic, sent140 and Shakespeare. `method` can be chosen from `[qffedavg, qffedsgd]`. `$sampling` is `2` (with weights of sampling devices proportional to the number of local data points).

```
mkdir log_$dataset
bash run.sh $dataset $method $seed $q $sampling | tee log_$dataset/$method_run$seed_q$q
```

In particular, `$dataset` can be chosen from `[vehicle, synthetic, sent140, shakespeare]`, in accordance with the data directory names under the `fair_flearn/data/` folder.

**Compare with AFL.** We compare wtih the AFL baseline using the two datasets (samplaed Fashion MNIST and Adult) following the [AFL paper](https://arxiv.org/abs/1902.00146). 

* Generate data. (data generation process is as described above) 
* Specify parameters. `method` should be specified to be `afl` in order to run AFL algorithms. `data_partition_seed` should be set to 0, such that it won't randomly partition datasets into train/test/validation splits. This allows us to use the same standard public testing set as that in the AFL paper. `track_individual_accuracy` should be set to 1. Here is an example `run.sh` for the Adult dataset:

```
python3  -u main.py --dataset=$1 --optimizer=$2  \
            --learning_rate=0.1 \
            --learning_rate_lambda=0.1 \
            --num_rounds=600 \
            --eval_every=1 \
            --clients_per_round=2 \
            --batch_size=10 \
            --q=$4 \
            --model='lr' \
            --sampling=$5  \
            --num_epochs=1 \
            --data_partition_seed=$3 \
            --log_interval=100 \
            --static_step_size=0 \
            --track_individual_accuracy=1 \
            --output="./log_$1/$2_samp$5_run$3_q$4"
```
And then run:

```
bash run.sh adult qffedsgd 0 5 2 | tee log_adult/qffedsgd_q5
bash run.sh adult afl 0 0 2 | tee log_adult/afl
```
* You can find the accuracy numbers in the log files `log_adult/qffedsgd_q5` and `log_adult/afl`, respectively. 


## References

See our [Fair Federated Learning](https://openreview.net/pdf?id=ByexElSYDr)  manuscript for more details as well as all references.