# android-malware-analysis

**Repository Path**: devdasdevdasjan/android-malware-analysis

## Basic Information

- **Project Name**: android-malware-analysis
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: GPL-3.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-05-04
- **Last Updated**: 2025-05-04

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Getting an API Key
AndroTotal has simplified the process for getting an API Key. Login/Create an Account at http://andrototal.org/ and you will then be able to view your profile settings. There is an API Tab which contains your key. 

This repository contains a set of scripts to automate the process of
gathering data from malware samples, training a machine learning model
on that data, and plotting its classification accuracy.

0. Make a copy of config-template.ini called config.ini and edit it.

1. Ensure that the "tools" subdirectory has been initialized ("`$ git submodule update --init tools`")

2. Either use `get_samples.py` to download samples or copy them into "all_apks" from another source.
If you're using `get_samples.py`, you can monitor it in another shell by running `watch "ls -l *.apk | wc -l"`

3. `sort_malicious.py` uses andrototal.org to sort them into "malicious_apk" and "benign_apk" folders.
You can monitor it in another shell by running `watch "ls -l benign_apk/*.apk | wc -l && ls -l malicious_apk/*.apk | wc -l"`

4. `extract_apks_parallel.sh` unpacks the .apk files into folders and processes some of the data therein.
You can monitor it in another shell by running `watch "wc -l benign_apk/valid_apks.txt; wc -l malicious_apk/valid_apks.txt"`

5. Run one of the following scripts to generate feature vectors:
    * `parse_xml.py` for permissions. "app_permission_vectors.json" is generated
    * `parse_maline_output.py` for syscalls. "app_syscall_vectors.json" is generated. You will have to run [maline](https://github.com/soarlab/maline) first for this to work.
    * `parse_disassembled.py` for API calls. "app_method_vectors.json" is generated
    * `parse_ssdeep.py` for fuzzy hashes. "app_hash_vectors.json" is generated. You will have to run [ssdeep](http://ssdeep.sourceforge.net/) first for this to work.
    * `combine_features.py` for a combination of the top weighted features. "app_feature_vectors.json" is generated. This only works if you've previously trained a network on the specified features, and the feature weights files are named appropriately.

6. Run `$ run_trials.sh app_feature_vectors.json` (or whichever json you want) which runs the `tensorflow_learn.py` script (where the ML happens) a number of times and puts the results into a folder. It also runs `plot_data.py` and `match_features.py` to create a plot and create a list of top weighted features, respectively.

7. Change the parameters or input data and repeat step 6. It should be non-destructive so you can compare the results of different runs.

Note: If you want to use a SVM instead of a neural network, use `sklearn_svm.py` in place of `tensorflow_learn.py`. You can also use `sklearn_tree.py` to use a decision tree.