# proxy-redis **Repository Path**: youc-project/proxy-redis ## Basic Information - **Project Name**: proxy-redis - **Description**: No description available - **Primary Language**: Java - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-03-30 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Proxy Service This service interacts with a [Redis](https://redis.io/) Datastore via HTTP. Setup is meant to be scalable and handle concurrent requests. Service uses the below building blocks, - Webserver is based on [Dropwizard](https://www.dropwizard.io/1.0.0/docs/getting-started.html), a Java framework for developing RESTful web apps - Exposes a HTTP GET endpoint to perform read operations from Redis - Exposes a HTTP POST endpoint, a backdoor to perform write operation to Redis - Caches `` pairs into a [Guava](https://github.com/google/guava/wiki) based LRU cache System in it's current form, 1. Achieves eventual Consistency 2. Is not Highly Available (HA). HA can be achieved with minor changes to this setup though ## Assumptions and Design Choice A software application typically desires one of the below two properties based on the use case and live with a slight compromise on the other 1. *High Consistency* 2. *High Availability* In a realistic Data Center where nodes can be down for arbitrary time and *Network Partitions* are common it is impossible to achieve both consistency and availability. Based on the use case we prefer one over the other to fit the requirements. Data consistency is important for systems hosting transactional data. Non-transactional data (typically) demands less on consistency i.e small fraction of data loss is acceptable if that helps with High Availability so that it is available for huge throughput of writes e.g. a case of clickstream in a typical e-commerce website. This setup establishes a foundation towards high availability while compromising on consistency. Redis can be setup for asynchronous replication. In case a Redis host goes down, replica can take over to serve as a replacement. Only gotcha is it may not have all the data. Such fault tolerance offers high availability at the cost of data consistency. Currently, Redis is running as a single node. TODO: Setup Redis slave and turn on replication mechanism for high availability. ## Development - Running Proxy Service ### Prerequisites - A runtime setup requires - `docker`, `docker-compose`, `bash`, `make` - Execute `docker login` for ability to pull public images. Clone repo with commands below - Webserver runs on port 8080 and Redis is exposed on 7001. Make sure these two ports are available or change ports in `Dockerfile.build` and `resources/redis-app.yml` - Clone repo and change directory ```shell git clone git@github.com:abhinavmehta14/proxy-redis.git cd proxy-redis ``` ### Run Tests ```shell make test ``` This executes the below steps in they are mentioned, 1. Builds a docker image with mvn and code base 2. Runs Maven based tests, loads Redis store with key value pairs specified in `RedisAppResourceIntegrationTest.initRedisStore` 3. A new build is triggered only if all tests pass ### Run Service Below command stops existing proxy container and spins up a new Webserver container ```shell make run ``` Note that, above command requires at least one execution of `make test` before `make run` to generate image for Webserver. The Webserver is not yet pulled from Docker cloud yet. To access Webserver logs, ```shell docker-compose exec proxy-redis /bin/bash tail -f logs/* ``` Logs are appended in two separate files, - `logs/proxy-redis-request.log` contains request logs - `logs/proxy-redis.log` contains application logs To watch Redis metrics or stats, ```shell > redis-cli -h localhost -p 7001 localhost:7001> info # Server redis_version:5.0.5 ... # Stats total_connections_received:7 total_commands_processed:22 instantaneous_ops_per_sec:0 total_net_input_bytes:663 total_net_output_bytes:14857 instantaneous_input_kbps:0.00 instantaneous_output_kbps:0.00 rejected_connections:0 sync_full:0 sync_partial_ok:0 sync_partial_err:0 expired_keys:0 expired_stale_perc:0.00 expired_time_cap_reached_count:0 evicted_keys:0 keyspace_hits:1 keyspace_misses:1 pubsub_channels:0 pubsub_patterns:0 latest_fork_usec:0 migrate_cached_sockets:0 slave_expires_tracked_keys:0 active_defrag_hits:0 active_defrag_misses:0 active_defrag_key_hits:0 active_defrag_key_misses:0 ``` Note that the above method assumes you have `redis-cli` setup which is not a part of system requirements #### Admin and Metrics Endpoint - is admin dashboard for Webserver - is useful to monitor the Webserver e.g. `timers` shows useful metrics like `min`, `max`, `t99`, `stddev` etc for each endpoint - shows various threads Webserver is running and their state #### GET Endpoint One can access the GET endpoint as , where KEY is a required parameter. Webserver returns, - http code=200 and value of key if it exists in Redis DB or Cache - http code=400 if key is not passed e.g. for GET request - http code=404 if key is not found in Redis DB or Cache #### POST Endpoint This is a backdoor (not part of requirements) to add key value pairs to Redis. Sample request ```shell curl -i -XPOST 'http://localhost:8080/v1.0/proxy?key=a5&value=b5'; echo ``` Returns, - http code=200 if pair was added or updated successfully - http code=400 if either key or value parameters are not passed - http code=500 if something went wrong while performing update and DB state for this key value is unknown Alternatively, one can install and use `redis-cli` to access Redis DB for writes (or reads) as follows, ```shell redis-cli -h localhost -p 7001 localhost:7001> set a1 b1 OK localhost:7001> get a1 "b1" ``` #### Adding Key, Value Pairs to Redis Inline with requirements, key value pairs can be added to Redis at by appending them in `RedisAppResourceIntegrationTest.initRedisStore` followed by `make test` which ensures pairs are set in Redis. These pairs are persisted in DB until `docker-compose rm` or `docker-compose rm redis` is executed. ### Stop Service To stop and remove all running containers including data on Redis, ```shell > make stop docker-compose stop Stopping proxy ... done Stopping redis-db ... done docker-compose rm Going to remove proxy, proxy-redis_redis_run_1, proxy-redis_proxy-test_run_4, proxy-redis_proxy-test_run_3, proxy-redis_proxy-test_run_2, proxy-redis_proxy-test_run_1, redis-db Are you sure? [yN] y Removing proxy ... done Removing proxy-redis_redis_run_1 ... done Removing proxy-redis_proxy-test_run_4 ... done Removing proxy-redis_proxy-test_run_3 ... done Removing proxy-redis_proxy-test_run_2 ... done Removing proxy-redis_proxy-test_run_1 ... done Removing redis-db ... done ``` To stop Webserver container only, ```shell docker-compose stop proxy-redis ``` ### Run Everything To bring up Redis store, run tests, build webserver image, start Webserver one can invoke the below command ```shell make all ``` This combines all the make tasks. ### Test sending concurrent request [Work in progress] Class `com.amehta.proxy.redis.interact.test.RedisAppConcurrentRequestsTest` can be used send concurrent request to HTTP Webserver. The number of requests can be configured in `main()` method. TODO: Figure out how to run a standalone from dropwizard based fat jar and use the command to dockerize this test. Currently, I can run this class in a native dev environment and see that concurrent requests are being handled. ## Architecture TODO: Architecture Diagram 1. GET endpoint Endpoint is served by a multi-threaded webserver where each thread handles requests from a bounded queue of pending requests. Request handler first checks Guava Cache for presence of key - If found, return value - If not found, get key value from Redis. Add it to Cache return Cache has various tunable parameters - TTL, cache size in terms of number of entries Redis keys are available until a configurable global expiry time. 2. POST endpoint TODO ## Algorithmic Complexity Guava Cache or Redis lookup or insert is O(1) for all practical purposes. In practice, the lookup in each _segment_ of _ConcurrentHashMap_ is non-constant (a TODO: `log` or `linear` function of size of segment). Also, multiple threads writing to same _segment_ can cause contention issues. Hence, complexity is also a function of number of threads writing to Cache. TODO: Elaborate on details above. ## Future Work ### Must - Allocate memory limit for Redis and cpu limits for Redis and Proxy containers - ~~For graceful shutdown, `CachedRedisService` should implement `io.dropwizard.lifecycle.Managed` interface~~ - Address problems with concurrency test class `ProxyLoadTestCommand` to send concurrent requests and assert response http code. ~~Dockerize and run from fat jar~~ - ~~Another bash based attempt `scripts/curl_test_concurrent.sh` needs to be Dockerized~~ Not Applicable - Read `globalExpiry` from config in `RedisAppResourceIntegrationTest` - ~~Test to validate config. Such errors are showing up on runtime currently~~ Command `java -jar FAT_JAR` addresses this - Impose timeout on DB Read / Write calls - HTTP GET support for non-string value [data types](https://redis.io/topics/data-types) ### Good to have - ~~Log cache hit / miss metrics periodically~~ - Sharding (Run N redis containers and shard based on key range) - Run a Replica Redis container for failover to ensure High Availability - Cleanup POST endpoint which is a backdoor (since it is not a part of requirements) to add key value pairs to Redis ## Benchmark Benchmarks below are done using [Apache HTTP server benchmarking tool](https://httpd.apache.org/docs/2.4/programs/ab.html) for GET endpoint hosted by single replica of Proxy Service. Command to run benchmark, ```shell make benchmark_ab ``` Various parameters (`concurrency_level`, `concurrency_level`, `keep_alive`) are controlled from within the `Makefile` Results, | Special conditions | #Keys queried | #Requests | Concurrency | mean | t50 | t75 | t95 | t99 | t100 | Non-2xx responses | Requests per second | response body size (bytes) | | --- | ------------- | --------- | ----------- | ---- | --- | --- | --- | --- | ---- | --- | --- | --- | | no connection keep alive | 1 | 100,000 | 10 | 34 | 26 | 32 | 46 | 208 | 1259 | - | - | 13 | no connection keep alive | 1 | 100,000 | 32 | 113 | 87 | 107 | 275 | 540 | 4853 | - | - | 13 | **with connection keep alive** | 1 | 1000,000 | 32 | 3 | 0 | 1 | 14| 65 | 400 | 100 | 640 | 13 Note that, `t{xx}` is in milliseconds TODO: Improve benchmarking capabilities for multiple url using JMeter (`docker pull justb4/jmeter`) ## Benchmark (work in progress) This benchmark uses `ProxyLoadTestCommand` class, a command line utility for advanced assertions on load test like response code, body TODOs: 1. setup stdout logs from benchmark not appearing on console 2. Investigate `NoRouteToHost` exception Benchmark can be invoked with the command below, ``` make benchmark ``` To connect to running benchmark container ```shell docker-compose ps # To get service name for benchmark e.g. proxy-redis_benchmark_run_3 docker exec -it proxy-redis_benchmark_run_[%s] bash # use suffix from the output of the command above ```