Role-based Debloating for Web Applications
Previous debloating schemes produce one debloated copy of the target application that includes features required by all users. In this work, we built a pipeline to identify clusters of users that interact with similar set of features, and assigned them to dynamically generated roles. Next, we produced debloated applications tailored to these roles. As a result, we produce smaller web applications compared to prior work that are exposed to fewer CVEs. Our tool named dbltr comes with the clustering algorithm to generate roles. It also incorporates a transparent reverse-proxy to identify successful logins and redirect users to their underlying debloated web applications.
The paper is available at https://www.securitee.org/files/dbltr_2023codaspy.pdf
The source code of DBLTR is available at: https://github.com/pragseclab/DBLTR_Demo
The main modules in the repository are as follows:
In this playbook, we go over the steps for debloating and serving a web application using DBLTR. At a high level, first we use the Less is More platform to generate a baseline usage profile of the web application users in the form of line coverage logs. Next, we import the code coverage data into the DBLTR Jupyter notebook and incorporate classifier to group users with similar behavior together under the same role (i.e., cluster). Finally, we deploy the docker compose environment with the produced configurations generated by DBLTR to serve the debloated web applications to the users.
Less is More can be setup using the following guide: https://lessismore.debloating.com/. More details and playbook available at: https://playground.debloating.com/ After this step is done, we export the code coverage of each user into the CSV format. sql_to_csv.py script can help automate this process. For this demonstration, Less is More is hosted under LIM/training directory for debloating phpMyAdmin. In this setup, we have 5 users (Alice, Bob, Charlie, David, and Eli), they perform minimal actions on phpMyAdmin (kept minimal for demonstration purposes). Alice, creates a database and inserts some rows of data, Bob does the same but also views the list of users, Charlie views the existing databases without making any changes, David, views databases and runs manual queries and finally Eli who only views various phpMyAdmin parameters. After exporting the code coverage data from LIM, we use the provided python script (sql_to_csv.py) on a system with mysql-server installed to convert the database backup to csv files for DBLTR.
Now we switch our focus to the jupter notebook "rbd_dataanalysis". This notebook is hosted under analysis directory and can be setup using the provided docker-compose environment through: docker compose up -d and then navigating to http://localhost:8888/lab/tree/work/rbd_dataanalysis.ipynb. The token to access this notebook is set in the docker-compose env variable and is currently set to "jupytersecrettokenabhsyd68ay". We can follow the cells in the notebook. Certain steps can take a long time from 30 minutes to couple hours to complete on large applications with many users. We have also provided the output of lengthy steps in the form of Python pickled objects. At the end of each section, the pickle files are restored. This would be an alternative to running individual cells in the notebook for that section.
For the sections where pickle file is available, you can jump to the end of the section and quickly restore the data from the pickle file. For new web applications outside our dataset, the whole process needs to be followed instead of restoring pickle files.
In order to serve the debloated web applications, we use the generated mappings.txt configuration file including user to role mappings along with the docker-compose.yaml in the root of this repository to host the DBLTR setup. The web applications will be served under localhost:8080. Upon logging in, each the authentication cookie of each user is extracted by our OpenResty Lua modules and stored in the Redis datastore. Subsequent requests from users containing the authentication cookie will instruct the reverse-proxy to transparently route their requests towards their custom debloated web applications. Responses from DBLTR will include an "active_proxy" HTTP header to show which backend served that request.
Demo of DBLTR protecting users against CVE-2019-12616:
Setup the web application under a LIM-like setup to collect the code-coverage data from web application users for a period of time.
The files for this module are located under docker/reverse-proxy/lua/. The skeleton of this code is available under common.lua as well as application specific files under pma (phpMyAdmin login detection) and wp (WordPress login detection). default.conf file which is an Nginx/OpenResty config file is used to activate the Lua module. At a high level:
We are a team of security researchers at PragSec Lab, Stony Brook University (https://securitee.org).
For any queries or questions contact Babak
Amin Azad at [email protected]