lucabadiali commited on
Commit
d5eadf2
Β·
1 Parent(s): 7f4471a

Final Edits

Browse files
Files changed (2) hide show
  1. README.md +76 -2
  2. snapshots/grafana_dashboard.png +3 -0
README.md CHANGED
@@ -8,11 +8,21 @@ app_port: 7860
8
  pinned: false
9
  ---
10
 
 
 
11
  # Sentiment Analysis API
12
 
13
- The top part of this file is meant to run a docker image in Hugging Face (more details below).
 
 
14
 
15
- Structure of the project folder:
 
 
 
 
 
 
16
 
17
  .github/
18
  └── workflows/
@@ -43,3 +53,67 @@ pytest.ini
43
  README.md
44
  requirements.txt
45
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  pinned: false
9
  ---
10
 
11
+ The top part of this file is needed to run a docker image in Hugging Face Docker Space (more details below).
12
+
13
  # Sentiment Analysis API
14
 
15
+ This is a small project that allows the user to fine tune or download a pretrained Sentiment Analysis model taken from this [Hugging Face repo](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest), and to then use this model via API to classify input texts into the following categories : positive, negative, and neutral.
16
+
17
+ ## Project Deliverables Overview
18
 
19
+ - βœ” Sentiment analysis model using `cardiffnlp/twitter-roberta-base-sentiment-latest`
20
+ - βœ” Fine-tuning script on public sentiment dataset (`tweet_eval`)
21
+ - βœ” CI pipeline (pytest + linting)
22
+ - βœ” CD pipeline deploying a Dockerized FastAPI app on HF Spaces
23
+ - βœ” Continuous monitoring with Prometheus + Grafana
24
+
25
+ ## Structure of the project folder:
26
 
27
  .github/
28
  └── workflows/
 
53
  README.md
54
  requirements.txt
55
 
56
+ ## Project Configuration
57
+
58
+ The file [env_config.sh](env_config.sh) defines the following environmental variables for the shell that launches the app:
59
+ - MODEL_SOURCE: "hf" or "local". It indicates whether the app should use a locally trained model ("hf") or an online published model ("local");
60
+ EVAL_SAMPLE_SIZE: how many samples to use for the app monitoring (see below for more details);
61
+ EVAL_PERIOD_MIN1: how often monitoring tasks should be run;
62
+ EVAL_BATCH_SIZE: batch size used when evaluating the model for monitoring tasks;
63
+ TRAIN_FRACTION_SIZE: fraction of the train dataset to use for training the model (the training can take some hours on my GPU and so I included the chance for a faster training)
64
+ EVAL_FRACTION_SIZE: Similar as above but just for the model evaluation during training.
65
+
66
+ ## Train Model
67
+
68
+ The user can decide to fine tune a pretrained model by running the python script [src/train_model.p](src/train_model.py) . If no dataset is already present in the project folder, the script downloads the *tweet_eval* dataset for the *sentiment* task from the same HF repo. Once the script completes, a model will be saved within the *models* folder.
69
+
70
+ ## API calls
71
+
72
+ The user can decide whether to use a locally fine tuned model or the latest available model in the HF space linked above; more on this and other app configurations is described below.
73
+ The FastAPI app implemented in src/app/app.py has a *predict* endpoint that can receive a list of input texts and for each text returns the sentiment prediction and the probability score for all possible sentiments.
74
+ For demonstration purposes the script *src/app/app_post.py* can be run to obtain some predictions.
75
+
76
+
77
+ ## CI
78
+
79
+ The file *tests/test_app.py* implements some Pytest tests that check the response of the app to post requests. These tests are run automatically when pushing to the [project github repo](https://github.com/lucabadiali/MLOPS_Project/actions) using the github action defined in the file *MLOPS_Project/.github/workflows/python-app.yml*. This action additionally runs flake8 on the project for code linting.
80
+
81
+ ## CD
82
+
83
+ The app is hosted at [this HF Docker Space](https://huggingface.co/spaces/lucabadiali/ML_OPS_Project). HF runs the docker image specified in the *Dockerfile* of the project folder. The *Dockerfile* tells the container to:
84
+ - install the required Python version;
85
+ - install the necessary packages listed in the file *requirements.txt*;
86
+ - load the configuration variables defined in the *env_config.sh* file and runs the app via *uvicorn*.
87
+
88
+ A second remote to the HF Space repo was added to my local repo, so that my local repo could push both to the github and HF Space repo. To ensure automatic synchronization between these two, I added the guthub action defined in *MLOPS_Project/.github/workflows/huggingface-space-deploy.yml*. This action, triggered every time my local repo pushes to the github repo, makes sure that the same exact project folder is also pushed to the HF Space repo. In order for this action to work I created an HF access token to my HF repo, which I then saved as github secret (see the yml file).
89
+
90
+ Overall this automization makes such that every time some edits are pushed from my local to my github repo, HF consequently builds and hosts the most updated version of the app.
91
+
92
+ ## MONITORING
93
+
94
+ A *Prometheus* image was set to scrape metrics from the HF hosted running app and to send those metrics to a *Grafana* image. Both these images were composed locally through the [docker-compose.yml](docker-compose.yml) file in the project folder. Specific settings for Prometheus, like the endpoint to scrape from, are specified in the [prometheus.yml](prometheus.yml) file.
95
+ The metrics collected by Prometheus can be read at https://lucabadiali-ml-ops-project.hf.space/metrics .
96
+
97
+ Since no real-time labelled data stream is available, the monitoring loop
98
+ uses randomly sampled labelled data from the test set to simulate incoming
99
+ data. Specifically, a job scheduler periodically runs the following tasks:
100
+ - creation of a random subset of the test data and evaluation of the model accuracy;
101
+ - creation of a random subset of the test data, model prediction and computation of sentiment distribution.
102
+
103
+ Finally on my Grafana image I created a dashbord with:
104
+ - a time series visualization of the model accuracy;
105
+ - a time series visualization of the sentiment distribution;
106
+ - a piechart visualization of the latest sentiment distribution.
107
+
108
+ Snapshots of such panels can be found in the [snapshot](snapshots) folder.
109
+
110
+
111
+
112
+
113
+
114
+
115
+
116
+
117
+
118
+
119
+
snapshots/grafana_dashboard.png ADDED

Git LFS Details

  • SHA256: 9c5b7539e2f0e80e01d80a207b621f00af40c0a868ff28fa1dd2f24e4cec2799
  • Pointer size: 131 Bytes
  • Size of remote file: 177 kB