galihboy commited on
Commit
ae27454
Β·
verified Β·
1 Parent(s): 84fcdaa

Upload 3 files

Browse files
Files changed (3) hide show
  1. README.md +82 -0
  2. app.py +755 -0
  3. requirements.txt +6 -0
README.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Semantic Embedding API
3
+ emoji: πŸ€–
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: gradio
7
+ sdk_version: "4.44.0"
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ short_description: Embedding + LLM Analysis untuk deteksi kemiripan proposal
12
+ ---
13
+
14
+ # πŸ€– Semantic Embedding & LLM Analysis API
15
+
16
+ API untuk deteksi kemiripan proposal skripsi menggunakan AI embedding dan Google Gemini.
17
+
18
+ ## Fitur
19
+
20
+ ### Embedding (Sentence Transformers)
21
+ - **Single/Batch Embedding** - Generate embedding vektor 384 dimensi
22
+ - **Similarity Check** - Hitung kemiripan semantik
23
+ - **Supabase Cache** - Shared cache untuk performa
24
+
25
+ ### LLM Analysis (Google Gemini)
26
+ - **Analisis Mendalam** - Reasoning seperti penilai manusia
27
+ - **Verdict** - AMAN / PERLU_REVIEW / BERMASALAH
28
+ - **Saran Konkret** - Rekomendasi untuk mahasiswa
29
+ - **Auto Cache** - Hasil disimpan ke Supabase
30
+
31
+ ## Model & Tech
32
+
33
+ | Komponen | Teknologi |
34
+ |----------|-----------|
35
+ | Embedding | `paraphrase-multilingual-MiniLM-L12-v2` (384 dim) |
36
+ | LLM | Google Gemini 2.5 Pro |
37
+ | Cache | Supabase (PostgreSQL) |
38
+ | API | Gradio |
39
+
40
+ ## Required Secrets
41
+
42
+ Set di **Settings > Repository secrets**:
43
+
44
+ ```
45
+ SUPABASE_URL - URL project Supabase
46
+ SUPABASE_KEY - Supabase anon/service key
47
+ GEMINI_API_KEY_1 - API key Gemini #1
48
+ GEMINI_API_KEY_2 - API key Gemini #2 (opsional)
49
+ GEMINI_API_KEY_3 - API key Gemini #3 (opsional)
50
+ GEMINI_API_KEY_4 - API key Gemini #4 (opsional)
51
+ ```
52
+
53
+ ## API Endpoints
54
+
55
+ | Endpoint | Fungsi |
56
+ |----------|--------|
57
+ | `/get_embedding` | Single text embedding |
58
+ | `/get_embeddings_batch` | Batch embeddings |
59
+ | `/calculate_similarity` | Cosine similarity |
60
+ | `/db_get_all_embeddings` | Get cached embeddings |
61
+ | `/db_save_embedding` | Save embedding (API only) |
62
+ | `/llm_check_status` | Check Gemini status |
63
+ | `/llm_analyze_pair` | Full LLM analysis |
64
+
65
+ ## Dibuat Untuk
66
+
67
+ **Monitoring Proposal Skripsi**
68
+ KK E (Ilmu Komputer) - Prodi Teknik Informatika
69
+ Universitas Komputer Indonesia (UNIKOM)
70
+
71
+ πŸ”— [Website](https://galih-hermawan-unikom.github.io/monitoring-proksi/)
72
+
73
+ ## Pengembang
74
+
75
+ **Galih Hermawan**
76
+ 🌐 [galih.eu](https://galih.eu) β€’ πŸ™ [github.com/galihboy](https://github.com/galihboy) β€’ πŸ™ [github.com/Galih-Hermawan-Unikom](https://github.com/Galih-Hermawan-Unikom)
77
+
78
+ πŸ“… Terakhir diperbarui: 30 November 2025
79
+
80
+ ## License
81
+
82
+ MIT License
app.py ADDED
@@ -0,0 +1,755 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ from sentence_transformers import SentenceTransformer
3
+ import json
4
+ import numpy as np
5
+ import os
6
+ import httpx
7
+ import hashlib
8
+
9
+ # Load environment variables from .env file (optional, for local development)
10
+ try:
11
+ from dotenv import load_dotenv
12
+ load_dotenv()
13
+ print("βœ… Loaded .env file")
14
+ except ImportError:
15
+ print("ℹ️ python-dotenv not installed, using system environment variables")
16
+
17
+ # Google GenAI SDK (new library) - optional, graceful fallback if not available
18
+ try:
19
+ from google import genai
20
+ from google.genai import types
21
+ GENAI_AVAILABLE = True
22
+ print("βœ… google-genai loaded successfully")
23
+ except ImportError as e:
24
+ GENAI_AVAILABLE = False
25
+ print(f"⚠️ google-genai not available: {e}")
26
+ genai = None
27
+ types = None
28
+
29
+ # ==================== CONFIGURATION ====================
30
+
31
+ # Model - akan auto-download dari HF Hub saat pertama kali
32
+ HF_MODEL_NAME = "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
33
+
34
+ # Path lokal untuk development (opsional, diabaikan jika tidak ada)
35
+ LOCAL_MODEL_PATH = r"E:\huggingface_models\hub\models--sentence-transformers--paraphrase-multilingual-MiniLM-L12-v2\snapshots"
36
+
37
+ # Supabase configuration (dari environment variables untuk keamanan)
38
+ # Di HF Space: Settings > Repository secrets
39
+ # Di lokal: set environment variable atau gunakan default untuk testing
40
+ SUPABASE_URL = os.environ.get("SUPABASE_URL", "")
41
+ SUPABASE_KEY = os.environ.get("SUPABASE_KEY", "")
42
+
43
+ # Gemini API configuration with key rotation
44
+ GEMINI_MODEL = os.environ.get("GEMINI_MODEL", "gemini-2.5-pro") # atau gemini-2.5-flash, gemini-2.5-flash-lite
45
+
46
+ # Load multiple API keys for rotation
47
+ GEMINI_API_KEYS = []
48
+ for i in range(1, 10): # Support up to 9 keys
49
+ key = os.environ.get(f"GEMINI_API_KEY_{i}", "")
50
+ if key:
51
+ GEMINI_API_KEYS.append(key)
52
+
53
+ # Fallback to single key if no numbered keys found
54
+ if not GEMINI_API_KEYS:
55
+ single_key = os.environ.get("GEMINI_API_KEY", "")
56
+ if single_key:
57
+ GEMINI_API_KEYS.append(single_key)
58
+
59
+ # Track current key index for rotation
60
+ current_key_index = 0
61
+
62
+ def get_gemini_client():
63
+ """Get Gemini client with current API key"""
64
+ global current_key_index
65
+ if not GENAI_AVAILABLE or genai is None:
66
+ return None
67
+ if not GEMINI_API_KEYS:
68
+ return None
69
+ return genai.Client(api_key=GEMINI_API_KEYS[current_key_index])
70
+
71
+ def rotate_api_key():
72
+ """Rotate to next API key"""
73
+ global current_key_index
74
+ if len(GEMINI_API_KEYS) > 1:
75
+ current_key_index = (current_key_index + 1) % len(GEMINI_API_KEYS)
76
+ print(f"πŸ”„ Rotated to API key #{current_key_index + 1}")
77
+ return current_key_index
78
+
79
+ def call_gemini_with_retry(prompt: str, max_retries: int = None):
80
+ """Call Gemini API with automatic key rotation on rate limit"""
81
+ global current_key_index
82
+
83
+ if not GEMINI_API_KEYS:
84
+ return None, "No API keys configured"
85
+
86
+ if max_retries is None:
87
+ max_retries = len(GEMINI_API_KEYS)
88
+
89
+ last_error = None
90
+
91
+ for attempt in range(max_retries):
92
+ try:
93
+ client = get_gemini_client()
94
+ response = client.models.generate_content(
95
+ model=GEMINI_MODEL,
96
+ contents=prompt
97
+ )
98
+ return response, None
99
+
100
+ except Exception as e:
101
+ error_str = str(e).lower()
102
+ last_error = str(e)
103
+
104
+ # Check if rate limit error
105
+ if "429" in error_str or "rate" in error_str or "quota" in error_str or "resource" in error_str:
106
+ print(f"⚠️ Rate limit hit on key #{current_key_index + 1}: {e}")
107
+ rotate_api_key()
108
+ continue
109
+ else:
110
+ # Non-rate-limit error, don't retry
111
+ return None, str(e)
112
+
113
+ return None, f"All API keys exhausted. Last error: {last_error}"
114
+
115
+ # Initialize and print status
116
+ if GEMINI_API_KEYS:
117
+ print(f"βœ… Gemini configured with {len(GEMINI_API_KEYS)} API key(s)")
118
+ print(f" Model: {GEMINI_MODEL}")
119
+ else:
120
+ print("⚠️ No Gemini API keys found")
121
+
122
+ def get_model_path():
123
+ """Deteksi environment dan return path model yang sesuai"""
124
+ # Cek apakah folder lokal ada
125
+ if os.path.exists(LOCAL_MODEL_PATH):
126
+ # Cari snapshot terbaru
127
+ snapshots = os.listdir(LOCAL_MODEL_PATH)
128
+ if snapshots:
129
+ return os.path.join(LOCAL_MODEL_PATH, snapshots[0])
130
+ # Fallback ke HF Hub (untuk deployment di Space)
131
+ return HF_MODEL_NAME
132
+
133
+ # Load model saat startup
134
+ print("Loading model...")
135
+ model = None
136
+ try:
137
+ model_path = get_model_path()
138
+ print(f"Using model from: {model_path}")
139
+ model = SentenceTransformer(model_path)
140
+ print("βœ… Model loaded successfully!")
141
+ except Exception as e:
142
+ print(f"❌ Failed to load model: {e}")
143
+ model = None
144
+
145
+
146
+ def get_embedding(text: str):
147
+ """Generate embedding untuk single text"""
148
+ if model is None:
149
+ return {"error": "Model not loaded"}
150
+ if not text or not text.strip():
151
+ return {"error": "Text tidak boleh kosong"}
152
+
153
+ try:
154
+ embedding = model.encode(text.strip())
155
+ return {"embedding": embedding.tolist()}
156
+ except Exception as e:
157
+ return {"error": str(e)}
158
+
159
+
160
+ def get_embeddings_batch(texts_json: str):
161
+ """Generate embeddings untuk multiple texts (JSON array)"""
162
+ try:
163
+ texts = json.loads(texts_json)
164
+ if not isinstance(texts, list):
165
+ return {"error": "Input harus JSON array"}
166
+
167
+ if len(texts) == 0:
168
+ return {"error": "Array tidak boleh kosong"}
169
+
170
+ # Filter empty strings
171
+ texts = [t.strip() for t in texts if t and t.strip()]
172
+
173
+ if len(texts) == 0:
174
+ return {"error": "Semua text kosong"}
175
+
176
+ embeddings = model.encode(texts)
177
+ return {"embeddings": embeddings.tolist()}
178
+ except json.JSONDecodeError:
179
+ return {"error": "Invalid JSON format. Gunakan format: [\"teks 1\", \"teks 2\"]"}
180
+ except Exception as e:
181
+ return {"error": str(e)}
182
+
183
+
184
+ def calculate_similarity(text1: str, text2: str):
185
+ """Hitung cosine similarity antara dua teks"""
186
+ if not text1 or not text1.strip():
187
+ return {"error": "Text 1 tidak boleh kosong"}
188
+ if not text2 or not text2.strip():
189
+ return {"error": "Text 2 tidak boleh kosong"}
190
+
191
+ try:
192
+ embeddings = model.encode([text1.strip(), text2.strip()])
193
+
194
+ # Cosine similarity
195
+ similarity = np.dot(embeddings[0], embeddings[1]) / (
196
+ np.linalg.norm(embeddings[0]) * np.linalg.norm(embeddings[1])
197
+ )
198
+
199
+ return {
200
+ "similarity": float(similarity),
201
+ "percentage": f"{similarity * 100:.2f}%"
202
+ }
203
+ except Exception as e:
204
+ return {"error": str(e)}
205
+
206
+
207
+ # ==================== SUPABASE PROXY FUNCTIONS ====================
208
+
209
+ def get_supabase_headers():
210
+ """Get headers untuk Supabase API calls"""
211
+ return {
212
+ "apikey": SUPABASE_KEY,
213
+ "Authorization": f"Bearer {SUPABASE_KEY}",
214
+ "Content-Type": "application/json",
215
+ "Prefer": "return=representation"
216
+ }
217
+
218
+
219
+ def db_get_all_embeddings():
220
+ """Ambil semua embeddings dari Supabase"""
221
+ if not SUPABASE_URL or not SUPABASE_KEY:
222
+ return {"error": "Supabase not configured"}
223
+
224
+ try:
225
+ url = f"{SUPABASE_URL}/rest/v1/proposal_embeddings?select=nim,content_hash,embedding_combined,embedding_judul,embedding_deskripsi,embedding_problem,embedding_metode,nama,judul"
226
+
227
+ with httpx.Client(timeout=30.0) as client:
228
+ response = client.get(url, headers=get_supabase_headers())
229
+
230
+ if response.status_code == 200:
231
+ return {"data": response.json(), "count": len(response.json())}
232
+ else:
233
+ return {"error": f"Supabase error: {response.status_code}", "detail": response.text}
234
+ except Exception as e:
235
+ return {"error": str(e)}
236
+
237
+
238
+ def db_get_embedding(nim: str, content_hash: str):
239
+ """Ambil embedding untuk NIM dan content_hash tertentu"""
240
+ if not SUPABASE_URL or not SUPABASE_KEY:
241
+ return {"error": "Supabase not configured"}
242
+
243
+ try:
244
+ url = f"{SUPABASE_URL}/rest/v1/proposal_embeddings?nim=eq.{nim}&content_hash=eq.{content_hash}&select=*"
245
+
246
+ with httpx.Client(timeout=30.0) as client:
247
+ response = client.get(url, headers=get_supabase_headers())
248
+
249
+ if response.status_code == 200:
250
+ data = response.json()
251
+ return {"data": data[0] if data else None, "found": len(data) > 0}
252
+ else:
253
+ return {"error": f"Supabase error: {response.status_code}"}
254
+ except Exception as e:
255
+ return {"error": str(e)}
256
+
257
+
258
+ def db_save_embedding(data_json: str):
259
+ """Simpan embedding ke Supabase (upsert)"""
260
+ if not SUPABASE_URL or not SUPABASE_KEY:
261
+ return {"error": "Supabase not configured"}
262
+
263
+ try:
264
+ data = json.loads(data_json)
265
+
266
+ # Validate required fields
267
+ if not data.get("nim") or not data.get("content_hash"):
268
+ return {"error": "nim and content_hash are required"}
269
+
270
+ if not data.get("embedding_combined"):
271
+ return {"error": "embedding_combined is required"}
272
+
273
+ url = f"{SUPABASE_URL}/rest/v1/proposal_embeddings"
274
+ headers = get_supabase_headers()
275
+ headers["Prefer"] = "resolution=merge-duplicates,return=representation"
276
+
277
+ payload = {
278
+ "nim": data["nim"],
279
+ "content_hash": data["content_hash"],
280
+ "embedding_combined": data["embedding_combined"],
281
+ "embedding_judul": data.get("embedding_judul"),
282
+ "embedding_deskripsi": data.get("embedding_deskripsi"),
283
+ "embedding_problem": data.get("embedding_problem"),
284
+ "embedding_metode": data.get("embedding_metode"),
285
+ "nama": data.get("nama"),
286
+ "judul": data.get("judul")
287
+ }
288
+
289
+ with httpx.Client(timeout=30.0) as client:
290
+ response = client.post(url, headers=headers, json=payload)
291
+
292
+ if response.status_code in [200, 201]:
293
+ return {"success": True, "data": response.json()}
294
+ else:
295
+ return {"error": f"Supabase error: {response.status_code}", "detail": response.text}
296
+ except json.JSONDecodeError:
297
+ return {"error": "Invalid JSON format"}
298
+ except Exception as e:
299
+ return {"error": str(e)}
300
+
301
+
302
+ def db_check_connection():
303
+ """Test koneksi ke Supabase"""
304
+ if not SUPABASE_URL or not SUPABASE_KEY:
305
+ return {"connected": False, "error": "Supabase URL or KEY not configured"}
306
+
307
+ try:
308
+ url = f"{SUPABASE_URL}/rest/v1/proposal_embeddings?select=id&limit=1"
309
+
310
+ with httpx.Client(timeout=10.0) as client:
311
+ response = client.get(url, headers=get_supabase_headers())
312
+
313
+ return {
314
+ "connected": response.status_code == 200,
315
+ "status_code": response.status_code,
316
+ "supabase_url": SUPABASE_URL[:30] + "..." if len(SUPABASE_URL) > 30 else SUPABASE_URL
317
+ }
318
+ except Exception as e:
319
+ return {"connected": False, "error": str(e)}
320
+
321
+
322
+ # ==================== LLM CACHE FUNCTIONS (SUPABASE) ====================
323
+
324
+ def db_get_llm_analysis(pair_hash: str):
325
+ """Ambil cached LLM analysis dari Supabase by pair_hash"""
326
+ if not SUPABASE_URL or not SUPABASE_KEY:
327
+ return None
328
+
329
+ try:
330
+ url = f"{SUPABASE_URL}/rest/v1/llm_analysis?pair_hash=eq.{pair_hash}&select=*"
331
+
332
+ with httpx.Client(timeout=10.0) as client:
333
+ response = client.get(url, headers=get_supabase_headers())
334
+
335
+ if response.status_code == 200:
336
+ data = response.json()
337
+ if data and len(data) > 0:
338
+ result = data[0]
339
+ # Parse similar_aspects from JSONB
340
+ if isinstance(result.get('similar_aspects'), str):
341
+ result['similar_aspects'] = json.loads(result['similar_aspects'])
342
+ result['from_cache'] = True
343
+ return result
344
+ return None
345
+ except Exception as e:
346
+ print(f"Error getting cached LLM analysis: {e}")
347
+ return None
348
+
349
+
350
+ def db_save_llm_analysis(pair_hash: str, proposal1_judul: str, proposal2_judul: str, result: dict):
351
+ """Simpan LLM analysis result ke Supabase"""
352
+ if not SUPABASE_URL or not SUPABASE_KEY:
353
+ return False
354
+
355
+ try:
356
+ url = f"{SUPABASE_URL}/rest/v1/llm_analysis"
357
+ headers = get_supabase_headers()
358
+ headers["Prefer"] = "resolution=merge-duplicates" # Upsert
359
+
360
+ payload = {
361
+ "pair_hash": pair_hash,
362
+ "proposal1_judul": proposal1_judul[:500] if proposal1_judul else "",
363
+ "proposal2_judul": proposal2_judul[:500] if proposal2_judul else "",
364
+ "similarity_score": result.get("similarity_score"),
365
+ "verdict": result.get("verdict"),
366
+ "reasoning": result.get("reasoning"),
367
+ "saran": result.get("saran"),
368
+ "similar_aspects": json.dumps(result.get("similar_aspects", {})),
369
+ "differentiator": result.get("differentiator"),
370
+ "model_used": result.get("model_used", GEMINI_MODEL)
371
+ }
372
+
373
+ with httpx.Client(timeout=10.0) as client:
374
+ response = client.post(url, headers=headers, json=payload)
375
+
376
+ if response.status_code in [200, 201]:
377
+ print(f"βœ… LLM result cached: {pair_hash[:8]}...")
378
+ return True
379
+ else:
380
+ print(f"⚠️ Failed to cache LLM result: {response.status_code}")
381
+ return False
382
+ except Exception as e:
383
+ print(f"Error saving LLM analysis: {e}")
384
+ return False
385
+
386
+
387
+ # ==================== LLM FUNCTIONS (GEMINI) ====================
388
+
389
+ def generate_pair_hash(proposal1: dict, proposal2: dict) -> str:
390
+ """Generate unique hash untuk pasangan proposal"""
391
+ def proposal_hash(p):
392
+ content = f"{p.get('nim', '')}|{p.get('judul', '')}|{p.get('deskripsi', '')}|{p.get('problem', '')}|{p.get('metode', '')}"
393
+ return hashlib.md5(content.encode()).hexdigest()[:16]
394
+
395
+ h1 = proposal_hash(proposal1)
396
+ h2 = proposal_hash(proposal2)
397
+ # Sort untuk konsistensi (A,B = B,A)
398
+ sorted_hashes = sorted([h1, h2])
399
+ return hashlib.md5(f"{sorted_hashes[0]}|{sorted_hashes[1]}".encode()).hexdigest()[:32]
400
+
401
+
402
+ def llm_analyze_pair(proposal1_json: str, proposal2_json: str, use_cache: bool = True):
403
+ """Analisis kemiripan dua proposal menggunakan Gemini LLM"""
404
+ if not GEMINI_API_KEYS:
405
+ return {"error": "Gemini API key not configured. Set GEMINI_API_KEY_1, GEMINI_API_KEY_2, etc in .env file"}
406
+
407
+ try:
408
+ proposal1 = json.loads(proposal1_json)
409
+ proposal2 = json.loads(proposal2_json)
410
+ except json.JSONDecodeError:
411
+ return {"error": "Invalid JSON format for proposals"}
412
+
413
+ # Generate pair hash untuk caching
414
+ pair_hash = generate_pair_hash(proposal1, proposal2)
415
+
416
+ # Check cache first
417
+ if use_cache:
418
+ cached_result = db_get_llm_analysis(pair_hash)
419
+ if cached_result:
420
+ print(f"πŸ“¦ Using cached LLM result: {pair_hash[:8]}...")
421
+ return cached_result
422
+
423
+ # Build prompt
424
+ prompt = f"""Anda adalah penilai kemiripan proposal skripsi yang ahli dan berpengalaman. Analisis dua proposal berikut dengan KRITERIA AKADEMIK yang benar.
425
+
426
+ ATURAN PENILAIAN PENTING:
427
+ 1. Proposal skripsi dianggap BERMASALAH hanya jika KETIGA aspek ini SAMA: Topik/Domain + Dataset/Objek Penelitian + Metode/Algoritma
428
+ 2. Jika METODE BERBEDA (walaupun topik & dataset sama) β†’ AMAN, karena memberikan kontribusi ilmiah berbeda
429
+ 3. Jika DATASET/OBJEK BERBEDA (walaupun topik & metode sama) β†’ AMAN, karena studi kasus berbeda
430
+ 4. Jika TOPIK/DOMAIN BERBEDA β†’ AMAN
431
+ 5. Penelitian replikasi dengan variasi adalah HAL YANG WAJAR dalam dunia akademik
432
+
433
+ PROPOSAL 1:
434
+ - NIM: {proposal1.get('nim', 'N/A')}
435
+ - Nama: {proposal1.get('nama', 'N/A')}
436
+ - Judul: {proposal1.get('judul', 'N/A')}
437
+ - Deskripsi: {proposal1.get('deskripsi', 'N/A')[:500] if proposal1.get('deskripsi') else 'N/A'}
438
+ - Problem Statement: {proposal1.get('problem', 'N/A')[:500] if proposal1.get('problem') else 'N/A'}
439
+ - Metode: {proposal1.get('metode', 'N/A')}
440
+
441
+ PROPOSAL 2:
442
+ - NIM: {proposal2.get('nim', 'N/A')}
443
+ - Nama: {proposal2.get('nama', 'N/A')}
444
+ - Judul: {proposal2.get('judul', 'N/A')}
445
+ - Deskripsi: {proposal2.get('deskripsi', 'N/A')[:500] if proposal2.get('deskripsi') else 'N/A'}
446
+ - Problem Statement: {proposal2.get('problem', 'N/A')[:500] if proposal2.get('problem') else 'N/A'}
447
+ - Metode: {proposal2.get('metode', 'N/A')}
448
+
449
+ ANALISIS dengan cermat, lalu berikan output JSON (HANYA JSON, tanpa markdown):
450
+ {{
451
+ "similarity_score": <0-100, tinggi HANYA jika topik+dataset+metode SEMUA sama>,
452
+ "verdict": "<BERMASALAH jika score>=80, PERLU_REVIEW jika 50-79, AMAN jika <50>",
453
+ "similar_aspects": {{
454
+ "topik": <true/false - apakah tema/domain penelitian sama>,
455
+ "dataset": <true/false - apakah objek/data penelitian sama>,
456
+ "metode": <true/false - apakah algoritma/metode sama>,
457
+ "pendekatan": <true/false - apakah framework/pendekatan sama>
458
+ }},
459
+ "differentiator": "<aspek pembeda utama: metode/dataset/domain/tidak_ada>",
460
+ "reasoning": "<analisis mendalam 4-5 kalimat: jelaskan persamaan dan perbedaan dari aspek topik, dataset, dan metode. Jelaskan mengapa proposal ini aman/bermasalah berdasarkan kriteria akademik>",
461
+ "saran": "<nasihat konstruktif 2-3 kalimat untuk mahasiswa: jika aman, beri saran penguatan diferensiasi. Jika bermasalah, beri warning dan alternatif arah penelitian>"
462
+ }}"""
463
+
464
+ # Call Gemini API with retry/rotation
465
+ response, error = call_gemini_with_retry(prompt)
466
+
467
+ if error:
468
+ return {"error": f"Gemini API error: {error}"}
469
+
470
+ try:
471
+ # Parse response
472
+ response_text = response.text.strip()
473
+
474
+ # Clean response (remove markdown code blocks if present)
475
+ if response_text.startswith("```"):
476
+ lines = response_text.split("\n")
477
+ response_text = "\n".join(lines[1:-1]) # Remove first and last lines
478
+
479
+ result = json.loads(response_text)
480
+ result["pair_hash"] = pair_hash
481
+ result["model_used"] = GEMINI_MODEL
482
+ result["api_key_used"] = current_key_index + 1
483
+ result["from_cache"] = False
484
+
485
+ # Save to cache
486
+ db_save_llm_analysis(
487
+ pair_hash=pair_hash,
488
+ proposal1_judul=proposal1.get('judul', ''),
489
+ proposal2_judul=proposal2.get('judul', ''),
490
+ result=result
491
+ )
492
+
493
+ return result
494
+
495
+ except json.JSONDecodeError as e:
496
+ return {
497
+ "error": "Failed to parse LLM response as JSON",
498
+ "raw_response": response_text if 'response_text' in dir() else "No response",
499
+ "parse_error": str(e)
500
+ }
501
+
502
+
503
+ def llm_check_status():
504
+ """Check Gemini API status"""
505
+ if not GENAI_AVAILABLE:
506
+ return {
507
+ "configured": False,
508
+ "error": "google-genai package not available"
509
+ }
510
+ if not GEMINI_API_KEYS:
511
+ return {
512
+ "configured": False,
513
+ "error": "No GEMINI_API_KEY found in environment"
514
+ }
515
+
516
+ response, error = call_gemini_with_retry("Respond with only: OK")
517
+
518
+ if error:
519
+ return {
520
+ "configured": True,
521
+ "total_keys": len(GEMINI_API_KEYS),
522
+ "model": GEMINI_MODEL,
523
+ "status": "error",
524
+ "error": error
525
+ }
526
+
527
+ return {
528
+ "configured": True,
529
+ "total_keys": len(GEMINI_API_KEYS),
530
+ "current_key": current_key_index + 1,
531
+ "model": GEMINI_MODEL,
532
+ "status": "connected",
533
+ "test_response": response.text.strip()[:50]
534
+ }
535
+
536
+
537
+ def llm_analyze_simple(judul1: str, judul2: str, metode1: str, metode2: str):
538
+ """Simplified analysis - hanya judul dan metode (untuk testing cepat)"""
539
+ if not GEMINI_API_KEYS:
540
+ return {"error": "Gemini API key not configured"}
541
+
542
+ prompt = f"""Anda adalah penilai kemiripan proposal skripsi yang ahli. Bandingkan dua proposal berikut dengan KRITERIA AKADEMIK yang benar.
543
+
544
+ ATURAN PENILAIAN PENTING:
545
+ 1. Proposal skripsi dianggap BERMASALAH hanya jika KETIGA aspek ini SAMA: Topik/Domain + Dataset + Metode
546
+ 2. Jika METODE BERBEDA (walaupun topik sama) β†’ AMAN, karena kontribusi berbeda
547
+ 3. Jika DATASET BERBEDA (walaupun topik & metode sama) β†’ AMAN, karena studi kasus berbeda
548
+ 4. Jika TOPIK/DOMAIN BERBEDA β†’ AMAN
549
+
550
+ Proposal 1:
551
+ - Judul: {judul1}
552
+ - Metode: {metode1}
553
+
554
+ Proposal 2:
555
+ - Judul: {judul2}
556
+ - Metode: {metode2}
557
+
558
+ ANALISIS dengan cermat, lalu berikan output JSON (HANYA JSON, tanpa markdown):
559
+ {{
560
+ "similarity_score": <0-100, tinggi HANYA jika topik+dataset+metode SEMUA sama>,
561
+ "verdict": "<BERMASALAH jika score>=80, PERLU_REVIEW jika 50-79, AMAN jika <50>",
562
+ "topik_sama": <true/false>,
563
+ "metode_sama": <true/false>,
564
+ "differentiator": "<aspek pembeda utama: metode/dataset/domain/tidak_ada>",
565
+ "reasoning": "<analisis mendalam 3-4 kalimat: jelaskan persamaan, perbedaan, dan mengapa aman/bermasalah>",
566
+ "saran": "<nasihat konstruktif untuk mahasiswa, misal: cara memperkuat diferensiasi, atau warning jika terlalu mirip>"
567
+ }}"""
568
+
569
+ response, error = call_gemini_with_retry(prompt)
570
+
571
+ if error:
572
+ return {"error": error}
573
+
574
+ try:
575
+ response_text = response.text.strip()
576
+
577
+ if response_text.startswith("```"):
578
+ lines = response_text.split("\n")
579
+ response_text = "\n".join(lines[1:-1])
580
+
581
+ result = json.loads(response_text)
582
+ result["model_used"] = GEMINI_MODEL
583
+ result["api_key_used"] = current_key_index + 1
584
+ return result
585
+
586
+ except json.JSONDecodeError as e:
587
+ return {"error": f"Failed to parse response: {e}", "raw": response_text}
588
+
589
+
590
+ # Gradio Interface
591
+ with gr.Blocks(title="Semantic Embedding API") as demo:
592
+ gr.Markdown("# πŸ”€ Semantic Embedding API")
593
+ gr.Markdown("API untuk menghasilkan text embedding menggunakan `paraphrase-multilingual-MiniLM-L12-v2`")
594
+ gr.Markdown("**Model**: Multilingual, mendukung 50+ bahasa termasuk Bahasa Indonesia")
595
+
596
+ with gr.Tab("πŸ”’ Single Embedding"):
597
+ gr.Markdown("Generate embedding vector untuk satu teks")
598
+ text_input = gr.Textbox(
599
+ label="Input Text",
600
+ placeholder="Masukkan teks untuk di-embed...",
601
+ lines=2
602
+ )
603
+ single_output = gr.JSON(label="Embedding Result")
604
+ single_btn = gr.Button("Generate Embedding", variant="primary")
605
+ single_btn.click(fn=get_embedding, inputs=text_input, outputs=single_output)
606
+
607
+ with gr.Tab("πŸ“¦ Batch Embedding"):
608
+ gr.Markdown("Generate embeddings untuk multiple teks sekaligus")
609
+ batch_input = gr.Textbox(
610
+ label="JSON Array of Texts",
611
+ placeholder='["teks pertama", "teks kedua", "teks ketiga"]',
612
+ lines=4
613
+ )
614
+ batch_output = gr.JSON(label="Embeddings Result")
615
+ batch_btn = gr.Button("Generate Embeddings", variant="primary")
616
+ batch_btn.click(fn=get_embeddings_batch, inputs=batch_input, outputs=batch_output)
617
+
618
+ with gr.Tab("πŸ“Š Similarity Check"):
619
+ gr.Markdown("Hitung kemiripan semantik antara dua teks")
620
+ with gr.Row():
621
+ sim_text1 = gr.Textbox(label="Text 1", placeholder="Teks pertama...", lines=2)
622
+ sim_text2 = gr.Textbox(label="Text 2", placeholder="Teks kedua...", lines=2)
623
+ sim_output = gr.JSON(label="Similarity Result")
624
+ sim_btn = gr.Button("Calculate Similarity", variant="primary")
625
+ sim_btn.click(fn=calculate_similarity, inputs=[sim_text1, sim_text2], outputs=sim_output)
626
+
627
+ with gr.Tab("πŸ’Ύ Database (Supabase)"):
628
+ gr.Markdown("### Supabase Cache Operations")
629
+ gr.Markdown("Proxy untuk akses Supabase (API key aman di server)")
630
+ gr.Markdown("*Note: Operasi write (save) hanya tersedia melalui API untuk keamanan.*")
631
+
632
+ with gr.Row():
633
+ db_check_btn = gr.Button("πŸ”Œ Check Connection", variant="secondary")
634
+ db_check_output = gr.JSON(label="Connection Status")
635
+ db_check_btn.click(fn=db_check_connection, outputs=db_check_output)
636
+
637
+ gr.Markdown("---")
638
+
639
+ gr.Markdown("#### Get All Cached Embeddings")
640
+ db_all_btn = gr.Button("πŸ“₯ Get All Embeddings", variant="primary")
641
+ db_all_output = gr.JSON(label="All Embeddings")
642
+ db_all_btn.click(fn=db_get_all_embeddings, outputs=db_all_output)
643
+
644
+ gr.Markdown("---")
645
+
646
+ gr.Markdown("#### Get Single Embedding by NIM")
647
+ with gr.Row():
648
+ db_nim_input = gr.Textbox(label="NIM", placeholder="10121xxx")
649
+ db_hash_input = gr.Textbox(label="Content Hash", placeholder="abc123...")
650
+ db_get_btn = gr.Button("πŸ” Get Embedding", variant="primary")
651
+ db_get_output = gr.JSON(label="Embedding Result")
652
+ db_get_btn.click(fn=db_get_embedding, inputs=[db_nim_input, db_hash_input], outputs=db_get_output)
653
+
654
+ with gr.Tab("πŸ€– LLM Analysis (Gemini)"):
655
+ gr.Markdown("### Analisis Kemiripan dengan LLM")
656
+ gr.Markdown("Menggunakan Google Gemini untuk analisis mendalam dengan penjelasan")
657
+
658
+ with gr.Row():
659
+ llm_check_btn = gr.Button("πŸ”Œ Check Gemini Status", variant="secondary")
660
+ llm_check_output = gr.JSON(label="Gemini Status")
661
+ llm_check_btn.click(fn=llm_check_status, outputs=llm_check_output)
662
+
663
+ gr.Markdown("---")
664
+
665
+ gr.Markdown("#### Quick Analysis (Judul + Metode saja)")
666
+ with gr.Row():
667
+ with gr.Column():
668
+ llm_judul1 = gr.Textbox(label="Judul Proposal 1", placeholder="Analisis Sentimen dengan SVM...", lines=2)
669
+ llm_metode1 = gr.Textbox(label="Metode 1", placeholder="Support Vector Machine")
670
+ with gr.Column():
671
+ llm_judul2 = gr.Textbox(label="Judul Proposal 2", placeholder="Klasifikasi Sentimen dengan SVM...", lines=2)
672
+ llm_metode2 = gr.Textbox(label="Metode 2", placeholder="Support Vector Machine")
673
+
674
+ llm_simple_btn = gr.Button("πŸš€ Analyze (Quick)", variant="primary")
675
+ llm_simple_output = gr.JSON(label="Quick Analysis Result")
676
+ llm_simple_btn.click(
677
+ fn=llm_analyze_simple,
678
+ inputs=[llm_judul1, llm_judul2, llm_metode1, llm_metode2],
679
+ outputs=llm_simple_output
680
+ )
681
+
682
+ gr.Markdown("---")
683
+
684
+ gr.Markdown("#### Full Analysis (Complete Proposal Data)")
685
+ gr.Markdown("*Hasil di-cache ke Supabase. Request yang sama akan menggunakan cache.*")
686
+ with gr.Row():
687
+ llm_proposal1 = gr.Textbox(
688
+ label="Proposal 1 (JSON)",
689
+ placeholder='{"nim": "123", "nama": "Ahmad", "judul": "...", "deskripsi": "...", "problem": "...", "metode": "..."}',
690
+ lines=5
691
+ )
692
+ llm_proposal2 = gr.Textbox(
693
+ label="Proposal 2 (JSON)",
694
+ placeholder='{"nim": "456", "nama": "Budi", "judul": "...", "deskripsi": "...", "problem": "...", "metode": "..."}',
695
+ lines=5
696
+ )
697
+
698
+ with gr.Row():
699
+ llm_use_cache = gr.Checkbox(label="Gunakan Cache", value=True, info="Uncheck untuk force refresh dari Gemini")
700
+ llm_full_btn = gr.Button("πŸ” Analyze (Full)", variant="primary")
701
+
702
+ llm_full_output = gr.JSON(label="Full Analysis Result")
703
+ llm_full_btn.click(
704
+ fn=llm_analyze_pair,
705
+ inputs=[llm_proposal1, llm_proposal2, llm_use_cache],
706
+ outputs=llm_full_output
707
+ )
708
+
709
+ gr.Markdown("""
710
+ **Output mencakup:**
711
+ - `similarity_score`: Skor 0-100 (tinggi hanya jika topik+dataset+metode sama)
712
+ - `verdict`: BERMASALAH / PERLU_REVIEW / AMAN
713
+ - `reasoning`: Analisis mendalam dari AI
714
+ - `similar_aspects`: Aspek yang mirip (topik/dataset/metode/pendekatan)
715
+ - `differentiator`: Pembeda utama
716
+ - `saran`: Nasihat untuk mahasiswa
717
+ - `from_cache`: true jika hasil dari cache
718
+ """)
719
+
720
+ with gr.Accordion("πŸ“‘ API Usage (untuk Developer)", open=False):
721
+ gr.Markdown("""
722
+ ### Endpoints
723
+
724
+ #### Embedding
725
+ - `get_embedding` - Single text embedding
726
+ - `get_embeddings_batch` - Batch text embeddings
727
+ - `calculate_similarity` - Compare two texts
728
+
729
+ #### Database (Supabase Proxy)
730
+ - `db_check_connection` - Test Supabase connection
731
+ - `db_get_all_embeddings` - Get all cached embeddings
732
+ - `db_get_embedding` - Get embedding by NIM + hash
733
+ - `db_save_embedding` - Save embedding to cache
734
+
735
+ ### Example API Call
736
+ ```javascript
737
+ // Get all cached embeddings
738
+ const response = await fetch("YOUR_SPACE_URL/gradio_api/call/db_get_all_embeddings", {
739
+ method: "POST",
740
+ headers: { "Content-Type": "application/json" },
741
+ body: JSON.stringify({ data: [] })
742
+ });
743
+ const result = await response.json();
744
+ const eventId = result.event_id;
745
+
746
+ // Get result
747
+ const dataResponse = await fetch(`YOUR_SPACE_URL/gradio_api/call/db_get_all_embeddings/${eventId}`);
748
+ ```
749
+ """)
750
+
751
+ gr.Markdown("---")
752
+ gr.Markdown("*Dibuat untuk Monitoring Proposal Skripsi KK E - UNIKOM*")
753
+
754
+ # Launch dengan API enabled
755
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ gradio>=4.0.0
2
+ sentence-transformers>=2.2.0
3
+ torch
4
+ numpy
5
+ httpx>=0.24.0
6
+ google-genai>=1.0.0