Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

LICENSE +17 -0
README.md +197 -0
model.py +193 -0
model.safetensors +3 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,17 @@

+Apache License
+Version 2.0, January 2004
+http://www.apache.org/licenses/
+Copyright (c) 2025 HighkeyPrxneeth
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.

README.md ADDED Viewed

	@@ -0,0 +1,197 @@

+---
+language:
+- "en"
+pretty_name: "ModernTrajectoryNet: Transaction Embedding Classifier"
+tags:
+- embedding
+- pytorch
+- finance
+- transaction-classifier
+- contrastive-learning
+license: "apache-2.0"
+datasets:
+- "HighkeyPrxneeth/BusinessTransactions"
+library_name: "pytorch"
+---
+# ModernTrajectoryNet: Transaction Embedding Classifier
+A state-of-the-art PyTorch embedding classifier trained with modern deep learning techniques for transaction categorization. The model learns to project transaction embeddings toward their target category embeddings through trajectory-based contrastive learning.
+## Model Architecture
+**ModernTrajectoryNet** combines several modern architectural innovations:
+### Core Components
+1. **RMSNorm (Root Mean Square Layer Normalization)**
+   - More stable and computationally efficient than LayerNorm
+   - Used in LLaMA, PaLM, and Gopher
+   - Provides consistent gradient flow through deep networks
+2. **SwiGLU (Swish-Gated Linear Unit)**
+   - SOTA activation function for feed-forward networks
+   - Outperforms GELU and ReLU in expressivity
+   - Gate mechanism: `(x * sigmoid(x)) * linear(x)`
+3. **SEBlock (Squeeze-and-Excitation)**
+   - Channel attention mechanism
+   - Allows dynamic weighting of embedding dimensions
+   - Context-aware feature recalibration
+4. **ModernBlock (Pre-Norm Architecture)**
+   - RMSNorm → SwiGLU → SEBlock → Residual Connection
+   - Incorporates layer scaling and stochastic depth (DropPath)
+   - Enables training of very deep networks
+### Configuration
+- **Input dimension**: 768 (embedding size)
+- **Hidden layers**: 12 transformer-style blocks
+- **Expansion ratio**: 4x hidden dimension in SwiGLU
+- **Dropout**: 0.1
+- **Stochastic depth**: Linear decay across layers (0.0 → 0.1)
+## Training Objective: Hybrid Trajectory Learning
+The model is trained with **HybridTrajectoryLoss**, combining two objectives:
+### 1. Adaptive InfoNCE (Contrastive Component)
+- Learnable temperature parameter for dynamic scaling
+- Contrastive loss with label smoothing (0.1)
+- Ensures the model maps input embeddings close to their true target embedding
+- Equation: `L_contrastive = CrossEntropy(logits / T, labels)`
+### 2. Monotonic Ranking (Trajectory Component)
+- Enforces **monotonically increasing similarity** through the transaction sequence
+- Each step in the trajectory should have higher similarity than the previous step
+- Final embedding must achieve high similarity (ideally 1.0) with target
+- Margin constraint: `sim[i+1] > sim[i] + 0.01`
+- Ensures the model learns the **path** to the target, not just the endpoint
+### Loss Formulation
+```
+Total Loss = InfoNCE Loss + Monotonicity Loss
+```
+**Why Trajectory Learning?**
+- Transactions often evolve gradually toward their correct category
+- Intermediate embeddings should show progression toward the target
+- This inductive bias improves generalization and interpretability
+## Training Details
+- **Optimizer**: AdamW with weight decay (1e-4)
+- **Learning rate**: Cosine annealing from 3e-4 to 1e-6
+- **Batch size**: 128
+- **Gradient clipping**: 1.0
+- **Epochs**: 50 with early stopping (patience=5)
+- **EMA (Exponential Moving Average)**: Decay=0.99 for evaluation stability
+- **Augmentation**: Input masking (p=0.15) and Gaussian noise (std=0.01) during training
+- **Mixed Precision**: AMP enabled for faster training on CUDA
+## Performance Metrics
+The model optimizes for:
+1. **Last Similarity**: Similarity of final embedding with target (Target: ≈1.0)
+2. **Monotonicity Accuracy**: % of transitions with strictly increasing similarity (Target: 100%)
+3. **Contrastive Accuracy**: Ability to distinguish true target from other targets in batch
+## How to Load
+```python
+from safetensors.torch import load_file
+import torch
+from config import Config
+from model import ModernTrajectoryNet
+# Load weights
+weights = load_file("model.safetensors")
+# Instantiate model
+config = Config()
+model = ModernTrajectoryNet(config)
+model.load_state_dict(weights)
+model.eval()
+# Use model
+with torch.no_grad():
+    input_embedding = torch.randn(1, 768)  # Your transaction embedding
+    output_embedding = model(input_embedding)
+    print(output_embedding.shape)  # [1, 768]
+```
+## Usage Example
+```python
+import torch
+from torch.nn.functional import normalize
+# Assuming you have transaction embeddings and category embeddings
+transaction_emb = model(input_embedding)  # [B, 768]
+# Compute similarity with category embeddings
+category_embs = normalize(category_embeddings, p=2, dim=1)  # [N_cats, 768]
+transaction_emb_norm = normalize(transaction_emb, p=2, dim=1)  # [B, 768]
+similarities = torch.matmul(transaction_emb_norm, category_embs.t())  # [B, N_cats]
+predicted_category = torch.argmax(similarities, dim=1)  # [B]
+```
+## Intended Uses
+- **Transaction categorization**: Classify business transactions into merchant categories
+- **Embedding refinement**: Project raw transaction embeddings to discriminative space
+- **Contrastive learning**: Extract improved embeddings for downstream tasks
+- **Research**: Study trajectory-based learning for sequential decision problems
+## Limitations & Biases
+- **Synthetic data**: Trained on synthetic transaction strings generated from Foursquare Open-Source (FSQ OS) business names and categories using `qwen2.5-4b-instruct` LLM
+- **FSQ OS biases**: Inherits biases from the FSQ OS dataset (e.g., geographic coverage, business type distribution)
+- **Generation artifacts**: LLM-based synthetic data may not reflect real-world transaction diversity
+- **Category coverage**: Limited to categories present in FSQ OS (typically 200-500 merchant types)
+- **Language**: Trained on English transaction strings; may not generalize to other languages
+**Recommendation**: Validate performance on your specific transaction domain before production deployment.
+## Dataset
+- **Source**: Foursquare Open-Source (FSQ OS) business names and categories
+- **Processing**: LLM-based synthetic transaction generation
+- **Size**: ~1M synthetic transaction embeddings
+- **Train/Val split**: 90% / 10%
+See the [dataset](https://huggingface.co/datasets/HighkeyPrxneeth/BusinessTransactions) for more details.
+## Files in This Repository
+- `model.safetensors`: Model weights in HuggingFace SafeTensors format (160MB)
+- `README.md`: This file
+- `LICENSE`: Apache 2.0 license
+## License
+Apache License 2.0. See LICENSE file for details.
+## Citation
+If you use this model, please cite:
+```bibtex
+@software{transactionclassifier2024,
+  title={TransactionClassifier: Embedding-based Transaction Categorization},
+  author={HighkeyPrxneeth},
+  year={2024},
+  url={https://huggingface.co/HighkeyPrxneeth/ModernTrajectoryNet}
+}
+```
+## Contact & Support
+- **Repository**: [GitHub - TransactionClassifier](https://github.com/HighkeyPrxneeth/TransactionClassifier)
+- **Issues**: Open an issue in the main project repository
+- **Author**: HighkeyPrxneeth
+For questions about the model architecture, training, or usage, feel free to reach out!

model.py ADDED Viewed

	@@ -0,0 +1,193 @@

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+class RMSNorm(nn.Module):
+    """
+    Root Mean Square Layer Normalization.
+    More stable and computationally efficient than LayerNorm.
+    Used in LLaMA, PaLM, Gopher.
+    """
+    def __init__(self, dim: int, eps: float = 1e-6):
+        super().__init__()
+        self.eps = eps
+        self.weight = nn.Parameter(torch.ones(dim))
+    def _norm(self, x):
+        return x * torch.rsqrt(x.pow(2).mean(-1, keepdim=True) + self.eps)
+    def forward(self, x):
+        output = self._norm(x.float()).type_as(x)
+        return output * self.weight
+class SwiGLU(nn.Module):
+    """
+    Swish-Gated Linear Unit.
+    SOTA activation function for FFNs (outperforms GELU/ReLU).
+    """
+    def __init__(self, dim: int, hidden_dim: int, dropout: float = 0.0):
+        super().__init__()
+        self.w1 = nn.Linear(dim, hidden_dim, bias=False)
+        self.w2 = nn.Linear(dim, hidden_dim, bias=False)
+        self.w3 = nn.Linear(hidden_dim, dim, bias=False)
+        self.dropout = nn.Dropout(dropout)
+    def forward(self, x):
+        # Gate mechanism: (x * sigmoid(x)) * linear(x)
+        x1 = self.w1(x)
+        x2 = self.w2(x)
+        hidden = F.silu(x1) * x2
+        return self.w3(self.dropout(hidden))
+class SEBlock(nn.Module):
+    """
+    Squeeze-and-Excitation Block.
+    Allows the model to dynamically weight different dimensions of the embedding
+    based on global context.
+    """
+    def __init__(self, dim: int, reduction: int = 4):
+        super().__init__()
+        self.avg_pool = nn.AdaptiveAvgPool1d(1)
+        self.fc = nn.Sequential(
+            nn.Linear(dim, dim // reduction, bias=False),
+            nn.ReLU(inplace=True),
+            nn.Linear(dim // reduction, dim, bias=False),
+            nn.Sigmoid()
+        )
+    def forward(self, x):
+        # Input: [B, D] -> unsqueeze to [B, D, 1] for pool/conv compatibility if needed
+        # But here we are working with vectors, so we simulate it.
+        b, d = x.shape
+        y = self.fc(x) # [B, D]
+        return x * y
+class DropPath(nn.Module):
+    """Stochastic depth regularizer (Improved)."""
+    def __init__(self, drop_prob: float = 0.0):
+        super().__init__()
+        self.drop_prob = drop_prob
+    def forward(self, x):
+        if self.drop_prob == 0.0 or not self.training:
+            return x
+        keep_prob = 1.0 - self.drop_prob
+        shape = (x.shape[0],) + (1,) * (x.ndim - 1)
+        random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)
+        random_tensor.floor_()
+        return x.div(keep_prob) * random_tensor
+class ModernBlock(nn.Module):
+    """
+    A Pre-Norm Block combining RMSNorm, SwiGLU, and Channel Attention.
+    """
+    def __init__(self, dim: int, expand: int = 4, dropout: float = 0.1,
+                 layer_scale_init: float = 1e-6, drop_path: float = 0.0):
+        super().__init__()
+        # 1. Normalization
+        self.norm = RMSNorm(dim)
+        # 2. SOTA Feed Forward (SwiGLU)
+        # SwiGLU usually requires 2/3 hidden dim of standard MLP to match params,
+        # but we keep it high for expressivity.
+        self.ffn = SwiGLU(dim, int(dim * expand * 2 / 3), dropout=dropout)
+        # 3. Channel Attention (Context awareness)
+        self.se = SEBlock(dim, reduction=4)
+        # 4. Regularization
+        self.layer_scale = nn.Parameter(torch.ones(dim) * layer_scale_init) if layer_scale_init > 0 else None
+        self.drop_path = DropPath(drop_path)
+    def forward(self, x):
+        residual = x
+        # Pre-Norm Architecture
+        out = self.norm(x)
+        out = self.ffn(out)
+        out = self.se(out) # Apply attention
+        if self.layer_scale is not None:
+            out = out * self.layer_scale
+        out = self.drop_path(out)
+        return residual + out
+class ModernTrajectoryNet(nn.Module):
+    def __init__(self, config):
+        super().__init__()
+        self.d_model = config.d_model
+        self.n_layers = config.n_layers
+        # Config defaults
+        dropout = getattr(config, "dropout", 0.1)
+        expand = getattr(config, "expand", 4)
+        drop_path_rate = getattr(config, "drop_path_rate", 0.1)
+        # Input Projection (Projects to latent space)
+        self.input_proj = nn.Sequential(
+            RMSNorm(self.d_model),
+            nn.Linear(self.d_model, self.d_model)
+        )
+        # Backbone
+        self.blocks = nn.ModuleList([
+            ModernBlock(
+                dim=self.d_model,
+                expand=expand,
+                dropout=dropout,
+                drop_path=drop_path_rate * (i / (self.n_layers - 1)) # Linear decay
+            ) for i in range(self.n_layers)
+        ])
+        self.final_norm = RMSNorm(self.d_model)
+        # Projector Head (SimCLR / CLIP style)
+        # Important: Keep high dimension for the final linear probe
+        self.head = nn.Sequential(
+            nn.Linear(self.d_model, self.d_model),
+            nn.GELU(),
+            nn.Linear(self.d_model, self.d_model)
+        )
+        self.apply(self._init_weights)
+    def _init_weights(self, m):
+        if isinstance(m, nn.Linear):
+            torch.nn.init.trunc_normal_(m.weight, std=.02)
+            if isinstance(m, nn.Linear) and m.bias is not None:
+                nn.init.constant_(m.bias, 0)
+        elif isinstance(m, nn.LayerNorm):
+            nn.init.constant_(m.bias, 0)
+            nn.init.constant_(m.weight, 1.0)
+    def forward(self, x, return_trajectory=False):
+        # Handle sequence dimension if present
+        if x.dim() == 3:
+            x = x.mean(dim=1)
+        x = self.input_proj(x)
+        trajectory = []
+        for block in self.blocks:
+            x = block(x)
+            trajectory.append(x)
+        x = self.final_norm(x)
+        # Residual connection to original input is implicit via the blocks,
+        # but for trajectory learning, we want the final head to dictate the shift.
+        output = self.head(x)
+        # OPTIONAL: Add Denoising / Residual connection to input
+        # output = output + input_tensor_if_saved
+        if return_trajectory:
+            return output, torch.stack(trajectory, dim=1)
+        return output
+# Backwards compatibility
+HybridMambaAttentionModel = ModernTrajectoryNet

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dd2615d46943643727d117d14950b2437000a0a7346d1a969d15728c5cadd56b
+size 167580472