# MCC Monitoring Stack

Complete monitoring solution for Microsoft Connected Cache (MCC) with Grafana, Loki, Prometheus, InfluxDB, and Telegraf.

![License](https://img.shields.io/badge/license-MIT-blue.svg)
![Docker](https://img.shields.io/badge/docker-required-blue.svg)

## 📊 Features

- **Zero Configuration**: Auto-detects network interface, MCC container, log paths
- **Real-time Cache Monitoring**: Track cache hits, misses, stale, and updating status
- **BGP Monitoring**: Monitor BGP status, routes learned, and prefix lists
- **Network Metrics**: Bandwidth usage, TCP connections, packet statistics
- **System Performance**: CPU, RAM, disk usage monitoring
- **Log Analysis**: Real-time log parsing and visualization with Loki
- **Beautiful Dashboards**: Pre-configured Grafana dashboard with all metrics
- **Auto Configuration**: Automatic MCC log format detection (semicolon/JSON)
- **Docker Compose Auto-Install**: Installs Docker Compose if not present

## 🏗️ Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                         Grafana                             │
│                    (Port 3000)                              │
│                   Visualization Layer                        │
└─────────────────┬───────────────┬───────────────────────────┘
                  │               │
        ┌─────────▼────────┐  ┌──▼──────────┐
        │   Loki (3100)    │  │ InfluxDB    │
        │   Log Storage    │  │ (8086)      │
        └─────────▲────────┘  └──▲──────────┘
                  │               │
        ┌─────────┴────────┐  ┌──┴──────────┐
        │    Promtail      │  │  Telegraf   │
        │  Log Collector   │  │  Metrics    │
        └─────────▲────────┘  └──▲──────────┘
                  │               │
                  └───────┬───────┘
                          │
                    ┌─────▼─────┐
                    │    MCC    │
                    │  Server   │
                    └───────────┘
```

## 📋 Prerequisites

- Docker (version 20.10+)
- Docker Compose (version 2.0+ or docker-compose 1.29+)
- Minimum 2GB RAM
- 10GB free disk space
- Root/sudo access
- MCC container running (for full functionality)

## 🚀 Quick Start

### 1. Clone the Repository

```bash
git clone https://github.com/saambd/mcc-graffana.git
cd mcc-monitoring-stack
```

### 2. Run the Setup Script

```bash
# Standard setup
sudo ./setup.sh

# OR Fresh install (removes existing data)
sudo ./setup.sh --clean
```

The setup script will automatically:
- ✅ Install Docker Compose if not present
- ✅ Auto-detect network interface
- ✅ Auto-detect MCC container and log path
- ✅ Generate dynamic UIDs for datasources
- ✅ Configure Grafana datasources
- ✅ Update dashboard with correct UIDs and interface
- ✅ Auto-detect MCC log format (semicolon/JSON)
- ✅ Configure Promtail for your log format
- ✅ Enable MCC nginx logging if disabled
- ✅ Start all monitoring services
- ✅ Perform health checks

### 3. Access Grafana

Open your browser and go to:
```
http://localhost:3000
```

**Default credentials:**
- Username: `admin`
- Password: `admin`

The MCC Dashboard will be automatically loaded.

## ⚙️ Configuration

### Environment Variables

The setup script **auto-detects** most settings, but you can override them:

```bash
# Override network interface (auto-detected by default)
NETWORK_INTERFACE=eth0 sudo ./setup.sh

# Override MCC container name (auto-detected by default)
MCC_CONTAINER_NAME=my-mcc-container sudo ./setup.sh

# Override log path
MCC_LOG_PATH=/path/to/mcc/logs sudo ./setup.sh

# All options together
MCC_CONTAINER_NAME=MCC \
MCC_LOG_PATH=/media/data/node1/logs \
NETWORK_INTERFACE=ens160 \
CACHE_DISK_PATH=/media/data \
sudo ./setup.sh --clean
```

### Auto-Detection Features

The setup script automatically detects:

| Feature | Detection Method |
|---------|-----------------|
| **Network Interface** | Primary interface from default route |
| **MCC Container** | Container with 'mcc' or 'cache' in name |
| **Log Path** | Common MCC log directories |
| **Cache Disk** | Common mount points (/media/data, /mnt/data) |
| **Log Format** | Semicolon-separated or JSON format |

### Default Configuration

| Variable | Default Value | Description |
|----------|---------------|-------------|
| `MCC_CONTAINER_NAME` | Auto-detected or `MCC` | Name of MCC Docker container |
| `MCC_LOG_PATH` | Auto-detected or `/media/data/node1/logs` | Path to MCC access logs |
| `NETWORK_INTERFACE` | Auto-detected (e.g., `ens160`, `eth0`) | Network interface for monitoring |
| `CACHE_DISK_PATH` | Auto-detected or `/media/data` | MCC cache disk path |
| `GRAFANA_ADMIN_PASSWORD` | `admin` | Grafana admin password |
| `INFLUXDB_TOKEN` | `mcc-monitoring-token-2025` | InfluxDB API token |

## 📁 Directory Structure

```
mcc-monitoring-stack/
├── docker-compose.yml          # Main compose file
├── setup.sh                    # Automated setup script
├── diagnose.sh                 # Diagnostic script
├── README.md                   # This file
│
├── grafana/
│   ├── provisioning/
│   │   ├── dashboards/
│   │   │   └── dashboards.yml  # Dashboard provisioning
│   │   └── datasources/
│   │       ├── influxdb.yml    # InfluxDB datasource
│   │       └── loki.yml        # Loki datasource
│   └── dashboards/
│       └── mcc_dashboard.json  # Main MCC dashboard
│
├── loki/
│   └── config.yml              # Loki configuration
│
├── promtail/
│   └── config.yml              # Promtail configuration
│
├── prometheus/
│   └── prometheus.yml          # Prometheus configuration
│
└── telegraf/
    ├── telegraf.conf           # Telegraf configuration
    └── scripts/
        ├── bgp_status.sh       # BGP status collector
        └── bgp_prefixes.sh     # BGP prefix collector
```

## 🔍 Dashboard Panels

### Cache Metrics
- **Cache Status Distribution**: Pie chart showing HIT/MISS/STALE/UPDATING percentages
- **Current Activities**: Real-time table with color-coded cache status
  - 🟢 Green = HIT
  - 🔴 Red = MISS
  - 🟡 Yellow = STALE
  - 🟠 Orange = UPDATING
- **Cache Served**: Log count of cache operations
- **Cache Domain**: Top domains being cached
- **Cache Bandwidth Saved**: Bandwidth saved by cache hits

### Storage
- **Cache Disk Usage**: Donut chart showing used vs free space
- **Cache Disk Usage Over Time**: Historical disk usage
- **Total Size / Used / Free**: Current disk statistics

### BGP Monitoring
- **BGP Status**: Current BGP state (UP/DOWN)
- **BGP Routes Learned**: Total number of routes
- **BGP Routes Over Time**: Historical route count
- **BGP Learned Prefixes**: Table listing all BGP prefixes

### Network
- **Network Bandwidth**: Upload/download speeds
- **TCP Connections**: Active TCP connections
- **Network Packets**: Packets in/out statistics

### System Performance
- **CPU Usage**: Real-time CPU utilization
- **RAM Usage**: Memory utilization
- **System Uptime**: System uptime display

## 🐛 Troubleshooting

### Run Diagnostic Script

```bash
sudo ./diagnose.sh
```

This will check:
- Container status
- Service health
- MCC log availability
- Loki data ingestion
- InfluxDB metrics
- Network connectivity

### Common Issues

#### No Data in Grafana

1. **Check if all containers are running:**
   ```bash
   docker-compose ps
   ```

2. **Check Telegraf logs:**
   ```bash
   docker-compose logs telegraf
   ```

3. **Check Promtail logs:**
   ```bash
   docker-compose logs promtail
   ```

4. **Verify MCC logs exist:**
   ```bash
   ls -la /media/data/node1/logs/access.log
   tail -5 /media/data/node1/logs/access.log
   ```

#### BGP Data Not Showing

1. **Check if MCC container is accessible:**
   ```bash
   docker exec MCC birdc show protocols
   ```

2. **Verify BIRD is running:**
   ```bash
   docker exec MCC ps aux | grep bird
   ```

#### Grafana Won't Start

1. **Check for datasource conflicts:**
   ```bash
   ls -la grafana/provisioning/datasources/
   ```
   
2. **Clean restart:**
   ```bash
   sudo ./setup.sh --clean
   ```

### View Service Logs

```bash
# All services
docker-compose logs -f

# Specific service
docker-compose logs -f grafana
docker-compose logs -f promtail
docker-compose logs -f telegraf
docker-compose logs -f loki
docker-compose logs -f influxdb
```

## 🔄 Updating

To update the stack:

```bash
# Pull latest changes
git pull

# Restart with updated config
sudo ./setup.sh
```

To update with fresh data:
```bash
sudo ./setup.sh --clean
```

## 🗑️ Uninstalling

To completely remove the stack:

```bash
# Stop and remove containers, networks, volumes
docker-compose down -v

# Remove data directories (optional)
sudo rm -rf loki/data grafana/data influxdb/data prometheus/data
```

## 📊 Sample Queries

### InfluxDB Flux Queries

**Disk Usage:**
```flux
from(bucket: "metrics")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "disk")
  |> filter(fn: (r) => r["path"] == "/media/data")
  |> filter(fn: (r) => r["_field"] == "used")
```

**BGP Routes:**
```flux
from(bucket: "metrics")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "bgp_status")
  |> filter(fn: (r) => r["_field"] == "learned")
```

### LogQL Queries (Loki)

**Cache Status Distribution:**
```logql
sum by (cache_status) (count_over_time({job="mcc"} | json [$__interval]))
```

**Cache Bandwidth Saved:**
```logql
sum(sum_over_time({job="mcc"} | json | cache_status = "HIT" | unwrap body_bytes_sent [$__interval]))
```

## 🔐 Security Recommendations

1. **Change default Grafana password** immediately after first login
2. **Use strong InfluxDB token** in production
3. **Enable HTTPS** for Grafana (use reverse proxy like nginx)
4. **Restrict network access** using firewall rules
5. **Regular backups** of Grafana dashboards and InfluxDB data

## 📈 Performance Tuning

### For High-Traffic MCC Servers

1. **Increase InfluxDB resources in docker-compose.yml:**
   ```yaml
   influxdb:
     deploy:
       resources:
         limits:
           memory: 4G
   ```

2. **Increase Loki retention in loki/config.yml:**
   ```yaml
   limits_config:
     retention_period: 720h  # 30 days
   ```

3. **Adjust Telegraf collection intervals in telegraf/telegraf.conf:**
   ```toml
   [agent]
     interval = "30s"  # Increase if needed
   ```

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## 📝 License

This project is open source and available under the MIT License.

## 💬 Support

For issues and questions:
1. Check the Troubleshooting section
2. Run `sudo ./diagnose.sh`
3. Review docker-compose logs
4. Open an issue on GitHub

## 📚 Additional Resources

- [Grafana Documentation](https://grafana.com/docs/)
- [Loki Documentation](https://grafana.com/docs/loki/)
- [InfluxDB Documentation](https://docs.influxdata.com/)
- [Telegraf Documentation](https://docs.influxdata.com/telegraf/)
- [Microsoft Connected Cache Documentation](https://docs.microsoft.com/en-us/windows/deployment/do/mcc-isp)

---

**Made with ❤️ for MCC Monitoring**
