What is an MCP Server?

An MCP (Model Context Protocol) Server acts as a bridge, allowing AI models to access and interact with external data sources and tools. It standardizes how applications provide context to Large Language Models (LLMs), enabling more informed decision-making and automation.

What are the key components of the MCP Server?

The MCP Server includes n8n (automation platform), Ollama (local LLM platform), Qdrant (vector store), Prometheus (monitoring), Grafana (visualization), Whisper (speech-to-text), Caddy (HTTPS/TLS), Supabase (database & auth), Flowise (AI agent builder), Open WebUI (ChatGPT-like interface), and SearXNG (privacy-focused search engine).

What are the minimum system requirements for running the MCP Server?

You need an Ubuntu VPS (tested on 22.04 LTS), a domain name with DNS access, a minimum of 16GB RAM, 100GB+ storage, and Docker/Docker Compose installed.

How do I install the MCP Server?

Connect to your VPS via SSH, install the required packages, configure the firewall, clone the repository, run the configuration script, and use the `start_services.py` script to start the AI stack.

How do I access the different services after installation?

You can access the services via the provided URLs, such as `https://n8n.kwintes.cloud` for n8n, `https://openwebui.kwintes.cloud` for the Web UI, and so on.

How do I update the MCP Server to the latest version?

Use the provided `update_stack.sh` script to pull the latest Docker images, apply configuration fixes, and restart all services.

What if I encounter Docker Compose issues during installation?

The script automatically detects and uses the correct Docker Compose command format for your system. If you're manually running commands, use `docker compose` or `docker-compose` depending on your setup. If issues persist, install the standalone Docker Compose binary.

How does the MCP Server ensure data privacy and security?

All AI models run locally on your VPS, preventing data from being sent to external services. Automatic HTTPS/TLS encryption, firewall rules, and secure secret management provide additional layers of security.

How can I integrate the MCP Server with the UBOS Platform?

Use the UBOS Platform to orchestrate AI Agents running on your MCP Server, connect them to enterprise data, build custom AI Agents, and create sophisticated Multi-Agent Systems.

What models are installed automatically?

Several models are automatically installed, including language models like `gemma3:12b`, `granite3-guardian:8b`, `llama3.2-vision`, and embedding models like `granite-embedding:278m` and `nomic-embed-text:latest`. These models are downloaded during the initial setup.

How do I monitor the health and performance of my MCP Server?

Access Grafana to view dashboards or Prometheus to directly check the targets and create alerts. These tools provide detailed insights into service health and performance metrics.

Local AI Stack for VPS Deployment

A comprehensive self-hosted AI stack designed for VPS deployment, featuring n8n, Ollama, Qdrant, Prometheus, Grafana, Whisper, and more.

Note: This project is based on work from coleam00/local-ai-packaged and Digitl-Alchemyst/Automation-Stack with customizations and improvements.

Features

n8n - Low-code automation platform with 400+ integrations
Ollama - Local LLM platform
Qdrant - High-performance vector store
Prometheus - Monitoring and alerting toolkit
Grafana - Metrics visualization and analytics
Whisper - Speech-to-text processing
Caddy - Automatic HTTPS/TLS
Supabase - Database and authentication
Flowise - AI agent builder
Open WebUI - ChatGPT-like interface
SearXNG - Privacy-focused search engine

Prerequisites

Ubuntu VPS (tested on Ubuntu 22.04 LTS)
Domain name with DNS access
Minimum 16GB RAM recommended
100GB+ storage recommended
Docker installed (version 20.10.0 or later recommended)
Docker Compose installed:
- Either Docker Compose plugin (docker compose)
- Or standalone Docker Compose binary (docker-compose)

Note: The setup script will automatically detect whether to use docker compose or docker-compose based on what’s available on your system.

Installation

Connect to your VPS via SSH:

ssh root@your-vps-ip

Install required packages:

sudo apt update && sudo apt install -y nano git docker.io python3 python3-pip docker-compose

Configure firewall:

sudo ufw enable
sudo ufw allow 5678  # n8n (using port 5678 to avoid conflict with Supabase)
sudo ufw allow 3001  # Flowise
sudo ufw allow 8080  # Open WebUI
sudo ufw allow 3000  # Grafana
sudo ufw allow 80    # HTTP
sudo ufw allow 443   # HTTPS
sudo ufw allow 8000  # Supabase API (Kong)
sudo ufw allow 11434 # Ollama
sudo ufw allow 6333  # Qdrant
sudo ufw allow 9090  # Prometheus
sudo ufw allow 54321 # Supabase Studio
sudo ufw reload

Clone the repository:

git clone https://github.com/ThijsdeZeeuw/avg-kwintes.git
cd avg-kwintes

Run the configuration script to prepare the environment:

# Make the script executable
chmod +x fix_config.sh

# Run the configuration script
sudo ./fix_config.sh

The configuration script will:

Check and install all necessary system dependencies
Configure the correct firewall rules for all services
Detect and resolve any port conflicts
Generate utility scripts for maintenance
Create a basic .env file if one doesn’t exist

Run the interactive setup to complete configuration:

python3 start_services.py --interactive

Start the services:

python3 start_services.py --profile cpu

This sequence ensures that everything is properly configured before starting the services, avoiding port conflicts and other setup issues.

Utility Scripts

The configuration process creates several helpful utility scripts:

Update Script

To update your Local AI Stack to the latest version:

sudo ./update_stack.sh

This script will pull the latest Docker images, apply necessary configuration fixes, and restart all services.

Backup Script

To create a complete backup of your Local AI Stack data:

sudo ./backup_stack.sh

This script will back up all Docker volumes, configuration files, and secrets to a timestamped archive.

Ollama Models

The following models are automatically installed and available in the system:

Large Language Models (LLMs)

Model	Source	Description
gemma3:12b	Google	A 12B parameter model from Google’s Gemma family, optimized for general text understanding and generation
granite3-guardian:8b	IBM	An 8B parameter model focused on safety and ethical considerations in AI interactions
granite3.1-dense:latest	IBM	Latest version of IBM’s dense transformer model for general language tasks
granite3.1-moe:3b	IBM	A 3B parameter mixture-of-experts model optimized for efficient inference
granite3.2:latest	IBM	Latest version of IBM’s advanced language model with improved capabilities
llama3.2-vision	Meta	A multimodal model capable of understanding both text and images
minicpm-v:8b	OpenBMB	A compact 8B parameter model optimized for efficient deployment
mistral-nemo:12b	Mistral AI	A 12B parameter model based on Mistral’s architecture with enhanced capabilities
qwen2.5:7b-instruct-q4_K_M	Alibaba	A quantized 7B parameter instruction-tuned model optimized for efficiency
reader-lm:latest	OpenBMB	A specialized model for document understanding and question answering

Embedding Models

Model	Source	Description
granite-embedding:278m	IBM	A compact embedding model for efficient text vectorization
jeffh/intfloat-multilingual-e5-large-instruct:f16	Hugging Face	A multilingual embedding model optimized for instruction following
nomic-embed-text:latest	Nomic AI	A general-purpose text embedding model for semantic search and similarity

These models are automatically downloaded during the initial setup process. The system supports both CPU and GPU (NVIDIA/AMD) inference depending on your hardware configuration.

Accessing Services

After installation, you can access the following services:

n8n: https://n8n.kwintes.cloud
Web UI: https://openwebui.kwintes.cloud
Flowise: https://flowise.kwintes.cloud
Supabase: https://supabase.kwintes.cloud
Supabase Studio: http://localhost:54321 or https://studio.supabase.kwintes.cloud
Grafana: https://grafana.kwintes.cloud
Prometheus: https://prometheus.kwintes.cloud
Whisper API: https://whisper.kwintes.cloud
Qdrant API: https://qdrant.kwintes.cloud

Monitoring

The stack includes comprehensive monitoring:

Access Grafana at https://grafana.kwintes.cloud
- Default credentials: admin / (password from secrets.txt)
- Add Prometheus as a data source (URL: http://prometheus:9090)
Access Prometheus at https://prometheus.kwintes.cloud
- View metrics and create alerts

Security Notes

All secrets are saved to secrets.txt - keep this file secure
All services are configured to use HTTPS through Caddy
Firewall rules are configured to allow only necessary ports
Default credentials should be changed after first login

Maintenance

To update the stack:

cd local-ai-packaged
git pull
python3 start_services.py --profile cpu

To restart services:

docker compose -p localai down
python3 start_services.py --profile cpu

Troubleshooting

Docker Compose Issues
If you encounter errors with Docker Compose commands like:
```
unknown shorthand flag: 'p' in -p
```
This indicates incompatibility between the command format and your Docker Compose version.
Solution: The script now automatically detects and uses the correct Docker Compose command format for your system. If you’re manually running commands, use:
- For Docker Compose plugin: docker compose -p localai ...
- For standalone binary: docker-compose -p localai ...
If neither works, install the standalone Docker Compose binary:
```
sudo curl -L "https://github.com/docker/compose/releases/download/v2.24.5/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
```
Check service logs:

docker compose -p localai logs -f [service_name]
# or
docker-compose -p localai logs -f [service_name]

Verify service status:

docker compose -p localai ps
# or
docker-compose -p localai ps

Check monitoring:

Visit Grafana dashboard
Check Prometheus targets
Review service health endpoints

Service Restart:

# Restart all services
docker compose down
docker compose up -d

Support

For issues and feature requests, please open an issue on the GitHub repository.

Created and maintained by Z4Y

Security Features

This setup prioritizes security through multiple layers:

Local Deployment
- All AI models run locally on your VPS
- No data is sent to external AI services
- Complete control over data privacy and security
Secure Infrastructure
- Automatic HTTPS/TLS encryption via Caddy
- Firewall rules limiting access to necessary ports
- Secure secret management with environment variables
- Regular security updates through Docker containers
Access Control
- Supabase authentication for user management
- Role-based access control
- Audit logging for all system activities
- Secure API endpoints with authentication
Data Protection
- Local vector database (Qdrant) for secure document storage
- Encrypted communication between services
- No external API dependencies for core functionality
- Regular backup capabilities

Local AI Capabilities

The system leverages powerful local models for various tasks:

Text Processing

Document summarization and analysis
Multi-language support (via multilingual models)
Question answering and information extraction
Text classification and sentiment analysis

Vision Capabilities

Image analysis and description
Document scanning and text extraction
Visual understanding and reasoning
Accessibility features for visual content

Example Use Cases

Document Analysis

# Example: Analyzing client reports
input_text = "Client report from session..."
model = "qwen2.5:7b-instruct-q4_K_M"
# Process and analyze the report locally

Multi-language Support

# Example: Processing documents in multiple languages
text = "Document in Dutch..."
model = "jeffh/intfloat-multilingual-e5-large-instruct:f16"
# Process multilingual content

Visual Document Processing

# Example: Analyzing scanned documents
image = "scanned_report.jpg"
model = "llama3.2-vision"
# Extract and analyze visual content

GGZ/FBW Client Support

This system is particularly valuable for GGZ (Mental Healthcare) and FBW (Forensic Protected Living) organizations:

Document Generation and Analysis

Client Report Generation
- Automatically generate structured reports from session notes
- Maintain consistent documentation standards
- Support multiple languages for diverse client populations
- Ensure privacy by processing all data locally
Treatment Plan Analysis
- Analyze treatment plans for completeness and consistency
- Identify potential gaps in documentation
- Suggest improvements based on best practices
- Track progress over time
Risk Assessment Support
- Process and analyze risk assessment documents
- Identify patterns and trends in risk factors
- Generate structured risk reports
- Support evidence-based decision making

Client Understanding and Support

Communication Analysis
- Process and analyze client communications
- Identify key themes and concerns
- Support multilingual communication
- Track changes in client status over time
Documentation Quality
- Ensure consistent documentation standards
- Identify missing or incomplete information
- Suggest improvements in documentation
- Support quality assurance processes
Knowledge Management
- Create searchable knowledge bases from client documents
- Support evidence-based practice
- Enable quick access to relevant information
- Maintain privacy and security of sensitive data

Benefits for GGZ/FBW Organizations

Privacy and Compliance
- All processing happens locally
- No external data transmission
- Compliant with healthcare privacy regulations
- Full control over data security
Efficiency Improvements
- Automated document processing
- Reduced administrative burden
- Faster access to relevant information
- Support for evidence-based practice
Quality Enhancement
- Consistent documentation standards
- Improved risk assessment
- Better tracking of client progress
- Enhanced decision support

Port Configuration

To avoid port conflicts between services, we’ve set up consistent port mappings:

sudo ufw enable
sudo ufw allow 5678  # n8n (using port 5678 to avoid conflict with Supabase)
sudo ufw allow 3001  # Flowise
sudo ufw allow 8080  # Open WebUI
sudo ufw allow 3000  # Grafana
sudo ufw allow 80    # HTTP
sudo ufw allow 443   # HTTPS
sudo ufw allow 8000  # Supabase API (Kong)
sudo ufw allow 11434 # Ollama
sudo ufw allow 6333  # Qdrant
sudo ufw allow 9090  # Prometheus
sudo ufw allow 54321 # Supabase Studio
sudo ufw reload

Key points about our port configuration:

n8n uses port 5678 instead of 8000 to avoid conflicts with Supabase
Each service uses consistent internal and external port mappings
Port settings are handled automatically by the setup scripts

Local AI Stack for VPS Deployment

Features

Prerequisites

Installation

Utility Scripts

Update Script

Backup Script

Ollama Models

Large Language Models (LLMs)

Embedding Models

Accessing Services

Monitoring

Security Notes

Maintenance

Troubleshooting

Support

Security Features

Local AI Capabilities

Text Processing

Vision Capabilities

Example Use Cases

GGZ/FBW Client Support

Document Generation and Analysis

Client Understanding and Support

Benefits for GGZ/FBW Organizations

Port Configuration

Local AI Stack for VPS Deployment

Resources

Project Details

Recomended MCP Servers

Featured Templates

AI-Powered Essay Outline Generator

Image Generation with Stable Diffusion

Sarcastic AI Chat Bot

AI Chat Bot: Text, Voice, and Video Magic

Calculate Time Complexity with ChatGPT API

Unified Authorization Template

Start your free trial

Local AI Stack for VPS Deployment

Features

Prerequisites

Installation

Utility Scripts

Update Script

Backup Script

Ollama Models

Large Language Models (LLMs)

Embedding Models

Accessing Services

Monitoring

Security Notes

Maintenance

Troubleshooting

Support

Security Features

Local AI Capabilities

Text Processing

Vision Capabilities

Example Use Cases

GGZ/FBW Client Support

Document Generation and Analysis

Client Understanding and Support

Benefits for GGZ/FBW Organizations

Port Configuration

Local AI Stack for VPS Deployment

Resources

Project Details

Recomended MCP Servers

Featured Templates

AI-Powered Essay Outline Generator

Image Generation with Stable Diffusion

Sarcastic AI Chat Bot

AI Chat Bot: Text, Voice, and Video Magic

Calculate Time Complexity with ChatGPT API

Unified Authorization Template

Start your free trial

Sign In

Register

Reset Password