✨ From vibe coding to vibe deployment. UBOS MCP turns ideas into infra with one message.

Learn more

Dingo: Revolutionizing Data Quality Evaluation for MCP Servers with UBOS

In the burgeoning landscape of AI and Machine Learning, data quality stands as the bedrock upon which successful models are built. Recognizing this critical need, UBOS proudly presents Dingo, a comprehensive data quality evaluation tool designed to automatically detect data quality issues across diverse datasets, especially within the context of MCP (Model Context Protocol) Servers. Dingo isn’t just another tool; it’s a paradigm shift in how data is assessed and refined, ensuring that your AI initiatives are fueled by the highest quality data possible.

Dingo offers a versatile suite of built-in rules, model evaluation methods, and support for custom evaluation approaches. This adaptability makes it ideal for a wide spectrum of datasets, including those used for pre-training, fine-tuning, and general evaluation. Whether you’re working with text, multimodal datasets, or integrating with platforms like OpenCompass, Dingo streamlines the data quality assurance process.

The UBOS Advantage: Seamless Integration for Enhanced AI Agent Development

UBOS, a full-stack AI Agent Development Platform, focuses on bringing AI Agents to every business department. Our platform helps you orchestrate AI Agents, connect them with your enterprise data, build custom AI Agents with your LLM model and Multi-Agent Systems. Dingo, integrated within the UBOS ecosystem, amplifies these capabilities by ensuring the data fed into your AI Agents is pristine and reliable.

Key Features that Set Dingo Apart:

  • Automated Data Quality Detection: Dingo automates the often-laborious process of identifying data quality issues. This automation saves valuable time and resources, allowing data scientists and engineers to focus on model development and innovation.
  • Versatile Evaluation Methods: From rule-based evaluations to sophisticated LLM-driven assessments, Dingo provides a multifaceted approach to data quality. This ensures that all potential issues are identified, regardless of their nature.
  • Customizable and Extensible: Dingo’s architecture supports custom rules and models, allowing you to tailor the evaluation process to your specific needs. This extensibility makes Dingo a future-proof solution that can adapt to evolving data landscapes.
  • Multi-Modal Support: Dingo supports both text and image data, ensuring comprehensive data quality across different modalities. This is crucial for modern AI applications that often rely on a combination of data types.
  • Seamless Integration with MCP Servers: Dingo includes an experimental Model Context Protocol (MCP) server, facilitating seamless interaction with external data sources and tools, crucial for AI model development. The provided video demonstration walks users through the process of using Dingo MCP server with Cursor.

Use Cases: Transforming Data Quality Across Industries

Dingo’s impact extends across numerous industries and applications, providing tangible benefits to organizations seeking to leverage the power of AI.

  • Enhanced LLM Training: By evaluating and refining datasets used for training Large Language Models (LLMs), Dingo improves the accuracy, reliability, and overall performance of these models.
  • Improved Data-Driven Decision-Making: High-quality data is essential for making informed business decisions. Dingo ensures that the data used for analysis is accurate, complete, and relevant.
  • Streamlined Data Migration: When migrating data between systems, Dingo helps identify and correct data quality issues that could lead to errors or inconsistencies.
  • Robust AI Agent Development: By providing high-quality data, Dingo enhances the capabilities of AI Agents, enabling them to perform tasks more effectively and efficiently.
  • Efficient Data Governance: Dingo helps organizations establish and maintain data governance policies by providing a tool for monitoring and enforcing data quality standards.

Diving Deep into Key Features:

  1. Multi-Source & Multi-Modal Support:

    • Data Sources: Dingo seamlessly integrates with various data sources, including local files, Hugging Face datasets, and S3 storage. This flexibility allows you to evaluate data regardless of its location.
    • Data Types: Whether you’re working with pre-training, fine-tuning, or evaluation datasets, Dingo provides tailored evaluation methods to suit your specific needs.
    • Data Modalities: Dingo supports both text and image data, ensuring comprehensive data quality across different modalities.
  2. Rule-based & Model-based Evaluation:

    • Built-in Rules: Dingo includes over 20 general heuristic evaluation rules, covering a wide range of data quality issues.
    • LLM Integration: Dingo integrates with popular LLMs like OpenAI, Kimi, and local models such as Llama3, enabling advanced data quality assessments.
    • Custom Rules: Easily extend Dingo with your own rules and models to address specific data quality challenges.
    • Security Evaluation: Dingo integrates with the Perspective API for security evaluations, identifying potentially harmful or inappropriate content.
  3. Flexible Usage:

    • Interfaces: Dingo offers both CLI and SDK options, providing flexibility for different usage scenarios.
    • Integration: Dingo can be easily integrated with other platforms, streamlining your data quality workflow.
    • Execution Engines: Dingo supports both local and Spark execution engines, allowing you to choose the best option for your infrastructure.
  4. Comprehensive Reporting:

    • Quality Metrics: Dingo provides 7-dimensional quality assessments, covering completeness, effectiveness, fluency, relevance, security, similarity, and understandability.
    • Traceability: Detailed reports provide traceability, allowing you to track down the root cause of data quality issues.

Understanding Dingo’s Data Quality Metrics:

Dingo categorizes data quality issues into seven critical dimensions, each evaluated through rule-based methods and LLM-based prompts:

  • Completeness: Ensures data is not missing critical components, such as evaluating if text abruptly ends with a colon or ellipsis.
  • Effectiveness: Verifies if data is meaningful and properly formatted, detecting garbled text or content lacking proper punctuation.
  • Fluency: Checks grammatical correctness and natural readability, identifying excessively long words or chaotic reading order.
  • Relevance: Detects irrelevant content, like citation details or HTML tags, ensuring data focuses on pertinent information.
  • Security: Identifies sensitive information, such as personal details or content related to gambling, pornography, or political issues.
  • Similarity: Detects repetitive content, evaluating text for consecutive repetitions or multiple occurrences of special characters.
  • Understandability: Assesses how easily data can be interpreted, ensuring correct formatting for LaTeX formulas and Markdown.

Getting Started with Dingo:

  1. Installation:

    bash pip install dingo-python

  2. Basic Usage:

    • Evaluate LLM chat data:

      python from dingo.config.config import DynamicLLMConfig from dingo.io.input.Data import Data from dingo.model.llm.llm_text_quality_model_base import LLMTextQualityModelBase from dingo.model.rule.rule_common import RuleEnterAndSpace

data = Data( data_id=‘123’, prompt=“hello, introduce the world”, content=“Hello! The world is a vast and diverse place, full of wonders, cultures, and incredible natural beauty.” )

def llm(): LLMTextQualityModelBase.dynamic_config = DynamicLLMConfig( key=‘YOUR_API_KEY’, api_url=‘https://api.openai.com/v1/chat/completions’, model=‘gpt-4o’, ) res = LLMTextQualityModelBase.eval(data) print(res)

def rule(): res = RuleEnterAndSpace().eval(data) print(res)

*   **Evaluate Dataset:**

    python

from dingo.io import InputArgs from dingo.exec import Executor

Evaluate a dataset from Hugging Face

input_data = { “eval_group”: “sft”, # Rule set for SFT data “input_path”: “tatsu-lab/alpaca”, # Dataset from Hugging Face “data_format”: “plaintext”, # Format: plaintext “save_data”: True # Save evaluation results }

input_args = InputArgs(**input_data) executor = Executor.exec_map"local" result = executor.execute() print(result)

  1. GUI Visualization:

    After evaluation (with save_data=True), a frontend page will be automatically generated. To manually start the frontend:

    bash python -m dingo.run.vsl --input output_directory

    Where output_directory contains the evaluation results with a summary.json file.

Dingo: A Commitment to Data Quality

Dingo represents UBOS’s unwavering commitment to data quality as a cornerstone of successful AI initiatives. By providing a comprehensive, automated, and customizable solution for data quality evaluation, Dingo empowers organizations to unlock the full potential of their data. Whether you’re training LLMs, developing AI Agents, or making critical business decisions, Dingo ensures that your data is always of the highest quality. Integrate Dingo with UBOS today and experience the transformative power of pristine data.

Featured Templates

View More
AI Agents
AI Video Generator
252 2007 5.0
AI Assistants
AI Chatbot Starter Kit v0.1
140 913
Customer service
Service ERP
126 1188
AI Engineering
Python Bug Fixer
119 1433

Start your free trial

Build your solution today. No credit card required.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.