Capsule Network
EN
EN
  • Overview
  • 🧑‍🎓Network 101
    • Why Does AI Need Your Experience Data
    • How Does AI Utilize Experience Data
    • Use Cases of Experience Data in Various Industries
    • Our Unique Killer Use Cases
  • 🤩What Makes Network Unique
    • Autonomous Data Governance To Ensure Fair Rewards
    • ADG-Centric Network Architecture
      • Data Acquisition Layer
      • Data Processing Layer
      • Data Retrieval Layer
      • Data Transformation Layer
      • Value Creation Layer
      • Autonomous Data Governance Layer
    • AGD-Aligned Roles
    • One Wallet One Entity
  • Token Economics
  • HOW TO PARTICIPATE
    • Data Producer
    • Data Mapping and Schema Developer
    • Data Validator
    • Data Annotator
    • Data Consumer
Powered by GitBook
On this page
  • Data Veracity Verification
  • Data Mapping & Schema Management
  • Data Desensitization and Preprocessing
  1. What Makes Network Unique
  2. ADG-Centric Network Architecture

Data Processing Layer

Raw data cannot be immediately used for data analysis. Our Data Processing Layer will process the raw data to remove noises, verify the veracity of the data (to prevent people from populating fake data) and restructure the data via data schemas.

Data Veracity Verification

Due to the incentives present in the network, it is easy to foresee that bad behaviors such as populating fake data will be imminent. We will employ a hybrid approach to verify veracity of the data, since there are no single methods that are fail-proof. This includes but is not limited to the following:

  • AI / ML algorithms for anomaly detection (such as Autoencoders, Isolation Forests, Z-Score, etc. )

  • Assertion checks enforced by data schemas (rejecting data entries that do not meet certain criteria)

  • Data Provenance Checks

  • Data Consistency Checks

We will not disclose the details of our approach in order to prevent it from being reverse engineered and hence exploited.

Data Mapping & Schema Management

The volume of information is ever expanding but not all information is valuable. Since the storage and processing capabilities of the network can never be unlimited, it is important to identify data fields that are meaningful (discard the rest)and include their definition (i.e what does the data field mean, since raw data fields may be written in shortforms, abbreviated, etc.). This is the role of data mapping and data schemas. Within the network, we refer to the combined output of data mapping and data schemas as data templates.

Just as there are infinite ways of interpreting the same piece of data, they are infinite ways of defining data schemas. Hence, we have decided to decentralize this module, meaning that anyone can participate in defining data schemas and be rewarded should the data schema defined is valid and meaningful.

Data Mapping

  1. Source File Name (Absolute or Conditional)

  2. Source Field Name (Raw Field Name in Source File)

  3. Target Field Name (Optional)

Data Schema

By default, a single file will correspond to a single table / graph.

  1. Type of Database (Relational / Graph / Hybrid)

  2. Entity Information

  3. Provenance Information

  4. Table Names

  5. Fields

    1. Raw Name (Tied to the Source Field Name of the Data Mapping File)

    2. Canonical Name (The standardized or human-readable name of the data field, optional)

    3. Data Type

    4. Definition / Description

    5. Field Format (Optional)

    6. Assertions (Optional)

    7. Data Sensitivity / Privacy Level (Optional, used for removing Personally Identifiable Information)

    8. IsNullable

    9. IsKey

For more information, please see the Roles - Data Schema Developer page to see how you can participate in this meaningful endeavor!

Data Desensitization and Preprocessing

After verifying the veracity of the data and the data fields to be ingested, the data will undergo desensitization and preprocessing so that privacy of the data contributor is guaranteed and the dataset will be suitable for data analytics. This includes the following but is not limited to:

  • Desensitization (Removing Personal Identifiable Information (PII) )

  • Handling Missing Data (FillNA, FillZero, Mean, etc.)

  • Data Standardizing (For Structured Data Fields)

  • Structured Data Conversion

  • Type Conversion

  • One-Hot Encoding

PreviousData Acquisition LayerNextData Retrieval Layer

Last updated 7 months ago

🤩