Technical document
Motion & Pose Datasets for Machine Learning and Robotics
Structured exports from video: formats, quality artifacts, and reproducibility parameters for procurement and R&D teams.
- Publisher
- Quality Vision (qvision.space)
- Version
- 1.0
- Date
- April 2026
Use your browser's print dialog and choose Save as PDF (e.g. Chrome, Edge, Safari).
Abstract
High-quality human motion data is a bottleneck for robotics, animation, sports analytics, and multimodal AI. This document summarizes what Quality Vision supplies to buyers: ML-ready line-delimited JSON, optional hand-centric analytics, packaged quality and configuration metadata, and public sample listings for evaluation. It does not claim specific accuracy figures for third-party models; it describes deliverables and file-level semantics so engineers can integrate datasets into training and validation pipelines.
1. Scope
Exports are built from real-world video through a processing pipeline that produces per-frame keypoints and optional derived features. Pipeline modes include full-body pose (33 landmarks, MediaPipe-compatible naming) and single-hand (21 landmarks) for dexterous and manipulation-style workloads. Locomotion-oriented products (walking, jogging, running) use the body pipeline; hand-focused products use the hand pipeline and may include a dexterous_hand block with finger-angle proxies, grip heuristics, and hand motion summaries where enabled.
2. Deliverables (typical export bundle)
Commercial bundles are designed so that ML and data engineers can audit and ingest data without reverse engineering ad hoc formats. Typical artifacts include:
- data.jsonl — one JSON object per accepted frame; keypoints and optional fields (e.g. layer summaries, segmentation metadata, dexterous hand features).
- manifest.json — job metadata, keypoint model, counts, flags.
- global_stats.json / features.json — aggregates and per-sequence metrics where applicable.
- export_quality_report.json — thresholds, filter report, and formulas used for acceptance.
- runtime_config.json — resolved runtime snapshot including documented augmentation parameters (e.g. Gaussian noise sigma, random scale/translation ranges when enabled) for reproducibility.
- SCHEMA.md, README_dataset.md, optional ONEPAGER.md — human-readable documentation.
- Optional: data.csv, coco_keypoints.json (keypoint schema matches pose vs hand export), viewer.html, integrity hashes.
3. Technical characteristics
- Temporal smoothing: optional Gaussian smoothing on normalized x/y/z coordinates (visibility typically not smoothed).
- Body normalization: for full-body pose, hip-centered, torso-scaled normalized coordinates may be attached when landmarks support it.
- Augmentations: optional horizontal mirror, Gaussian keypoint noise, and random scale/translation in normalized space to simulate reframing and camera distance — with parameters echoed for ML teams.
- Dexterous hand (when enabled): additive per-frame fields such as finger-angle proxies, grip-type heuristic, tip velocities, and hand-centric motion summaries — in addition to base keypoints.
4. Target sectors (illustrative)
Motion data is relevant across autonomy, robotics, ML infrastructure, and interactive entertainment. References to well-known companies on marketing pages are sector examples only and do not imply partnership, endorsement, or past sales unless separately agreed in writing.
5. Public samples & commercial path
Evaluation copies and mirrors are published on open platforms to reduce friction for researchers:
Commercial datasets, bundles, and enterprise terms: qvision.space/dataset-pricing. Licensing: qvision.space/legal/dataset-licensing.