Data annotation

Multi-Finger Dexterous Hand Datasets: Annotating In-Hand Object Rotation, Pivoting, and Re-Grasping

Modern robotics is rapidly moving towards the creation of systems capable of performing complex manipulation tasks similar to those of the human hand. One of the key challenges in this field is the development and training of multi-fingered robotic hands that can not only grasp objects but also perform precise intra-manipulation actions, such as rotating, pivoting, and re-grasping them.

Despite significant progress in deep and reinforcement learning, the effectiveness of such systems depends heavily on the quality and diversity of the training data. That is why there is a need for specialized datasets that annotate in detail the movements of a multi-phalangeal hand during complex manipulations with objects. In particular, the tasks of in-hand object rotation, pivoting, and re-grasping are critically important for the development of autonomous robots capable of operating in real, unpredictable conditions.

Overview of existing datasets for multi-finger manipulation

The development of multi-finger robotic hands relies heavily on the availability of high-quality datasets that allow training and evaluation of complex manipulation control algorithms. Existing datasets in this area can be roughly divided into two large groups: datasets for grasping objects and datasets for complex in-hand manipulation.

The first group consists of datasets focused on stable object grasping tasks. They usually contain information about finger configurations at the moment of contact with the object, the poses and orientations of the object, and sometimes force parameters or contact points. Examples include datasets created from simulation environments or RGB-D observations, where the main emphasis is on the success of the grasp rather than on the subsequent manipulation of the object in the hand.

The second, more complex group consists of datasets for in-hand manipulation. They include scenarios in which the object is already in the robot's hand and undergoes further transformations. However, most of these datasets are limited in terms of action coverage and focus only on individual aspects, such as simple object rotation or grip stabilization. Full-fledged scenarios that include sequential phases of rotation, pivoting, and re-grasping are presented much less frequently.

In addition, a significant portion of existing datasets is based on simulation environments, which creates a problem with transferring knowledge to real-world conditions (the sim-to-real gap). Even when using real robotic systems, the lack of detailed annotation of finger-object contact limits the ability to train models that can understand the subtle dynamics of manipulation.

In many datasets, there is no clear separation of manipulation phases, which complicates training models that must predict not only the final state of the object but also the sequence of actions leading to it.

Proposed dataset description

Dataset Component	Description	Example / Content
Manipulation Types	Core in-hand interaction scenarios	Rotation, Pivoting, Re-grasping
Objects	Set of objects with varying geometry, mass, and size	Cube, cylinder, tool-like objects, irregular shapes
Kinematic Data	Time-series joint states of the robotic fingers	Joint angles, finger trajectories
Object Pose	6DoF object state at each time step	(x, y, z, roll, pitch, yaw)
Contact Information	Data describing finger–object interactions	Contact maps, contact forces (if available)
Temporal Annotation	Segmentation of manipulation into phases	Grasp, Manipulation, Adjustment, Re-grasp
Sensor Data	Visual and tactile observations	RGB / RGB-D images, tactile sensor readings
Sequence Structure	Representation format of data over time	Frame-based time-series sequences
Annotation Level	Degree of labeling detail	Phase-level, contact-level, geometric annotation
Data Collection Setup	Environment where data is generated	Real robot / simulation / hybrid
Sampling Rate	Frequency of data recording	30-240 Hz depending on sensors
Intended Applications	Target learning and control tasks	Imitation learning, reinforcement learning, policy learni

Annotation methodology

Phase segmentation - each manipulation trajectory is divided into main stages: grasp, in-hand manipulation, position correction, and re-capture. The boundaries between phases are determined by kinematic characteristics (finger joint velocity and acceleration) and changes in the object's stability in the hand.
Contact annotation - interactions between fingers and the object are recorded using tactile sensors or geometric analysis in simulation. For each finger, the contact state is determined, and, if possible, additional parameters such as contact duration and interaction force are added.
Object motion marking - the position of the object is described in space through 6DoF parameters (three coordinates and three orientation angles). Additionally, typical movements, including rotation and rolling, are highlighted relative to the hand coordinate system.
Automatic pre-marking - initial annotations are generated algorithmically based on sensor data and rules based on kinematic and contact thresholds. This reduces the amount of manual work.
Manual refinement - complex or ambiguous fragments are checked and corrected by experts. Special attention is paid to the transition points between phases and non-standard manipulations, such as re-grasping.
Temporal consistency check - a logical sequence of phases in time is maintained to avoid physically impossible transitions and to ensure continuity of contact events.
Physical correctness check - compliance of the data with physical laws is controlled, in particular, the absence of penetration of the object through the fingers and the consistency between the contacts and the object's movement.
Data quality control - cross-checking of annotations between different markups and consistency analysis is performed, after which only validated trajectories are included in the final dataset.

Potential applications of the dataset

Application	Description
Imitation Learning	The dataset can be used to train policies that replicate demonstrated dexterous manipulation behaviors, including object rotation, pivoting, and re-grasping.
Reinforcement Learning	The data supports initializing or fine-tuning agents that learn optimal in-hand manipulation strategies through interaction with an environment.
Robotic Hand Control Policies	Enables development of real-time control algorithms capable of stable multi-finger manipulation in complex scenarios.
Sim-to-Real Transfer	Helps reduce the gap between simulation and real-world robotics by training models on realistic trajectories and contact interactions.
Contact Dynamics Modeling	Improves predictive models of finger–object interactions in high-friction and contact-rich manipulation tasks.
Manipulation Planning	Supports systems that predict sequences of actions required to achieve stable in-hand object control.
Synthetic Trajectory Generation	Serves as a basis for generating new, physically plausible manipulation scenarios to expand training datasets.
Benchmarking Algorithms	Provides a standardized dataset for evaluating and comparing different dexterous manipulation methods.

FAQ

What is a dexterous hand dataset, and why is it important?

A dexterous hand dataset is a structured collection of data capturing multi-finger robotic manipulation of objects. It is important because it enables learning-based systems to understand complex in-hand behaviors such as rotation, pivoting, and re-grasping, which are essential for human-like robotic manipulation.

What are the main challenges in building a dexterous hand dataset?

The main challenges include capturing high-dimensional finger motions, accurately recording unstable contacts, and ensuring precise temporal alignment between actions and object states. Additionally, collecting consistent real-world data is difficult due to sensor noise and hardware limitations.

How is in-hand manipulation labeling performed in such datasets?

In-hand manipulation labeling involves segmenting continuous motion into meaningful phases, such as grasping, manipulation, and re-grasping. This labeling relies on kinematic signals, contact information, and, when needed, manual expert correction to ensure semantic correctness.

What is the role of finger contact map annotation?

Finger contact map annotation encodes which parts of the robotic fingers are in contact with the object at each timestep. This is crucial for understanding how forces are distributed and how grip stability evolves during manipulation.

How is object pose tracking used in dexterous manipulation datasets?

Object pose tracking provides the 6DoF position and orientation of the manipulated object over time. It allows researchers to link finger actions with resulting object motion, which is essential for learning control policies.

What is regrasp event data, and why is it important?

Regrasp event data captures moments when the robot changes its grip configuration during manipulation. These events are critical because they represent transitions in strategy needed to maintain control over complex or unstable objects.

How is contact information utilized in dexterous manipulation learning?

Contact information, often represented as a finger contact map annotation, helps models understand where and how forces are applied. This improves the ability to predict stable grasps and avoid object slippage.

What is the significance of fingertip slip detection?

Fingertip slip detection identifies when an object begins to move unintentionally relative to the fingers. This signal is important for reactive control systems that must adjust grip force or configuration to prevent object drop.

How does in-hand manipulation labeling improve learning algorithms?

Accurate in-hand manipulation labeling provides structured supervision for learning temporal dependencies in actions. It allows models to distinguish between stable manipulation phases and transitions, such as regrasping or adjustment.

How do these datasets support robotic learning in general?

Dexterous datasets combining object pose tracking, finger contact map annotation, and regrasp event data enable robust learning of manipulation policies. They bridge the gap between perception and control, allowing robots to perform more human-like dexterous tasks in real environments.

Multi-Finger Dexterous Hand Datasets: Annotating In-Hand Object Rotation, Pivoting, and Re-Grasping

Overview of existing datasets for multi-finger manipulation

Proposed dataset description

Annotation methodology

Potential applications of the dataset

FAQ

What is a dexterous hand dataset, and why is it important?

What are the main challenges in building a dexterous hand dataset?

How is in-hand manipulation labeling performed in such datasets?

What is the role of finger contact map annotation?

How is object pose tracking used in dexterous manipulation datasets?

What is regrasp event data, and why is it important?

How is contact information utilized in dexterous manipulation learning?

What is the significance of fingertip slip detection?

How does in-hand manipulation labeling improve learning algorithms?

How do these datasets support robotic learning in general?

Read next

Proprioceptive Data Annotation for Robot Learning

Annotation of long-horizon task decomposition

Transparent and Reflective Object Annotation: Solving the Hardest Perception Problem in Robotics

Comments ()

Overview of existing datasets for multi-finger manipulation

Proposed dataset description

Annotation methodology

Potential applications of the dataset

FAQ

What is a dexterous hand dataset, and why is it important?

What are the main challenges in building a dexterous hand dataset?

How is in-hand manipulation labeling performed in such datasets?

What is the role of finger contact map annotation?

How is object pose tracking used in dexterous manipulation datasets?

What is regrasp event data, and why is it important?

How is contact information utilized in dexterous manipulation learning?

What is the significance of fingertip slip detection?

How does in-hand manipulation labeling improve learning algorithms?

How do these datasets support robotic learning in general?

Read next

Comments ( )

Comments ()