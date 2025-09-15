A study published on arXiv details how researchers at the University of Bonn have developed a reinforcement learning framework that enables robots to manipulate granular media such as sand into target shapes. The system trains a robotic arm with a cubic end-effector and a stereo camera to reshape loose material into forms including rectangles, L-shapes, polygons, and negatives of archaeological fresco fragments. Experiments showed millimeter-level accuracy, with the trained agent outperforming two baseline approaches and transferring successfully from simulation to a physical robot without additional training.

Granular materials pose difficulties for robotics because of their high-dimensional configuration space and unstable dynamics. Rule-based approaches often fail, while particle simulations are computationally expensive. Researchers addressed these challenges by designing compact observation spaces and reward functions that guided learning. Visual policies were trained using Truncated Quantile Critics (TQC), an off-policy reinforcement learning algorithm. Depth images from a ZED 2i stereo camera were converted into height maps, allowing the robot to compare current and goal structures in a form suitable for efficient training.

The robot’s task is to manipulate the granular media with its cubic end-effector to shape it as close as possible to desired goal configurations. Image via University of Bonn.

The system was evaluated against a random policy and a Boustrophedon Coverage Path Planning baseline. Across 400 goal shapes, the learned agent consistently outperformed both methods. Using the delta reward (DELTA) formulation, the robot achieved a mean height difference of 3.4 millimeters compared with 4.8 millimeters for the planning method and 7.2 millimeters for random motion. Execution time was shorter as well, averaging 23.5 steps versus 44 for the path planning baseline. The agent also modified 97 percent of relevant cells in the goal area, compared with 54 percent for random motion. Execution steps were defined as the number of actions until the end-effector left the granular medium for three consecutive steps. Statistical testing confirmed that the DELTA policy significantly outperformed all alternatives.

The project involved the Humanoid Robots Lab, the Autonomous Intelligent Systems Lab, and the Center for Robotics at the University of Bonn, working with the Lamarr Institute for Machine Learning and Artificial Intelligence. Funding came from the European Commission’s RePAIR program under Horizon 2020 and from Germany’s Federal Ministry of Education and Research through the Robotics Institute Germany initiative.

A training process is employed to enable agents to manipulate granular media using sensory inputs. Image via University of Bonn.

Further experiments examined design choices. When the goal-area movement reward was removed, agents avoided manipulation behaviors entirely, performing no better than random baselines. Feature extractor ablations showed that the proposed gating-based encoder achieved the best performance, with an average error of 3.4 millimeters compared with 4.6 millimeters when relying directly on depth images. Algorithm comparisons confirmed that TQC achieved stable convergence, whereas Soft Actor-Critic lagged and Twin Delayed Deep Deterministic Policy Gradient failed to converge. A supplementary site linked in the paper provides additional details, videos, and code.

Deployment on a UR5e robotic arm validated the approach outside simulation. Despite sensor noise and an uneven starting surface, the robot reproduced target shapes such as rectangles with results similar to those seen in simulation. The ability to transfer directly from synthetic training environments to real-world execution demonstrated the robustness of the framework.

From left to right, the reconstructed 3D scene in simulation. Image via University of Bonn.

Research into granular media manipulation spans excavation, grading, and extraterrestrial soil handling. Many approaches depend on computationally demanding finite or discrete element simulations or on imitation learning pipelines tailored to specific tasks. By combining efficient height map representations with carefully designed reward formulations, the Bonn team demonstrated that reinforcement learning can adaptively shape granular media without handcrafted rules.



The authors conclude that their method consistently outperforms traditional baselines and establishes a viable route for adaptive robotic manipulation of deformable materials.

