Logo HandCraft

Anatomically Correct Restoration of Malformed Hands in Diffusion Generated Images

1Seeing Machines, 2Australian National University
*Equal Contribution
Corresponding authors: dylan.campbell@anu.edu.au, kf.zy.qin@gmail.com
Formerly with Seeing Machines
WACV 2025

Images generated by Stable Diffusion often exhibit anatomically incorrect hands (a), for example, a missing finger (top) or abnormal relative finger lengths (bottom). Our method---HandCraft---is able to correct the hands in a controllable manner, allowing for a variety of output gestures while following the style of the original image (b--d). The resulting images feature naturally-posed hands, improving the quality of the AI-generated portraits and restoring the illusion of reality.

Abstract

Generative text-to-image models, such as Stable Diffusion, have demonstrated a remarkable ability to generate diverse, high-quality images. However, they are surprisingly inept when it comes to rendering human hands, which are often anatomically incorrect or reside in the "uncanny valley". This paper proposes a method—HandCraft—for restoring such malformed hands. This is achieved by automatically constructing masks and depth images for hands as conditioning signals using a parametric model, allowing a diffusion-based image editor to fix the hand’s anatomy and adjust its pose while seamlessly integrating the changes into the original image, preserving pose, color, and style. Our plug-and-play hand restoration solution is compatible with existing diffusion models, and the restoration process facilitates adoption by eschewing any fine-tuning or training requirements. We also contribute MalHand datasets that contain generated images with a wide variety of malformed hands in several styles for training and benchmarking, and demonstrate through qualitative and quantitative evaluation that HandCraft not only restores anatomical correctness but also maintains the integrity of the overall image.

Method

HandCraft flowchart. The framework has three stages for correcting malformed hands in images. (1) Hand detection. A hand detector is employed to detected the bounding box of the hand and a body pose estimator is used to predict the landmarks on hands with the prior of the whole body pose. (2) Control image generation. The extracted body pose and a parametric hand template are given to a control image generator to obtain a control image I_c and a template mask M_t. The final control mask M is obtained by doing a union operation between the bounding box mask M_d and the template mask M_t. (3) Hand restoration. The final output image with corrected hand is generated using ControlNet given the input image, a text prompt, control mask and control image as the conditioning.