Our method reconstructs strand-based 3D hairstyles from short monocular videos with natural head motion.

These casual captures provide additional viewpoints beyond a single image, implicitly encoding multi-view observations and motion cues that improve hair length estimation and resolve occlusions.

Abstract

We present Vid2Haircut, a novel approach for strand-based 3D hair reconstruction from monocular head-motion videos. While existing multi-view methods achieve high-fidelity results, they require controlled capture setups. In contrast, single-image approaches suffer from occlusion-driven ambiguities, particularly in unseen regions such as the back of the head. Recent monocular video methods improve reconstruction by leveraging learned priors, but may struggle under natural head motion.

To address this, our approach reconstructs accurate geometry from a short monocular video by leveraging viewpoint variations induced by natural head motion, which help resolve occlusions in poorly observed regions. Specifically, we extend a learned prior for general hair structure by jointly optimizing a shared, scalp-aligned hair map in a canonical space across multiple keyframes. To accommodate hair motion during capture, we incorporate a deformation MLP that predicts residual strand offsets, preventing frame-specific deformations from corrupting the canonical hairstyle.

We further stabilize the reconstruction of weakly observed regions using visibility-aware updates and neighboring-strand smoothness constraints. Experiments on synthetic and real data show that our pipeline improves backside consistency, scalp attachment, and overall 3D reconstruction quality compared to state-of-the-art baselines, while requiring only casual head-motion videos as input.

Video Presentation

🎬 Video coming soon

Main idea

Vid2Haircut deformation and refinement

Starting from a frontal reference frame, we initialize the canonical hairstyle using the Im2Haircut prior and select keyframes from the input monocular video for joint optimization. A shared PCA hair map Z is refined across frames, while a deformation MLP accounts for non-rigid motion. Visibility-driven updates ensure that gradients modify the canonical map only in regions with reliable observations, with optimization stabilized by smoothness regularization and reprojection losses

Comparison

We compare Vid2Haircut with state-of-the-art single-view hair reconstruction methods, including Hairstep, Difflocks, and Im2Haircut, as well as a commercial mobile-scanning solution for hair, denoted as MS-Hair.

Comparison with Hairstep, Difflocks, Im2Haircut, and MS-Hair

Comparison across test subjects. We compare Vid2Haircut against single-view and commercial baselines using fixed rendered views. Our method better preserves hairstyle-specific structure and improves consistency in side and partially occluded regions.

Comparison under a shared camera trajectory. While single-view methods often struggle with occluded regions and incomplete 3D structure, our method leverages multiple views from a short monocular video to recover more consistent strand geometry and better preserve hairstyle-specific details.

Additional Results

We provide additional qualitative comparisons to evaluate the geometric fidelity of reconstructed hairstyles, with a focus on strand-level structure, local flow, and overall hair volume.

Geometry-based comparison. We compare Vid2Haircut with Im2Haircut, Difflocks, Hairstep, and MV-Scanner on multiple subjects from the NeRSemble dataset. Geometry-only renderings and zoom-in crops highlight local strand structure and reconstruction artifacts.

Vid2Haircut produces more coherent strand-based geometry while preserving both volumetric hair structure and local strand flow. The improvement is especially visible in side views, where prior methods often oversmooth the hairstyle, lose volume, or produce noisy and misaligned strand patterns. These results show that using multiple views from a short monocular video enables more faithful and personalized hairstyle reconstruction than prior single-view and scanning-based baselines.

BibTeX