The project will build on early prototypes in the group to estimate, reconstruct or generate hand-object interactions from egocentric videos.
The project team will be targeting joint interaction prediction (i.e. is either hand currently in interaction with an object, and if so, which object), reconstruction (i.e. 3D modelling of the pose of the hand, the object and the contact vertices between both) and forecasting (i.e. future movement of the hand to interact with or release objects, including future contact, pose and contact).
While these three tasks have been studied separately, this proposal aims for a joint multi-task foundation model that uses large-scale synthetic and real training data – a mixture of datasets publicly available and mostly datasets built and annotated locally at the University of Bristol.
The outcome is a foundation model for hand-object interactions for release and a research publication
Dima Damen, University of Bristol, United Kingdom