Have made a few attempts at this. Here are the ones that worked best. Did this on my own setup using the rapidaio version of the 14B WAN 2.2 T2V generator. Each clip took about 2 minutes on a 12GB RTX 3080.
WAN 2.2 is quite extraordinary in how accurate it can generate lifelike feet. All the feet in the videos above were AI generated, I did not provide any photo input.