Awesome-3D-Human-Motion-Generation

3D Human Motion Generation aims to generate natural and plausible motions from conditions such as text descriptions, action labels, music, etc.

This repository is built mainly to track mainstream Text-to-Motion works, and also contains papers and datasets related to it.

Last updated: 2024/07/24 (Partial ECCV'24 added)

Content Catalog

Datasets
Metircs
Performance Tables
- Humanml3D
- KIT-ML
Paper List
- Text-to-Motion
- Motion Control
Feedback

Datasets

Text-to-Motion

Humanml3D | [project] | [paper]
KIT-ML | [project]
PoseScript | [project] | [paper]
Motion-X | [project]| [paper]
CombatMotion | [project]

Metrics

Motion quality

Frechet Inception Distance (FID) $\downarrow$
- FID is adopted as a principal metric to evaluate the feature distributions between the generated and real motions. The feature extractor employed is from [T2M].

Motion diversity

MultiModality (MModality) $\uparrow$
- MModality measures the generation diversity conditioned on the same text. Specifically, MModality represents the average variance for a single text prompt by computing Euclidean distances of 10 generated pairs of motions.
Diversity $\rightarrow$ i.e., closer to real motion is better
- Diversity measures the variability and richness of the generated action sequences, which is calculated by averaging Euclidean distances of random samples from 300 pairs of motion.

Condition matching

R-Precision $\uparrow$
- R-Precision measures the similarity between the text description and the generated motion sequence and indicates the probability that the real text appears in the top-k after sorting.
Multi-Modal Distance (MM Dist) $\downarrow$
- MM Dist represents the average Euclidean distance between the motion feature of each generated motion and the text feature of its corresponding text description in the test set.

Performance Tables

Notably! The symbol of 'o-' and 'u-' in Code Link indicate the official and the unofficial implementations, respectively.

Please note, [Seq2Seq], [Language2Pose], [Text2Gesture], [Hier] and [TEMOS] don't report results in terms of above metrics.
- The [Seq2Seq], [Language2Pose], [Text2Gesture] and [Hier]'s results come from [TM2T].
- The [TEMOS]'s come from [MMM].
†: denotes a different evaluator is used.

Humanml3D

ID	Year	Venue	Model (or Authors)	R Precision Top-1 ↑	R Precision Top-2 ↑	R Preciion Top-3 ↑	FID ↓	MM Dist ↓	MultiModality ↑	Diversity →	code	-
-	-	-	Real Motion	$0.511^{\pm.003}$	$0.703^{\pm.003}$	$0.797^{\pm.002}$	$0.002^{\pm.000}$	$2.974^{\pm.008}$	-	$9.503^{\pm.065}$	-	-
-	-	-	Real Motion †	$0.539^{\pm.004}$	$0.721^{\pm.003}$	$0.810^{\pm.003}$	$0.001^{\pm.000}$	$1.462^{\pm.006}$	-	$5.298^{\pm.047}$	-	-
1	2018	NeurIPS	Seq2Seq	$0.180^{\pm.002}$	$0.300^{\pm.002}$	$0.396^{\pm.002}$	$11.75^{\pm.035}$	$5.529^{\pm.007}$	-	$6.223^{\pm.061}$	`[u-pytorch]`	-
2	2019	3DV	Language2Pose	$0.246^{\pm.002}$	$0.387^{\pm.002}$	$0.486^{\pm.004}$	$11.02^{\pm.046}$	$5.296^{\pm.008}$	-	$7.676^{\pm.058}$	`[o-pytorch]`	-
3	2021	IEEE VR	Text2Gesture	$0.165^{\pm.001}$	$0.267^{\pm.002}$	$0.345^{\pm.002}$	$5.012^{\pm.030}$	$6.030^{\pm.008}$	-	$6.409^{\pm.071}$	`[o-pytorch]`	-
4	2021	ICCV	Hier	$0.301^{\pm.002}$	$0.425^{\pm.002}$	$0.552^{\pm.004}$	$6.523^{\pm.024}$	$5.012^{\pm.018}$	-	$8.332^{\pm.042}$	`[o-pytorch]`	-
5	2022	ECCV	TEMOS	$0.424^{\pm.002}$	$0.612^{\pm.002}$	$0.722^{\pm.002}$	$3.734^{\pm.028}$	$3.703^{\pm.008}$	$0.368^{\pm.018}$	$8.973^{\pm.071}$	`[o-pytorch]`	-
6	2022	ECCV	TM2T	$0.424^{\pm.003}$	$0.618^{\pm.003}$	$0.729^{\pm.002}$	$1.501^{\pm.017}$	$3.467^{\pm.011}$	$2.424^{\pm.093}$	$8.589^{\pm.076}$	`[o-pytorch]`	-
7	2022	CVPR	T2M	$0.455^{\pm.003}$	$0.636^{\pm.003}$	$0.736^{\pm.002}$	$1.087^{\pm.021}$	$3.347^{\pm.008}$	$2.219^{\pm.074}$	$9.175^{\pm.083}$	`[o-pytorch]`	-
8	2023	ICLR	MDM	$0.320^{\pm.005}$	$0.498^{\pm.004}$	$0.611^{\pm.007}$	$0.544^{\pm.044}$	$5.566^{\pm.027}$	$2.799^{\pm.072}$	$9.559^{\pm.086}$	`[o-pytorch]`	-
9	2022 (2024)	Arxiv (TPAMI)	MOtionDiffuse	$0.491^{\pm.001}$	$0.681^{\pm.001}$	$0.782^{\pm.001}$	$0.630^{\pm.001}$	$3.113^{\pm.001}$	$1.553^{\pm.042}$	$9.410^{\pm.049}$	`[o-pytorch]`	-
10	2023	CVPR	MLD	$0.481^{\pm.003}$	$0.673^{\pm.003}$	$0.772^{\pm.002}$	$0.473^{\pm.013}$	$3.196^{\pm.010}$	$2.413^{\pm.079}$	$9.724^{\pm.082}$	`[o-pytorch]`	-
11	2023	CVPR	T2M-GPT	$0.491^{\pm.003}$	$0.680^{\pm.003}$	$0.775^{\pm.002}$	$0.116^{\pm.004}$	$3.118^{\pm.011}$	$1.856^{\pm.011}$	$9.761^{\pm.081}$	`[o-pytorch]`	-
12	2023	ICCV	Fg-T2M	$0.492^{\pm.002}$	$0.683^{\pm.003}$	$0.783^{\pm.002}$	$0.243^{\pm.019}$	$3.109^{\pm.007}$	$1.614^{\pm.049}$	$9.278^{\pm.072}$	-	-
13	2023	ICCV	M2DM	$0.497^{\pm.003}$	$0.682^{\pm.002}$	$0.763^{\pm.003}$	$0.352^{\pm.005}$	$3.134^{\pm.010}$	$3.587^{\pm.072}$	$9.926^{\pm.073}$	-	-
14	2023	ICCV	AttT2M	$0.499^{\pm.003}$	$0.690^{\pm.002}$	$0.786^{\pm.002}$	$0.112^{\pm.006}$	$3.038^{\pm.007}$	$2.452^{\pm.051}$	$9.700^{\pm.090}$	`[o-pytorch]`	-
15	2023	NeurIPS	MotionGPT	$0.492^{\pm.003}$	$0.681^{\pm.003}$	$0.778^{\pm.002}$	$0.232^{\pm.008}$	$3.096^{\pm.008}$	$2.008^{\pm.084}$	$9.528^{\pm.071}$	`[o-pytorch]`	-
16	2023	NeurIPS	ReMoDiffuse †	$0.510^{\pm.005}$	$0.698^{\pm.006}$	$0.795^{\pm.004}$	$0.103^{\pm.004}$	$2.974^{\pm.016}$	$1.795^{\pm.043}$	$9.018^{\pm.075}$	`[o-pytorch]`	-
17	2024	CVPR	MMM	$0.504^{\pm.003}$	$0.696^{\pm.003}$	$0.794^{\pm.002}$	$0.080^{\pm.003}$	$2.998^{\pm.007}$	$1.164^{\pm.041}$	$9.411^{\pm.058}$	`[o-pytorch]`	-
18	2024	CVPR	MoMask	$0.521^{\pm.002}$	$0.713^{\pm.002}$	$0.807^{\pm.002}$	$0.045^{\pm.002}$	$2.958^{\pm.008}$	$1.241^{\pm.040}$	-	`[o-pytorch]`	-
19	2024	ECCV	MotionLCM	$0.502^{\pm.003}$	$0.698^{\pm.002}$	$0.798^{\pm.002}$	$0.304^{\pm.012}$	$3.012^{\pm.007}$	$2.259^{\pm.092}$	$9.607^{\pm.066}$	`[o-pytorch]`	-
20	2024	ECCV	Motion Mamba	$0.502^{\pm.003}$	$0.693^{\pm.002}$	$0.792^{\pm.002}$	$0.281^{\pm.009}$	$3.060^{\pm.058}$	$2.294^{\pm.058}$	$9.871^{\pm.084}$	`[o-pytorch]`	-
21	2024	ECCV	BAMM	$0.525^{\pm.002}$	$0.720^{\pm.003}$	$0.814^{\pm.003}$	$0.055^{\pm.002}$	$2.919^{\pm.008}$	$1.687^{\pm.051}$	$9.717^{\pm.089}$	-	-

KIT-ML

ID	Year	Venue	Model (or Authors)	R Precision Top-1 ↑	R Precision Top-2 ↑	R Preciion Top-3 ↑	FID ↓	MM Dist ↓	MultiModality ↑	Diversity →	code	-
-	-	-	Real Motion (GT)	$0.424^{\pm.005}$	$0.649^{\pm.006}$	$0.779^{\pm.006}$	$0.031^{\pm.004}$	$2.788^{\pm.012}$	-	$11.08^{\pm.097}$	-	-
-	-	-	Real Motion †	$0.475^{\pm.006}$	$0.690^{\pm.004}$	$0.791^{\pm.005}$	$0.002^{\pm.000}$	$1.337^{\pm.012}$	-	$6.371^{\pm.058}$	-	-
1	2018	NeurIPS	Seq2Seq	$0.103^{\pm.003}$	$0.178^{\pm.005}$	$0.241^{\pm.006}$	$24.86^{\pm.348}$	$7.960^{\pm.031}$	-	$6.744^{\pm.106}$	`[u-pytorch]`	-
2	2019	3DV	Language2Pose	$0.221^{\pm.005}$	$0.373^{\pm.004}$	$0.483^{\pm.005}$	$6.545^{\pm.072}$	$5.147^{\pm.030}$	-	$9.073^{\pm.100}$	`[o-pytorch]`	-
3	2021	IEEE VR	Text2Gesture	$0.156^{\pm.004}$	$0.255^{\pm.004}$	$0.338^{\pm.005}$	$12.12^{\pm.183}$	$6.964^{\pm.029}$	-	$9.334^{\pm.079}$	`[o-pytorch]`	-
4	2021	ICCV	Hier	$0.255^{\pm.006}$	$0.432^{\pm.007}$	$0.531^{\pm.007}$	$5.203^{\pm.107}$	$4.986^{\pm.027}$	-	$9.563^{\pm.072}$	`[o-pytorch]`	-
5	2022	ECCV	TEMOS	$0.353^{\pm.006}$	$0.561^{\pm.007}$	$0.687^{\pm.005}$	$3.717^{\pm.051}$	$3.417^{\pm.017}$	$0.532^{\pm.034}$	$10.84^{\pm.100}$	`[o-pytorch]`	-
6	2022	ECCV	TM2T	$0.280^{\pm.005}$	$0.463^{\pm.006}$	$0.587^{\pm.005}$	$3.599^{\pm.153}$	$4.591^{\pm.026}$	$3.292^{\pm.081}$	$9.473^{\pm.117}$	`[o-pytorch]`	-
7	2022	CVPR	T2M	$0.361^{\pm.006}$	$0.559^{\pm.007}$	$0.681^{\pm.007}$	$3.022^{\pm.107}$	$3.488^{\pm.028}$	$2.052^{\pm.107}$	$10.72^{\pm.145}$	`[o-pytorch]`	-
8	2023	ICLR	MDM	$0.164^{\pm.004}$	$0.291^{\pm.004}$	$0.396^{\pm.004}$	$0.497^{\pm.021}$	$9.191^{\pm.022}$	$1.907^{\pm.214}$	$10.85^{\pm.109}$	`[o-pytorch]`	-
9	2022 (2024)	Arxiv (TPAMI)	MOtionDiffuse	$0.417^{\pm.004}$	$0.621^{\pm.004}$	$0.739^{\pm.004}$	$1.954^{\pm.064}$	$2.958^{\pm.005}$	$0.730^{\pm.013}$	$11.10^{\pm.143}$	`[o-pytorch]`	-
10	2023	CVPR	MLD	$0.390^{\pm.008}$	$0.609^{\pm.008}$	$0.734^{\pm.007}$	$0.404^{\pm.027}$	$3.204^{\pm.027}$	$2.192^{\pm.071}$	$10.80^{\pm.117}$	`[o-pytorch]`	-
11	2023	CVPR	T2M-GPT	$0.402^{\pm.006}$	$0.619^{\pm.005}$	$0.737^{\pm.006}$	$0.717^{\pm.041}$	$3.053^{\pm.026}$	$1.912^{\pm.036}$	$10.86^{\pm.094}$	`[o-pytorch]`	-
12	2023	ICCV	Fg-T2M	$0.418^{\pm.005}$	$0.626^{\pm.004}$	$0.745^{\pm.004}$	$0.571^{\pm.047}$	$3.114^{\pm.015}$	$1.019^{\pm.029}$	$10.93^{\pm.083}$	-	-
13	2023	ICCV	M2DM	$0.416^{\pm.004}$	$0.628^{\pm.004}$	$0.743^{\pm.004}$	$0.515^{\pm.029}$	$3.015^{\pm.017}$	$3.325^{\pm.370}$	$11.417^{\pm.970}$	-	-
14	2023	ICCV	AttT2M	$0.413^{\pm.006}$	$0.632^{\pm.006}$	$0.751^{\pm.006}$	$0.870^{\pm.039}$	$3.039^{\pm.021}$	$2.281^{\pm.047}$	$10.96^{\pm.123}$	`[o-pytorch]`	-
15	2023	NeurIPS	MotionGPT	$0.366^{\pm.005}$	$0.558^{\pm.004}$	$0.680^{\pm.005}$	$0.510^{\pm.016}$	$3.527^{\pm.021}$	$2.328^{\pm.117}$	$10.35^{\pm.084}$	`[o-pytorch]`	-
16	2023	NeurIPS	ReMoDiffuse †	$0.427^{\pm.014}$	$0.641^{\pm.004}$	$0.765^{\pm.055}$	$0.155^{\pm.006}$	$2.814^{\pm.012}$	$1.239^{\pm.028}$	$10.80^{\pm.105}$	`[o-pytorch]`	-
17	2024	CVPR	MMM	$0.381^{\pm.005}$	$0.590^{\pm.006}$	$0.718^{\pm.005}$	$0.429^{\pm.019}$	$3.146^{\pm.019}$	$1.105^{\pm.026}$	$10.633^{\pm.097}$	`[o-pytorch]`	-
18	2024	CVPR	MoMask	$0.433^{\pm.007}$	$0.656^{\pm.005}$	$0.781^{\pm.005}$	$0.204^{\pm.011}$	$2.779^{\pm.022}$	$1.131^{\pm.043}$	-	`[o-pytorch]`	-
19	2024	ECCV	Motion Mamba	$0.419^{\pm.006}$	$0.645^{\pm.005}$	$0.765^{\pm.006}$	$0.307^{\pm.041}$	$3.021^{\pm.025}$	$1.678^{\pm.064}$	$11.02^{\pm.098}$	`[o-pytorch]`	-
20	2024	ECCV	BAMM	$0.438^{\pm.009}$	$0.661^{\pm.009}$	$0.788^{\pm.005}$	$0.183^{\pm.013}$	$2.723^{\pm.026}$	$1.609^{\pm.065}$	$11.008^{\pm.098}$	-	-

Paper List

Text-to-Motion

[Seq2Seq] | NeurIPS'18 | Generating Animated Videos of Human Activities from Natural Language Descriptions | [pdf] | [u-pytorch] |
[Language2Pose] | 3DV'19 | Language2Pose: Natural Language Grounded Pose Forecasting | [pdf] | [o-pytorch] |
[Text2Gesture] | IEEE VR'21 | Text2Gestures: A Transformer-Based Network for Generating Emotive Body Gestures for Virtual Agents | [pdf] | [o-pytorch] |
[Hier] | ICCV'21 | Synthesis of Compositional Animations from Textual Descriptions | [pdf] | [o-pytorch] |
[TEMOS] | ECCV'22 | TEMOS: Generating diverse human motions from textual descriptions | [pdf] | [o-pytorch] |
[TM2T] | ECCV'22 | TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts | [pdf] | [o-pytorch] |
[T2T] | CVPR'22 | Generating Diverse and Natural 3D Human Motions from Text | [pdf] | [o-pytorch] |
[MDM] | ICLR'23 | MDM: Human Motion Diffusion Model | [pdf] | [o-pytorch] |
[MotionDiffuse] | Arxiv'22 (TPAMI'24) | MDM: Human Motion Diffusion Model | [pdf] | [o-pytorch] |
[MLD] | CVPR'23 | Executing your Commands via Motion Diffusion in Latent Space | [pdf] | [o-pytorch] |
[T2m-GPT] | CVPR'23 | T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations | [pdf] | [o-pytorch] |
[Fg-T2M] | ICCV'23 | Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model | [pdf] | - |
[M2DM] | ICCV'23 | Priority-Centric Human Motion Generation in Discrete Latent Space | [pdf] | - |
[AttT2M] | ICCV'23 | AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism | [pdf] | [o-pytorch] |
[MotionGPT] | NeurIPS'23 | MotionGPT: Human Motion as a Foreign Language | [pdf] | [o-pytorch] |
[ReMoDiffuse †] | NeurIPS'23 | ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model | [pdf] | [o-pytorch] |
[MMM] | CVPR'24 | MMM: Generative Masked Motion Model | [pdf] | [o-pytorch] |
[MoMask] | CVPR'24 | MoMask: Generative Masked Modeling of 3D Human Motions | [pdf] | [o-pytorch] |
[MotionLCM] | ECCV'24 | MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model | [pdf] | [o-pytorch] |
[Motion Mamba] | ECCV'24 | Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM | [pdf] | [o-pytorch] |
[BAMM] | ECCV'24 | BAMM: Bidirectional Autoregressive Motion Model | [pdf] | - |

Motion Control (e.g., Spatial Contraints)

[GMD] | ICCV'23 | Guided motion diffusion for controllable human motion synthesis | [pdf] | [o-pytorch] |
[PhysDiff] | ICCV'23 | PhysDiff: Physics-Guided Human Motion Diffusion Model | [pdf] | - |
[PriorMDM] | ICLR'24 | Human Motion Diffusion as a Generative Prior | [pdf] | [o-pytorch] |
[OmniControl] | ICLR'24 | Omnicontrol: Control any joint at any time for human motion generation | [pdf] | [o-pytorch] |
[MotionLCM] | ECCV'24 | MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model | [pdf] | [o-pytorch] |

Feedback

If you have any suggestions or find missing papers, please feel free to contact me.

Thanks

This format of this awesome follows this project, thanks for such a pretty template!

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-3D-Human-Motion-Generation

Content Catalog

Datasets

Text-to-Motion

Metrics

Motion quality

Motion diversity

Condition matching

Performance Tables

Humanml3D

KIT-ML

Paper List

Text-to-Motion

Motion Control (e.g., Spatial Contraints)

Feedback

Thanks

About

Releases

Packages

License

andyl-flwls/Awesome-3D-Human-Motion-Generation

Folders and files

Latest commit

History

Repository files navigation

Awesome-3D-Human-Motion-Generation

Content Catalog

Datasets

Text-to-Motion

Metrics

Motion quality

Motion diversity

Condition matching

Performance Tables

Humanml3D

KIT-ML

Paper List

Text-to-Motion

Motion Control (e.g., Spatial Contraints)

Feedback

Thanks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages