學術文獻
- 谷歌學術
- Solving Transformer by Hand: A Step-by-Step Math Example
- Diffusion Models: A Comprehensive Survey of Methods and Applications
- CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
- Transformer Network for video to text translation
- Make-A-Video: Text-to-Video Generation without Text-Video Data
- Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
- TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval(Book)