UM

Browse/Search Results:  1-9 of 9 Help

Selected(0)Clear Items/Page:    Sort:
SmartEdit: Exploring Complex Instruction-Based Image Editing with Multimodal Large Language Models Conference paper
Huang, Yuzhou, Xie, Liangbin, Wang, Xintao, Yuan, Ziyang, Cun, Xiaodong, Ge, Yixiao, Zhou, Jiantao, Dong, Chao, Huang, Rui, Zhang, Ruimao, Shan, Ying. SmartEdit: Exploring Complex Instruction-Based Image Editing with Multimodal Large Language Models[C]:IEEE Computer Society, 2024, 8362-8371.
Authors:  Huang, Yuzhou;  Xie, Liangbin;  Wang, Xintao;  Yuan, Ziyang;  Cun, Xiaodong; et al.
Favorite | TC[Scopus]:1 | Submit date:2024/11/05
Training  Visualization  Computer Vision  Large Language Models  Diffusion Models  Cognition  Pattern Recognition  Instruction-based Image Editing  Multimodal Large Language Models  
COMMA: Co-articulated Multi-Modal Learning Conference paper
Hu, Lianyu, Gao, Liqing, Liu, Zekang, Pun, Chi Man, Feng, Wei. COMMA: Co-articulated Multi-Modal Learning[C]:Association for the Advancement of Artificial Intelligence, 2024, 2238-2246.
Authors:  Hu, Lianyu;  Gao, Liqing;  Liu, Zekang;  Pun, Chi Man;  Feng, Wei
Favorite | TC[Scopus]:0 | Submit date:2024/05/16
Cv: Language And Vision  Cv: Large Vision Models  Cv: Multi-modal Vision  Cv: Video Understanding & Activity Analysis  
LightVLP: A Lightweight Vision-Language Pre-training via Gated Interactive Masked AutoEncoders Conference paper
Sun, Xingwu, Yang, Zhen, Xie, Ruobing, Lian, Fengzong, Kang, Zhanhui, Xu, Chengzhong. LightVLP: A Lightweight Vision-Language Pre-training via Gated Interactive Masked AutoEncoders[C]:European Language Resources Association (ELRA), 2024, 10499-10510.
Authors:  Sun, Xingwu;  Yang, Zhen;  Xie, Ruobing;  Lian, Fengzong;  Kang, Zhanhui; et al.
Favorite | TC[Scopus]:0 | Submit date:2024/07/04
Lightweight v&l Pre-training  Mask Autoencoder  Vision-language Pre-training  
Relational Network via Cascade CRF for Video Language Grounding Journal article
Zhang, Tong, Lu, Xiankai, Zhang, Hao, Nie, Xiushan, Yin, Yilong, Shen, Jianbing. Relational Network via Cascade CRF for Video Language Grounding[J]. IEEE Transactions on Multimedia, 2024, 26, 8297-8311.
Authors:  Zhang, Tong;  Lu, Xiankai;  Zhang, Hao;  Nie, Xiushan;  Yin, Yilong; et al.
Favorite | TC[WOS]:1 TC[Scopus]:1  IF:8.4/8.0 | Submit date:2024/02/23
Vision-language Grounding  Conditional Random Fields  Temporal Relation  Proposal Free  
The Neglected Tails in Vision-Language Models Conference paper
Parashar, Shubham, Lin, Zhiqiu, Liu, Tian, Dong, Xiangjue, Li, Yanan, Ramanan, Deva, Caverlee, James, Kong, Shu. The Neglected Tails in Vision-Language Models[C]:IEEE Computer Society, 2024, 12988-12997.
Authors:  Parashar, Shubham;  Lin, Zhiqiu;  Liu, Tian;  Dong, Xiangjue;  Li, Yanan; et al.
Favorite | TC[Scopus]:2 | Submit date:2024/11/05
Long Tailed Recognition  Vision-language Models  Zero-shot Recognition  
Referring Multi-Object Tracking Conference paper
Wu, Dongming, Han, Wencheng, Wang, Tiancai, Dong, Xingping, Zhang, Xiangyu, Shen, Jianbing. Referring Multi-Object Tracking[C]:IEEE, 2023, 14633-14642.
Authors:  Wu, Dongming;  Han, Wencheng;  Wang, Tiancai;  Dong, Xingping;  Zhang, Xiangyu; et al.
Favorite | TC[WOS]:16 TC[Scopus]:29 | Submit date:2024/02/23
And Reasoning  Language  Vision  
Learning Disentanglement with Decoupled Labels for Vision-Language Navigation Conference paper
Cheng, Wenhao, Dong, Xingping, Khan, Salman, Shen, Jianbing. Learning Disentanglement with Decoupled Labels for Vision-Language Navigation[C]:SPRINGER-VERLAG BERLIN, HEIDELBERGER PLATZ 3, D-14197 BERLIN, GERMANY, 2022, 309-329.
Authors:  Cheng, Wenhao;  Dong, Xingping;  Khan, Salman;  Shen, Jianbing
Favorite | TC[WOS]:4 TC[Scopus]:5 | Submit date:2023/01/30
Disentanglement  Imitation/reinforcement Learning  Lstm And Transformer  Modular Network  Vision-and-language Navigation  
Multi-Level Representation Learning with Semantic Alignment for Referring Video Object Segmentation Conference paper
Dongming Wu, Xingping Dong, Ling Shao, Jianbing Shen. Multi-Level Representation Learning with Semantic Alignment for Referring Video Object Segmentation[C]:IEEE COMPUTER SOC, 10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1264 USA, 2022, 4986-4995.
Authors:  Dongming Wu;  Xingping Dong;  Ling Shao;  Jianbing Shen
Favorite | TC[WOS]:18 TC[Scopus]:33 | Submit date:2023/01/30
Grouping And Shape Analysis  Segmentation  Vision + Language  
Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation Conference paper
Hanqing Wang, Wei Liang, Jianbing Shen, Luc Van Gool, Wenguan Wang. Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation[C], 2022, 15450-15460.
Authors:  Hanqing Wang;  Wei Liang;  Jianbing Shen;  Luc Van Gool;  Wenguan Wang
Favorite | TC[WOS]:17 TC[Scopus]:36 | Submit date:2023/01/30
Vision + Language