基于全局与序列混合变分Transformer的多样化图像描述生成方法

查看全部 · 2024-12-14 12:08

文档名：基于全局与序列混合变分Transformer的多样化图像描述生成方法
摘要：多样化图像描述生成已成为图像描述领域研究热点.然而,现有方法忽视了全局和序列隐向量之间的依赖关系,严重限制了图像描述性能的提升.针对该问题,本文提出了基于混合变分Transformer的多样化图像描述生成框架.具体地,首先构建全局与序列混合条件变分自编码模型,解决全局与序列隐向量之间依赖关系表示的问题.其次,通过最大化条件似然推导混合模型的变分证据下界,解决多样化图像描述目标函数设计问题.最后,无缝融合Transformer和混合变分自编码模型,通过联合优化提升多样化图像描述的泛化性能.在MSCOCO数据集上实验结果表明,与当前最优基准方法相比,在随机生成20和100个描述语句时,多样性指标m-BLEU(mutualoverlap-BiLingualEvaluationUnderstudy)分别提升了4.2％和4.7％,同时准确性指标CIDEr(Consensus-basedImageDescriptionEvaluation)分别提升了4.4％和15.2％.

Abstract：Diverseimagecaptioninghasbecomearesearchhotspotinthefieldofimagedescription.Existingmeth-odsgenerallyignorethedependencyrelationshipbetweenglobalandsequentiallatentvectors,whichseriouslylimitstheperformanceimprovement.Toaddressthisproblem,thispaperproposesahybridvariationalTransformerbaseddiverseim-agecaptioningframework.Firstly,weconstructahybridconditionalvariationalautoencodertoeffectivelymodelthedepen-dencybetweenglobalandsequentiallatentvectors.Secondly,theevidencelowerboundisderivedbymaximizingthecondi-tionallikelihoodofthehybridautoencoder,whichservesastheobjectivefunctionfordiverseimagecaptioning.Finally,weseamlesslycombinetheTransformermodelwiththehybridconditionalvariationalautoencoder,whichcanbejointlyopti-mizedtoimprovethegeneralizationperformanceofdiverseimagecaptioning.TheexperimentalresultsonMSCOCOdatas-etshowthatcomparedwiththestate-of-the-artmethods,whenrandomlygenerating20and100captions,thediversitymet-ricm-BLEU(MutualoverlapBilingualEvaluationUnderstudy)hasimprovedby4.2％and4.7％,respectively,whiletheac-curacymetricCIDEr(ConsensusbasedImageDescriptionEvaluation)hasimprovedby4.4％and15.2％,respectively.

作者：刘兵李穗刘明明刘浩 Author：LIUBing LISui LIUMing-ming LIUHao
作者单位：中国矿业大学计算机科学与技术学院,江苏徐州221116;矿山数字化教育部工程研究中心,江苏徐州221116中国矿业大学计算机科学与技术学院,江苏徐州221116
刊名：电子学报 ISTICEIPKU
Journal：ActaElectronicaSinica
年，卷(期)：2024, 52(4)
分类号：TP391
关键词：图像理解  图像描述  变分自编码  隐嵌入  多模态学习  生成模型
Keywords：imageunderstanding  imagecaptioning  variationalautoencoding  latentembedding  multi-modallearn-ing  generativemodel
机标分类号：TP391.41TN957.52S
在线出版日期：2024年6月26日
基金项目：基于全局与序列混合变分Transformer的多样化图像描述生成方法[
期刊论文]  电子学报--2024, 52(4)刘兵  李穗  刘明明  刘浩多样化图像描述生成已成为图像描述领域研究热点.然而,现有方法忽视了全局和序列隐向量之间的依赖关系,严重限制了图像描述性能的提升.针对该问题,本文提出了基于混合变分Transformer的多样化图像描述生成框架.具体地,首...参考文献和引证文献
参考文献
引证文献
本文读者也读过
相似文献
相关博文

基于全局与序列混合变分Transformer的多样化图像描述生成方法  Diverse Image Captioning Based on Hybrid Global and Sequential Variational Transformer

基于全局与序列混合变分Transformer的多样化图像描述生成方法.pdf

2024-12-14 12:08 上传

基于全局与序列混合变分Transformer的多样化图像描述生成方法.pdf

文件大小:: 6.12 MB

下载次数:: 60

高速下载

基于全局与序列混合变分Transformer的多样化图像描述生成方法

能源电力

化工

建筑工程

机械

电子信息

医药

科学