<pre id="j9nnl"></pre>
        <ruby id="j9nnl"></ruby>
          <pre id="j9nnl"></pre><ruby id="j9nnl"><mark id="j9nnl"></mark></ruby>
          <p id="j9nnl"><mark id="j9nnl"><progress id="j9nnl"></progress></mark></p>

              <p id="j9nnl"><cite id="j9nnl"></cite></p>

                  <ruby id="j9nnl"></ruby>

                  <del id="j9nnl"><dfn id="j9nnl"><th id="j9nnl"></th></dfn></del>

                        <ruby id="j9nnl"><mark id="j9nnl"><thead id="j9nnl"></thead></mark></ruby>
                        <p id="j9nnl"></p>

                        <p id="j9nnl"><mark id="j9nnl"></mark></p>
                          <p id="j9nnl"><del id="j9nnl"><thead id="j9nnl"></thead></del></p>

                          科學研究

                          Research

                          首頁 >  論文  > 詳情

                          VideoChat : Chat-Centric Video Understanding

                          發表會議及期刊:arXiv

                          KunChang Li?1,4 , Yinan He?1 , Yi Wang??1 , Yizhuo Li1,3 , Wenhai Wang1

                          Ping Luo3 , Yali Wang4,1 , Limin Wang2,1 , Yu Qiao

                          1OpenGVLab, Shanghai AI Laboratory     2Nanjing University     3The University of Hong Kong 4Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences

                           https://github.com/OpenGVLab/Ask-Anything 


                          Abstract

                          In this study, we initiate an exploration into video understanding by introducing VideoChat, an end-to-end chat-centric video understanding system. It integrates video foundation models and large language models via a learnable neural interface, excelling in spatiotemporal reasoning, event localization, and causal relationship inference. To instructively tune this system, we propose a video-centric instruction dataset, composed of thousands of videos matched with detailed descriptions and conversations. This dataset emphasizes spatiotemporal reasoning and causal relationships, providing a valuable asset for training chat-centric video understanding systems. Preliminary qualitative experiments reveal our system’s potential across a broad spectrum of video applications and set the standard for future research. Access our code and data at https://github.com/OpenGVLab/Ask-Anything. 


                          comm@pjlab.org.cn

                          上海市徐匯區云錦路701號西岸國際人工智能中心37-38層

                          滬ICP備2021009351號-1

                          <pre id="j9nnl"></pre>
                                <ruby id="j9nnl"></ruby>
                                  <pre id="j9nnl"></pre><ruby id="j9nnl"><mark id="j9nnl"></mark></ruby>
                                  <p id="j9nnl"><mark id="j9nnl"><progress id="j9nnl"></progress></mark></p>

                                      <p id="j9nnl"><cite id="j9nnl"></cite></p>

                                          <ruby id="j9nnl"></ruby>

                                          <del id="j9nnl"><dfn id="j9nnl"><th id="j9nnl"></th></dfn></del>

                                                <ruby id="j9nnl"><mark id="j9nnl"><thead id="j9nnl"></thead></mark></ruby>
                                                <p id="j9nnl"></p>

                                                <p id="j9nnl"><mark id="j9nnl"></mark></p>
                                                  <p id="j9nnl"><del id="j9nnl"><thead id="j9nnl"></thead></del></p>
                                                  韩国伦理电影