In conjunction with the 25th International Conference on Pattern Recognition (ICPR 2020)
The workshop will be hosted at Milan Congress Center (Mi.Co.), which is located in Piazzale Carlo Magno 1, Milan (now goes online, more information are available at the main conference website)
Deep learning is now recognized as one of the key software engines that drives the new industrial revolution. The majority of current deep learning research efforts have been dedicated to single-modal data processing. Pronounced manifestations are deep learning based visual recognition and speech recognition. Although significant progress made, single-modal data is often insufficient to derive accurate and robust deep models in many applications. Our digital world is by nature multi-modal, that combines different modalities of data such as text, audio, images, animations, videos and interactive content. Multi-modal is the most popular form for information representation and delivery. For example, posts for hot social events are typically composed of textual descriptions, images and videos. For medical diagnosis, the joint use of medical imaging and textual reports is also essential. Multi-modal data is common for human to make accurate perceptions and decisions. Multi-modal deep learning that is capable of learning from information presented in multiple modalities and consequently making predictions based on multi-modal input is much in demand.
This workshop calls for scientific works that illustrate the most recent progress on multi-modal deep learning. In particular, multi-modal data capture, integration, modelling, understanding and analysis, and how to leverage them to derive accurate and robust AI models in many applications. It is a timely topic following the rapid development of deep learning technologies and their remarkable applications to many fields. It will serve as a forum to bring together active researchers and practitioners to share their recent advances in this exciting area. In particular, we solicit original and high-quality contributions in: (1) presenting state-of-the-art theories and novel application scenarios related to multi-modal deep learning; (2) surveying the recent progress in this area; and (3) developing benchmark datasets and evaluations. We welcome contributions coming from various communities (i.e., visual computing, machine learning, multimedia analysis, distributed and cloud computing, etc.) to submit their novel results.
Accepted papers will be encouraged to submit extended versions of their papers to a special issue of the Machine Vision and Applications journal, under the same theme.
Paper ID | Paper Title |
---|---|
2 | Hierarchical Consistency and Refnement for Semi-supervised Medical Segmentation |
3 | BVTNet: Multi-label Multi-class Fusion of Visible and Thermal Camera for Free Space and Pedestrian Segmentation |
5 | Multimodal Emotion Recognition Based on Speech and Physiological Signals Using Deep Neural Networks |
6 | Cross-modal Deep Learning Applications: Audio-Visual Retrieval |
10 | Exploiting Word Embeddings for Recognition of Unseen Objects |
12 | Automated segmentation of lateral ventricle in MR images using multi-scale feature fusion convolutional neural network |
13 | Visual Word Embedding for Text Classification |
16 | CC-LSTM: Cross and Conditional Long-Short Time Memory for Video Captioning |
18 | An Overview of Image-to-Image Translation using Generative Adversarial Networks |
20 | Fusion Models for Improved Visual Captioning |
21 | From Bottom to Top: A Coordinated Feature Representation Method for Speech Recognition |
PROGRAM SCHEDULE OF MMDLCA 2020 |
||
Monday, January 11, 2021 (CET Time) |
||
12:00 | Joining the online conference. Introduction to the technical information (for online participants) | |
12:00-14:40 | Plenary Session | Chair |
12:00-12:40 |
Keynote Talk 1 Multimodal Medical Data Analysis: Machine Learning in Histopathology Henning Muller Professor at the University of Geneva, Switzerland |
Xirong Li, Renmin University of China, China |
12:40-13:00 |
Hierarchical Consistency and Refnement for Semi-supervised Medical Segmentation Zixiao Wang, Hai Xu, Youliang Tian and Hongtao Xie University of Science and Technology of China, China Guizhou Provincial Key Laboratory of Public Big Data, Guizhou University, China |
|
13:00-13:20 |
BVTNet: Multi-label Multi-class Fusion of Visible and Thermal Camera for Free Space and Pedestrian Segmentation Vijay John, Ali Boyali, Simon Thompson and Seiichi Mita Toyota Technological Institute, Japan Tier IV, Japan |
|
13:20-13:40 |
Cross-modal Deep Learning Applications: Audio-Visual Retrieval Cong Jin, Tian Zhang, Shouxun Liu, Yun Tie, Jianguang Li, Wencai Yan and Ming Yn Communication University of China, China Zhengzhou University, China |
|
13:40-14:00 |
Automated segmentation of lateral ventricle in MR images using multi-scale feature fusion convolutional neural network Fei Ye, Zhiqiang Wang, Kai Hu, Sheng Zhu and Xieping Gao Xiangtan University, China Xiangnan University , China |
|
14:00-14:20 |
From Bottom to Top: A Coordinated Feature Representation Method for Speech Recognition Lixia Zhou and Jun Zhang Guangdong University of Technology, China |
|
14:20-16:40 | Plenary Session | Chair |
14:20-15:00 |
Keynote Talk 2 Vision to Language: from Independency, Interaction, to Symbiosis Ting Yao Principal Researcher at JD AI Research, China |
Zhineng Chen, Institute of Automation, Chinese Academy of Sciences, China |
15:00-15:20 |
An Overview of Image-to-Image Translation using Generative Adversarial Networks Xin Chen and Caiyan Jia Beijing Jiaotong University, China |
|
15:20-15:40 |
Multimodal Emotion Recognition Based on Speech and Physiological Signals Using Deep Neural Networks Ali Bakhshi and Stephan Chalup The University of Newcastle, Australia |
|
15:40-16:00 |
Exploiting Word Embeddings for Recognition of Unseen Objects Karan Sharma, Hemanth Dandu, Arun Kumar, Vinay Boddula and Suchendra Bhandarkar Keysight Technologies, United States The University of Georgia, United States |
|
16:00-16:20 |
Visual Word Embedding for Text Classification Ignazio Gallo, Shah Nawaz, Nicola Landro and Riccardo La Grassainst University of Insubria, Italy |
|
16:20-16:40 |
Fusion Models for Improved Visual Captioning Marimuthu Kalimuthu, Aditya Mogadala, Marius Mosbach and Dietrich Klakow Saarland Informatics Campus, Saarland University, Germany |
|
16:40 | Closing Ceremony |
Submissions must be formatted in accordance with the Springer's Computer Science Proceedings guidelines (https://www.springer.com/gp/computer-science/lncs/conference-proceedings-guidelines). The submission is single-blind. Two types of contribution will be considered:
Accepted manuscripts will be included in the ICPR 2020 Workshop Proceedings Springer volume. Once accepted, at least one author is expected to attend the event and orally present the paper (online).
We have setup a submission entry in Easychair. It is OPEN now!