TechSocial Series - March 2024 | Advanced Research Computing - UCL

Document AI and Large Language Models, by Dongsheng Wang - AI Research Lead and Vice President at JPMorgan AI Research (NLP) in London

Speaker Bio

Dongsheng Wang received his PhD from the University of Copenhagen in 2020, with a Marie-Curie fellowship under the EU Horizon 2020 program.

Before joining JPMorgan, worked at Action.ai as a research scientist in conversational AI, the AI department at Tencent, and with the Brain-inspired intelligence team at the Chinese Academy of Sciences.

At JPMorgan, he led the notable DocLLM project, attracting over 200,000 social media views. His research interests include Transformers, Document AI, Layout Language Model, Large Language Models, and Knowledge Graph reasoning. He recently has been leading the research on RAG for LLM, and multi-modal LLM construction.

Abstract

The field of Document AI has been growing rapidly, aiming to better understand complex documents such as business forms, invoices, academic publications and financial reports.

Researchers have developed specialised models that are fine-tuned to encode the text, layout, and visual features of these documents. Notably, models like LayoutLM and Graph-based approaches have been effective for specific tasks because they grasp features of different modality in a nuanced manner. However, these models often cannot generate to new tasks or new datasets without fine-tuning, limiting their immediate usage out-of-the box.

Another key challenge in documents with irregular forms is how to effectively pre-train a transformer model for autoregressive next token prediction?

This talk introduces DocLLM, which represents a pioneering effort to develop such a generative large language model tailored for Document AI. Its innovation includes a novel pre-training objective that fills in the text within the context of visual documents, addressing the issues of irregular layouts and diverse content. Moreover, it integrates text with spatial layout information using disentangled self-attention mechanisms, which has proven highly effective. This innovative approach has led to DocLLM outperforming state-of-the-art LLMs on 14 out of 16 datasets and generalising well with 4 out of 5 previously unseen datasets.

Archived Media

Apologies - there is no archived media available for this event

For more information about the TechSocial series, including past event media, future dates and speakers, please visit TechSocial Series