본문 바로가기

Document understanding

(2)

Clean your desk! Transformers for unsupervised clustering of document images Abstract 문서 분류를 위해 Clustering 을 목표로 한다 Multi-modal Transformer-based encoder : LayoutLM과 LayoutLMv2 사용 RVL-CDIP 문서, SROIE 영수증 및 machine learning 논문에 테스트 수행 텍스트가 많은 문서에 LayoutLM이 이점이 있음에도 LayoutLMv2가 항상 성능을 상회한다 [CLS] 토큰이 Clustering에 항상 최선의 representation은 아니다 1. Introduction 문서 분류를 돕기위해 unsupervised document clustering을 수행했다. 이는 문서 분류와 다르게 라벨이 필요없다. Document understanding은 본질적으로 multimodal이며 강력한..

End-to-end Document Recognition and Understanding with Dessurt Abstract Document understanding transformer인 Dessurt를 소개 Document image와 task string을 입력으로 받고 autoregressive하게 텍스트를 출력 End-to-end 아키텍쳐로 document understanding에 text recognition까지 추가로 수행 9개의 다른 task에 대해 효과적인 성능을 달성 Introduction Document understanding 분야에 많은 연구가 진행중이며 대표적으로 LayoutLM이 있다. LayoutLM Family BERT-like transformers에 spatial/layout 정보와 visual feature 추가 Document image로 사전학습되었으며 각 task로 미..

이전 1 다음

티스토리툴바