docx-corpus
The largest open corpus of .docx files for document processing research
45
GitHub Stars
Jan 9, 2026
Launch Date
6h ago
First Tracked
About
AI Summary
docx-corpus is the largest open dataset of .docx files designed to facilitate research in document processing.
The largest open corpus of .docx files for document processing research
Tags
bun
common-crawl
corpus
dataset
document-processing
docx
machine-learning
nlp
typescript
word-documents
TypeScript