catalog.ldc.upenn.edu
Linguistic Data Consortium - Linguistic Data Consortium
https://catalog.ldc.upenn.edu/byproject
8250; Language Resources. LDC Corpora ⇔ Projects. Many of the corpora in the Catalog were developed for, or used in, sponsored research projects. Some of those resources were training and test data for benchmark tests of language-based systems developed during the project. A corpus is associated with a given project either because it was developed for the project, it was used in the project or it was considered otherwise relevant to the work of the project. ACE-2 Version 1.0. Chinese Treebank 8.0. BOLT C...
catalog.ldc.upenn.edu
Web 1T 5-gram Version 1 - Linguistic Data Consortium
https://catalog.ldc.upenn.edu/LDC2006T13
8250; Language Resources. Web 1T 5-gram Version 1. Web 1T 5-gram Version 1. Thorsten Brants, Alex Franz. LDC Catalog No.:. September 19, 2006. Web 1T 5-gram Version 1 Agreement. Subscription and Standard Members, and Non-Members. Brants, Thorsten, and Alex Franz. Web 1T 5-gram Version 1 LDC2006T13. DVD. Philadelphia: Linguistic Data Consortium, 2006. The n-gram counts were generated from approximately 1 trillion word tokens of text from publicly accessible Web pages. File sizes: approx. 24 GB compres...
ldc.upenn.edu
Obtaining Data | Linguistic Data Consortium
https://www.ldc.upenn.edu/language-resources/data/obtaining
Skip to main content. The easiest way to access data is to become an LDC member. Membership offers organizations unparalled data rights and privileges (and discounts). Visit Membership Benefits. Most corpora distributed by LDC are available to nonmember organizations under research-only licenses. And are subject to a nonmember license fee. Designated 'members-only' data sets are available to LDC members only. Refer to the Catalog. For corpus-specific nonmember license and fee information. When data is ad...
catalog.ldc.upenn.edu
Linguistic Data Consortium - Linguistic Data Consortium
https://catalog.ldc.upenn.edu/memberships
8250; Language Resources. 2015 For Profit Membership. 2015 U.S. Government Membership. 2015 LDC Online Membership. 2016 For Profit Membership. 2016 U.S. Government Membership. 2016 LDC Online Membership. 2017 For Profit Membership. 2017 U.S. Government Membership.
catalog.ldc.upenn.edu
Linguistic Data Consortium - Linguistic Data Consortium
https://catalog.ldc.upenn.edu/byyear
8250; Language Resources. LDC Catalog by Year. Arabic Treebank - Weblog. ARL Arabic Dependency Treebank. BOLT Chinese Discussion Forums. BOLT Chinese-English Word Alignment and Tagging - Discussion Forum Training. Chinese Treebank 9.0. Chinese-English Parallel Sentences Extracted from Patents. Digital Archive of Southern Speech - NLP Version. English Speed Networking Conversational Transcripts. GALE Phase 3 and 4 Arabic Web Parallel Text. GALE Phase 3 and 4 Chinese Broadcast Conversation Parallel Text.
catalog.ldc.upenn.edu
Linguistic Data Consortium - Linguistic Data Consortium
https://catalog.ldc.upenn.edu/topten
8250; Language Resources. Top Ten LDC Corpora. TIMIT Acoustic-Phonetic Continuous Speech Corpus. Web 1T 5-gram Version 1. OntoNotes Release 5.0. The New York Times Annotated Corpus. BOLT Chinese-English Word Alignment and Tagging - Discussion Forum Training.
SOCIAL ENGAGEMENT