corpus.tools corpus.tools

corpus.tools

Corpus tools

This is a joint portal of the. Masaryk University's NLP Centre. Dedicated to a number of software tools for corpus processing. If you have any questions or suggestions, please subscribe to the. Google group, where you can get involved in the discussion with the developers and other users. JusText is a HTML boilerplate removal tool. It can strip navigation links, headers, footers, etc. from HTML pages and leave just regular text containing full sentences. Last modified on Dec 7, 2016, 6:28:23 PM. Powered ...

http://www.corpus.tools/

WEBSITE DETAILS
SEO
PAGES
SIMILAR SITES

TRAFFIC RANK FOR CORPUS.TOOLS

TODAY'S RATING

>1,000,000

TRAFFIC RANK - AVERAGE PER MONTH

BEST MONTH

December

AVERAGE PER DAY Of THE WEEK

HIGHEST TRAFFIC ON

Tuesday

TRAFFIC BY CITY

CUSTOMER REVIEWS

Average Rating: 3.9 out of 5 with 20 reviews
5 star
8
4 star
6
3 star
4
2 star
0
1 star
2

Hey there! Start your review of corpus.tools

AVERAGE USER RATING

Write a Review

WEBSITE PREVIEW

Desktop Preview Tablet Preview Mobile Preview

LOAD TIME

1.1 seconds

FAVICON PREVIEW

  • corpus.tools

    16x16

  • corpus.tools

    32x32

CONTACTS AT CORPUS.TOOLS

Login

TO VIEW CONTACTS

Remove Contacts

FOR PRIVACY ISSUES

CONTENT

SCORE

6.2

PAGE TITLE
Corpus tools | corpus.tools Reviews
<META>
DESCRIPTION
This is a joint portal of the. Masaryk University's NLP Centre. Dedicated to a number of software tools for corpus processing. If you have any questions or suggestions, please subscribe to the. Google group, where you can get involved in the discussion with the developers and other users. JusText is a HTML boilerplate removal tool. It can strip navigation links, headers, footers, etc. from HTML pages and leave just regular text containing full sentences. Last modified on Dec 7, 2016, 6:28:23 PM. Powered ...
<META>
KEYWORDS
1 preferences
2 wiki
3 browse source
4 wikistart
5 lexical computing
6 nosketch engine
7 paper
8 cite
9 licence
10 last modified
CONTENT
Page content here
KEYWORDS ON
PAGE
preferences,wiki,browse source,wikistart,lexical computing,nosketch engine,paper,cite,licence,last modified,5 weeks ago,plain text,by edgewall software
SERVER
Apache/2.4.10 (Fedora) OpenSSL/1.0.1e-fips mod_nss/2.4.6 NSS/3.15.3 Basic ECC PHP/5.5.26 mod_wsgi/3.5 Python/2.7.5
CONTENT-TYPE
utf-8
GOOGLE PREVIEW

Corpus tools | corpus.tools Reviews

https://corpus.tools

This is a joint portal of the. Masaryk University's NLP Centre. Dedicated to a number of software tools for corpus processing. If you have any questions or suggestions, please subscribe to the. Google group, where you can get involved in the discussion with the developers and other users. JusText is a HTML boilerplate removal tool. It can strip navigation links, headers, footers, etc. from HTML pages and leave just regular text containing full sentences. Last modified on Dec 7, 2016, 6:28:23 PM. Powered ...

INTERNAL PAGES

corpus.tools corpus.tools
1

Unitok – Corpus tools

http://www.corpus.tools/wiki/Unitok

Splits input text into tokens (one token per line). Recognizes URLs, e-mail addreses, DNS domains, IP addresses. For specified languages recognizes abbreviations and clictics (such as 've or n't in English). Replaces entities with unicode equivalents. Adds glue ( g/ ) tags between tokens not separated by space. For the latest version. Unitok is licensed under. Mozilla Public License Version 2.0. Last modified on Aug 7, 2015, 3:11:55 PM. Download in other formats:. Powered by Trac 1.0.2.

2

Onion – Corpus tools

http://www.corpus.tools/wiki/Onion

Onion (ONe Instance ONly) is a tool for removing duplicate parts from large collections of texts. Libjudy ( =1.0.5). Wget -O onion-1.2.tar.gz 'http:/ corpus.tools/raw-attachment/wiki/Downloads/onion-1.2.tar.gz'. Extract the downloaded file:. Tar xzvf onion-1.2.tar.gz. Configure the package by editing onion-1.2/Makefile.config:. Set PREFIX (or INSTALL BIN and INSTALL DATA) according to where you want the executables and data (docs) installed. Set JUDY INC to where Judy.h is located. On a sample input.

3

Chared/Cite – Corpus tools

http://www.corpus.tools/wiki/Chared/Cite

POMIKÁLEK, Jan and Vít SUCHOMEL. chared: Character Encoding Detection with a Known Language. In Aleš Horák, Pavel Rychlý. RASLAN 2011. 5. vyd. Brno, Czech Republic: Tribun EU, 2011. p. 125-129, 5 pp. ISBN 978-80-263-0077-9. Last modified on Jun 2, 2016, 3:11:01 PM. Download in other formats:. Powered by Trac 1.0.2. Visit the Trac open source project at. Http:/ trac.edgewall.org/.

4

Justext/Cite – Corpus tools

http://www.corpus.tools/wiki/Justext/Cite

Pomikálek, Jan. "Removing boilerplate and duplicate content from web corpora." PhD thesis, Masaryk university, Faculty of informatics, Brno, Czech republic (2011). Phdthesis{pomikalek2011removing, title={Removing boilerplate and duplicate content from web corpora}, author={Pomik{ 'a}lek, Jan}, school={Masaryk university, Faculty of informatics, Brno, Czech Republic}, year={2011} }. Last modified on Jul 30, 2015, 12:21:37 PM. Download in other formats:. Powered by Trac 1.0.2.

5

About Trac – Corpus tools

http://www.corpus.tools/about

Trac is a web-based software project management and bug/issue tracking system emphasizing ease of use and low ceremony. It provides an integrated Wiki, an interface to version control systems, and a number of convenient ways to stay on top of events and changes within a project. Trac is distributed under the modified BSD License. The complete text of the license can be found online. As well as in the. File included in the distribution. Powered by Trac 1.0.2. Visit the Trac open source project at.

UPGRADE TO PREMIUM TO VIEW 11 MORE

TOTAL PAGES IN THIS WEBSITE

16

OTHER SITES

corpus.sg corpus.sg

Corpus Singapore::Write Corpus Linguistics Applications

We believe in a philosophy that embraces change and acquire an insatiable appetite for challenges. Find out more about the various services Corpus offers in the fields of web and mobile application development, business process engineering and other IT consulting services. Web and Mobile Development. A website serves as the face of any company’s brand and image. Here at Corpus, we design modern and sleek websites suited to all your technical needs, including mobile friendliness. Word Engine - Developing ...

corpus.siemprelisto.net corpus.siemprelisto.net

siemprelisto.net - This website is for sale! - siemprelisto Resources and Information.

The domain siemprelisto.net. May be for sale by its owner! This webpage was generated by the domain owner using Sedo Domain Parking. Disclaimer: Sedo maintains no relationship with third party advertisers. Reference to any specific service or trade mark is not controlled by Sedo nor does it constitute or imply its association, endorsement or recommendation.

corpus.sjtu.edu.cn corpus.sjtu.edu.cn

Welcome to corpus.sjtu.edu.cn

FAIL (the browser should render some flash content, not this). 26368;后更新日期:2.21.2010.

corpus.sk corpus.sk

corpus.sk

DOMÉNA wwww.corpus.sk. Je obsadená. Ak máte nejaké otázky,. Pošlite mi prosím email.

corpus.tools corpus.tools

Corpus tools

This is a joint portal of the. Masaryk University's NLP Centre. Dedicated to a number of software tools for corpus processing. If you have any questions or suggestions, please subscribe to the. Google group, where you can get involved in the discussion with the developers and other users. JusText is a HTML boilerplate removal tool. It can strip navigation links, headers, footers, etc. from HTML pages and leave just regular text containing full sentences. Last modified on Dec 7, 2016, 6:28:23 PM. Powered ...

corpus.ufsm.br corpus.ufsm.br

corpus - SITE CORPUS - UFSM

Portal do Governo Brasileiro. Acervo especial e Obras raras. Teses e Dissertações de Estudos Linguísticos. Teses e Dissertações de Estudos Literários. Teses e Dissertações de Outras Áreas. Fundo Documental Neusa Carson. Fundo Documental Aldema Menine McKinney. Fundo Documental Maria Luiza Ritzel Remédios. Escola de Altos Estudos. Adquirido pelo Laboratório Corpus através do edital Pró-equipamentos/CAPES de 2014, por meio do projeto "Memória e patrimônio histórico-linguístico-literário no Sul - Parte IV".

corpus.ulaval.ca corpus.ulaval.ca

CorpusUL: Accueil

Le dépôt institutionnel de l'Université Laval, vise à rendre librement accessible votre production scientifique afin d'en accroître la visibilité et de favoriser le partage des connaissances de façon durable. Intensity and breadth of participation in organized activities during the adolescent years : multiple associations with youth outcomes. Vous avez des questions? À savoir avant de déposer. 418 656-2131 poste 6131.

corpus.uo.edu.cu corpus.uo.edu.cu

CORPUS | Dirección de Informatización | Universidad de Oriente

Diseño, desarrollo y explotación de intranets empresariales. Leer más. Servicios de proyectos, auditorias y supervisión de instalaciones de redes. Leer más. Desarrollo de proyectos en Redes y Telemática. Leer más. Desarrollo de Sistemas de Tratamiento de la Información y/o Toma de Decisiones de amplio espectro. Leer más. Información sobre la estructura, composición y colectivo de nuestra Dirección. Información sobre todos los servicios que prestamos en nuestra Dirección. El misterio de las siete villas.

corpus.usx.edu.cn corpus.usx.edu.cn

ÉÜÐËÎÄÀíѧԺ--ÖйúººÓ¢Æ½ÐÐÓïÁÏ´óÊÀ½ç

corpus.wa.edu.au corpus.wa.edu.au

Corpus Christi College | Sequere Dominum

Values & Ethos. Learning & Teaching. Parents & Friends. The Middle School at Corpus Christi College has certainly accepted the. Students in Years 10, 11 and 12 are offered an extensive academic prog. Corpus Christi College provides a modern, exciting and forward thinkin. At Corpus Christi College we seek to meet the diverse learning needs o. The aim of the Centre is to equip students for a more independent life. A core value of the life and spirit of Corpus Christi College is nurtu. July 1, 2015.