Welcome to HULTIG-C


HULTIG-C is a multilingual corpus, created to support research on information retrieval and related technologies of human language.

A project that is part of the center of Human Language Technology and Bioinformatics (HULTIG) founded in 2003 by Gael Harry Dias, at the time Professor of the Department of Informatics of the University of Beira Interior (UBI), Covilhã, Portugal. And currently directed by Dr. João Paulo Cordeiro, Faculty of the informatics Department of UBI.


The Technology Center of Human Language and bioinformatics (HULTIG) is a research group of the Department of Informatics of the University of Beira Interior. Over time, we have worked on a variety of topics related to the automatic processing of human language, with particular focus on the strand application of them. Among the various sub domains, we have devoted special attention to the following:

▸ Statistical Inference;
▸ Statistical Parsing;
▸ Statistical learning;
▸ Text Classification;
▸ Question Answering;
▸ Sentiment Analysis;
▸ Summarization;
▸ Conversational Agents;
▸ Narrative Science;
▸ Lexical Semantics;
▸ Word Sense Disambiguation;
▸ Speech Recognition;
▸ Text-to-Speech and Spoken Language Understanding;
▸ Computational Social Science and Social Media;
▸ Dialogue and Interactive Systems;
▸ Discourse and Pragmatics;
▸ Information Extraction;
▸ Information Retrieval and Text Mining;
▸ Linguistic Theories;
▸ Cognitive Modeling and Psycholinguistics;
▸ Machine Learning for NLP;
▸ Machine Translation;
▸ Deep Learning for NLP;
▸ NLP Applications in Big Data;

The HULTIG-C is a corpus that began to be developed in January 2019, and consists in thousands of web pages in different languages, collected based on raw texts (of different natures, linguistics and sophistication levels) obtained through Web Pages and indexed with Hultig Crawler. HULTIG-C is being developed and maintained at UBI, by the Center of Technology of Human Language and Bioinformatics (HULTIG) of the Department of Informatics.

This corpus arises as a result of ongoing work that aims to support the automatic processing of human language, extending and gradually improving the corpus, in all its dimensions, in order to provide a high-level resource for research In computational linguistics and for the development of applications and language technologies.


In addition to a majority concern with the application and technology, we also consider the most theoretical and conceptual aspects of the study of human language, in particular computational linguistics..