is a multilingual corpus, created to support research on information retrieval and
related technologies of human language.
The
Technology Center of Human Language and bioinformatics
(HULTIG) is a research group of the Department of Informatics of
the University of Beira Interior. Over time, we have worked on a variety of topics related to
the automatic processing of human language,
with particular focus on the strand application of them. Among the various sub domains, we have
devoted special attention to the following:
▸ Statistical Inference;
▸ Statistical Parsing;
▸ Statistical learning;
▸ Text Classification;
▸ Question Answering;
▸ Sentiment Analysis;
▸ Summarization;
▸ Conversational Agents;
▸ Narrative Science;
▸ Lexical Semantics;
▸ Word Sense Disambiguation;
▸ Speech Recognition;
▸ Text-to-Speech and Spoken Language Understanding;
▸ Computational Social Science and Social Media;
▸ Dialogue and Interactive Systems;
▸ Discourse and Pragmatics;
▸ Information Extraction;
▸ Information Retrieval and Text Mining;
▸ Linguistic Theories;
▸ Cognitive Modeling and Psycholinguistics;
▸ Machine Learning for NLP;
▸ Machine Translation;
▸ Deep Learning for NLP;
▸ NLP Applications in Big Data;
The
HULTIG-C is a corpus that began to be developed in January 2019, and consists in
thousands of web pages in different languages,
collected based on raw texts (of different natures, linguistics and sophistication levels)
obtained through Web Pages and
indexed with
Hultig Crawler. HULTIG-C is being
developed and maintained at
UBI, by the
Center of Technology of Human Language and Bioinformatics (HULTIG)
of the
Department of Informatics.
This corpus arises as a result of ongoing work that
aims to support the automatic processing of human language,
extending and gradually improving the corpus, in all its dimensions, in order to provide a
high-level resource for research In computational
linguistics and for the development of applications and language technologies.
In addition to a majority concern with the application and technology, we also consider
the most theoretical and conceptual aspects of the study of
human language, in particular computational linguistics..