Software

SocialNetCrawler

This Software was supported by project C4 - Cloud Computing Competences Centre, financed by the P2020. A crawler to extract data from social networks. This crawler was developed in Java programming language and our interface was done with Java Swing. To get started, you only need to visit the social networks developer pages our crawler works with, and follow the steps on the respective pages to get an access token. Then put that access token on our application in the tabs of each social network you want to crawl and press start.

Access here

SocialDictionary

This package aims to account for emojis in sentiment analysis, making sentiment analysis better when emojis are included in the text. First, we take the emojis from the ‘https://getemoji.com/’ website. We create a CSV with information about each emoji taken, such as the Unicode and description. We calculate the score of the emoji description and assign a sentiment, which can be positive, neutral, or negative. After this information, we use the EmoRoBERTa model, which uses GoEmotions to recognize emotions, so we then assign an emotion to each of the emojis on the list. Next, we place an input containing emojis, and then the text is returned with the emoji replaced by the emotion, the sentiment of the text, and information about the emoji/s used in the input.

Access here

SENTAWEB

This Software was supported by National Founding from the FCT - Fundação para a Ciência e a Tecnologia. Senta Web aims to provide an online way of automatically extracting expressions formed by sequences of lexicographic units lexicographical units (e.g. characters, words, punctuation marks), contiguous or non-contiguous, that are as syntactic-semantic units, with their own meaning.

Access here

Multilingual Text Parser

Multilingual Text Parser is a web page through which you can obtain both syntactic and morphological analysis of a given text, to obtain these results were used python libraries such as Spacy and Nltk. For this first version it is only possible to analyze texts in Portuguese, French, Spanish and English.

Access here

HultigCrawler

This Software was supported by project C4 - Cloud Computing Competences Centre, financed by the P2020. HultigCrawler is a text crawler that crawls all the text from given website recursively. The crawled data is then saved as items. These items are URL, Title, Tags and Text. This data is then saved into database using scrapy pipelines.

Access here

HULTIG-C: Cloud Platform for Computational Linguistics Services

This Software was supported by project C4 - Cloud Computing Competences Centre, financed by the P2020 HULTIG-C is a multilingual corpus, created to support research on information retrieval and related technologies of human language. HULTIG-C is characterized by various languages that include unique annotations such as keywords set, sentences set, named entity recognition set, and multiword set.

Access here

ExtremeSentiLex

This Software was supported by National Founding from the FCT - Fundação para a Ciência e a Tecnologia. ExtremeSentilex is a lexicon of extreme sentiments created based on SentiWordNet and SenticNet we will soon provide an article where all the information of the research will be available. For now we have for download the file result of the research. One file with the lexicon and the classified datasets that we classified in order to validade our lexicon, a file with lexicon only and other with the classified datasets only.

Access here

Categorium

This package enables the creation and training of language models for text classification using BERT, with prescribed parameters for smaller dataset training. It also comprises six pre-trained models with 27 categories each for experimentation and a feature for testing out language models.

Access here

Sebastião Pais

Software