Sophia Search is a search engine based on ontology. It is specialised in semantic analysis for the interpretation of information from texts and in the classification of texts and documents.
Sophia Search permits the understanding of texts, as well as the identification and classification of all relevant items (brands, products, people, places, actions, concepts, …). All Sophia Search results are available through standard interfaces to enable interrogation by third parties.
Sophia Search includes a classification engine that provides strong support to the classification of texts and documents as well as to the navigation of document archives and knowledge bases.
Sophia Search uses property resources (lexical, morphological, semantic networks) and market resources, especially ontology and open words, enabling interoperability at the semantic level thanks to a set of data made available by other parties, such as Civil Services (Open Data) or online semantic repositories such as DBPedia.
The analysis pipeline of Sophia Semantic Search is demonstrated in five steps as it allows the system to recognise the meaning of a text (as a sequence of characters) and identify semantic elements such as Concepts, Entities and Relations:
- Language recognition
- Grammatical Analysis (recognition, normalization, and morphological analysis of single words)
- Logic Analysis (making lexical categories unambiguous, identifying syntagmas and what they rely on, Subject-Verb-Object relationship)
- Deep semantic analysis (identification of period analysis and dependency on preposition)
- Ontological Analysis (Entities recognition and their relations, in order to allow Entity linking and reasoning which leads to concepts, word entity and standard resources)
In conjunction with this organisation of the pipeline analysis (which works on independent documents), Sophia Search is able to fulfil a deep text analysis on the entire work:
- Statistical distribution of the terms in the documents, and the type of POS present (nouns, adjectives, verbs, etc.)
- Keyword extraction (significant terms with respect to the dataset in analysis)
- Co-occurrence of terms
- Clustering (grouping documents into similar categories)