The definition of Advanced Analytics has broadened significantly over the last number of years.
There is no doubt that the spread of the “open data” concept is crucial, but its importance does not only lie in new sources but also in their readability through automated systems processing information. Popular interchangeable formats (e.g. RDF – Resource Description Framework – which enables data to be decentralized and distributed) have allowed complex and ontological structures to be coded in a manner that is readable for a machine.
Progress in Natural Language Processing (NLP) have finally enabled us to allow for non-structured data as information. This big data, which can be considered as “hard data” (difficult to elaborate with traditional methods) can be aligned with structured data to either validate or dismiss the information. Prime examples would be Polls on a specific topic (or with predefined questions) and any place where communication is freely expressed as on the social media platforms (Twitter, Facebook, Instagram).
Developments in infrastructure (more powerful processors, the possibility to use files which allow for the most ardent calculations) represent another reason to be able to extract information from data banks. Finally advanced methods can be applied to guarantee rapid responses.
Accepting that more than one factor is required to produce value from data has brought more to the theory of “Information quality”. The goals and tools are as important as the raw data. Our experience in various projects has allowed us to conclude as follows:
- Quickly identify the aim of the analysis, i.e. “What do I want from this data?”
- Define the strategy that will bring you to your goals, research analysis, model type, use of specific technology
- Use in-house skills and resources, e.g. linguistic skills, statistics, people. Concentrate on areas where new technology is actually needed
In spite of the existence of widespread tools and knowledge resources for Machine Learning, (Open source libraries, API), CELI’s expertise is based on the choice and combination of algorithms – even in Deep Learning structures – as well as on the ability to personalise and interpret them. It is vital to put the user in the position of understanding the analysis process, which is potentially complex, in order to help him/her complete successfully the extraction of high quality information from a data bank. And CELI keeps in mind that human validation is the final step in the analysis chain, globally in the NLP world, as it frequently includes ambiguous text and highly subjective material.
When it can be used:
- Real-time flow in social data: the monitoring and study of complex phenomenon: what are the factors that drive communication?
- Research logs in vertical data banks: how to find the best that’s there
- Quantities of online research: deducing behaviour and population habits