Machine LearningAutomatic unsupervised and supervised solutions
What is Machine Learning
Machine learning is a combination of algorithms to perform precise tasks in a similar way to humans. This discipline is in close contact with Statistical Computation, otherwise known in the industrial world as Predictive Analytics or Predictive Modelling. Machine learning is a core component of Artificial Intelligence. The most important difference in Machine Learning is between supervised and unsupervised algorithms.
Computers are supplied with examples of input along with the desired output. The aim is to establish general rules that generate intended output on new data. To some extent, it is as if a person proposed himself/herself to act as the computer’s teacher. For CELI, the following applications are central to Natural Language Processing (NLP):
- Automatic Categorisation : automatically assign predefined categories or tags to new documents, i.e. automatically classify a document as a human would do.
- Named Entity Recognition: understand and annotate people, places and organisations that are mentioned in a text that is potentially ambiguous.
- Text To Speech Translation.
- Casual and Regression Models which represent connections between each other and other variables, e.g. comments on Twitter in response to hate or viral content.
Unsupervised Learning is the machine learning task of inferring a function to describe hidden structure from unlabeled data. In this case, data are automatically developed by computers, whithout the intervention of humans, i.e. the examples given to the learner are unlabeled. The algorithms extract information only on the basis of general and statistical criteria. The results of this unsupervised procedure can be directly used by the client, such as in clustering, or they can be a starting point for a more in-depth analysis.
The best known form of unsupervised learning is cluster analysis, i.e. extracting reccurring patterns or repetitive forms of information from a big data set. The aim of cluster analysis is to provide a computed classification of items without information about the classification being known. Cluster analysis is popular because it provides the first insight into a dataset. CELI has gained experience into the segmentation of structured data, of which:
- Transactional and Contractual Data
- Behavioural Data (such as polls, but also website visit logs)
The above in combination with Natural Language Processing, has enabled us to define custom procedures of topic detection within written documents (unstructured). The new challenge is to overcome the higher dimensions of the problem. As every word can have variants, human intelligence will always play a role in determining what a specific text means and whether its content is important or not. We therefore place significant emphasis in the validation phase, that follows the automatic extraction of information, as it puts the user in the position of deciding what information is relevant to him/her and what is not.