What we do

80% of the world‘s data is stored in text files. harnesses the potential of Machine Learning to automatically analyse content and interpret text.

We develop tools which enable you to search for relevant texts, text sections, words or combination of words in your particular area of interest and to quickly detect the core information of a text.

Massive amounts of such „unstructured“ data - as it is called - may serve as the basis. You will get the insights of it in less than a blink of an eye.

We start with a 1-day workshop to introduce you to the topics of Machine Learning and Text Mining, to crystalize in which areas your organization benefits from its possibilities and to discuss and plan your project. Based on this information we develop the tools and support you and your team to make sure that you realize the full potential of what machines can do for you.

Machine Learning – the background

The human brain is excellent in analysing, drawing conclusions and evaluating information. Scientists try to replicate these capabilities to enable machines to take over tasks. This is not the easiest mission as human decision processes are highly complex and based on a multitude of experiences which we make in our lives and from which we learn from one to the next decision. Moreover large parts of human decision making happens subconscious. This is the reason why one person is often not able to fully explain to another person why he or she decided something in a certain way. Even if such a knowledge transfer succeeded, there is a high degree of complexity within the „receiving system“ and the problem of subconsciousness arises again.

Translating the complete process in programming code would demand an immense amount of time and resources. This is where Machine Learning comes in: Instead of programming each and every component of a decision process, it provides the framework in which the exact decision making is derived from the provided data. In short: Machine Learning means that a computer solves a problem although it was not programmed to solve this particular problem.

There are a number of convincing capabilities offered by machines which we make use of: First of all machines do not get tired by doing repetitive tasks. Second, they are able to handle very long columns of figures. And third machines were not trained to get to used to a three-dimensional world like humans were from their early childhood on. Thus the problem of handling high-dimensional data does not arise.

Focus on Text Mining

Machine learning is the key to create sets of rules f or the content analysis of large amounts of texts. In order to interpret language, semantic analysis aims to „translate“ rule sets of a language (grammar, spelling, common phrases, linguistic peculiarities like sarcasm for example) into programming code and trains machines to correctly interpret language based on such rule sets.

A second state-of-the-art approach for language interpretation is to derive the necessary context information out of large amounts of data, which represent a good cross section of all characteristics of a language.

Although semantic analysis delivers good results, difficulty arises with the creation of a complete set of rules for a language as the continuous evolution of a language happens over centuries, does not follow rules and includes tons of exceptions.

Extensive experience and an enormous amount of „trial and error“ studies and empirical work delivered the solution to this problem. It appeared that „direct“ learning based on an adequate amount of data and well-selected and tested models works very well without intensively programming rules for grammar, etc.

The result of such a technology is known as „word vectors“, more precisely „high dimensional, dense vectors“ which represent the characterisitics of words very well (polysemy, degree of similarity to other words, relationships with other words, etc.). Such word vectors are combined to sentences and sentence vectors. In a next step content covered by the sentences (text parts, articles and so on) are isolated. This is how similar content between one and another text is detected. Moreover it empowers us to ask questions to machines like „Can you find text sections which include content similar to a segment which I show you here?“. This gets us close to our habitual way of human communication.