We want to make clear what the difference between our approach and approach X is. (Substitute X for your favourite text analytics technology). In short, Ethersource is a vector space model, with the processing convenience that comes with a vector space.
But a vector space is only as good as the process used to populate it with data. We use distributional data to populate our vector space matrix: nearness in our vector space means similarity with respect to distribution. And we build the vector space handily – it is also compact and remains tractable in size.
Here is a brief comparison matrix.
Challenge | Statistical | Knowledge-based | Ethersource |
---|---|---|---|
Vast scale | Fine (if sampling is done correctly and samples are true to data) |
Fine (if processing model can be optimised) |
Fine inherent in memory model and in processing model |
Multilinguality | Fine (if labeled training collection is available, involves train-test-update cycle) |
Problematic (involves expensive retooling of knowledge base) |
Fine inherent in memory model and in processing model |
Change | Problematic new data not guaranteed to conform to estimations based on previous data |
Problematic (involves expensive retooling of knowledge base) |
Fine inherent in memory model and in processing model |
Variety | Fine (if labeled training collection is available, involves train-test-update cycle) |
Problematic (involves expensive retooling of knowledge base) |
Fine inherent in memory model and in processing model |
Coverage | High recall | High precision | High recall |
Abstraction | Strings | Concepts or logical forms | Concepts |
We view Ethersource as the base technology for any service or information process which relies on human language as an input. Any process which today uses grammars, lexica, thesauri, occurrence frequencies, estimates of collocation likelihoods will be well served by plugging in Ethersource as a base resource or as a replacement. Also, many new services which today would be prohibitive in engineering cost will be painless to design on top of Ethersource.
We have some examples in our service palette today, but we have by no means exhausted the possibilities!