Experimenting with Reuters' Calais Automatic Tagging Tool
Reuters recently released Calais to developers. Calais is a set of software tools and rules that read text and automatically assign various tags based on an analysis of the text. Calais outputs tags in the following categories:
- Company
- IndustryTerm
- Organization
- Person
While Calais itself does not yet have a user interface available, several developers have prepared interfaces and I decided to try the one provided by Abhay Kumar to test out the system.
First I selected source text from my blog, a recent post titled Cognitive Enhancement and Scientific Collaboration, Working Together. It is short and contains references to a variety of things, including people, institutions, topics, and, for good measure, at least one alien race. The tags I had manually assigned to the post included Collaboration, Social Networking, Expertise Management, Social Media, sustainability, and Cognitive Enhancement.
Next I copied the text and title into Kumar’s tool and pressed the “submit” button. This is what I got back:
- Organization: Oxford University, Humanity Institute (Comment: Oxford University is correct, but Humanity Institute is only partially correct; the actual institution referenced in the blog is Future of Humanity Institute.)
- IndustryTerm: pure technical solutions, collaborative technologies, expertise management systems, social networks, energy (Comment: The list is OK. I would have liked to have seen “collaboration” and “cognitive enhancement” included, though.)
- Company: Google (Comment: This is correct; I did not mention any other companies. I wonder, though, if Google would still have been extracted had I used it as a verb?)
- Person: Ostrow (Comment: OK, this was a trick. I mentioned three names in the post, two of which are fictional. Calais missed Nick Bostrom (real) and Thufir Hawat (a fictional character in Dune), but it did mention Ostrow (a fictional character from the movie Forbidden Planet).
Despite the issues, I’m impressed and looking forward to tools like this making their way into more products and services. The addition of features such as learning, training, and authority lists will provide significant aids to both manual and automated use of such tools.