Skip to Main Content

Text mining

What is text mining?

Widely used in knowledge-driven organizations, text mining is the process of examining large collections of documents to discover new information or help answer specific research questions.

Text mining identifies facts, relationships and assertions that would otherwise remain buried in the mass of textual big data. Once extracted, this information is converted into a structured form that can be further analyzed, or presented directly using clustered HTML tables, mind maps, charts, etc. Text mining employs a variety of methodologies to process the text, one of the most important of these being Natural Language Processing (NLP).

The structured data created by text mining can be integrated into databases, data warehouses or business intelligence dashboards and used for descriptive, prescriptive or predictive analytics.

Licensed information sources for text data mining

Different publishers allow text data mining with various rights in their publications. Please, check the rights on the Terms & Conditions pages of the publishers.

Literature on text data mining

  • Text and data mining example: Video tutorial explaining how text and data mining techniques cast new light on a large historical archive: n-gram visualizations, topic modeling and word embedding in closer look. Also key concepts of TDM are presented.
  • DARIAH - The Digital Research Infrastructure for the Arts and Humanities (DARIAH) is pan-european infrastructure for arts and humanities scholars working with computational methods. It supports digital research as well as the teaching of digital research methods.