Subscribe
About

Experience a world of topic modelling tools

An exploration of several popular tools provides a deeper appreciation of the complexity and power of topic modelling.
Alta van der Merwe
By Alta van der Merwe, Deputy dean, teaching and learning within the EBIT Faculty at the University of Pretoria.
Johannesburg, 18 Nov 2024
Professor Alta van der Merwe, deputy dean, teaching and learning within the EBIT Faculty at the University of Pretoria.
Professor Alta van der Merwe, deputy dean, teaching and learning within the EBIT Faculty at the University of Pretoria.

I was genuinely excited to dive into the world of topic modelling tools. After realising I needed a way to generate topic models from a range of text attributes, I set out to explore several popular tools to see which one would best suit my needs.

Each tool offered unique functionalities and challenges, and the experience gave me a deeper appreciation of the complexity and power of topic modelling.

Here’s an overview of the tools I explored and my reflections on how they compare.

MALLET is a Java-based tool known for its ability to efficiently handle large datasets. It uses Latent Dirichlet Allocation (LDA) to uncover patterns, making it a popular choice among researchers.

However, MALLET’s command-line interface posed a challenge, especially since it lacks a graphical interface. Although it offers advanced customisation for experienced users, it was clear it would require more time and technical effort than I had initially expected.

Gensim, a Python library, felt more approachable. It integrates smoothly with Python’s ecosystem, making it easier to experiment with smaller datasets. It supports multiple algorithms, including LDA and Word2Vec, which added versatility to my experiments.

However, its performance slowed with larger datasets, and tuning the parameters for optimal results required significant effort.

BigARTM was another fascinating tool, particularly because it allows multi-objective topic modelling through its additive regularisation technique. This feature gives users more control over their models by balancing various factors, such as coherence and sparsity.

While each tool offered unique strengths, my experience with Infranodus was by far the most satisfying.

However, it quickly became clear that BigARTM requires substantial programming knowledge and is best suited for advanced users who are comfortable with Python or command-line tools.

The Stanford Topic Modelling Toolbox provided a simpler setup, ideal for smaller datasets.

I appreciated its ability to load data from spreadsheet files and its support for different LDA variations. However, the tool’s reliance on command-line operations made it somewhat cumbersome, and it lacked the scalability needed for larger projects.

jsLDA was an interesting experiment, as it operates entirely within a web browser without the need for installations or coding. This made it perfect for quick prototyping and educational purposes.

However, its simplicity became a limitation − it couldn’t handle large datasets or more complex modelling tasks, which reduced its usefulness beyond initial demonstrations.

TopicWizard offered powerful visualisations with support for multiple models, such as LDA and Non-Negative Matrix Factorisation. Built using Dash and Plotly, it was well-suited for Python users comfortable with creating dashboards and web apps. However, its reliance on intermediate Python skills meant it wasn’t the most accessible tool for beginners.

Tools like pyLDAvis provided excellent visual exploration capabilities but required solid knowledge of Python and LDA to interpret the results effectively.

Meanwhile, IBM Watson NLP stood out for its cloud-based scalability, though its subscription-based model could become costly over time.

Finally, I explored Infranodus, and it quickly became my favourite. Infranodus transforms text into networks, creating clear and intuitive visualisations that reveal the relationships between words and ideas.

It stood out not only for its analytical depth but also for its ease of use − an intuitive interface that requires minimal setup and makes it possible to generate meaningful insights almost immediately. The integration of GPT models for generating ideas or exploring new narratives was an added bonus that elevated my experience.

While each tool offered unique strengths, my experience with Infranodus was by far the most satisfying. It struck the perfect balance between functionality and accessibility, making it the best tool for my needs.

Infranodus allowed me to explore complex data without getting bogged down by technical challenges, and I would confidently recommend it to anyone looking to dive into topic modelling, regardless of their technical background.

* Click here to see a comparison table.

Share