Software Downloads

On this page, you will find links to a large collection of simple point-and-click applications that I have developed over the years. Many of these programs are designed to work with plain old ".txt" files (for all of you oldschool social scientists out there), but some programs are also spreadsheet capable. Note that a number of these programs are no longer being developed as I am working on several other applications.

The Big Ones

    Linguistic Inquiry and Word Count (LIWC)

    LIWC is the gold standard in psychological text analysis software. Codes languages samples for ~80 psychological dimensions. I have been involved in the development of LIWC since around 2014 or so and have loved every minute of it.


    After developing a lot of text analysis tools (i.e., pretty much everything below on this page), I decided that it was time to start integrating them all into a single, unified text analysis framework. BUTTER is the result of these efforts and is the primary focus of my development efforts these days. Something of a "Swiss Army Knife" of text analysis tools for non-developers, the idea behind BUTTER is to make natural language analysis methods available to everyone, regardless of background or expertise.

Content Coding

    RIOTScan & RIOTLite

    Free, open-source content coding software. RIOTLite works pretty much the same way as LIWC, but it does not come with the LIWC dictionary. Has a bit more flexibility in terms of phrases and wildcards.


    Content coding system that allows you to add weights to your words. Can be used to code words or specific characters for whatever word properties are of interest to you. Can also be used to score texts using pre-trained word vector models (e.g., GloVe, word2vec).


    Under development. More details coming soon...

Topic Modeling / Data-driven Text Analysis

    Meaning Extraction Helper

    An entire system for conducting bottom-up, data-driven text analyses. MEH takes your input texts and provides frequency lists for all of your words/phrases, extracts n-grams, and builds a document-by-word matrix dataset for topic modeling and other types of analyses.

Sentiment Analysis


    Sentiment analysis based on Hutto & Gilbert's (2014) VADER system. Best used for sentiment analysis of Twitter data.


    Sentiment analysis using Stanford's CoreNLP framework.

Part of Speech Tagging


    Part of Speech tagger build around Stanford's CoreNLP framework. Comes with several pre-trained models, including English, Spanish, French German, Swedish, Chinese, and Arabic. Also comes with the GATE pre-trained model for English Twitter data.



    Preprocesses your Korean texts by tokenizing them.


    Preprocesses your Chinese texts by tokenizing them.

Text Manipulation / Extraction


    Extract words and their immediate context. For example, if you want to see how people are using the word "pain", you can "contextualize" them by extracting all words that appear in close proximity to the word "pain".

    ConverSplitter Plus!

    Separates the contents of transcripts into separate files, by speaker.

Specialized Analyses


    Measures the repetition within a text in a "rolling word window" fashion.


    Count the number of different types of objects across a corpus of images. Based on the YOLO framework.

Word Vectors


    Using pre-trained word vectors, you can extract words with similar meanings. Very useful/helpful for creating new text analysis / content coding dictionaries. Can also be used in conjunction with the TAPA software (mentioned above) to perform cosine similarity calculations between a text and specific domains.

Data Preparation / Cleaning


    Takes your text from a spreadsheet file (e.g., CSV) and aggregates it into separate .txt files.


    Provides basic information about your text and text-based files, such as their size, encodings, and so on.


    Convert text-based files from one encoding to another. Best used in conjunction with ExamineTXT.


    Regex-driven "find and replace" in text-based files. Useful for cleaning and replacing texts prior to processing with other software.

    Royal Sampler

    Takes a large CSV file and subsamples into separate files. Can create both "chunked" versions of your dataset or build random, repeated subsamples (for bootstrapping, etc.).


    Strips columns out of a CSV file to result in smaller/more manageable datasets.


    Process your texts through the Google Translate API.

Code Generation

    Plug N Chug

    Recursive code generator. Useful for when you need to generate large batches of code with systematic variations.