SlavicDH USB Drives

This is a page that offers an explanation of what’s on the SlavicDH USB drive, why we’ve included it and how we use it.

OpenRefine —OpenRefine, sometimes also referred to as Google Refine, is a data cleaning tool that runs in your browser (but can be used offline). The data that we collect – for example, in spreadsheets – often needs to be “cleaned” or normalized before it can be used. Doubles and typos need to be weeded out, categories parsed and so forth. OpenRefine provides solutions to this kind of data cleaning and includes, among other things, an extension to authorize names of authors through the VIAF database. A good OpenRefine tutorial for humanists can be found here.

QGIS — Quantum Geographic Information Systems is a powerful and free alternative to ArcGIS for creating data-rich maps. Users can build their own maps on top of GoogleMaps or digitize paper maps by geo-referencing. QGIS is an open source program with Python programming language support and users have created more than a hundred officially supported add-on modules.

Sublime Text — There are many programs for writing code, but when you just need to sit down and write, Sublime gets the job done.

Gephi —  This open-source software (written in Java) is a good intro-to-network-visualization tool, which is easy to grasp and usable at a basic level. It gets more complex and nuanced quickly as you go deeper into its functions and options. If a dataset can be converted into CSV files that identify nodes and define edges, Gephi’s layout algorithm(s) will do the rest. Accessible and efficient tutorials guide users through the basic steps to create a graph and customize it for maximum legibility. n.b.: the latest release (Gephi 0.9.1) is still a little buggy, so it is worth visiting the Gephi site every few weeks for bug fixes and updated plug-ins.

R Statistical Analysis — R is a programming language and software environment for statistical computing.

Python 2.7 — Python is one of the most popular programming languages in the world. It is known for having syntax that is comparatively readable and for having an extensive collection of libraries for doing a variety of programming tasks. Among these tasks are: scraping (downloading) web data; statistical analysis; topic modelling; network analysis; web programming. Python is available for free. Version 2.7 is included on the drive. The latest version, 3.5, has slightly fewer libraries but has ongoing support from the Python Software Foundation.
Cytoscape Network analysis, slightly more powerful than Gephi
oXygen XML Editor, great for regex stuff

Zotero — The free, open-source, bibliographic management software that installs as a plug-in to your browser and scrapes reference metadata while you surf through online publications, blogs, booksellers, and library catalogs (initially just Firefox, now available as a stand-alone app or as a “bookmarklet” for other browsers). It also can be used to create a library of pdfs and other digital objects which integrates well with  citations scraped from the web. Developed in the mid-2000s by the Roy Rosenzweig Center for History and New Media, it substituted and improved on a lot of the proprietary reference tools such as EndNote and RefWorks.

Cytoscape — Originally developed as a data visualization tool for molecular biologists, Cytoscape is a powerful software that can also be used for social network analysis. Recently, several researchers have switched from Gephi to Cytoscape, which is easier to install and seems to run more reliably. This recent tutorial by Miriam Posner gives a good introduction to the software’s basic functions:

oXygen — oXygen is an XML editor that has proven very useful for TEI encoding. It also is a very handy tool for data parsing and cleaning with regular expressions.

AntConc — A freeware corpus analysis toolkit for concordancing and text analysis.

Transkribus — Transkribus is a comprehensive platform for the automated recognition, transcription and searching of historical documents. The main objective of Transkribus is to support users who are engaged in the transcription of printed or handwritten documents, namely humanities scholars, archives, volunteers and computer scientists. Read more.

Slack — A cloud-based tool for group conversations and collaborative projects.