Linguistic analysis tools

I volunteered to spend several weeks with nerdy linguist-developers at SIL, an all-volunteer organization supporting language preservation and literacy projects among minority languages. SIL is best known today as the originators of the OFL (Open Font License), borne out of their Non-Roman Script Initiative (NRSI), which allows designers around the world to contribute to font projects with their own modifications and release them to be used for free, which has increased the availability of fonts especially in minority languages, as was intended.

At SIL, I demonstrated Personas and Pair Designing methodologies, and helped clarify the discussions among the different groups that needed to form consensus for the next year of development.

Fieldworks Language Explorer (FLEx) was intended to be a huge leap forward from the common text-based PC tools such as Shoebox, allowing linguists in the field to focus on the task of collecting words and texts, analysis of such gathered data, and testing of insights, and finally to bring together a functional lexicon and grammar sketch that could be used directly to aid teams in translation and literacy projects without constant re-inputting, or without dealing with overlapping and contradicting data.

FLEx is used by almost 300 language projects around the world and is used by most academic institutions in linguistics today. It uses Graphite, another SIL creation to enable the display of scripts outside of unicode, especially obscure or even new scripts in development.

FLEx was designed to be run in isolated, un-networked environments but also to allow team members to pool their results together over poor Internet connections or directly while they are in the same room.

At the time they had the goal of open-sourcing the project (when the only major open-source repository service was SourceForge), but had not yet done so. FLEx is now a public open source project on GitHub, released under LGPL, and undergoing literally up-to-the-minute active development.

As a note on the classic Windows XP retro look: work on visual design was not in consideration, and the use of Microsoft .NET framework and navigation idioms of MS Outlook 2003 was entirely adequate and appropriate to make the complexity of the various parts immediately understandable and accessible. FLEx runs on the latest versions of Windows as well as Linux.

During my time there, we spent significant effort getting the analysis of source texts (Interlinear analysis) feel fluid and productive. To promote flow for the linguist who would be mining insights from collected texts and transcripts of recordings from the field, all operations, especially morpheme breaks and tagging, were enabled to be done entirely with uncomplicated keyboard operations (space, tab, shift, arrows, enter, esc and affixing symbols for morpheme break types).

An example of stressing attention upon the goals and mental model of the field linguist at various phases of a language project, “gloss” and “analyze” were split according to each purpose, even though at first glance they are built upon the same basic screen layout.

FLEx is designed to let the linguist feel free to capture guesses and insights, and to build confidence over time while not losing track of where and when those original insights came from, building a multi-dimensional understanding of the language in a way that can be shared with others working with you.

Encouraging the development team to think more of not only the people using their software but those impacted by the work generated on the software, inspired more tools, this time empowering non-linguists and native “vernacular language experts” teams from minority language communities to start their own language projects via apps like WeSay and Bloom, which I flew to Thailand to help out with. Many of the familiar people are still at SIL and we keep in touch today.