Machine Learning for Natural Language Processing

This class was first developed for the 2018 YSU – ISTC Join Summer School on Machine Learning. View or edit the source for this website at github.com/deeplanguageclass.

Recent progress in machine learning for natural language is significant, however language poses some unique challenges.

In this class we will start with natural language processing fundamentals and the current results of state-of-the-art approaches across tasks. We will focus on deep learning – the motivations behind word vectors and sequence output, and how to apply effective approaches to real tasks with industrial-strength libraries and datasets, which we will practice in the lab.

We will also keep in view how natural language tasks relate to tasks in other areas of machine learning.

Instructors: Erik Arakelyan, Teamable and Adam Bittlingmayer, Signal N

Prerequisites: solid coding skills, strong analytical ability, basic machine learning concepts, fluency in multiple human languages, a Unix system with Python

Announcements and questions: the private #nlp Slack channel and the public deeplanguageclass Telegram group

Fundamentals

Slides: Fundamentals 1, Fundamentals 2

Deep Learning

Slides: Deep Learning

Demos: projector.tensorflow.org, anvaka.github.io/pm/#/galaxy/word2vec-wiki, word2vec-gensim-gameofthrones.ipynb

Entity Recognition

universaldependencies.org
cloud.google.com/natural-language/
spacy.io/api/entityrecognizer
fastent.github.io

Dialogue Systems

SQuAD
leaderboard: rajpurkar.github.io/SQuAD-explorer/
example: Nikola Tesla

ParlAI
parl.ai / github.com/facebookresearch/ParlAI

Mixed Medium

cs.stanford.edu/people/jcjohns/clevr/
twitter.com/picdescbot
seq2seq for DNA

Lab - word2vec

Lab: Classifying Amazon reviews with fastText

Lab - seq2seq

Lab: Transliteration with Fairseq

More resources

See NLP Guide