Elena Filatova, efilatova@citytech.cuny.edu
Office: GC 4410
Class is held in room: GC 6494
Programming Assignments
Date | Topic | Reading Assignment |
---|---|---|
Week 1 Mon. Jan. 29 |
Introduction, word statistics, text similarity measures | Before the Internet, Librarians Would Answer Everything and Still do
Alibaba’s and Microsoft’s results for the Stanford Reading Comprehension Test: |
Week 2 Mon. Feb. 5 |
Basic text processing concepts: Sentence Splitting, Word Tokenization, Types, Tokens, Vector space model: binary representation, classic IR system, Vector Space modellecture notes |
Turney and Pantel, 2010. From Frequency to Meaning: Vector Space Models of Semantics, Journal of AI Research 27 (141-188)
Chung and Pennebaker, 2007. The Psychological Functions of Function Words, Social Communication (343-359) Danescu-Niculescu-Mizil et al. 2012. Echoes of Power: Language Effects and Power Differences in Social Interaction Niederhoffer and Pennebaker, 2002. Linguistic Style Matching in Social Interaction. Journal of Language and Social Psychology 2002 21: 337. |
Mon. Feb. 12 Mon. Feb. 19: |
No class: The Graduate Center is closed | |
Week 3 Tues. Feb. 20 |
Document Classification, feature representation, cosine similarity, inverse document frequency, tf*idf weighting, term-document weightinglecture notes |
The text classification problem Naive Bayes text classification
D. Blei and J. Lafferty, 2009. Topic Models Check topic modeling visualization in the corresponding Wikipedia page Mosteller and Wallace, 1964. Deciding Authorship |
Week 4 Mon. Feb.26 |
Classification Evaluation, Neural Nets | R. Collobert, et al. 2011. Natural Language Processing (Almost) from Scratch
Learning the Meaning Behind Words. Google Research Blog. Mikolov et al, 2013 Efficient Estimation of Word Representations in M. Gales, 2016. Deep Learning notes |
Week 5 Mon. Mar. 5 |
Unsupervised learning on textual data, Invited talk
Word2Vec |
Zhang and LeCunn, 2015. Text Understanding from scratch. |
Week 6 Mon. Mar. 12 |
Word Embeddings for Information Extraction and Question Answering,
WordNet
|
E. Agichtein, L. Gravano. 2000. Snowball: Extracting Relations from Large Plain-Text Collections.
M. Mintz, et al. 2009. Distant supervision for relation extraction without labeled data. C. Sutton, A. McCallum, 2006. Tutorial: An Introduction to Conditional Random Fields for Relational Learning. NLTK book, CH. 7: Extracting Information From Text |
Week 7 Mon. Mar. 19 |
Proposal Presentation, Language modeling, topic modeling, LSI | |
Week 8 Mon. Mar. 26 |
Analyzing the Meaning of Sentences
Jurafky: QA Jurafky: semantics Palmer: PropBank |
Navigli. 2009. Word Sense Disambiguation: a Survey.
Kingsbury, Palmer, 2002. From TreeBank to PropBank. Reddy et al. 2014. Large-Scale Parsing without Question-Answer Pairs. NLTK book, Chapter 10
|
Mon. Apr. 2 |
No class: Spring break | |
Week 9 Mon. Apr. 9 |
Sentiment Analysis, Figurative language
|
Socher et al. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
Ghosh et al. 2017. The Role of Conversation Context for Sarcasm Detection in Online Interactions Thomas et al. 2006. Get out the vote: Determining support or opposition from Congressional floor-debate transcripts Laver et al. 2003. Extracting Policy Positions from Political Texts Using Words as Data
|
Week10 Mon. Apr. 16 |
Opinions and Trust: Using social information for sentiment analysis, Helpfulness
LDA topic modeling |
Mohammad et al. 2016. Stance and Sentiment in Tweets
Misra and Walker, 2017. Topic Independent Identification of Agreement and Disagreement in Social Media Dialogue Broby and Elhadad. 2010. An Unsupervised Aspect-Sentiment Model for Online Reviews Yang et al. 2007. Semantic Analysis and Helpfulness Prediction of Text for Online Product Reviews Hall et al. 2008. Studying the History of Ideas Using Topic Models Kuang et al. 2017. An LDA Topic Model and Social Network Analysis of a School Blogging Platform |
Week11 Mon. Apr. 23 |
Text mining and crowdsourcing
|
Callison-Burch and Dredze. 2010. Creating Speech and Language Data With Amazon’s Mechanical Turk
Sheng et al. 2012. Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers. (optional) Pavlick et al. 2014. The Language Demographics of Amazon Mechanical Turk Chilton et al. 2016. HumorTools: A Microtask Workflow for Writing News Satire. Defalla et al. 2018. Demographics and Dynamics of Mechanical Turk Workers. (optional)
|
Week12 Mon. Apr. 30 |
Text mining and crowdsourcing
(Opinion, Trust, Helpfulness)
|
C. Danescu-Niculescu-Mizil et al. 2011. How Opinions are Received by Online Communities: A Case Study on Amazon.com Helpfulness Votes
M. Ott et al. 2011. Finding Deceptive Opinion Spam by Any Stretch of the Imagination V. Niculae et al. 2015. Linguistic Harbingers of Betrayal (data, summary) L. Fu et al. 2017. When Confidence and Competence Collide: Effects on Online Decision-Making Discussions
|
Week13 Mon. May 7 |
Building Grammars? Managing linguistic data? Authorship detection? Neural Nets?
(Clustering, dimentionality reduction / feature selection) |
D. Lin, 1998. “Automatic retrieval and clustering of similar words.”
T. Liu et al. 2003. “An evaluation on feature selection for text clustering.” Y. Zao et al. 2002. “Evaluation of hierarchical clustering algorithms for document datasets.” |
Week14 Mon. May 14 |
Student presentations | |
Week15 Mon. May 21 |
Wrap-up |