Home » Text Mining, Spring 2018

Text Mining, Spring 2018

Elena Filatova, [email protected]

Office: GC 4410

Class is held in room: GC 6494

Tools and Books

Azure Notebooks

Programming Assignments

Date	Topic	Reading Assignment
Week 1 Mon. Jan. 29	Introduction, word statistics, text similarity measures lecture notes origin-of-species	Before the Internet, Librarians Would Answer Everything and Still do Alibaba’s and Microsoft’s results for the Stanford Reading Comprehension Test: Data set Explanation of the data set Wired article
Week 2 Mon. Feb. 5	Basic text processing concepts: Sentence Splitting, Word Tokenization, Types, Tokens, Vector space model: binary representation, classic IR system, Vector Space modellecture notes	Turney and Pantel, 2010. From Frequency to Meaning: Vector Space Models of Semantics, Journal of AI Research 27 (141-188) Chung and Pennebaker, 2007. The Psychological Functions of Function Words, Social Communication (343-359) Danescu-Niculescu-Mizil et al. 2012. Echoes of Power: Language Effects and Power Differences in Social Interaction Niederhoffer and Pennebaker, 2002. Linguistic Style Matching in Social Interaction. Journal of Language and Social Psychology 2002 21: 337. Edit distance
Mon. Feb. 12 Mon. Feb. 19:	No class: The Graduate Center is closed
Week 3 Tues. Feb. 20	Document Classification, feature representation, cosine similarity, inverse document frequency, tf*idf weighting, term-document weightinglecture notes	The text classification problem Naive Bayes text classification Vector space classification D. Blei and J. Lafferty, 2009. Topic Models Check topic modeling visualization in the corresponding Wikipedia page Mosteller and Wallace, 1964. Deciding Authorship More references on Authorship and Style
Week 4 Mon. Feb.26	Classification Evaluation, Neural Nets lecture notes	R. Collobert, et al. 2011. Natural Language Processing (Almost) from Scratch Learning the Meaning Behind Words. Google Research Blog. Mikolov et al, 2013 Efficient Estimation of Word Representations in Vector Space M. Gales, 2016. Deep Learning notes
Week 5 Mon. Mar. 5	Unsupervised learning on textual data, Invited talk Word2Vec lecture notes	Zhang and LeCunn, 2015. Text Understanding from scratch.
Week 6 Mon. Mar. 12	Word Embeddings for Information Extraction and Question Answering, WordNet lecture notes	E. Agichtein, L. Gravano. 2000. Snowball: Extracting Relations from Large Plain-Text Collections. M. Mintz, et al. 2009. Distant supervision for relation extraction without labeled data. C. Sutton, A. McCallum, 2006. Tutorial: An Introduction to Conditional Random Fields for Relational Learning. NLTK book, CH. 7: Extracting Information From Text
Week 7 Mon. Mar. 19	Proposal Presentation, Language modeling, topic modeling, LSI
Week 8 Mon. Mar. 26	Analyzing the Meaning of Sentences Jurafky: QA Jurafky: semantics Palmer: PropBank	Navigli. 2009. Word Sense Disambiguation: a Survey. Kingsbury, Palmer, 2002. From TreeBank to PropBank. Reddy et al. 2014. Large-Scale Parsing without Question-Answer Pairs. NLTK book, Chapter 10
Mon. Apr. 2	No class: Spring break
Week 9 Mon. Apr. 9	Sentiment Analysis, Figurative language	Socher et al. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank Ghosh et al. 2017. The Role of Conversation Context for Sarcasm Detection in Online Interactions Thomas et al. 2006. Get out the vote: Determining support or opposition from Congressional floor-debate transcripts Laver et al. 2003. Extracting Policy Positions from Political Texts Using Words as Data
Week10 Mon. Apr. 16	Opinions and Trust: Using social information for sentiment analysis, Helpfulness LDA topic modeling lecture notes	Mohammad et al. 2016. Stance and Sentiment in Tweets Misra and Walker, 2017. Topic Independent Identification of Agreement and Disagreement in Social Media Dialogue Broby and Elhadad. 2010. An Unsupervised Aspect-Sentiment Model for Online Reviews Yang et al. 2007. Semantic Analysis and Helpfulness Prediction of Text for Online Product Reviews Hall et al. 2008. Studying the History of Ideas Using Topic Models Kuang et al. 2017. An LDA Topic Model and Social Network Analysis of a School Blogging Platform
Week11 Mon. Apr. 23	Text mining and crowdsourcing Tutorial	Callison-Burch and Dredze. 2010. Creating Speech and Language Data With Amazon’s Mechanical Turk Sheng et al. 2012. Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers. (optional) Pavlick et al. 2014. The Language Demographics of Amazon Mechanical Turk Chilton et al. 2016. HumorTools: A Microtask Workflow for Writing News Satire. MTurk Tutorial Defalla et al. 2018. Demographics and Dynamics of Mechanical Turk Workers. (optional)
Week12 Mon. Apr. 30	Text mining and crowdsourcing (Opinion, Trust, Helpfulness) Regression notes Linear Regression Assumptions	C. Danescu-Niculescu-Mizil et al. 2011. How Opinions are Received by Online Communities: A Case Study on Amazon.com Helpfulness Votes NYT article on fake reviews M. Ott et al. 2011. Finding Deceptive Opinion Spam by Any Stretch of the Imagination V. Niculae et al. 2015. Linguistic Harbingers of Betrayal (data, summary) L. Fu et al. 2017. When Confidence and Competence Collide: Effects on Online Decision-Making Discussions
Week13 Mon. May 7	Building Grammars? Managing linguistic data? Authorship detection? Neural Nets? (Clustering, dimentionality reduction / feature selection) Clustering COLING 2012 tutorial on dimentionality reduction CMU recitation	D. Lin, 1998. “Automatic retrieval and clustering of similar words.” T. Liu et al. 2003. “An evaluation on feature selection for text clustering.” Y. Zao et al. 2002. “Evaluation of hierarchical clustering algorithms for document datasets.”
Week14 Mon. May 14	Student presentations
Week15 Mon. May 21	Wrap-up

Attribution-NonCommercial-ShareAlike 4.0 International

This entry is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.