- Definition: Natural Language Processing (NLP) is the study of the computational treatment of natural (human) language, i.e., teaching computer how to understand and generate human language.
- Research Resource: NLP draws on research in
- Linguistics
- Theoretical Computer Science
- Mathematics
- Statistics
- Artificial Intelligence
- Psychology
- Database
- Language and Communication (Speaker):
- Intention- goals, shared knowledge and beliefs
- Generation- tactical
- Synthesis- text or speech
- Language and Communication (Listener):
- Perception
- Interpretation- syntactic, semantic, pragmatic
- Incorporation- internalization. understanding
- Basic NLP Pipeline: Language --(Understanding)--> Computer --(Generation)--> Language
- Challenges: coming from the text
- current events, background events, speculation, property, reference to previous sentence
- genres of text: blogs, emails, press releases, chats, debates, etc.
- special types of text and terminologies
- incomplete sentences
- fiction text, rare words,
- extremely long and complex sentences
- multiple possible interpretations, ambiguous sentences
- lexical, structural, scope ambiguity
- Textbook
- Speech and Language Processing (3rd edition) (Dan Jurafsky and James H. Martin) or 2nd edition
- Foundations of Statistical Natural Language Processing (Chris Manning and Hinrich Schütze)
- Natural Language Understanding (James Allen)
- Courses
- JHU (Jason Eisner)
- Cornell (Lillian Lee)
- Stanford (Chris Manning)
- U. Maryland (Hal Daume)
- Berkeley (Dan Klein)
- U. Texas (Ray Mooney)
- Coursera (Manning/Jurafsky, survey)
- Coursera (Michael Collins, advanced, 2013)
- Related Fields
- CL (Computational Linguistics)- more mathematical and formal treatment of linguistics, less confusing on applications
- IR (Information Retrieval)- study of finding information and documents (text, speech or video).
- SP (Speech Processing)- deal with understanding and generation of spoken signals.
- HLT ( Human Language Technology)- uses of the applied component of the NLP.
- NLE (Natural Language Engineering)- synonym of HLT.
- ML (Machine Learning)- computational, statistical study of learning.
- Research in NLP
- Conference: ACL/NAACL, EMNLP, SIGIR, AAAI/IJCAI, Coling, HLT, EACL/NAACL, AMTA/MT Summit, ICSLP/Eurospeech
- Journals: Computational Linguistics, TACL, Natural Language Engineering, Information Retrieval, Information Processing and Management, ACM Trans. on Information Systems, ACM TALIP, ACM TSLP
- ACL Anthology
- ACL Anthology Network (AAN)
- Difficulties of NLP
- Similar forms does not imply similar meanings (e.g., "Beverly Hills/Sills")
- Computer is not good at metaphors (e.g., "Time flies like an arrow.")
- Word order is important (e.g., "The box/pen is in the pen/box.")
- Not the same semantic relationship (e.g., Mary and Sue are mothers/sisters.")
- Not the same order of magnitude due to semantic change (e.g., "Every American has a mother/president.")
- Draw on common sense (e.g., "We gave monkeys bananas because they were hungry/over-ripe.")
- Syntactic correctness does not guarantee semantic correctness (e.g., "Colorless green ideas sleep furiously.")
- Ambiguous words due to multiple meanings (e.g., "ball"), part of speech (e.g., "fly"), different pronunciations (e.g., "address"), noun-noun phrases (e.g., "Science fiction writer"), etc.
- Types of Ambiguity
- Morphological- "Joe is quite impossible/important."
- Phonetic- "Joe's finger got number."
- Part of Speech- "Joe won the first round."
- Syntactic- "Call Joe a taxi."
- Pp Attachment- "Joe ate pizza with a for/with meatballs/with Sam/with pleasure."
- Sense- "Joe took the bar exam."
- Modality- "Joe may win the lottery."
- Subjectivity- "Joe believes the stocks will rise."
- Cc Attachment- "Joe likes ripe apples and pears."
- Negation- "Joe likes his pizza with no cheese and tomatoes."
- Referential- "Joe yelled at Mike. He had broken the bike." "Joe yelled at Mike. He was angry at him."
- Reflexive- "John brought him/himself a present."
- Ellipsis and Parallelism- "Joe gave Mike a beer and James a glass of wine."
- Metonymy- "Boston called and left a message for Joe."
- Other Difficulties
- Non-standard- "+1-212-772-1220", "A360"
- Slang- "friend (verb)", "spam"
- Novel words and usages- "yolo", "selfie"
- Inconsistencies- "Junior college" vs. "college junior"
- Typos and grammatical errors
- Parsing problems- "cup holder", "Federal Reserve Board Chairman"
- Complex sentences
- Counterfactual sentences- " if you were to do..."
- Humor and sarcasm
- Implicature/inference/world knowledge- "I was late because my car broken down." implies I have a car, I used the car to get to places, etc.
- Semantics vs. pragmatics- "Do you know the time?"
- Language is even hard for human (both L1 and L2).
- Synonyms and Paraphrases
- Synonyms- "climbed", "gained" and "rose"
- Paraphrases- "its best close", "for its best showing" and "its highest level"
- Linguistics Knowledge
- Constituents- "My cousin's neighbor's children eat pizza." and "Eat pizza"
- Colocations- "strong beer" but *"powerful beer"
- Get Linguistics Knowledge into the System
- Manual rules
- Automatically acquired from large text collections (corpora)
Friday, July 8, 2016
Natural Language Processing (NLP) Notes
This is the working notes for the Natural Language Processing (NLP) that is part of the bioinformatics project.
Subscribe to:
Comments (Atom)
