Natural Language Processing
Natural Language Processing: Natural Language Processing (NLP) is the process of computer analysis of input provided in a human language (natural language), and conversion of this input into a useful form of representation. The field of NLP is primarily concerned with getting computers to perform useful and interesting tasks with human languages. The field of NLP is secondarily concerned with helping us come to a better understanding of human language.
- The input/output of a NLP system can be:
– written text
– speech
- We will mostly concerned with written text (not speech).
- To process written text, we need:
– lexical, syntactic, semantic knowledge about the language
– discourse information, real world knowledge
- To process spoken language, we need everything required to process written text, plus the challenges of speech recognition and speech synthesis.
There are two components of NLP.
-
Natural Language Understanding
– Mapping the given input in the natural language into a useful representation.
– Different level of analysis required:
morphological analysis,
syntactic analysis,
semantic analysis,
discourse analysis, …
-
Natural Language Generation
– Producing output in the natural language from some internal representation.
– Different level of synthesis required:
deep planning (what to say),
syntactic generation
- NL Understanding is much harder than NL Generation. But, still both of them are hard.
The difficulty in NL understanding arises from the following facts:
- Natural language is extremely rich in form and structure, and very ambiguous.
– How to represent meaning,
– Which structures map to which meaning structures.
- One input can mean many different things. Ambiguity can be at different levels.
- Lexical (word level) ambiguity -- different meanings of words
– Syntactic ambiguity -- different ways to parse the sentence
– Interpreting partial information -- how to interpret pronouns
– Contextual information -- context of the sentence may affect the meaning of that sentence.
- Many input can mean the same thing.
- Interaction among components of the input is not clear.
The following language related information are useful in NLP:
- Phonology – concerns how words are related to the sounds that realize them.
- Morphology – concerns how words are constructed from more basic meaning units called morphemes. A morpheme is the primitive unit of meaning in a language.
- Syntax – concerns how can be put together to form correct sentences and determines what structural role each word plays in the sentence and what phrases are subparts of other phrases.
- Semantics – concerns what words mean and how these meaning combine in sentences to form sentence meaning. The study of context-independent meaning.
- Pragmatics – concerns how sentences are used in different situations and how use affects the interpretation of the sentence.
- Discourse – concerns how the immediately preceding sentences affect the interpretation of the next sentence. For example, interpreting pronouns and interpreting the temporal aspects of the information.
- World Knowledge – includes general knowledge about the world. What each language user must know about the other’s beliefs and goals.