Introduction to NLTK

 Technology

 49 views
of 101
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Description
1. Getting Started with NLTK An Introduction to NLTK Sreejith S srssreejith@gmail.com @tweet2sree FOSSMeet 2011,NIC Calicut 06 February 2011 Sreejith S Getting Started…
Share
Transcript
  • 1. Getting Started with NLTK An Introduction to NLTK Sreejith S srssreejith@gmail.com @tweet2sree FOSSMeet 2011,NIC Calicut 06 February 2011 Sreejith S Getting Started with NLTK
  • 2. Just a word about me !! Working in Natural Language Processing (NLP), Machine Learning, Text Mining Active member of ilugcbe , http://ilugcbe.techstud.org Works for 365Media Pvt. Ltd. Coimbatore India. @tweet2sree , srssreejith@gmail.com Sreejith S Getting Started with NLTK
  • 3. Introduction - NLP Natural Language Processing Sreejith S Getting Started with NLTK
  • 4. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Sreejith S Getting Started with NLTK
  • 5. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Sreejith S Getting Started with NLTK
  • 6. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Sreejith S Getting Started with NLTK
  • 7. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... Sreejith S Getting Started with NLTK
  • 8. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence Sreejith S Getting Started with NLTK
  • 9. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence NLP - Any kind of computer manipulation of natural language. Sreejith S Getting Started with NLTK
  • 10. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence NLP - Any kind of computer manipulation of natural language. It is a rapidly developing field of study Sreejith S Getting Started with NLTK
  • 11. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence NLP - Any kind of computer manipulation of natural language. It is a rapidly developing field of study Everyday applications of NLP Sreejith S Getting Started with NLTK
  • 12. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence NLP - Any kind of computer manipulation of natural language. It is a rapidly developing field of study Everyday applications of NLP Handwriting recognition,Machine translation,Question-answering systems,Spell checkers,Grammer checkers etc... Sreejith S Getting Started with NLTK
  • 13. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Sreejith S Getting Started with NLTK
  • 14. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien Sreejith S Getting Started with NLTK
  • 15. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Sreejith S Getting Started with NLTK
  • 16. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Sreejith S Getting Started with NLTK
  • 17. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Sreejith S Getting Started with NLTK
  • 18. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Modular Sreejith S Getting Started with NLTK
  • 19. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Modular Well documented Sreejith S Getting Started with NLTK
  • 20. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Modular Well documented Simple and extensible Sreejith S Getting Started with NLTK
  • 21. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Modular Well documented Simple and extensible http://www.nltk.org Sreejith S Getting Started with NLTK
  • 22. What You Will Learn How simple programs can help you manipulate and analyze language data, and how to write these programs Sreejith S Getting Started with NLTK
  • 23. What You Will Learn How simple programs can help you manipulate and analyze language data, and how to write these programs How key concepts from NLP and linguistics are used to describe and analyze language Sreejith S Getting Started with NLTK
  • 24. What You Will Learn How simple programs can help you manipulate and analyze language data, and how to write these programs How key concepts from NLP and linguistics are used to describe and analyze language How data structures and algorithms are used in NLP Sreejith S Getting Started with NLTK
  • 25. What You Will Learn How simple programs can help you manipulate and analyze language data, and how to write these programs How key concepts from NLP and linguistics are used to describe and analyze language How data structures and algorithms are used in NLP How language data is stored in standard formats, and how data can be used to evaluate the performance of NLP techniques Sreejith S Getting Started with NLTK
  • 26. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Sreejith S Getting Started with NLTK
  • 27. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Sreejith S Getting Started with NLTK
  • 28. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Sreejith S Getting Started with NLTK
  • 29. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it Sreejith S Getting Started with NLTK
  • 30. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Sreejith S Getting Started with NLTK
  • 31. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Sreejith S Getting Started with NLTK
  • 32. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Open terminal and cd in to this folder, Be super user , python setup.py install Sreejith S Getting Started with NLTK
  • 33. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Open terminal and cd in to this folder, Be super user , python setup.py install To install data Sreejith S Getting Started with NLTK
  • 34. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Open terminal and cd in to this folder, Be super user , python setup.py install To install data Start python interpreter >>> import nltk >>> nltk.download() Sreejith S Getting Started with NLTK
  • 35. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Open terminal and cd in to this folder, Be super user , python setup.py install To install data Start python interpreter >>> import nltk >>> nltk.download() Now you are ready to play with NLTK !!! Sreejith S Getting Started with NLTK
  • 36. NLTK Modules NLTK Modules Functionality Sreejith S Getting Started with NLTK
  • 37. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus Sreejith S Getting Started with NLTK
  • 38. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers Sreejith S Getting Started with NLTK
  • 39. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info Sreejith S Getting Started with NLTK
  • 40. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT Sreejith S Getting Started with NLTK
  • 41. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means Sreejith S Getting Started with NLTK
  • 42. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity Sreejith S Getting Started with NLTK
  • 43. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing Sreejith S Getting Started with NLTK
  • 44. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation Sreejith S Getting Started with NLTK
  • 45. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation nltk.metrics Evaluation metrics Sreejith S Getting Started with NLTK
  • 46. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation nltk.metrics Evaluation metrics nltk.probability Probability & Estimation Sreejith S Getting Started with NLTK
  • 47. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation nltk.metrics Evaluation metrics nltk.probability Probability & Estimation nltk.app,nltk.chat Applications Sreejith S Getting Started with NLTK
  • 48. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation nltk.metrics Evaluation metrics nltk.probability Probability & Estimation nltk.app,nltk.chat Applications Sreejith S Getting Started with NLTK
  • 49. Let us start the game To access data for working out the example in the book Start python interpreter Sreejith S Getting Started with NLTK
  • 50. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Sreejith S Getting Started with NLTK
  • 51. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance Sreejith S Getting Started with NLTK
  • 52. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Sreejith S Getting Started with NLTK
  • 53. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Similar Sreejith S Getting Started with NLTK
  • 54. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Similar >>> text1.similar("monstrous") Sreejith S Getting Started with NLTK
  • 55. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Similar >>> text1.similar("monstrous") Dispersion plot - Positional information Sreejith S Getting Started with NLTK
  • 56. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Similar >>> text1.similar("monstrous") Dispersion plot - Positional information >>> text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"]) >>> text4.dispersion_plot(["and", "to", "of", "with", "the"]) What is it !!! Why ??? Sreejith S Getting Started with NLTK
  • 57. Continued... Some basic work outs from the book Sreejith S Getting Started with NLTK
  • 58. Continued... Some basic work outs from the book Generate Sreejith S Getting Started with NLTK
  • 59. Continued... Some basic work outs from the book Generate >>> text3.generate() Sreejith S Getting Started with NLTK
  • 60. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary Sreejith S Getting Started with NLTK
  • 61. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) Sreejith S Getting Started with NLTK
  • 62. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) List of distinct words ,sorted in dictionary order. Sreejith S Getting Started with NLTK
  • 63. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) List of distinct words ,sorted in dictionary order. >>> sorted(set(text3)) Sreejith S Getting Started with NLTK
  • 64. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) List of distinct words ,sorted in dictionary order. >>> sorted(set(text3)) Count occurrence of a particular word in a text Sreejith S Getting Started with NLTK
  • 65. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) List of distinct words ,sorted in dictionary order. >>> sorted(set(text3)) Count occurrence of a particular word in a text >>> text3.count("and") What percentage of text it is taken by a specific word >>> 100 * text3.count("and") / len(text3) Sreejith S Getting Started with NLTK
  • 66. Collocation & Bigram Sreejith S Getting Started with NLTK
  • 67. Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a collocation Sreejith S Getting Started with NLTK
  • 68. Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a collocation >>> text4.collocations() Sreejith S Getting Started with NLTK
  • 69. Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a coll
  • Related Search
    We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks