3 Ways to Apply Latent Semantic Analysis on Large-Corpus Text on macOS Terminal, JupyterLab, and Colab

Latent semantic analysis works on large-scale datasets to generate representations to discover the insights through natural language processing. There are different approaches to perform the latent semantic analysis at multiple levels such as document level, phrase level, and sentence level. Primarily semantic analysis can be summarized into lexical semantics and the study of combining individual words into paragraphs or sentences. The lexical semantics classifies and decomposes the lexical items. Applying lexical semantic structures has different contexts to identify the differences and similarities between the words. A generic term in a paragraph or a sentence is hypernym and hyponymy provides the meaning of the relationship between instances of the hyponyms. Homonyms contain similar syntax or similar spelling with similar structuring with different meanings. Homonyms are not related to each other. Book is an example for homonym. It can mean for someone to read something or an act of making a reservation with similar spelling, form, and syntax. However, the definition is different. Polysemy is another phenomenon of the words where a single word could be associated with multiple related senses and distinct meanings. The word polysemy is a Greek word which means many signs. Python provides NLTK library to perform tokenization of the words by chopping the words in larger chunks into phrases or meaningful strings. Processing words through tokenization produce tokens. Word lemmatization converts words from the current inflected form into the base form.

Figure 1. Code snippet for word lemmatization.


Figure 2. Different data sources for natural language processing with Python.

Latent semantic analysis

Applying latent semantic analysis on large datasets of text and documents represents the contextual meaning through mathematical and statistical computation methods on large corpus of text. Many times, latent semantic analysis overtook human scores and subject matter tests conducted by humans. The accuracy of latent semantic analysis is high as it reads through machine readable documents and texts at a web scale. Latent semantic analysis is a technique that applies singular value decomposition and principal component analysis (PCA). The document can be represented with Z x Y Matrix A, the rows of the matrix represent the document in the collection. The matrix A can represent numerous hundred thousands of rows and columns on a typical large-corpus text document. Applying singular value decomposition develops a set of operations dubbed matrix decomposition. Natural language processing in Python with NLTK library applies a low-rank approximation to the term-document matrix. Later, the low-rank approximation aids in indexing and retrieving the document known as latent semantic indexing by clustering the number of words in the document.

Brief overview of linear algebra

The A with Z x Y matrix contains the real-valued entries with non-negative values for the term-document matrix. Determining the rank of the matrix comes with the number of linearly independent columns or rows in the the matrix. The rank of A ≤ {Z,Y}. A square c x c represented as diagonal matrix where off-diagonal entries are zero. Examining the matrix, if all the c diagonal matrices are one, the identity matrix of the dimension c represented by Ic. For the square Z x Z matrix, A with a vector k which contains not all zeroes, for λ. The matrix decomposition applies on the square matrix factored into the product of matrices from eigenvectors. This allows to reduce the dimensionality of the words from multi-dimensions to two dimensions to view on the plot. The dimensionality reduction techniques with principal component analysis and singular value decomposition holds critical relevance in natural language processing. The Zipfian nature of the frequency of the words in a document makes it difficult to determine the similarity of the words in a static stage. Hence, eigen decomposition is a by-product of singular value decomposition as the input of the document is highly asymmetrical. The latent semantic analysis is a particular technique in semantic space to parse through the document and identify the words with polysemy with NLKT library. The resources such as punkt and wordnet have to be downloaded from NLTK.

Deep Learning at scale with Google Colab notebooks

Figure 3. NVIDIA Deep Learning stack with GPUs.

Training machine learning or deep learning models on CPUs could take hours and could be pretty expensive in terms of the programming language efficiency with time and energy of the computer resources. Google built Colab Notebooks environment for research and development purposes. It runs entirely on the cloud without requiring any additional hardware or software setup for each machine. It’s entirely equivalent of a Jupyter notebook that aids the data scientists to share the colab notebooks by storing on Google drive just like any other Google Sheets or documents in a collaborative environment. There are no additional costs associated with enabling GPU at runtime for acceleration on the runtime. There are some challenges of uploading the data into Colab, unlike Jupyter notebook that can access the data directly from the local directory of the machine. In Colab, there are multiple options to upload the files from the local file system or a drive can be mounted to load the data through drive FUSE wrapper.

Figure 4. Installing a drive FUSE wrapper.

Once this step is complete, it shows the following log without errors:

Figure 5. Installation log on macOS that shows the installation

The next step would be generating the authentication tokens to authenticate the Google credentials for the drive and Colab

Figure 6. Authenticate the credentials.

If it shows successful retrieval of access token, then Colab is all set.

Figure 7. Access token verification.

At this stage, the drive is not mounted yet, it will show false when accessing the contents of the text file.

Figure 8. Verifying the access to Google drive Colab notebook uploaded files.

Once the drive is mounted, Colab has access to the datasets from Google drive.

Figure 9. Type your caption here.

Once the files are accessible, the Python can be executed similar to executing in Jupyter environment. Colab notebook also displays the results similar to what we see on Jupyter notebook.

Figure 10. Results from the program.

PyCharm IDE

The program can be run compiled on PyCharm IDE environment and run on PyCharm or can be executed from OSX Terminal.

Figure 11. LSA analysis in Python natural language processing in PyCharm IDE.

Results from OSX Terminal

Figure 12. Results from OSX Terminal.

Jupiter Notebook on standalone machine

Jupyter Notebook gives a similar output running the latent semantic analysis on the local machine:

Figure 13. Running the latent semantic analysis on Jupyter notebook.

Figure 14. Results.


Gorrell, G. (2006). Generalized Hebbian Algorithm for Incremental Singular Value Decomposition in Natural Language Processing. Retrieved from https://www.aclweb.org/anthology/E06-1013

Hardeniya, N. (2016). Natural Language Processing: Python and NLTK . Birmingham, England: Packt Publishing.

Landauer, T. K., Foltz, P. W., Laham, D., & University of Colorado at Boulder (1998). An Introduction to Latent Semantic Analysis. Retrieved from http://lsa.colorado.edu/papers/dp1.LSAintro.pdf

Stackoverflow (2018). Mounting Google Drive on Google Colab. Retrieved from https://stackoverflow.com/questions/50168315/mounting-google-drive-on-google-colab

Stanford University (2009). Matrix decompositions and latent semantic indexing. Retrieved from https://nlp.stanford.edu/IR-book/html/htmledition/matrix-decompositions-and-latent-semantic-indexing-1.html

25 thoughts on “3 Ways to Apply Latent Semantic Analysis on Large-Corpus Text on macOS Terminal, JupyterLab, and Colab

  1. yeezy boost Posted On

    Thank you a lot for giving everyone an extremely memorable chance to read in detail from here. It is always so useful and packed with a good time for me personally and my office co-workers to search your web site at the very least thrice in one week to read through the fresh issues you have got. Of course, we are actually contented with your good knowledge served by you. Certain 1 points in this article are undeniably the most effective I have had.

  2. yeezy Posted On

    I wish to show some thanks to you just for bailing me out of this particular instance. As a result of surfing through the online world and finding strategies that were not pleasant, I figured my entire life was well over. Living devoid of the solutions to the difficulties you have fixed by means of your main article is a crucial case, and the kind that would have badly damaged my entire career if I had not come across your website. Your personal talents and kindness in touching all the details was precious. I don’t know what I would’ve done if I hadn’t come upon such a stuff like this. I am able to at this time relish my future. Thanks for your time so much for this expert and effective guide. I won’t be reluctant to endorse the website to any person who would need support on this issue.

  3. yeezy boost 350 v2 Posted On

    A lot of thanks for every one of your labor on this site. Ellie take interest in making time for investigations and it’s obvious why. A lot of people notice all concerning the compelling form you deliver vital strategies through the website and recommend participation from other ones on that subject matter plus my princess is without question learning a lot. Enjoy the remaining portion of the new year. You are performing a fabulous job.

  4. converse shoes Posted On

    I definitely wanted to post a message so as to appreciate you for all of the great ideas you are giving out at this site. My prolonged internet search has at the end of the day been compensated with reliable details to talk about with my relatives. I ‘d say that we visitors actually are extremely endowed to live in a good community with many awesome individuals with valuable pointers. I feel somewhat grateful to have discovered your entire site and look forward to so many more amazing minutes reading here. Thank you once more for everything.

  5. golden goose sneakers Posted On

    I wanted to send you one little bit of observation to help say thanks as before with your marvelous methods you’ve shown on this page. It’s simply generous of you to offer publicly just what many of us could possibly have offered for an electronic book in making some profit on their own, and in particular considering that you could possibly have tried it in the event you wanted. These guidelines additionally served like a great way to be sure that other individuals have a similar passion similar to my own to see somewhat more regarding this issue. I think there are some more fun instances up front for folks who read your blog post.

  6. yeezy boost 350 Posted On

    I precisely wanted to appreciate you all over again. I’m not certain the things that I might have taken care of without these concepts discussed by you regarding this area of interest. This has been a troublesome dilemma for me, but being able to view this well-written technique you resolved that made me to cry for fulfillment. Now i am grateful for the support and trust you are aware of an amazing job you have been accomplishing instructing people today with the aid of your web blog. I know that you have never got to know all of us.

  7. Adidas NMD Runner Unisex Green Blue Posted On

    I have to voice my affection for your kindness giving support to those people who absolutely need assistance with that niche. Your real dedication to getting the message around has been really practical and have continuously enabled most people much like me to arrive at their endeavors. Your personal warm and helpful recommendations entails much a person like me and extremely more to my fellow workers. Many thanks; from everyone of us.

  8. kyrie irving shoes Posted On

    Nice post. I study something more challenging on totally different blogs everyday. It will always be stimulating to learn content from other writers and observe just a little one thing from their store. I抎 desire to make use of some with the content material on my blog whether or not you don抰 mind. Natually I抣l provide you with a link in your internet blog. Thanks for sharing.

  9. yeezy boost 350 Posted On

    I am commenting to make you be aware of what a excellent encounter my cousin’s child gained browsing yuor web blog. She discovered a wide variety of details, most notably what it’s like to possess a wonderful helping style to have folks effortlessly grasp selected very confusing topics. You actually did more than my expectations. Many thanks for churning out such useful, dependable, explanatory as well as unique guidance on that topic to Evelyn.

  10. chrome hearts Posted On

    Oh my goodness! a tremendous article dude. Thank you However I’m experiencing subject with ur rss . Don抰 know why Unable to subscribe to it. Is there anyone getting equivalent rss downside? Anybody who is aware of kindly respond. Thnkx

  11. westbrook shoes Posted On

    Youre so cool! I dont suppose Ive learn something like this before. So nice to find somebody with some original thoughts on this subject. realy thank you for beginning this up. this website is something that’s needed on the web, somebody with a bit of originality. useful job for bringing one thing new to the web!

  12. nmd uk Posted On

    I and my guys have already been reading the best techniques on the blog and then I had a horrible suspicion I never thanked the site owner for those techniques. Most of the women were consequently warmed to see all of them and have absolutely been enjoying these things. Many thanks for genuinely well thoughtful and then for deciding on certain amazing subject matter millions of individuals are really desirous to learn about. My personal sincere regret for not expressing gratitude to sooner.

  13. kyrie irving shoes Posted On

    I used to be more than happy to search out this net-site.I wished to thanks on your time for this wonderful learn!! I positively enjoying each little bit of it and I’ve you bookmarked to take a look at new stuff you weblog post.

  14. adidas nmd Posted On

    I’m just commenting to let you know what a impressive experience my wife’s girl encountered going through your web page. She came to understand some details, most notably what it is like to have an awesome coaching mindset to get other individuals very easily fully grasp selected extremely tough issues. You truly surpassed her expectations. Thanks for producing these useful, safe, edifying and in addition easy guidance on this topic to Janet.

  15. lebron soldier 11 Posted On

    A powerful share, I simply given this onto a colleague who was doing somewhat analysis on this. And he actually bought me breakfast as a result of I found it for him.. smile. So let me reword that: Thnx for the treat! But yeah Thnkx for spending the time to debate this, I really feel strongly about it and love reading more on this topic. If possible, as you change into expertise, would you mind updating your blog with extra details? It is extremely useful for me. Big thumb up for this weblog submit!

  16. adidas store Posted On

    Youre so cool! I dont suppose Ive learn something like this before. So nice to seek out somebody with some authentic ideas on this subject. realy thanks for beginning this up. this web site is something that is wanted on the net, somebody with a bit of originality. useful job for bringing one thing new to the web!

  17. yeezy boost Posted On

    I needed to create you this tiny remark in order to say thanks again about the splendid solutions you’ve shown on this website. It is really shockingly open-handed with you to allow easily what a number of people could possibly have advertised as an e-book to end up making some bucks on their own, specifically considering the fact that you might well have tried it in case you considered necessary. Those basics additionally worked to be a fantastic way to understand that other individuals have a similar fervor really like my personal own to understand somewhat more in respect of this condition. I think there are several more enjoyable instances up front for folks who looked at your site.

  18. adidas yeezy Posted On

    Oh my goodness! a tremendous article dude. Thanks Nevertheless I’m experiencing concern with ur rss . Don抰 know why Unable to subscribe to it. Is there anybody getting similar rss problem? Anyone who knows kindly respond. Thnkx

  19. michael kors factory outlet Posted On

    There are actually quite a lot of particulars like that to take into consideration. That could be a great level to convey up. I offer the ideas above as common inspiration however clearly there are questions just like the one you carry up the place a very powerful thing shall be working in honest good faith. I don?t know if greatest practices have emerged round things like that, however I am sure that your job is clearly identified as a good game. Both boys and girls feel the impact of only a moment抯 pleasure, for the rest of their lives.


Leave a Reply

Your email address will not be published. Required fields are marked *