About Me

I’m an R&D Engineer currently working with RaRe Technologies, a company that works on projects involving NLP and Machine Learning in a wide variety of domains. We are also the maintainers of Gensim, a popular open source library for unsupervised learning in NLP.

I’ve worked on both commercial and open source projects for the company. We often work on creating well-designed, scalable implementations of practically useful models from research papers and open source them as part of Gensim. The last such project I worked on was implementing Poincare Embeddings, a model from a paper by Facebook AI Research to learn vector representations of nodes from a graph with hierarchical information. We published a high-level report about it here. A more technical post with all the gory details is coming soon.

I’ve previously been a mentor in Google Summer of Code 2017 for a project to implement FastText into Gensim, a model to learn word representations from an unstructured text corpus while capturing morphological information using character n-grams. I’ve also written an analysis of how the embedding does in comparison to the model that FastText is based on, the popularly used word2vec.

One of my primary interests is using text to analyze large-scale text on the web in order to understand society at large. I’m particularly interested in news articles - I’ve published an open source library to perform exploratory analysis of news corpora in order to find topics and trends in them. Here, you can see a demo of the library on a dataset of Hacker News articles. I’ve also delved into a fun project on coming up with a novel (to the best of my knowledge) method to cluster classic literature available from Project Gutenberg by using word2vec. You can read about it here.

The problem of strong AI interests me too, and I hope to see it solved in my lifetime. Natural Language Understanding is of particular interest since it is an extremely hard problem, and also a very uniquely human trait. For one, human language is deeper and more sophisticated than the languages or forms of communication of any other species, and second, it is very closely associated to the ability of our brains to form complex models of reality, which I believe is the root of our intelligence (this is all very hand-wavy and vague though). And of course, actual, day-to-day work on NLP can be a little less lofty, since we are nowhere close to achieving human level intelligence in our artificial models (yet).

I graduated from Indian Institute of Technology, Roorkee in 2015 with a degree in mechanical engineering and knowledge of a lot of things unrelated. I spent a lot of time in college with a student group, SDSLabs (Software Development Section), a bunch of smart geeks who make cool web and mobile applications. I participated in quite a few hackathons with them, winning a couple. I was very interested in game development in college, being part of Google’s open source program, Google Summer of Code 2014 with CodeCombat, and interning in a game dev startup, Edlogiq in 2013.

When I’m not doing any of that, I love to read, especially classic sci-fi and dystopian fiction. I also like to trek and travel a lot, and listen to music. Feel free to contact me at jayantjain1992@gmail.com.