

For example, some companies might be interested in building a predictive model that correlates the sentiment in Tweets with a stock symbol or a commodity future of interest. Social media sentiment analysis is also of interest to algorithmic trading systems. Are people happy with our products? What do they not like about our product? How can we improve our service and offerings? Was our recent product launch/marketing conference well received by our partners and customers? How do customers view our competitors? These are questions that sales, marketing, and service departments in many companies are interested in.

What we find is that companies today are monitoring social media to improve sales, marketing, and customer service, including the evaluation of brand perception. We have also built many demos to showcase the capabilities of Pivotal’s Big Data Suite. Why POS Tagging? Understanding Feedback Inside TweetsĪt Pivotal Data Labs, we’ve worked with a variety of semi-structured data sources, like Tweets, to solve data science problems for our customers. This includes Greenplum DB analytic data warehouses and Pivotal HD (part of the Pivotal Big Data Suite) with HAWQ, the highest performing SQL query engine for Apache Hadoop®. In part one, we will introduce part-of-speech tagging, explain its value, understand the challenges with using it, and show how Pivotal’s MPP-oriented big data platform works with this type of workload, using open source projects, SQL user defined functions, and procedural languages like PL/Java, PL/Python and PL/R.Īs well, we open-sourced the code, and it can run on PostgreSQL as well as Pivotal’s massively parallel processing (MPP) engines.

In this two-part blog, we will show how open source tools can run on the Pivotal technology stack for part-of-speech (POS) tagging at massive scale. We saw how NLP holds one key to unlocking the business value buried in unstructured text, a gold mine that many companies have yet to tap into. In the previous blog post, we gave an overview of text analytics and natural language processing (NLP) in the era of Big Data.
