Apache Pig makes it easy to create efficient data pipelines
What do you like best?
Apache Pig and its query language (Pig Latin) allowed us to create data pipelines with ease. The language is designed to reflect the way data pipelines are designed, so it discards extraneous data, supports user defined functions (UDFs) , and offers a lot of control over the data flow.
What do you dislike?
Pig being a greedy language, will not evaluate data until it's actually needed. So errors are not visible unless you actually try to dump/print the data. There is no "debug" functionality to run the code in a dry-run mode.
Recommendations to others considering the product:
Unless you already have implementations of Pig in the company that you are building on top of, you might be better off with other newer technologies with more
What problems are you solving with the product? What benefits have you realized?
I have used Pig for data piplining and aggregation. The flow of the language reflects the flow of the data and so it is intuitive to understand what the data transformation is doing. However it hasn't kept up with the latest advances in technologies. If you were choosing a language, you would be better off with either Hive or Spark. Pig also has a steeper learning curve since it uses a proprietary language (Pig Latin).