R server for HDinsight review
What do you like best?
As r isn't the most integrated library in terms of big data development, it is extremely helpful HDinsight have a dedicated setup for r server to take any installation steps out of your hands. Ease of setup is extremely handy and much better than setting up a docker container with r yourself. The other option for big data with r would be databricks but HDinsight allows you to write non notebook based code so you can stick to normal script based workflows. Also HDinsight for the most part it cheaper than databricks but obviously if you have massive pipelines costs can rack up pretty easily.
What do you dislike?
Nowhere near as much documentation/tutorials compared to python samples on HD insight. Since HDinsight is an azure specific product, if you needed to migrate to AWS for example. You would spend more time refactoring than if you used databricks.
Recommendations to others considering the product:
Do tutorials before and get your head around spark and distributed / parallel computing concepts otherwise you won't get as much benefit out of it
What problems are you solving with the product? What benefits have you realized?
Data pipelines which are used to clean and featurize raw data have been implemented via this service, nothing overtly complex. Simply put speed of pipeline is the main benefit. Costs more than our current system but within budget.