Data warehousing made more effective with Amazon Redshift
What do you like best?
The features of Redshift which stand out amongst other cloud data warehouses are Automatic Workload Management (which allows to segregate your workloads, like ETL and Reporting and allocate a part of cluster resources to each workload - such that both of these can work in their 'own' space), Elastic Resizing (which allows you to scale your cluster up and down within minutes), Concurrency Scaling (which allows you to have a virtually unlimited number of connections on your Redshift cluster as additional capacity is added on-demand according to the workload).
Some other features worth the mentions are UDF and Stored Procedures (now you can also call AWS Lambda functions from within a UDF, isn't that cool.!!), Fully automated maintenance (including the tedious and time-taking Vacuum and Delete operations) which is really a big deal for data warehouses, Federated queries (to query other databases like RDS PostgreSQL from Redshift.
It is also amazing to see how AWS listens to customer feedback and implement new features at a great pace.
What do you dislike?
Few limitations of Redshift are listed below -
1. Single Commit Queue - Redshift is primarily used for OLAP (On-line Analytical Processing) workloads. It is not recommended for OLTP (Transactional) workloads where heavy write operations are performed. Due to the single commit queue, writes are slower than other traditional databases (like PostgreSQL)
2. Constraints: Redshift does not enforce constraints like UNIQUE, Primary, and Foreign keys. Although, they can be used while creating tables and they are used by the compiler to determine the most effective query plan.
3. Although not a show stopper, Redshift could definitely benefit from having the ability to query another Redshift cluster from one (feature like dblink).
Recommendations to others considering the product:
While Redshift can be considered for most use-cases pertaining to data warehousing, one must consider that Redshift is not cheap. It is true that you pay for what you use, but if you do not use it effectively - it could be a huge chunk of your overall AWS spends.
Few important points to consider and keep in mind -
1. Do not use Redshift if your requirement is to just store the data. Redshift gives you the most benefits when you perform complex analytics on the data
2. Dump infrequently accessed data to S3 and query using Spectrum (if required) to save storage costs on your Redshift cluster.
3. Pause your Redshift cluster when not in use and Resume when required. This is particularly beneficial for lower (dev, staging, and test) environments which are not used 24x7x365
4. Rightsize: Choosing the right node type for the Redshift cluster is really important, else the capacity would be just idle and a waste of money
What problems are you solving with the product? What benefits have you realized?
I have been doing a lot of work on Redshift in the last 5 years. For my clients, we are dealing with terabytes of data (including PII data). Redshift plays a key role in the data and analytics pipeline where we can store huge amounts of data without compromising the performance of analytical queries. The elastic resizing feature also helps to scale up and down the storage whenever required.