Hive is the best out there for answering ad-hoc queries in parallel paradigm. It works very well with Hadoop Echo system (mainly integrates perfectly with HDFS). - Easy to use as it implements most of SQL functions.
- Needs more optimization for complex queries (like caching, auto-partitioning,etc ...) to speed up the latency of the queries. - Tuning the hive parameters is really challenging for the users. The default settings don't work with the large queries. - Hive is perfect if 90-95% of the queries are read-only. It is not suitable for applications with heavily updates
Get quick insights from big data in case of the customers' data don't fit on one machine. It helps a lot for data preparation (i.e. creating temporary tables), that can be consumed by other machine learning solutions like Spark to build machine learning models that add more business values.
The progression of features, speed, etc brings me the strategic confidence I need in the SQL in hadoop space.
At this point, everything is on pint & theories it is great in hive 1.2
Deriving value from masses of unstructured & structured data.
Hive syntax is almost like sql, so for someone already familiar with sql it takes almost no effort to pick up hive. But there are other tools that can do the same thing faster these days. Hive initially was really good to have; but more and more projects are now available to do SQL like operations on Big Data (like Drill).
Hive is comparatively slower than its competitors. Its easy to use but that comes with the cost of processing, If you are using it just for batch processing then hive is well and fine. It also does not have as rich of a scripting language.
In Retail, the business partners are more comfortable querying their own data instead of relying on Engineers. Hive solves one of that problems. The main purpouse of using Hive is to building reports and do analysis of data that is stored in the Hadoop file system.
The syntax of hive! Its almost SQL so its easy to use. External tables, partitions, buckets, UDFs all the features I like to use with hive. ORC data format occupying lesser space and retrieving the data much faster. Learning curve looks easier as it is similar to SQL but hold on! you must learn all the features of hive before writing a big hql to join multiple hundreds GBs tables and fetch results. Otherwise if you write it like a regular SQL it may take hours to process. So hive is always at its best when you set the optimization parameters before you run your scripts. Also its complex datatypes make hive more useful than other RDBMS.
Hive is comparatively slower than its competitors. Its easy to use but that comes with the cost of processing, If you are using it just for batch processing then hive is well and fine.
Generating datasets from huge files for reporting purposes.