Hive is an all-in-one project management tool developed to “help teams move faster” regardless of how they work. Features are created based on users’ requests and are updated weekly, making Hive the world’s first democratic software platform. It’s best known for its capabilities in project management, time management, team collaboration, automation, and an array of integrations with third-party software. Hive is free to use for solo users and with premium versions available to teams and enterprises.
Capabilities |
|
---|---|
Segment |
|
Deployment | Cloud / SaaS / Web-Based, Mobile Android, Mobile iPad, Mobile iPhone |
Support | 24/7 (Live rep), Chat, Email/Help Desk, FAQs/Forum, Knowledge Base, Phone Support |
Training | Documentation |
Languages | English |
Its very user friendly. Easy to install and use. I like the interface very much .
Nothing I can think of. I had a great experience working on HIVE and was satisfied with all the features as it met all my requirements.
I was working on a school project as a part of Big Data course and executing queries with HIVE made the whole project lot more simpler.
The syntax of hive! Its almost SQL so its easy to use. External tables, partitions, buckets, UDFs all the features I like to use with hive. ORC data format occupying lesser space and retrieving the data much faster. Learning curve looks easier as it is similar to SQL but hold on! you must learn all the features of hive before writing a big hql to join multiple hundreds GBs tables and fetch results. Otherwise if you write it like a regular SQL it may take hours to process. So hive is always at its best when you set the optimization parameters before you run your scripts. Also its complex datatypes make hive more useful than other RDBMS.
Hive is comparatively slower than its competitors. Its easy to use but that comes with the cost of processing, If you are using it just for batch processing then hive is well and fine.
Generating datasets from huge files for reporting purposes.
Hive provides an ease to the user who wants to store bulk data, in a tabular manner. It works on the same queries like SQL, making it easy for using the traditional database system. Because of this reason, people need not have to study some new language and can still adapt to the Big Data Culture. Also it has features like partition, and bucketing, helping in segregation of data. Data can directly be loaded into hive, by HDFS, using the CSV files of the same format, or from Hbase by making a pointer to the Hbase table, providing a link within Hadoop.
For small amount of data also, it runs map reduce job, which consumes some time, and thus is not efficient for the same. We do not have a concept of primary key in Hive, so we can have redundant entries. Also till the older version, update and delete were not possible, and now also in the new version, if we want to use the update and delete commands, the performance of the tool gets degraded.
We are using Hive for storing logs, of data, being generated, in our business. Further we will be using these logs for reconciliation purpose, helping in keeping a track of data.
The Hive is intended to simplify your experience with Hadoop and allows developers and business analyst apply their SQL knowledge to query data, build reports, build etl etc.
As the open source software it has common issues with support. Also Hive doesn't support many features that traditional SQL has.
The main purpouse of using Hive is to building reports and do analysis of data that is stored in the Hadoop file system. As for now it is the only one framework that can be used by all most popular BI tools to read the data from the HDFS.
Ease of use as well as ability to scale. It has proven its reliability. They have continued to add more features and increased its speed at the same time.
Speed is still slower compared to newer distributed warehouses. Also, it still uses mapreduce behind the scene which is very slow in the present days.
Storing large amount of data that could not fit in to any relational database system. Being able to derive valuable insight into our data by running mapreduce jobs on data stored in Hive.
The best part about hive is that it easy to master because of its SQL like interfacing. Also very handy tool for ETL and DW functions.
Not too good with setting up optimization parameters. Need to remember a lot of console settings. Index doesnt turn out to be very useful. CRUD functions have many pre-requisites like the table must be bucketed, etc.
we are trying to create a standard ETL pipelining tool that would support standard BI/Reporting utilities.
To be able to run map reduce jobs using json parsing and generate dynamic partitions in parquet file format.
It is slow compared to Spark/Impala for most operations. Also, it throws Out of Memory if multiple partitions are updated containing many parquet files.
Events are gathered in HDFS by flume and needs to be processed into parquet files for fast querying. The input data contains variable attributes in the json payload as each customer could define custom attributes. It is part of the ETL pipeline, where hive jobs read json data and generates parquet files that would be queried using impala/spark. Using views, each customer queries only the relevant data.
-> Easy to configure/create a table for Big data or Streaming data -> Fast and easy to Query. Business/non-technical folks can use HUE for more interactive Querying on Hive tables -> HUE can have saved results and old queries along with exporting results in Excell.CSV
-> Joining and parsing multiple tables with huge sizes still remain a challenge. -> Some of SQL operations doesn't work in hive like non equality joins,
Dumping Site activity Big Data streaming data as well as data logs in Hive
Open source framework allows to read and write and manage tha data like sql , HQl which makes it easy to use.
The latency in Apache hive is very high.
It provides SQL like query language called HQl with schema on read and transparently convert queries to map reduce.
I like the most in Apache Hive supports partitioning and bucketing for fast data retrieval. We can create custom UDF as per the requirements to perform data cleansing and filtering. It supports HQL similar to SQL which gives easy for the people who comes from SQL background.
Doesn't support OLTP and also doesn't support delete or update actions.
We have created a semantic layer in Hive that helps us to process the terabytes of data and generate the reports faster. it also helped us fault tolerance and high availability of the data
Flexible and easy to understand loved it
Not compatible with multiple platforms hence mostly plotform depend
Works best with ETL related works or tasks
It is easy to run query in hive as hive uses hql which is very similar to sql. Hive has hivemetastore service to save the metadata and hiveserver2 to serve the client requests so the segregation here helps in proper resource distribution. Hive is also fault tolerant which makes it ideal to run ETL long running batches
Hive has a problem of cold start and since it used mapreduce algorithm at the backend, it is way slower than spark which made us move to spark from hive as the job completion time after switching to spark got reduced by 70-80%
Informatica data ingestion Abinitio data ingestion and modifications Data formatting (as it provides option such as csv,parauet etc) Data transformation using hive query Data pipelines
The friendliness of the data warehouse tool for the database developers
Not inclusion of acid properties, it doesn't have the acid properties as in the databases
I usually use hive for my big data [data migration problems], the speed at which the query operates, and the option to choose various engines
If you know how to write sql statements you can write hiveql and it doesn't require you to learn anything new,its pretty straightforward
Performance tuning is difficult and becomes hard for complex queries, it still has a few bugs like all the data going to single reducer, which might lead to slow down the query results.
For developing reports for business analysts, lot of them know sql statements so its easy to write and pull information for analysis
- Easy to learn - Can query complex data including nested structures. - Flexible (wrt data schema) - With ORC SerDe, I/O can be reduced drastically - by reading only what is required (columnar formats).
- Needs schema to be defined in prior. - Not ANSI SQL compliant. - Not suitable for fast interactive queries, even on moderate size datasets. - Works only with Hadoop (not an independent query-processing tool) - Not enterprise grade w.r.t quality of documentation, error messages, support
Exploring ways to store and process semantic datasets
Hive its a data warehousing infrastructure built on top of Hadoop to provide data grouping, querying, and analysis.Apache Hive soporta el análisis de grandes conjuntos de datos almacenados bajo HDFS de Hadoop y en sistemas compatibles como el sistema de archivos Amazon.It offers a SQL-based query language called HiveQL5 with schemas to transparently read and convert queries in MapReduce, Apache Tez6, and Spark tasks. All three execution engines can run under YARN. To speed up queries, Hive provides indexes, which include bitmap indexes.
Offers many tools, has great growth potential
Possibility of storing metadata in an organized and easily accessible way.
Hive syntax is almost like sql, so for someone already familiar with sql it takes almost no effort to pick up hive. But there are other tools that can do the same thing faster these days. Hive initially was really good to have; but more and more projects are now available to do SQL like operations on Big Data (like Drill).
Hive is comparatively slower than its competitors. Its easy to use but that comes with the cost of processing, If you are using it just for batch processing then hive is well and fine. It also does not have as rich of a scripting language.
In Retail, the business partners are more comfortable querying their own data instead of relying on Engineers. Hive solves one of that problems. The main purpouse of using Hive is to building reports and do analysis of data that is stored in the Hadoop file system.
nothing in particular. helps us with big data and allows all users to have unrestricted bandwidth, but we already ran into issues with that, so now one of the servers has limitations.
. at my company it was fairly troublesome getting access since it's underlying warehouseing is in hadoop, then have to connect through hive
data insights with big browser data through mapreduce
Easy SQL like syntax for very short and simple queries
No alias for relation. No flow controls as well.
I build machine learning model for online advertising system. Hive to me is more like a ad-hoc query engine rather than a platform where I can develop complex algorithm on