Member since
12-09-2015
106
Posts
40
Kudos Received
20
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1954 | 12-26-2018 08:07 PM | |
1430 | 08-17-2018 06:12 PM | |
834 | 08-09-2018 08:35 PM | |
9028 | 01-03-2018 12:31 AM | |
557 | 11-07-2017 05:53 PM |
01-02-2019
10:23 PM
1 Kudo
are the statistics up to date on the tables? It should be able to do a semi join reduction. Updating stats may help: https://cwiki.apache.org/confluence/display/Hive/Column%2BStatistics%2Bin%2BHive
... View more
01-02-2019
07:24 PM
From the directory listing, your table must have "transactional=true" property, i.e. it's an ACID table. That means that Insert Overwrite will create a base_x directory where it will put the result of the insert (new data) there. Any data that existed before, will remain in the table but will not be visible to readers that start after Insert Overwrite finished. Old data will be physically removed once Compaction runs over this table/partition.
... View more
12-26-2018
08:07 PM
1 Kudo
You can use the Export Table command https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ImportExport#LanguageManualImportExport-ExportSyntax
... View more
12-18-2018
01:41 AM
The contents of a managed table should be manage by Hive, you cannot just write files there. To do that you have to use External table. To add data to a managed table you should use Insert or Load Data or some other Hive command, but not just drop files there.
... View more
12-14-2018
10:04 PM
Is this a managed or external table? If managed, do you have any files in the table that don't match 0000_0, 0000_0_copy_1 or bucket_0 pattern?
... View more
10-16-2018
06:57 PM
Have you considered SQL Merge statement?
... View more
09-23-2018
09:49 PM
1 Kudo
Hive metastore is a service. That may run in embedded mode and in stand alone mode. You can have several instances running in your cluster, all instances must share the same backend RDMBS. This should help: https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-RemoteMetastoreServer
... View more
09-23-2018
04:51 PM
Do you have the standalone metastore running? that is where compaction jobs are actually generated and submitted.
... View more
09-12-2018
09:23 PM
desc formatted <table> <column> https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-Examples
... View more
08-24-2018
07:27 PM
https://cwiki.apache.org/confluence/display/hive/languagemanual+dml#LanguageManualDML-Merge is designed for this
... View more
08-20-2018
04:15 PM
3 Kudos
hive.merge.cardinality.check=false is a bad idea. The logic controlled by this property checks if the ON clause of your Merge statement is such that more than 1 row from source side matches the same row from target side (which only happens in WHEN MATCHED clause). Logically what this means is that the query is asking the system to update 1 existing row in target in 2 (or more) different ways. This check is actually part of SQL standard definition of how Merge should work. You either need examine your data or the ON clause but disabling this check, when it throws a cardinality_violation error, may lead to data corruption later.
... View more
08-17-2018
06:12 PM
When you do SHOW COMPACTIONS, if compaction MR job was submitted, it will show Hadoop Job ID, which can be used to get more info if the problem is with the job in the Resource Manager UI. If it failed even before submitting the job to the cluster, the errors would be in the log of the standalone Hive Metastore running the compactor processes.
... View more
08-09-2018
08:35 PM
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy
... View more
05-02-2018
10:46 PM
1 Kudo
Have you considered Hive? It can certainly do joins. Since you are trying to do CDC and need to handle updates/deletes, you can create transactional tables in Hive which support SQL Merge which is designed for exactly this.
... View more
04-11-2018
07:30 PM
1 Kudo
hive.support.concurrency property enables locking. When a queries is shutdown its locks should be released immediately. When dies abruptly it may leave locks behind. These will be cleaned up by a background process running from a standalone Hive metastore process. This process will consider locks abandoned if they have not heartbeated for (by default) 5 minutes. Metastore logfile should have entries from AcidHouseKeeperService - that is the clean up process.
... View more
02-12-2018
06:21 PM
Hive does not support this directly but you can use SQL Merge statement to achieve this: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Merge
... View more
01-04-2018
05:57 PM
Not generally. The data layout for transactional tables requires special logic to decide which directories to read and how to combine them correctly. Some data files may represent updates of previously written rows, for example. Also, if you are reading while something is writing to this table your read may fail (w/o the special logic) because it will try to read incomplete ORC files. Compaction may (again w/o the special logic) may make it look like your data is duplicated.
... View more
01-03-2018
12:31 AM
Spark doesn't support reading Hive Acid tables directly. (https://issues.apache.org/jira/browse/SPARK-15348/SPARK-16996) It can be done (WIP) via LLAP - tracked in https://issues.apache.org/jira/browse/HIVE-12991
... View more
11-08-2017
10:01 PM
I would need more context. What exceptions are you getting? It seems unusual that you have modify pom files in an whole platform installation. Are you calling Streaming API from your own app?
... View more
11-08-2017
08:52 PM
Check metastore log from standalone metastore. It may have more information.
... View more
11-07-2017
05:51 PM
If your competing read/insert target a single partition this should be safe since Hive uses 'rename' file system operation at the end of insert to make new files visible. Rename is atomic on HDFS. If your insert is a dynamic partition insert then you are writing multiple partitions and the data for each partition is using the 'rename' operation. This means that some read operation could see a set of files that reflects only part of the insert. Insert overwrite actually deletes existing files so this can conflict with a concurrent read.
... View more
10-20-2017
04:25 PM
could you also post the plan with "hive.explain.user=true" please
... View more
10-19-2017
06:19 PM
I think for partition pruning to work you have to either something like WHERE license_name=X somewhere for static pruning or src.license_name=target.license_name for dynamic partition pruning. Otherwise there is nothing to to infer a smaller partition set form.
... View more
10-03-2017
09:03 PM
you can stage your data somewhere and use "Insert into AcidTable Select * from ..." If the data originates in some streaming fashion then Streaming Ingest may be appropriate - this has been integrated with Stork, Flume and NiFi. Load Data statement is not supported in 2.x.
... View more
09-29-2017
04:07 PM
hive.support.concurrency =true hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager hive.lock.manager=org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager will install a lock manager (there are several; ZooKeeper based is the default) w/o enabling full Acid. If you do use hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager then hive.lock.manager is ignored and you will be using Metastore based lock manager that is used by Acid but if you don't create your tables with "transactional=true" all your tables remain the same. I believe external tables should be locked in this case.
... View more
09-26-2017
05:04 PM
This may be due to not having https://issues.apache.org/jira/browse/HIVE-10632 in your build. There is some data in internal tables that was not cleaned when tables were dropped which is causing Initiator to try to schedule compactions.
... View more
09-14-2017
04:10 PM
There isn't. Perhaps @thejas has a recommendation.
... View more
09-13-2017
02:27 PM
Have you considered using Merge statement? For example, https://community.hortonworks.com/articles/97113/hive-acid-merge-by-example.html
... View more