Member since
08-16-2016
35
Posts
8
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
10599 | 09-26-2016 06:08 AM | |
2645 | 09-26-2016 05:55 AM | |
10628 | 09-21-2016 01:44 PM | |
3516 | 09-06-2016 05:26 AM | |
25793 | 09-02-2016 10:51 AM |
07-17-2020
10:13 AM
1 Kudo
executing a file with multiple queries in it should work. But each statement is executed individually and is non-atomic (they are not executed within a single transaction). Please post exceptions or errors if this has not worked for you.
... View more
12-10-2018
08:14 AM
1 Kudo
Yes, that configuration should work. But you should we aware that you now have 2 instances that can modify the hive metadata without locking at the hive level. Meaning, the database is the one that is in-charge of concurrency. Say you have a HMS client on one reading the full HMS snapshot via one instance, and at the same time, another client is modifying the metadata via the other instance, you may (or may not) see exceptions in the log about certain cached objects (in datanucleus) were no longer found. The HMS handler has retrying logic in it that will retry the transaction and succeed. It is safe to ignore the exceptions as long as the queries are succeeding. Hope this helps. Thanks
... View more
12-10-2018
08:06 AM
1 Kudo
The JsonSerDe class is in hive-hcatalog-xxx.jar. So you will have to run "ADD JAR" on this file prior. JsonSerDe has been made a first class citizen recently via https://jira.apache.org/jira/browse/HIVE-18785. With this fix you should be able to just use "STORED AS JSONFILE" instead of having to specify the ROW FORMAT SERDE This fix is available in CDH6.1 ( I believe, dont quote me on this) Hope this helps. Thanks
... View more
07-09-2018
08:17 AM
There is not information for us to provide any guidance on what could be wrong. Do you believe that the hive results are inaccurate or the impala results? Could you provide the following items for us to look at? 1) full table definition in hive 2) Full or Sample data to help reproduce the issue. 3) Queries results from both Hive and impala for the sample data above. Thanks
... View more
05-21-2018
08:01 AM
Could you please post the HMS log snippet please showing the full exception stack? Could you also post the output of select * from SEQUENCE_TABLE from your HMS metastore DB? Thanks
... View more
02-01-2018
11:27 AM
the principal is the username that will be passed thru to HS2 for authentication (in your case, user to be authenticated against LDAP). This will be the same user that the jobs will run as in yarn/MR (if impersonation is turned on in Hive). It is not the kerberos-related. It is equivalent to -n command option for beeline. Hope this helps.
... View more
02-01-2018
07:37 AM
Its not quite clear what the issue is. We will probably need HS2 and spark logs to do understand the issue. However, I am curious if the second step succeeded. "load data inpath '/tmp/new.txt' into table new_tmp;" This appears to be a local path but there is no "LOCAL" keyword in the command. Have you verfied that the data was actually inserted into new_tmp table after this step? Also what version of CDH is this? Thanks
... View more
02-01-2018
07:30 AM
Hello, I am a Hive engineer so I am not quite certain of the HUE side settings but like you said the first step should be to narrow down if it is a hive-side or a hue-side issue. Testing from Beeline makes sense to ensure that hive-side settings are good. To beeline JDBC URL, you can add the TrustStore localtion and credentials for it to use SSL connection. Something like this beeline -u "jdbc:hive2://<HS2HostFQN>:10000/default;ssl=true;sslTrustStore=/etc/cdep-ssl-conf/signer/truststore.jks;sslTrustPassword=<pwd>;principal=<user>" Let me know what you find. Thanks
... View more
09-20-2017
09:24 AM
1 Kudo
I would say this is a bug. If the user isnt supposed to be performing a certain action (updating the name node URIs in this case), then either the UI should have prevented the user from performing the action or should have done nothing during the action if the cluster was not HA-enabled for NN. Manipulating the metadata is bad. I will file an internal jira. Thank you for reporting.
... View more
11-09-2016
12:31 PM
1 Kudo
The bold text is used to tell hive how to read/interpret the data for the hive table (located at '/user/hive/warehouse/original_access_logs' in this case). 1) ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.RegexSerDe" -- tells hive to use this class to serialize and deserialize the rows to/from the file. 2) input.regex is a property used by this class (RegexSerde) to deserialize the rows read from the table data. So this regex pattern is applied to the row value read from the file to split up into different columns defined in the meta data for this hive table. 3) output.format.string is a property used by this class (RegexSerde) to serialize the rows being written out to this table data. This value is used as a format to generate a row value (from its column values) that is to be written back to the output file for this hive table. Hope this helps. Thanks
... View more
10-06-2016
08:10 AM
Awesome !! Thanks for the update.
... View more
10-05-2016
01:12 PM
Could you please post an update so we can determine if there is a regression or not? Thanks
... View more
09-27-2016
07:13 AM
Hey, First you want to ensure that variable substitution is not disabled in your hive environment. So check the value of "hive.variable.substitute" property in your configuration. A few examples are provided in the documentation wiki below. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution There are multiple namespaces in hive (system, hiveconf, env etc). env looks at the environment variables. I think you want to use something other than hiveconf (because it meant for propertiesi n hive-site.xml file) Please find examples below and let me know if you still need help. Thanks
... View more
09-26-2016
06:10 AM
replied to the other thread regarding the Qs you posted. Hope this helps. Lets move this conversation to the other thread. Please accept the solution to close this thread. Thanks
... View more
09-26-2016
06:08 AM
1 Kudo
There are multiple ways to load data into your hive table. 1) Local file using "load data local inpath". Please be aware that if you are running this from beeline, this path actually refers to the local file on the HiveServer2 node because HS2 is the service actually executing this command not beeline. 2) Load from HDFS path using "load data inpath". Notice that there is no "LOCAL" keyword in the command. This indicates that this is a HDFS path. 3) Load from another hive table, like insert into table A select * from B where B.col1 > 100; 4) Or you could add a file to the HDFS directory for a hive table, and it will pick up. create table A (b int) location '/tmp/tableA'; you can add files to HDFS path '/tmp/tableA' directory and hive will see this data for table A. Please accept this solution if I have answered your questions on this topic.
... View more
09-26-2016
05:55 AM
Is this issue specific to CDH5.9? What is he behavior on CDH5.8. AFAIK, this is the expected behavior. The HoS application on yarn keeps running even after the query result is running. This application on yarn is treated as a container to run future queries. Starting a new container is an expensive operation. Having them warmed up speeds up the execution of future queries. You should observe that the next queries are noticeably faster. Please provide us additional info on the behavior in CDH5.8, so we can further assist you. Thanks
... View more
09-21-2016
01:47 PM
Hey, I just posted a reply to the other thread you created.
... View more
09-21-2016
01:44 PM
2 Kudos
I think the problem is that you havent defined what the ROW FORMAT is for your hive table. hive needs to understand how to separate rows in the inputfile (think the default is '\n') and how to separate columns from each row (I am not certain on what the default is but guessing that it might be COMMA) CREATE TABLE test(name STRING, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'; cat test.txt name1,value1, name2,value2, name3,value3 then load the file above LOAD DATA LOCAL INPATH '/tmp/test.txt' INTO TABLE test; select * from test; this should work. If you can provide the content of your file, I can give you a more specific answer. Hope this helps
... View more
09-06-2016
05:26 AM
Yes, HMS and HS2 have separate log4j.properties files. However, I find it odd that one cluster has the threadname in the log records while the other does not (both CDH5.4.7). The defaults should be exactly the same on both the clusters. Perhaps, you could confirm that you have not overridden the configuration in a safety valve. To add a threadname to the log records for the HMS, Select Hive Service --> Instances --> Hive Metastore Server --> Configuration then type in log4j and look for "Hive Metastore Server Logging Advanced Configuration Snippet (Safety Valve)" This is where I would set it to something like this .. log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p [%t] : %c{2} (%F:%M(%L)) - %m%n or just %d{ISO8601} %p [%t] : %c: %m%n the %t is what adds the thread name to the log records. Hope this helps. Thanks
... View more
09-02-2016
11:59 AM
Hey Alina, I have tried this with CDH5.3.10 (a bit newer than your version) and with the new CDH releases, and round seems to be functioning as expected. Connected to: Apache Hive (version 0.13.1-cdh5.3.10) Driver: Hive JDBC (version 0.13.1-cdh5.3.10) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 0.13.1-cdh5.3.10 by Apache Hive 0: jdbc:hive2://localhost:10000/default> show databases; +--------------------------------------------------------------------------------------------------+--+ | database_name | +--------------------------------------------------------------------------------------------------+--+ | cloudera_manager_metastore_canary_test_db_hive_1_hivemetastore_03567ed21d1b892110ff1cd925ae25bd | | default | +--------------------------------------------------------------------------------------------------+--+ 2 rows selected (1.916 seconds) 0: jdbc:hive2://localhost:10000/default> use default; No rows affected (0.127 seconds) 0: jdbc:hive2://localhost:10000/default> show tables; +------------+--+ | tab_name | +------------+--+ | sample_07 | | sample_08 | +------------+--+ 2 rows selected (0.153 seconds) 0: jdbc:hive2://localhost:10000/default> select round((20456079/100000),5); +------------+--+ | _c0 | +------------+--+ | 204.56079 | +------------+--+ 1 row selected (30.771 seconds) 0: jdbc:hive2://localhost:10000/default> select round((20456079/100000),1); +--------+--+ | _c0 | +--------+--+ | 204.6 | +--------+--+ 1 row selected (23.981 seconds) 0: jdbc:hive2://localhost:10000/default> select round((20456079/100000),3); +----------+--+ | _c0 | +----------+--+ | 204.561 | +----------+--+ 1 row selected (23.257 seconds) 0: jdbc:hive2://localhost:10000/default> select round(150,3); +------+--+ | _c0 | +------+--+ | 150 | +------+--+ 1 row selected (22.973 seconds) 0: jdbc:hive2://localhost:10000/default> Against CDH5.8, same results. The big difference between the 2 releases is the time of execution. It consistently take 23-30 seconds on CDH5.3.10 where as it takes about 0.1 seconds on CDH5.8.0. It is possible that my enviroment for CDH5.3.10 is not so kosher. I do not have a CDH5.3.4 environment anymore. But let me know if you would like me to try it with CDH5.3.4. Hope this helps.
... View more
09-02-2016
10:51 AM
If this is from HUE, hue can execute multiple command in a sequence until it reaches the first query that returns results (like a select query). For example, the following should be executed entirely in a single go. drop table if exists foo; create table if not exists foo (code string, description string, salary int); insert into foo select code, description, salary from sample s where s.salary > 50000 and s.salary < 100000; select * from foo where salary < 75000; The following will stop after the select query, so the drop table will not be executed. drop table if exists foo; create table if not exists foo (code string, description string, salary int); insert into foo select code, description, salary from sample_07 s where s.salary > 50000 and s.salary < 100000; select * from foo where salary < 75000; drop table foo; But if you use beeline to execute a file containing multiple select queries, this should work without pausing.
... View more
09-02-2016
07:25 AM
Can you please provide additional details on what the usecase is? are you using oozie hive1 action or hive2 action? Are these jobs failing? Please provide us a brief reproducer if you can. Thank you
... View more
09-02-2016
07:19 AM
HPL SQL is not currently supported in CDH. This is a new feature added in Apache Hive 2.0, CDH is currently on Hive 1.1 (plus patches). We plan to rebase to the latest upstream version as part of our future major release, but HPL -SQL is not currently in scope for support for the current CDH releases. Hope this helps. Thanks
... View more