Member since
09-26-2014
44
Posts
10
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5723 | 02-19-2015 03:41 AM | |
1601 | 01-07-2015 01:16 AM | |
11147 | 12-10-2014 04:59 AM | |
6906 | 12-08-2014 01:39 PM | |
6160 | 11-20-2014 08:16 AM |
01-07-2015
01:12 AM
More interestingly this differencce dissappeared after upgrading to CDH 5.3.1. T.
... View more
01-06-2015
06:21 AM
Anybody has an experience how to process xml data (imported from MSSQL) and how to store and analyze them in Impala? Thanks Tomas
... View more
Labels:
- Labels:
-
Apache Impala
12-10-2014
04:59 AM
During my test I came to one (maybe not correct) conclusion. The table is big and partitioned, and maybe Impala just limits the query to a subset of a table. Because if I change the query like create table result as select * from tmp_ext_item where item_id in ( 3040607, 5645020, 69772482, 2030547, 1753459, 9972822, 1846553, 6098104, 1874789, 1834370, 1829598, 1779239, 7932306 ) then it runs correctly and returns all items with the specified item_id.
... View more
12-08-2014
01:39 PM
I solved the issue with from_utc_timestamp(Create_Time, 'CEST'). Impala assumes the the timestamp value is stored in UCT. So converting to central european time with summery daylight saving will produce the correct result. As far as I know there is no way to tell Impala that the current timezone is CEST, so in every query this conversion should be made.
... View more
12-08-2014
04:51 AM
Hi, running a simple query where in the WHERE condition is a column IN ( ) condition and the list contains 13 elements (numbers). The column is type of int. Every time I run a query I got a different result, sometimes 5 rows, sometimes 2 rows, sometimes 10 rows. Of course I checked ID by ID that all elements are in the table... is this a known bug or I am missing something? select * from tmp_ext_item where item_id in ( 3040607, 5645020, 69772482, 2030547, 1753459, 9972822, 1846553, 6098104, 1874789, 1834370, 1829598, 1779239, 7932306 ) T.
... View more
Labels:
- Labels:
-
Apache Impala
11-26-2014
02:17 PM
I have a same issue, the same query returns different dates. In impala the date is one hour less than in Hive. Table was created in hive, loaded with data via insert overwrite table in hive (table is partitioned). And for example the timestamp 2014-11-18 00:30:00 - 18th of november was correctly written to partition 20141118. But when I fetch the table in impala, whith condition day_id (partition column) = 20141118 I see a value 2014-11-17 23:30:00 So the difference is one hour. If I query the minimum and maximum start_time from the table in one partition in the Imapal (partition day_id = 2014118) I get this wrong result: min( start_time ) = 2014-11-17 23:00 max( start_time ) = 2014-11-18 22:59 when I run the same query in Hive the result is ok: min( start_time ) = 2014-11-18 00:00 max( start_time ) = 2014-11-18 23:59 Any help?
... View more
11-20-2014
08:16 AM
Works great! Simply setting the --class-name overrides the name of the jar file. Thanks!
... View more
11-19-2014
12:06 PM
Have you changed somethin in directory or file permissions in /var/run? If yes, you should probably reconfigure YARN to use a NEW directory (for example if YARN used /data/yarn/nm for NodeManager, configure a new path as /data/yarn/nm2) After setting changing EVERY directory for YARN and restarting the Cluster the YARN started, created the new directories and set the permissions correctly, so now we dont have this kind of problem with permissions. If you didnt change any permission in the local file system, then I dont know what is the issue. Try another user - such as run for example a hive job under root/hdfs/yarn or other user, to see whether this is user related or it fails always. T.
... View more
11-19-2014
11:50 AM
Hi guys, have anybody tried to rename the output of the sqoop import command? It is always named as QueryResult.jar. When we run multiple sqoop import commands in parallel, in Cloudera Manager the Yarn applications does not distinct between them, every command is named as QueryResult.jar. The sqoop import command looks like: sqoop import --connect jdbc:sqlserver://OUR.SQL.SERVER.COM --username XXX --query 'select * from XXXXZZZ where Start_Time >= getdate()-7 and $CONDITIONS' -m 6 --split-by Start_Time --as-textfile --fields-terminated-by '\t' --delete-target-dir -z --target-dir /user/sql2hadoop/ext_xxxxzzzz sqoop import --connect jdbc:sqlserver://OUR.SQL.SERVER.COM --username XXX --query 'select * from XXXXYYY where Start_Time >= getdate()-7 and $CONDITIONS' -m 6 --split-by Start_Time --as-textfile --fields-terminated-by '\t' --delete-target-dir -z --target-dir /user/sql2hadoop/ext_xxxxyyyyyy I would like to see in YARN that for example there are two applications running: Import_XXXZZZ.jar and Import_XXXXYYY.jar Is there any parameter for setting the application name? Thanks
... View more
Labels:
- Labels:
-
Apache Sqoop
-
Apache YARN
-
Cloudera Manager
- « Previous
-
- 1
- 2
- Next »