About TomasTF

TomasTF · ‎01-07-2015

More interestingly this differencce dissappeared after upgrading to CDH 5.3.1. T.

TomasTF · ‎01-06-2015

Anybody has an experience how to process xml data (imported from MSSQL) and how to store and analyze them in Impala? Thanks Tomas

TomasTF · ‎12-10-2014

During my test I came to one (maybe not correct) conclusion. The table is big and partitioned, and maybe Impala just limits the query to a subset of a table. Because if I change the query like create table result as select * from tmp_ext_item where item_id in ( 3040607, 5645020, 69772482, 2030547, 1753459, 9972822, 1846553, 6098104, 1874789, 1834370, 1829598, 1779239, 7932306 ) then it runs correctly and returns all items with the specified item_id.

TomasTF · ‎12-08-2014

I solved the issue with from_utc_timestamp(Create_Time, 'CEST'). Impala assumes the the timestamp value is stored in UCT. So converting to central european time with summery daylight saving will produce the correct result. As far as I know there is no way to tell Impala that the current timezone is CEST, so in every query this conversion should be made.

TomasTF · ‎12-08-2014

Hi, running a simple query where in the WHERE condition is a column IN ( ) condition and the list contains 13 elements (numbers). The column is type of int. Every time I run a query I got a different result, sometimes 5 rows, sometimes 2 rows, sometimes 10 rows. Of course I checked ID by ID that all elements are in the table... is this a known bug or I am missing something? select * from tmp_ext_item where item_id in ( 3040607, 5645020, 69772482, 2030547, 1753459, 9972822, 1846553, 6098104, 1874789, 1834370, 1829598, 1779239, 7932306 ) T.

TomasTF · ‎11-26-2014

I have a same issue, the same query returns different dates. In impala the date is one hour less than in Hive. Table was created in hive, loaded with data via insert overwrite table in hive (table is partitioned). And for example the timestamp 2014-11-18 00:30:00 - 18th of november was correctly written to partition 20141118. But when I fetch the table in impala, whith condition day_id (partition column) = 20141118 I see a value 2014-11-17 23:30:00 So the difference is one hour. If I query the minimum and maximum start_time from the table in one partition in the Imapal (partition day_id = 2014118) I get this wrong result: min( start_time ) = 2014-11-17 23:00 max( start_time ) = 2014-11-18 22:59 when I run the same query in Hive the result is ok: min( start_time ) = 2014-11-18 00:00 max( start_time ) = 2014-11-18 23:59 Any help?

TomasTF · ‎11-20-2014

Works great! Simply setting the --class-name overrides the name of the jar file. Thanks!

TomasTF · ‎11-19-2014

Have you changed somethin in directory or file permissions in /var/run? If yes, you should probably reconfigure YARN to use a NEW directory (for example if YARN used /data/yarn/nm for NodeManager, configure a new path as /data/yarn/nm2) After setting changing EVERY directory for YARN and restarting the Cluster the YARN started, created the new directories and set the permissions correctly, so now we dont have this kind of problem with permissions. If you didnt change any permission in the local file system, then I dont know what is the issue. Try another user - such as run for example a hive job under root/hdfs/yarn or other user, to see whether this is user related or it fails always. T.

TomasTF · ‎11-19-2014

Hi guys, have anybody tried to rename the output of the sqoop import command? It is always named as QueryResult.jar. When we run multiple sqoop import commands in parallel, in Cloudera Manager the Yarn applications does not distinct between them, every command is named as QueryResult.jar. The sqoop import command looks like: sqoop import --connect jdbc:sqlserver://OUR.SQL.SERVER.COM --username XXX --query 'select * from XXXXZZZ where Start_Time >= getdate()-7 and $CONDITIONS' -m 6 --split-by Start_Time --as-textfile --fields-terminated-by '\t' --delete-target-dir -z --target-dir /user/sql2hadoop/ext_xxxxzzzz sqoop import --connect jdbc:sqlserver://OUR.SQL.SERVER.COM --username XXX --query 'select * from XXXXYYY where Start_Time >= getdate()-7 and $CONDITIONS' -m 6 --split-by Start_Time --as-textfile --fields-terminated-by '\t' --delete-target-dir -z --target-dir /user/sql2hadoop/ext_xxxxyyyyyy I would like to see in YARN that for example there are two applications running: Import_XXXZZZ.jar and Import_XXXXYYY.jar Is there any parameter for setting the application name? Thanks

TomasTF · ‎11-10-2014

What is the solution? We have the same issue with starting YARN

Online	Offline
Last Visited	‎09-14-2018 04:36 AM

Member Since	‎09-26-2014 12:28 AM
Last Visited	‎09-14-2018 04:36 AM
Posts	44
Kudos received	10

Cloudera Community

Re: piggybank

Re: Reading external tables with Impala

Re: Impala where xxx in () list operator not worki...

Re: Hive Vs. Impala Queries

Re: Sqoop Queryresult.jar rename

Re: Hive Vs. Impala Queries

Will impala support xml data type?

Re: Impala where xxx in () list operator not worki...

Re: Hive Vs. Impala Queries

Impala where xxx in () list operator not working c...

Re: Hive Vs. Impala Queries

Re: Sqoop Queryresult.jar rename

Re: Yarn: One nodemanager refuse to start

Sqoop Queryresult.jar rename

Re: Yarn: One nodemanager refuse to start