About TomasTF

TomasTF · ‎09-14-2018

Queries via jdbc client. Yes tried refresh, but after ~ 30 min all the queries appeared.

TomasTF · ‎09-14-2018

Hi, I have a brand new installation of CDH 5.15, where all services (MNG,Impala) are in green. I executed ~10 queries on Impala, and was checking the queries in Cloudera Manager -> Impala Queries. I noticed two issues: - Some of the queries were in the list when there were in "running" state - After the statements were finished, no query were reported in Impala Queries. The obvious reason could be time filter: I checked 30m, 1h, 2h, 1d, still NO results. Also another obvious reason can be a search filter. The filter is empty, NO results. I checked the Service Monitor of CM, it is in green, so I suppose it collects data. I checked the Impala storage (firehose_impala_storage_bytes) it is 1GB. The only "warning" is about the memory of Service Monitor what is much less, but this is a new cluster, no workload running, and the CM reports that the heap usage is under 1G The recommended non-Java memory size is 12.0 GiB, 10.5 GiB more than is configure What could be the problem of the empty list? Why Cloudera Manager is not collecting the Impala queries? Or maybe it is, but then why it is not reporting, showing me them? Thanks

TomasTF · ‎09-12-2018

How does the data looks like? I think the json has to be in one row (so cant contain newlines) and you have to have one json per line. At least I had a similar issue when I wanted to load a data via external table, where the json contained one big list with many dict elements.

TomasTF · ‎04-08-2015

Hi, we installed 64bit ODBC driver from DataDirect for Impala and tried to establish a connection between SQL Server 2014 (running on Windows Srv 2012R2) and Cloudera Impala. After setting up the ODBC driver, the test connection was ok. But the linked server is not working, listing tables works, but a simple select statmenet returns this kind of error: OLE DB provider "MSDASQL" for linked server "IMPALA" returned message "Unspecified error". Msg 7311, Level 16, State 2, Line 1 Cannot obtain the schema rowset "DBSCHEMA_COLUMNS" for OLE DB provider "MSDASQL" for linked server "IMPALA". The provider supports the interface, but returns a failure code when it is used. I also contacted the technical team from Progress Software but no response yet, Any ideas?

TomasTF · ‎03-27-2015

Created a case from this issue, hopefully the engineering team will come back with a solution Tomas

TomasTF · ‎03-23-2015

Hi, we are trying to download a bulk of data from CDH cluster via Windows ODBC Driver for Impala version 2.5.22 to a Windows server. The ODBC driver works well, but the performance of rows dispatching is really bad - roughly 3M rows/minute. We checked the possible bottlenecks for this kind of download, but the cluster and also the receiving Windows server were not under load at all, the cpu around 5%, the network cards running on 10Gbit, there are plenty of RAM memory, the target disk where the data is written is RAID-0 SSD with 1GB/s max throughput, so we dont know what component on the trasnfer slows down the records. We tried to run in multiple parallel threads, what helped a little bit (50% perf increase) but the overall perf is still low.. Also tried to tweak the transfer batch size in ODBC driver, it looks that it doesnt affect the performance at all. The setup is CDH5.3, and Microsoft SQL Server 2014, the Impala is linked via linked server in MS SQL. Any ideas how to increase the transfer speed? Thanks Tomas

TomasTF · ‎02-19-2015

I have a simple pig program, with a simple LOAD and STORE/DUMP statement, but it refuse to load the test data file. The path in HDFS is in /user/dwh/ the file is called test.txt. I assume the pig is not aware of the HA setting of my cluster, Any ideas? Input path does not exist: hdfs://nameservice1/user/dwh/test.txt

TomasTF · ‎02-19-2015

I found the piggybank.jar in /opt/cloudera/parcels/CDH/lib/pig/. The problem was in fact that when I called register piggybank, the grunt shell gave me this exception: grunt> REGISTER piggybank.jar 2015-02-19 12:38:49,841 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2015-02-19 12:38:49,849 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 101: file 'piggybank.jar' does not exist. after changing the directory into the lib path the register worked well.. or use: REGISTER /opt/cloudera/parcels/CDH/lib/pig/piggybank.jar Tomas

TomasTF · ‎02-19-2015

Hey guys, does cloudera packs piggy UDF into the CDH? I tried to find in the distribution anything called piggybank, but was not succesfull. Can somebody advice me how to add the piggybank UDFs into the existing pig installation in CDH? https://cwiki.apache.org/confluence/display/PIG/PiggyBank Thanks. Tomas

TomasTF · ‎01-07-2015

This issue - with reading large tables compressed by Impala - was (based on my experiences) solved in the release of Impala 2.1 (CDH 5.3.1) Cloudera did not confirm this as a bug - when I tried to arrange a conf call with cloudera support and they tried to investigate where is the problem - they were not able define what is the root cause of this bug. I assume that this changed helped to solve the problem (Impala 2.1.0 release notes): The memory requirement for querying gzip-compressed text is reduced. Now Impala decompresses the data as it is read, rather than reading the entire gzipped file and decompressing it in memory But this is not confirmed, after upgrade Impala did not crash anymore. T

Online	Offline
Last Visited	‎09-14-2018 04:36 AM

Member Since	‎09-26-2014 12:28 AM
Last Visited	‎09-14-2018 04:36 AM
Posts	44
Kudos received	10

Cloudera Community

Re: piggybank

Re: Reading external tables with Impala

Re: Impala where xxx in () list operator not worki...

Re: Hive Vs. Impala Queries

Re: Sqoop Queryresult.jar rename

Re: Empty Impala queries in Cloudera Manager

Empty Impala queries in Cloudera Manager

Re: Hive JSON SerDE org.apache.hadoop.hive.serde2....

Re: Impala ODBC performance

Re: Impala ODBC performance

Impala ODBC performance

Loading data in pig when HDFS in High Available mo...

Re: piggybank

piggybank

Re: Reading external tables with Impala