Member since
01-07-2016
26
Posts
8
Kudos Received
0
Solutions
02-22-2019
12:27 AM
Hi, I have not followed the development of Impala lateley.If this i still a limitation you might try the following approach.Design the schema with an additional column with information about which rows holds information for a particular struct column and then use this additional column in the WHERE clause. Something like: name complex1 complex2 complex3
complex1 content NULL NULL
complex3 NULL NULL content and then: SELECT complex1.*
FROM myTable
WHERE name = 'complex1' Br, Petter
... View more
10-09-2018
02:58 AM
Hi all, we have our cluster deployed on AWS EC2 instances where some of the worker noedes are on spot instances. Usually there is no problem when spot instances disapear. We have time to decomission them from CM. Recently we have started to experience a ResourceManager crash in connection when we loose spot instances. See log below. After the ResourceManager crashes it does not restart automatically and after a while, all of our remaining NodeManger processes are shut down as well leaving no YARN capacity left at all eventhough we have plenty of helthy machines. We are using CDH 5.14.2. 1. Is the problem in the stack trace below known (Timer allready cancelled) 2. Can we change the configuration to have the ResourceManager automatically recover from this? I only see a automatically restart option for JobHistory server in CM but perhaps this is the same process? Br, Petter 2018-10-08 16:14:45,617 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, FSPreemptionThread, that exited unexpectedly: java.lang.IllegalStateException: Timer already cancelled.
at java.util.Timer.sched(Timer.java:397)
at java.util.Timer.schedule(Timer.java:193)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.preemptContainers(FSPreemptionThread.java:212)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.run(FSPreemptionThread.java:77)
2018-10-08 16:14:45,623 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Shutting down the resource manager.
2018-10-08 16:14:45,624 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2018-10-08 16:14:45,629 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@ip-10-255-4-86.eu-west-1.compute.internal:8088
2018-10-08 16:14:45,731 INFO org.apache.hadoop.ipc.Server: Stopping server on 8032
2018-10-08 16:14:45,732 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8032
2018-10-08 16:14:45,732 INFO org.apache.hadoop.ipc.Server: Stopping server on 8033
2018-10-08 16:14:45,732 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2018-10-08 16:14:45,732 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8033
2018-10-08 16:14:45,733 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2018-10-08 16:14:48,250 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at ip-10-255-4-86.eu-west-1.compute.internal/10.255.4.86:8033
2018-10-08 16:14:49,643 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-255-4-86.eu-west-1.compute.internal/10.255.4.86:8033. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-10-08 16:14:50,644 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-255-4-86.eu-west-1.compute.internal/10.255.4.86:8033. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-10-08 16:14:51,647 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-255-4-
... View more
Labels:
- Labels:
-
Cloudera Manager
-
YARN
12-12-2017
07:10 AM
Hi, great! It solved my problem! For other users in the future: We upgraded a 5.10.1 cluster (without Kudu) to a 5.12.1 cluster (with Kudu). The missing part was the configuration option 'Kudu Service' that was set to none in the Impala Service-Wide configuration. Setting this to Kudu insert the impalad startup option -kudu_master_hosts and after that I can create tables without the TBLPROPERTIES clause and Sentry now works as expected. Thank you very much, Hao!
... View more
12-11-2017
12:27 AM
Hi, >Would you mind sharing the query how you create a new table? Did you happen to set kudu master addresses in TBLPROPERTIES clause? I did use the TBLPROPERTIES clause. I read somewhere that it should not be needed if running in a CM environment but in our case we have to specify it. I see now that CM has not added the --tserver_master_addrs flas to the gflagfile. See belwo for a simplified CREATE TABLE statement. CREATE TABLE my_db.my_table
(
key BIGINT,
value STRING,
PRIMARY KEY(key)
)
PARTITION BY RANGE (key)
(
PARTITION 1 <= VALUES < 1000
)
STORED AS KUDU
TBLPROPERTIES ('kudu.master_addresses'='my-master-address'); Are you saying that it will work (with Sentry) if we add the --tserver_master_addrs to the tservers and remove the TBLPROPERTIES clause? Br, Petter
... View more
12-08-2017
05:50 AM
Hi, thank you for your reply! >Sorry, I missed that you are using external Kudu tables in the previous reply. They are in fact internal tables. I do not use the EXTERNAL keyword when creating the tables. The only way I can let one user group (ROLE in Sentry) create their own Kudu tables (via Impala) is to give the ALL privilegies on the server level. This has the side effect that this user group will enyoy access to all data on the cluster. This is not desired. Granting ALL on the (impala) db level does not help. Have I missed something? Will finer grained access arrive in the future? Br, Petter
... View more
12-06-2017
04:53 AM
Hi, we have a sentry role that have action=ALL on db=my_db When trying to issue a CREATE TABLE statment in Impala to create a Kudu table in my_db we get the following error: I1205 12:32:21.124711 47537 jni-util.cc:176] org.apache.impala.catalog.AuthorizationException: User 'my_user' does not have privileges to access: name_of_sentry_server A work-around is to set action=ALL on the server level to the sentry role but we don't want to give this wide permission to the role. Do we need to set action=ALL on the server level in order to delegate the rights to our users to create Kudu tables or how could we set up Sentry in this case? We use CDH 5.12.1 (Kudu 1.4.0) Br, Petter
... View more
08-23-2017
05:28 AM
Hi, we are experienceing the same issue. We are on CDH 5.10.1 Our corresponding figures reads: Planner Timeline
Analysis finished: 63015903
Equivalence classes computed: 63148873
Single node plan created: 72171446
Runtime filters computed: 72242303
Distributed plan created: 72530789
Lineage info computed: 72627976
Planning finished: 74054390
Query Timeline
Query submitted: 0
Planning finished: 212302910792
Submit for admission: 212305910788
Completed admission: 212305910788
Ready to start 13 fragment instances: 212306910788
All 13 fragment instances started: 212314910786
Rows available: 216195909152
Cancelled: 223800905948
Unregister query: 223816905942 Br, Petter
... View more
06-07-2017
07:18 AM
1 Kudo
We are feeling the same pain here. In cloudera manager there is usually a "safety valve" for relevant configuration files where you get the opportunity to tweak the configuration for each role. In the Spark2 section in cloudera manager there is no safety valve for hive-site.xml. BR, Petter
... View more
01-10-2017
11:34 AM
Hi Tim, thank you for taking the time to look at this issue! Br, Petter
... View more
- Tags:
- i i
01-10-2017
05:37 AM
1 Kudo
Hi all, I reported IMPALA-4725 last week but it seems like it has not been triaged yet. I wanted to bring some more attention to this issue (and possible suggestions for workarounds) since it has a heavy impact on us. To summarize it seems like Impala mixes-up values in arrays of structs which to me seems like a fundamental problem in the parquet reader. Alternatively the values gets mixed-up when presented as a result. Either way, I would very much appreciated an initiated persons view on this issue. We are running Impala that is bundled with CDH 5.8.3 Br, Petter
... View more
- Tags:
- CDH 5.8.3
Labels:
- Labels:
-
Impala
11-29-2016
07:25 AM
Hi all, Best Practices for Using Impala with S3 states "Set the safety valve fs.s3a.connection.maximum to 1500 for impalad ." Can annyone clarify which safety valve field should be used and with what syntax? I'm reading somewhere that this setting belongs to core-site.xml but Impala configuration in Cloudera Manger does not seem to have a safety valve for core-site.xml. The instructions mentions safety valve for impalad but that safety valve seems to be for command line arguments to impalad. The problem we are trying to adress is hdfsSeek(desiredPos=503890631): FSDataInputStream#seek error: com.cloudera.com.amazonaws.AmazonClientException: Unable to execute HTTP request: Timeout waiting for connection from pool that we keep getting when using Impala for querying data stored in S3. We are using CDH 5.8.3 Thanks, Petter
... View more
Labels:
- Labels:
-
Cloudera Manager
-
Impala
08-18-2016
11:51 PM
Hi, thank you very much for your reply! Just a follow up question. Given the scenario that we target say 10 GB of data stored in gzipped parquet in each partition. We have three nodes currently but it will increase soon. From an Impala performance perspective, which of the below approaches is better? - Store the data in 40 parquet files with file size = row group size = hdfs block size = 256 MB - Store the data in 10 parquet files with file size = row group size = hdfs block size = 1 GB - Store the data in 10 parquet files with file size 1 GB, row group size = hdfs block size = 256 MB Thanks, Petter
... View more
08-12-2016
02:45 AM
I have described an issue with time consuming parquet file generation in the Hive forum. See this post for a description on the environment. The question is half Impala related so I would appreciated if any Impala experts here could read that post as well. https://community.cloudera.com/t5/Batch-SQL-Apache-Hive/How-to-improve-performance-when-creating-Parquet-files-with-Hive/m-p/43804#U43804 I have some additional questions that are Impala specific. The environment currently has three Impala nodes with 5-10GB worth of data in each partition. The question is how I should generate the parquet files to achieve the most performance out of Impala. Currently I target the parquet file size to 1 GB each. The HDFS block size is set to 256 MB for these files and I have instructed to create row groups of the same size. Surprisingly I get many more row groups. I just picked a random file and it contained 91 row groups. Given our environment, where should we aim at file-size, number of row groups in each file and HDFS block size for the files? Also, if it would be more beneficiary to have fewer row-groups in each file, how can we instruct Hive to generate fewer row groups since Hive does not seem to respect the parquet.block.size option? We used the Impala version bundled with CDH 5.7.1 Thanks in advance, Petter
... View more
02-24-2016
12:25 PM
Thank you for your prompt reply! I will hold my attempts using this operation.
... View more
02-23-2016
05:16 AM
1 Kudo
It does not seem like the IS NULL / IS NOT NULL operator is supported for struct data types. We are using Impala 2.3.0/CDH5.5.1. This seem like a basic and vital operator to have. Especially when using wide tables. Anybody out there that has a patch or workaround or that actually succeeded to use this operator on structs? I have reported IMPALA-3060 on the topic.
... View more
02-01-2016
03:22 AM
Thank you for your reply! I also noticed that if I check the option "Enable Kerberos Authentication for HTTP Web-Consoles" in the YARN configuration I can make the kill button work. However, this will enable kerberos for web pages such as for the History Server and Resouce Manager. However, we do not want kerberos authentication on these pages. So, with the fix in CDH5.5.3 the kill button will work without enabeling the above option I assume?
... View more
01-26-2016
01:31 AM
I'm using CDH 5.5.0 with kerberos and Sentry enabled. Trying to kill a job from the job browser fails with the message "There was a problem communicating with the server: The default static user cannot carry out this operation. (error 403)" I can kill the same job using the yarn application -kill command. I guess this is a configuration issue. Could someone assist me in getting this right so that I can can kill jobs from the Job Browser? Stack trace: [26/Jan/2016 10:15:30 +0100] access WARNING 10.128.42.143 di23060584 - "POST /jobbrowser/jobs/application_1453476679853_0011/kill HTTP/1.1"
[26/Jan/2016 10:15:30 +0100] connectionpool INFO Resetting dropped connection: ip-10-255-2-7.eu-west-1.compute.internal
[26/Jan/2016 10:15:30 +0100] kerberos_ ERROR handle_mutual_auth(): Mutual authentication unavailable on 403 response
[26/Jan/2016 10:15:30 +0100] views ERROR Killing job
Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/hue/apps/jobbrowser/src/jobbrowser/views.py", line 246, in kill_job
job.kill()
File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/hue/apps/jobbrowser/src/jobbrowser/yarn_models.py", line 185, in kill
return self.api.kill(self.id)
File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/hue/desktop/libs/hadoop/src/hadoop/yarn/mapreduce_api.py", line 117, in kill
get_resource_manager(self._user).kill(app_id) # We need to call the RM
File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/hue/desktop/libs/hadoop/src/hadoop/yarn/resource_manager_api.py", line 124, in kill
return self._execute(self._root.put, 'cluster/apps/%(app_id)s/state' % {'app_id': app_id}, params=params, data=json.dumps(data), contenttype=_JSON_CONTENT_TYPE)
File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/hue/desktop/libs/hadoop/src/hadoop/yarn/resource_manager_api.py", line 141, in _execute
response = function(*args, **kwargs)
File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/hue/desktop/core/src/desktop/lib/rest/resource.py", line 136, in put
return self.invoke("PUT", relpath, params, data, self._make_headers(contenttype))
File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/hue/desktop/core/src/desktop/lib/rest/resource.py", line 78, in invoke
urlencode=self._urlencode)
File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/hue/desktop/core/src/desktop/lib/rest/http_client.py", line 161, in execute
raise self._exc_class(ex)
RestException: The default static user cannot carry out this operation. (error 403)
[26/Jan/2016 10:15:30 +0100] middleware INFO Processing exception: The default static user cannot carry out this operation. (error 403): Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/handlers/base.py", line 112, in get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/db/transaction.py", line 371, in inner
return func(*args, **kwargs)
File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/hue/apps/jobbrowser/src/jobbrowser/views.py", line 83, in decorate
return view_func(request, *args, **kwargs)
File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/hue/apps/jobbrowser/src/jobbrowser/views.py", line 249, in kill_job
raise PopupException(e)
PopupException: The default static user cannot carry out this operation. (error 403)
... View more
01-07-2016
01:38 PM
Ah, m ore information for the team to work with then! Let's hope for a solution.
... View more
01-07-2016
10:34 AM
1 Kudo
Thank you Alex for your quick reply and confirmation! I've created IMPALA-2820 to track this issue.
... View more
01-07-2016
07:53 AM
I have been testing CDH5.5.0 and have noted that Impala does not like reserved words as field names in complex types. This seems strange as reserved words can be used as column names for ordinary columns. Hive does not impose the same restriction. Reserved words can be back-ticked where needed. Does anybody know if this is by design or if this is an issue in Impala 2.3.0? We are using Hive to create Parquet files with complex types. Sample to reproduce issue and error message below. In the case below the word 'replace' is reserved. In Hive: CREATE EXTERNAL TABLE MyTable (
device_id STRING,
added struct<name:string,version_name:string,version_code:int,`replace`:boolean>
)
STORED AS PARQUET
LOCATION '/tmp/impala/mytable'; In Hive: INSERT OVERWRITE TABLE MyTable
SELECT
device_id,
payload AS added
FROM Added where import_id = 106000; In Impala: SELECT * FROM MyTable limit 10; Output: AnalysisException: Failed to load metadata for table: 'mytable' CAUSED BY: TableLoadingException: Unsupported type 'struct<name:string,version_name:string,version_code:int,replace:boolean>' in column 'added' of table 'mytable'
I0107 15:56:01.251721 21006 Frontend.java:818] analyze query SELECT * FROM MyTable limit 10
E0107 15:56:01.252320 21006 Analyzer.java:2212] Failed to load metadata for table: mytable
Unsupported type 'struct<name:string,version_name:string,version_code:int,replace:boolean>' in column 'added' of table 'mytable'
I0107 15:56:01.252908 21006 jni-util.cc:177] com.cloudera.impala.common.AnalysisException: Failed to load metadata for table: 'MyTable'
at com.cloudera.impala.analysis.TableRef.analyze(TableRef.java:180)
at com.cloudera.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:512)
at com.cloudera.impala.analysis.SelectStmt.analyze(SelectStmt.java:155)
at com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:342)
at com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:317)
at com.cloudera.impala.service.Frontend.analyzeStmt(Frontend.java:827)
at com.cloudera.impala.service.Frontend.createExecRequest(Frontend.java:856)
at com.cloudera.impala.service.JniFrontend.createExecRequest(JniFrontend.java:147)
Caused by: com.cloudera.impala.catalog.TableLoadingException: Unsupported type 'struct<name:string,version_name:string,version_code:int,replace:boolean>' in column 'added' of table 'mytable'
at com.cloudera.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:111)
at com.cloudera.impala.catalog.Table.fromThrift(Table.java:240)
at com.cloudera.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:357)
at com.cloudera.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:246)
at com.cloudera.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:132)
at com.cloudera.impala.service.Frontend.updateCatalogCache(Frontend.java:223)
at com.cloudera.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:164)
at ========.<Remote stack trace on catalogd>: com.cloudera.impala.catalog.TableLoadingException: Unsupported type 'struct<name:string,version_name:string,version_code:int,replace:boolean>' in column 'added' of table 'mytable'
at com.cloudera.impala.catalog.Table.parseColumnType(Table.java:331)
at com.cloudera.impala.catalog.HdfsTable.addColumnsFromFieldSchemas(HdfsTable.java:571)
at com.cloudera.impala.catalog.HdfsTable.load(HdfsTable.java:1073)
at com.cloudera.impala.catalog.TableLoader.load(TableLoader.java:84)
at com.cloudera.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:232)
at com.cloudera.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:229)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
... View more