About uma66

uma66 · ‎03-11-2019

Sorry, I resolved by myself. By performing backup with the --clean option, restoration was successful. backup [root@localhost ~]$ pg_dump -w -h localhost -p 7432 -U scm --clean > /tmp/scm_server_db_backup.$(date +%Y%m%d) restore [root@localhost ~]$ psql -w -h localhost -p 7432 -U scm -f /tmp/scm_server_db_backup.$(date +%Y%m%d) The above restore was successful.

uma66 · ‎03-11-2019

Hi. I am using Embedded PostgreSQL for CM. Backed up as described in the document below. https://www.cloudera.com/documentation/enterprise/5-16-x/topics/cm_ag_backup_dbs.html#cmig_topic_5_6_3 pg_dump -h hostname -p 7432 -U scm > /tmp/scm_server_db_backup.$(date +%Y%m%d) But trying to restore fails. [root@localhost ~]$ psql -w -h localhost -p 7432 -U scm -f /tmp/scm_server_db_backup.$(date +%Y%m%d) ↓ psql:scm_server_db_backup:54: ERROR: relation "audits" already exists ALTER TABLE psql:scm_server_db_backup:77: ERROR: relation "client_configs" already exists ALTER TABLE psql:scm_server_db_backup:89: ERROR: relation "client_configs_to_hosts" already exists ALTER TABLE . . . psql:scm_server_db_backup:1036: ERROR: duplicate key value violates unique constraint "audits_pkey" DETAIL: Key (audit_id)=(1) already exists. CONTEXT: COPY audits, line 1 psql:scm_server_db_backup:1056: ERROR: duplicate key value violates unique constraint "client_configs_pkey" DETAIL: Key (client_config_id)=(1) already exists. CONTEXT: COPY client_configs, line 1 psql:scm_server_db_backup:1093: ERROR: duplicate key value violates unique constraint "cluster_activated_releases_pkey" DETAIL: Key (cluster_id, release_id)=(1, 1) already exists. . . . psql:scm_server_db_backup:30181: ERROR: multiple primary keys for table "audits" are not allowed psql:scm_server_db_backup:30189: ERROR: multiple primary keys for table "client_configs" are not allowed psql:scm_server_db_backup:30197: ERROR: multiple primary keys for table "cluster_activated_releases_aud" are not allowed . . . For the time being, I attempted to execute restoration after dropping all DB. drop_all.sql drop database "postgres"; drop database "scm"; drop database "amon"; drop database "rman"; drop database "nav"; drop database "navms"; But it gets an error. [root@localhost ~]$ psql -U cloudera-scm -p 7432 -h localhost -d postgres -f drop_all.sql Password for user cloudera-scm: ↓ psql:drop_all.sql:1: ERROR: cannot drop the currently open database psql:drop_all.sql:2: ERROR: database "scm" is being accessed by other users DETAIL: There are 11 other sessions using the database. I tried to drop the process killing of scm database, but the process will be restored while killing. How can I restore Embedded PostgreSQL?

uma66 · ‎03-05-2019

Thank you for answering. That means that it "cursor.fetchall()" contains hdfs scan time. On the other hand, the bottleneck is not on "hdfs scan" but on the client or network. I checked below, but I interpreted this problem as occurring in the case of specifying a size smaller than the default batch size. https://issues.apache.org/jira/browse/IMPALA-1618 It is questionable whether there is a possibility of occurrence even when using "cursor.fetchall()". I have found an issue that shows the same thing. https://github.com/cloudera/impyla/issues/239 Wes McKinney says it is a problem of hs2client. Somehow I understood that there was no solution.... Thanks!

uma66 · ‎03-05-2019

Hi, There is a program that uses Impyla to retrieve data from the local Impala daemon. cursor.execute("select * from table;") rows = cursor.fetchall() The table has 5 million rows, the number of columns is 9, the file size at the time of CSV conversion is about 200 MB. There are four data nodes.Memory is 32 GB. Despite just that much data, fetchall () takes over 200 seconds. Query execution ends in 0.2 seconds Why is it so slow? Do you have any ideas to speed up something? Thanks!

uma66 · ‎12-09-2018

The problem was solved. I had granted privileges with "hdfs dfs -setfacl" command, but I needed to grant privileges with "GRANT ON URI" command from impala.

uma66 · ‎12-02-2018

Hi, I enabled sentry for impala and sync to HDFS. Mostly working correctly, but when execute "CREATE TABLE" with location by impala will cause unexpected privilege error. Despite having authority to the specified location. CDH Settings hadoop.security.group.mapping: ShellBasedUnixGroupsMapping hadoop.security.authentication: simple hive.sentry.provider: HadoopGroupResourceAuthorizationProvider Authentication is all disabled for hdfs/hive/impala. Details are described below. 1. The first case is when location for table directory is not specified. [root@hostname ~]# su test_user1 [test_user1@hostname ~]$ impala-shell [hostname.example.com:21000] > CREATE EXTERNAL TABLE `test_db`.`test_table1` (`a` int , `b` int , `c` int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TextFile TBLPROPERTIES("skip.header.line.count" = "1"); Fetched 0 row(s) in 0.50s This worked. Check the authority of the created impala table directory. [root@hostname ~]# hdfs dfs -getfacl /user/hive/warehouse/test_db.db/test_table1 # file: /user/hive/warehouse/trial_f2042910.db/test_table1 # owner: hive # group: hive user:hive:rwx user:test_user1:rwx group:hive:rwx group:test_group1:rwx mask::rwx other::--x The all authority is given to "test_user1". 2. The next case is when location for table directory is specified. [root@hostname ~]# hdfs dfs -getfacl /user/hive/warehouse/test_db.db/test_table2 # file: /user/hive/warehouse/test_db.db/test_table2 # owner: hive # group: hive user:hive:rwx user:test_user2:rwx group:hive:rwx group:test_group2:rwx mask::rwx other::--x The all authority is given to "test_user2". [root@hostname ~]# su test_user2 [test_user2@hostname ~]$ impala-shell [hostname.example.com:21000] > CREATE EXTERNAL TABLE `test_db`.`test_table2` (`a` int , `b` int , `c` int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TextFile LOCATION '/user/hive/warehouse/test_db.db/test_table2' TBLPROPERTIES("skip.header.line.count" = "1"); ERROR: AuthorizationException: User 'test_user2' does not have privileges to access: hdfs://hostname.example.com:8020/user/hive/warehouse/test_db.db/test_table2 This not worked. why? By the way, with the hdfs command can write without problems. [root@hostname ~] su test_user2 [test_user2@hostname ~]$ hdfs dfs -put test.csv /user/hive/warehouse/test_db.db/test_table2/ => success The clues are that there is a difference in the impala deamon log. 1. The first case is when location for table directory is not specified. I1130 19:00:53.146760 3080 impala-hs2-server.cc:418] ExecuteStatement(): request=TExecuteStatementReq { 01: sessionHandle (struct) = TSessionHandle { 01: sessionId (struct) = THandleIdentifier { 01: guid (string) = ">\xfa\xb2|/\xe3J\xde\x978>\xfb\xf9\xc9k\x13", 02: secret (string) = "p\"a\xee\xd4\xc4G\x1d\x9aOV\xbe6\x17\xa6\x8b", }, }, 02: statement (string) = "CREATE EXTERNAL TABLE `test_db`.`test_table1` (`a` int , `b` int , `c` int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TextFile TBLPROPERTIES(\"skip.header.line.count\" = \"1\")", 03: confOverlay (map) = map<string,string>[2] { "QUERY_TIMEOUT_S" -> "600", "impala.resultset.cache.size" -> "100000", }, 04: runAsync (bool) = true, } . . 2. The next case is when location for table directory is specified. I1130 19:08:29.901100 18617 impala-beeswax-server.cc:52] query(): query=CREATE EXTERNAL TABLE `test_db`.`test_table2` (`a` int , `b` int , `c` int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TextFile LOCATION '/user/hive/warehouse/test_db.db/test_table2' TBLPROPERTIES("skip.header.line.count" = "1") I1130 19:08:29.901142 18617 impala-beeswax-server.cc:426] query: Query { 01: query (string) = "CREATE EXTERNAL [...](259)", 03: configuration (list) = list<string>[0] { }, 04: hadoop_user (string) = "test_user2", } . . When location is not specified, the query is executed with ExecuteStatement() method of impala-hs2-server.cc. But location is specified, the query is executed with query() method of impala-beeswax-server.cc. Do you know what is wrong? Is this a bug? Thank you in advance. uma66.

uma66 · ‎09-04-2018

Hello, I am building a REST API server that relays queries to Impala. The REST API will receive keytab file from the client server, and want to proxy Kerberos authentication on the API side. The following sequence. [Client Server] -- send keytab --> [REST API] --> ODBC or JDBC --> [Impala] In order to realize the above, I think that it is necessary to dynamically authenticate ODBC using the keytab received on the REST API side. is there such a thing possible? For example, HDFS Java API can pass and transfer arbitrary keytab as follows. UserGroupInformation.loginUserFromKeytab("hdfs@CLOUDERA", "/etc/hadoop/conf/hdfs.keytab"); However, Impala's ODBC or JDBC document shows that you are preparing a static file (UPNKeytabMappingFile) that defines pairs of user principals and keytab files. { "cloudera": { "principal" : "cloudera@CLOUDERA", "keytab": "/tmp/cloudera.keytab" }, Is there a way to authenticate with keytab received from client without predefining it? Thank you in advance.

Online	Offline
Last Visited	‎03-25-2019 04:12 AM

Member Since	‎09-04-2018 07:56 PM
Last Visited	‎03-25-2019 04:12 AM
Posts	9
Kudos received	1

Cloudera Community

Re: Fails Restoration of embedded PostgreSQL for C...

Re: When execute "CREATE TABLE" with location by i...

Re: Fails Restoration of embedded PostgreSQL for C...

Fails Restoration of embedded PostgreSQL for CM.

Re: Impyla bad performance - rows fetch is very sl...

Impyla bad performance - rows fetch is very slow

Re: When execute "CREATE TABLE" with location by i...

When execute "CREATE TABLE" with location by impal...

Impala: Authenticate with Kerberos using a "receiv...