Member since
09-04-2018
9
Posts
1
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3955 | 03-11-2019 08:10 PM | |
3583 | 12-09-2018 10:37 PM |
03-11-2019
08:10 PM
1 Kudo
Sorry, I resolved by myself. By performing backup with the --clean option, restoration was successful. backup [root@localhost ~]$ pg_dump -w -h localhost -p 7432 -U scm --clean > /tmp/scm_server_db_backup.$(date +%Y%m%d) restore [root@localhost ~]$ psql -w -h localhost -p 7432 -U scm -f /tmp/scm_server_db_backup.$(date +%Y%m%d) The above restore was successful.
... View more
03-11-2019
02:08 AM
Hi. I am using Embedded PostgreSQL for CM. Backed up as described in the document below. https://www.cloudera.com/documentation/enterprise/5-16-x/topics/cm_ag_backup_dbs.html#cmig_topic_5_6_3 pg_dump -h hostname -p 7432 -U scm > /tmp/scm_server_db_backup.$(date +%Y%m%d) But trying to restore fails. [root@localhost ~]$ psql -w -h localhost -p 7432 -U scm -f /tmp/scm_server_db_backup.$(date +%Y%m%d) ↓ psql:scm_server_db_backup:54: ERROR: relation "audits" already exists
ALTER TABLE
psql:scm_server_db_backup:77: ERROR: relation "client_configs" already exists
ALTER TABLE
psql:scm_server_db_backup:89: ERROR: relation "client_configs_to_hosts" already exists
ALTER TABLE
.
.
.
psql:scm_server_db_backup:1036: ERROR: duplicate key value violates unique constraint "audits_pkey"
DETAIL: Key (audit_id)=(1) already exists.
CONTEXT: COPY audits, line 1
psql:scm_server_db_backup:1056: ERROR: duplicate key value violates unique constraint "client_configs_pkey"
DETAIL: Key (client_config_id)=(1) already exists.
CONTEXT: COPY client_configs, line 1
psql:scm_server_db_backup:1093: ERROR: duplicate key value violates unique constraint "cluster_activated_releases_pkey"
DETAIL: Key (cluster_id, release_id)=(1, 1) already exists.
.
.
.
psql:scm_server_db_backup:30181: ERROR: multiple primary keys for table "audits" are not allowed
psql:scm_server_db_backup:30189: ERROR: multiple primary keys for table "client_configs" are not allowed
psql:scm_server_db_backup:30197: ERROR: multiple primary keys for table "cluster_activated_releases_aud" are not allowed
.
.
. For the time being, I attempted to execute restoration after dropping all DB. drop_all.sql drop database "postgres";
drop database "scm";
drop database "amon";
drop database "rman";
drop database "nav";
drop database "navms"; But it gets an error. [root@localhost ~]$ psql -U cloudera-scm -p 7432 -h localhost -d postgres -f drop_all.sql
Password for user cloudera-scm: ↓ psql:drop_all.sql:1: ERROR: cannot drop the currently open database psql:drop_all.sql:2: ERROR: database "scm" is being accessed by other users DETAIL: There are 11 other sessions using the database. I tried to drop the process killing of scm database, but the process will be restored while killing. How can I restore Embedded PostgreSQL?
... View more
Labels:
- Labels:
-
Cloudera Manager
03-05-2019
06:24 PM
Thank you for answering. That means that it "cursor.fetchall()" contains hdfs scan time. On the other hand, the bottleneck is not on "hdfs scan" but on the client or network. I checked below, but I interpreted this problem as occurring in the case of specifying a size smaller than the default batch size. https://issues.apache.org/jira/browse/IMPALA-1618 It is questionable whether there is a possibility of occurrence even when using "cursor.fetchall()". I have found an issue that shows the same thing. https://github.com/cloudera/impyla/issues/239 Wes McKinney says it is a problem of hs2client. Somehow I understood that there was no solution.... Thanks!
... View more
03-05-2019
01:12 AM
Hi, There is a program that uses Impyla to retrieve data from the local Impala daemon. cursor.execute("select * from table;")
rows = cursor.fetchall() The table has 5 million rows, the number of columns is 9, the file size at the time of CSV conversion is about 200 MB. There are four data nodes.Memory is 32 GB. Despite just that much data, fetchall () takes over 200 seconds. Query execution ends in 0.2 seconds Why is it so slow? Do you have any ideas to speed up something? Thanks!
... View more
Labels:
- Labels:
-
Apache Impala
12-09-2018
10:37 PM
The problem was solved. I had granted privileges with "hdfs dfs -setfacl" command, but I needed to grant privileges with "GRANT ON URI" command from impala.
... View more
12-02-2018
09:37 PM
Hi, I enabled sentry for impala and sync to HDFS. Mostly working correctly, but when execute "CREATE TABLE" with location by impala will cause unexpected privilege error. Despite having authority to the specified location. CDH Settings hadoop.security.group.mapping: ShellBasedUnixGroupsMapping
hadoop.security.authentication: simple
hive.sentry.provider: HadoopGroupResourceAuthorizationProvider
Authentication is all disabled for hdfs/hive/impala. Details are described below. 1. The first case is when location for table directory is not specified. [root@hostname ~]# su test_user1
[test_user1@hostname ~]$ impala-shell
[hostname.example.com:21000] > CREATE EXTERNAL TABLE `test_db`.`test_table1` (`a` int , `b` int , `c` int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TextFile TBLPROPERTIES("skip.header.line.count" = "1");
Fetched 0 row(s) in 0.50s This worked. Check the authority of the created impala table directory. [root@hostname ~]# hdfs dfs -getfacl /user/hive/warehouse/test_db.db/test_table1
# file: /user/hive/warehouse/trial_f2042910.db/test_table1
# owner: hive
# group: hive
user:hive:rwx
user:test_user1:rwx
group:hive:rwx
group:test_group1:rwx
mask::rwx
other::--x The all authority is given to "test_user1". 2. The next case is when location for table directory is specified. [root@hostname ~]# hdfs dfs -getfacl /user/hive/warehouse/test_db.db/test_table2
# file: /user/hive/warehouse/test_db.db/test_table2
# owner: hive
# group: hive
user:hive:rwx
user:test_user2:rwx
group:hive:rwx
group:test_group2:rwx
mask::rwx
other::--x The all authority is given to "test_user2". [root@hostname ~]# su test_user2
[test_user2@hostname ~]$ impala-shell
[hostname.example.com:21000] > CREATE EXTERNAL TABLE `test_db`.`test_table2` (`a` int , `b` int , `c` int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TextFile LOCATION '/user/hive/warehouse/test_db.db/test_table2' TBLPROPERTIES("skip.header.line.count" = "1");
ERROR: AuthorizationException: User 'test_user2' does not have privileges to access: hdfs://hostname.example.com:8020/user/hive/warehouse/test_db.db/test_table2 This not worked. why? By the way, with the hdfs command can write without problems. [root@hostname ~] su test_user2
[test_user2@hostname ~]$ hdfs dfs -put test.csv /user/hive/warehouse/test_db.db/test_table2/ => success The clues are that there is a difference in the impala deamon log. 1. The first case is when location for table directory is not specified. I1130 19:00:53.146760 3080 impala-hs2-server.cc:418] ExecuteStatement(): request=TExecuteStatementReq {
01: sessionHandle (struct) = TSessionHandle {
01: sessionId (struct) = THandleIdentifier {
01: guid (string) = ">\xfa\xb2|/\xe3J\xde\x978>\xfb\xf9\xc9k\x13",
02: secret (string) = "p\"a\xee\xd4\xc4G\x1d\x9aOV\xbe6\x17\xa6\x8b",
},
},
02: statement (string) = "CREATE EXTERNAL TABLE `test_db`.`test_table1` (`a` int , `b` int , `c` int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TextFile TBLPROPERTIES(\"skip.header.line.count\" = \"1\")",
03: confOverlay (map) = map<string,string>[2] {
"QUERY_TIMEOUT_S" -> "600",
"impala.resultset.cache.size" -> "100000",
},
04: runAsync (bool) = true,
} . . 2. The next case is when location for table directory is specified. I1130 19:08:29.901100 18617 impala-beeswax-server.cc:52] query(): query=CREATE EXTERNAL TABLE `test_db`.`test_table2` (`a` int , `b` int , `c` int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TextFile LOCATION '/user/hive/warehouse/test_db.db/test_table2' TBLPROPERTIES("skip.header.line.count" = "1")
I1130 19:08:29.901142 18617 impala-beeswax-server.cc:426] query: Query {
01: query (string) = "CREATE EXTERNAL [...](259)",
03: configuration (list) = list<string>[0] {
},
04: hadoop_user (string) = "test_user2",
} . . When location is not specified, the query is executed with ExecuteStatement() method of impala-hs2-server.cc. But location is specified, the query is executed with query() method of impala-beeswax-server.cc. Do you know what is wrong? Is this a bug? Thank you in advance. uma66.
... View more
Labels:
- Labels:
-
Apache Impala
-
Apache Sentry
-
HDFS
09-04-2018
09:46 PM
Hello, I am building a REST API server that relays queries to Impala. The REST API will receive keytab file from the client server, and want to proxy Kerberos authentication on the API side. The following sequence. [Client Server] -- send keytab --> [REST API] --> ODBC or JDBC --> [Impala] In order to realize the above, I think that it is necessary to dynamically authenticate ODBC using the keytab received on the REST API side. is there such a thing possible? For example, HDFS Java API can pass and transfer arbitrary keytab as follows. UserGroupInformation.loginUserFromKeytab("hdfs@CLOUDERA", "/etc/hadoop/conf/hdfs.keytab"); However, Impala's ODBC or JDBC document shows that you are preparing a static file (UPNKeytabMappingFile) that defines pairs of user principals and keytab files. {
"cloudera": {
"principal" : "cloudera@CLOUDERA",
"keytab": "/tmp/cloudera.keytab"
}, Is there a way to authenticate with keytab received from client without predefining it? Thank you in advance.
... View more
Labels:
- Labels:
-
Apache Impala