Member since
04-04-2016
166
Posts
168
Kudos Received
29
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2901 | 01-04-2018 01:37 PM | |
4914 | 08-01-2017 05:06 PM | |
1574 | 07-26-2017 01:04 AM | |
8919 | 07-21-2017 08:59 PM | |
2609 | 07-20-2017 08:59 PM |
07-18-2017
08:29 PM
@Aveek Choudhury Two things to check 1. Is nifi runing on port 8080? 2. Do you have any firewall rules for your ec2 instance that is blocking outside access?
... View more
07-18-2017
04:24 PM
1 Kudo
@Vijaya Narayana Reddy Bhoomi Reddy You have to export and import the key as well. Just creating the key as the same name does not make it the same key. That is the reason you are seeing gibberish values. I wrote an article to automate this task with the automation script link. You can just change the cluster inside the script and change the directory locations(if any) to make it work. https://community.hortonworks.com/content/kbentry/110144/hdfs-encrypted-zone-intra-cluster-transfer-automat-1.html
... View more
07-17-2017
04:58 PM
No problem @John Bowler. We are here to help. Happy hadooping!!
... View more
07-17-2017
03:18 PM
2 Kudos
@Kulasangar Gowrisangar You need to save a sqoop job for it to happen automatically. You can follow this article: https://dwbi.org/etl/bigdata/195-sqoop-merge-incremental-extraction-from-oracle
... View more
07-17-2017
02:54 PM
@Laurent lau You can adjust HDFS rebalance speed per your need. Refer this documentation: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_hdfs-administration/content/configuring_balancer.html If you are worried about filling the disk space during rack maintenance operation you can configure the balancer to be really slow so that you can have 48 hours window and virtually nothing will be replicated. Or if the situation permits you can take the namenode in safe mode. This will allow read operations but no write. This is correct "if the number of replica factor is equal to the number of racks, there is no guarantee that there will be a replica spread in each rack." The policy is all the replicas will not be on the same rack.
... View more
07-14-2017
08:54 PM
@John Bowler you cannot use int as the pig relationship name since it is a keyword. Use anything else like int1 or myint if you like. That is exactly what your error is about. Change the relationship name and you should be good. Thanks
... View more
07-14-2017
07:33 PM
Hi @Pooja Kamle Can you check and post the permissions on these two files: /usr/hdp/current/ranger-usersync/conf/ugsync.jceks
/usr/hdp/current/ranger-usersync/conf/.ugsync.jceks.crc Also check the permissions of these two files on the other ranger that is working and verify if they are the same.
... View more
07-14-2017
05:33 PM
2 Kudos
Steps to replicate: hdfs dfs -ls /apps/hive/warehouse/testraj.db/testtable/filename=test.csv.gz
Found 1 items -rw-rw-rw- 1 hive hive 38258 2017-06-27 21:04 /apps/hive/warehouse/testraj.db/testtable/filename=test.csv.gz/000000_0 USING hive -f script cat /tmp/test.txt
ALTER TABLE testraj.testtable PARTITION (filename="test.csv.gz") SET LOCATION "hdfs://ip-1-1-1-1.us-west-2.compute.internal:8020/apps/hive/warehouse/testraj.db/testtable/filename=test.csv.gz"; Error from hive -f scriptname: [hive@ip-1-1-1-1 rbiswas]$ hive -f /tmp/test.txt
Logging initialized using configuration in file:/etc/hive/2.5.3.0-37/0/hive-log4j.properties FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter partition. alter is not possible [hive@ip-1-1-1-1 rbiswas]$ Error from Beeline: 0: jdbc:hive2://ip-1-1-1-1.us-west-2.com> ALTER TABLE testraj.testtable PARTITION (filename="test.csv.gz") SET LOCATION 'hdfs://ip-1-1-1-1.us-west-2.compute.internal:8020/apps/hive/warehouse/testraj.db/testtable/filename=test.csv.gz';
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter partition. alter is not possible (state=08S01,code=1) 0: jdbc:hive2://ip-1-1-1-1.us-west-2.com> It does works if directly logged into HIVECLI: hive> ALTER TABLE testraj.testtable PARTITION (filename="test.csv.gz") SET LOCATION "hdfs://ip-1-1-1-1.us-west-2.compute.internal:8020/apps/hive/warehouse/testraj.db/testtable/filename=test.csv.gz";
OK Time taken: 0.605 seconds Solution: In the script, rather than using schema_name.tablename, use 2 separate lines use dbname;
alter table tablename; --Note no schema name prefix The Same solution is applicable for beeline. So the script becomes: cat /tmp/test.txt use testraj;
ALTER TABLE testtable PARTITION (filename="test.csv.gz") SET LOCATION "hdfs://ip-1-1-1-1.us-west-2.compute.internal:8020/apps/hive/warehouse/testraj.db/testtable/filename=test.csv.gz";
... View more
Labels:
07-14-2017
04:13 PM
@ravi nandyala you cannot do that for orc tables per my understanding. You have to insert the empty strings as null during insert. There are a lot of ways to do it example case, nullif etc. Check this thread also https://stackoverflow.com/questions/38872500/serialization-null-format-for-hive-orc-table
... View more
07-14-2017
12:49 AM
@Laurent lau Let's take it one by one: One big advantage of 4 replicas might be actually faster jobs in the situation where big jobs are fired simultaneously. Rack or no rack if data is lost, and replication factor falls below the specified level hdfs will try to replicate and bring it to the original replication factor. All the replicas will never be on the same rack until and unless that is the only rack alive. My suggestion for best performance and availability use a minimum of 3 racks per data center. Wrote an article some time ago, it might also help in clarifying some of your doubts: https://community.hortonworks.com/content/kbentry/43057/rack-awareness-1.html Thanks
... View more