About rbiswas1

rbiswas1 · ‎07-18-2017

@Aveek Choudhury Two things to check 1. Is nifi runing on port 8080? 2. Do you have any firewall rules for your ec2 instance that is blocking outside access?

rbiswas1 · ‎07-18-2017

@Vijaya Narayana Reddy Bhoomi Reddy You have to export and import the key as well. Just creating the key as the same name does not make it the same key. That is the reason you are seeing gibberish values. I wrote an article to automate this task with the automation script link. You can just change the cluster inside the script and change the directory locations(if any) to make it work. https://community.hortonworks.com/content/kbentry/110144/hdfs-encrypted-zone-intra-cluster-transfer-automat-1.html

rbiswas1 · ‎07-17-2017

No problem @John Bowler. We are here to help. Happy hadooping!!

rbiswas1 · ‎07-17-2017

@Kulasangar Gowrisangar You need to save a sqoop job for it to happen automatically. You can follow this article: https://dwbi.org/etl/bigdata/195-sqoop-merge-incremental-extraction-from-oracle

rbiswas1 · ‎07-17-2017

@Laurent lau You can adjust HDFS rebalance speed per your need. Refer this documentation: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_hdfs-administration/content/configuring_balancer.html If you are worried about filling the disk space during rack maintenance operation you can configure the balancer to be really slow so that you can have 48 hours window and virtually nothing will be replicated. Or if the situation permits you can take the namenode in safe mode. This will allow read operations but no write. This is correct "if the number of replica factor is equal to the number of racks, there is no guarantee that there will be a replica spread in each rack." The policy is all the replicas will not be on the same rack.

rbiswas1 · ‎07-14-2017

@John Bowler you cannot use int as the pig relationship name since it is a keyword. Use anything else like int1 or myint if you like. That is exactly what your error is about. Change the relationship name and you should be good. Thanks

rbiswas1 · ‎07-14-2017

Hi @Pooja Kamle Can you check and post the permissions on these two files: /usr/hdp/current/ranger-usersync/conf/ugsync.jceks /usr/hdp/current/ranger-usersync/conf/.ugsync.jceks.crc Also check the permissions of these two files on the other ranger that is working and verify if they are the same.

rbiswas1 · ‎07-14-2017

Steps to replicate: hdfs dfs -ls /apps/hive/warehouse/testraj.db/testtable/filename=test.csv.gz Found 1 items -rw-rw-rw- 1 hive hive 38258 2017-06-27 21:04 /apps/hive/warehouse/testraj.db/testtable/filename=test.csv.gz/000000_0 USING hive -f script cat /tmp/test.txt ALTER TABLE testraj.testtable PARTITION (filename="test.csv.gz") SET LOCATION "hdfs://ip-1-1-1-1.us-west-2.compute.internal:8020/apps/hive/warehouse/testraj.db/testtable/filename=test.csv.gz"; Error from hive -f scriptname: [hive@ip-1-1-1-1 rbiswas]$ hive -f /tmp/test.txt Logging initialized using configuration in file:/etc/hive/2.5.3.0-37/0/hive-log4j.properties FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter partition. alter is not possible [hive@ip-1-1-1-1 rbiswas]$ Error from Beeline: 0: jdbc:hive2://ip-1-1-1-1.us-west-2.com> ALTER TABLE testraj.testtable PARTITION (filename="test.csv.gz") SET LOCATION 'hdfs://ip-1-1-1-1.us-west-2.compute.internal:8020/apps/hive/warehouse/testraj.db/testtable/filename=test.csv.gz'; Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter partition. alter is not possible (state=08S01,code=1) 0: jdbc:hive2://ip-1-1-1-1.us-west-2.com> It does works if directly logged into HIVECLI: hive> ALTER TABLE testraj.testtable PARTITION (filename="test.csv.gz") SET LOCATION "hdfs://ip-1-1-1-1.us-west-2.compute.internal:8020/apps/hive/warehouse/testraj.db/testtable/filename=test.csv.gz"; OK Time taken: 0.605 seconds Solution: In the script, rather than using schema_name.tablename, use 2 separate lines use dbname; alter table tablename; --Note no schema name prefix The Same solution is applicable for beeline. So the script becomes: cat /tmp/test.txt use testraj; ALTER TABLE testtable PARTITION (filename="test.csv.gz") SET LOCATION "hdfs://ip-1-1-1-1.us-west-2.compute.internal:8020/apps/hive/warehouse/testraj.db/testtable/filename=test.csv.gz";

rbiswas1 · ‎07-14-2017

@ravi nandyala you cannot do that for orc tables per my understanding. You have to insert the empty strings as null during insert. There are a lot of ways to do it example case, nullif etc. Check this thread also https://stackoverflow.com/questions/38872500/serialization-null-format-for-hive-orc-table

rbiswas1 · ‎07-14-2017

@Laurent lau Let's take it one by one: One big advantage of 4 replicas might be actually faster jobs in the situation where big jobs are fired simultaneously. Rack or no rack if data is lost, and replication factor falls below the specified level hdfs will try to replicate and bring it to the original replication factor. All the replicas will never be on the same rack until and unless that is the only rack alive. My suggestion for best performance and availability use a minimum of 3 racks per data center. Wrote an article some time ago, it might also help in clarifying some of your doubts: https://community.hortonworks.com/content/kbentry/43057/rack-awareness-1.html Thanks

Online	Offline
Last Visited	‎05-03-2018 08:15 PM

Member Since	‎04-04-2016 06:50 PM
Last Visited	‎05-03-2018 08:15 PM
Posts	166
Kudos received	168

Cloudera Community

Re: How to "defragment" hdfs data?

Re: How to connect hive LLAP via ODBC using http a...

Re: which time actaul block size assign ? Is it pr...

Re: Hive - i would like to calculate percentage of...

Re: Get the length of time an oozie workflow took ...

Re: How to access NiFi UI installed on an AWS ec2 ...

Re: Issue with HDFS Encryption

Re: Error During Parsing - Help with simple data m...

Re: How to automatically sync a MySQL table with a...

Re: HDFS resiliency - DR - rack aware

Re: Error During Parsing - Help with simple data m...

Re: Ranger AD usersync : ERROR UserGroupSync [Unix...

Solution: ALTER TABLE PARTITION SET LOCATION does ...

Re: How to retrieve not null fields data from hive...

Re: HDFS resiliency - DR - rack aware