Member since
04-16-2019
373
Posts
7
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
16782 | 10-16-2018 11:27 AM | |
5222 | 09-29-2018 06:59 AM | |
681 | 07-17-2018 08:44 AM | |
4068 | 04-18-2018 08:59 AM |
06-06-2019
08:16 AM
In ambari hbase avg load is high , it is 360 regions per regionserver .....what are the parameters in general I have to tune related to hbase so that hbase avg load could be less than 250 regions per regionserver.
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache HBase
05-30-2019
06:29 PM
I have more than 350 regions per region server and as result it has to created one alert in ambari . There is way to increase regionsize and reduce number of regions but I am not sure about performance after the increment of size of region. What are optimal ways to reduce number of regions per regionserver.
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache HBase
05-16-2019
07:19 AM
i do not see spark service when i click on Actions and Add to service . is this case spark installation from ambari in hdp 2.2 and ambari 1.7 is not supported ...or there is other way around to do the same ? Regards Anurag
... View more
Labels:
04-16-2019
11:09 PM
kakfa leader not available I have added 1 more broker earlier kafka has 3 brokers , producer is throwing an error ;
WARN clients.NetworkClient: Error while fetching metadata with correlation id 5493 : {amoll=LEADER_NOT_AVAILABLE}
I have checked advertised.host.name and I see kafka broker default group and kafka broker1 group there .
Kind Regards
Amol
... View more
Labels:
- Labels:
-
Apache Kafka
04-16-2019
07:08 AM
I want to setup kerberos with AD/LDAP . I have some queries , I have enabled kerberos MIT KDC . Enabling kerberos with MIT kdc requires to install kdc server , util , libs etc . if i want to enable kerberos with AD what is the procedure ? I am listing below points as per my understanding . 1. need ldap url . 2. need kdc hosts details ( kerberos server is installed ) is it imp to keep AD server and kerberos server seperate. If I want kerberos server on different host , AD on different server how is the procedure to integrate kerberos with AD/ldap . while enabling kerberos from ambari I always keep kdc hosts and AD on same server . Kind Regards Anurag
... View more
Labels:
- Labels:
-
Apache Ambari
04-12-2019
11:55 AM
how does sentry differ from ranger ? what are the things We can not achieve with using sentry and that is achievable from ranger and vice-versa.
... View more
Labels:
- Labels:
-
Apache Ranger
-
Apache Sentry
04-11-2019
06:10 PM
@Josh Elser Thanks for your reply , can you please little bit more explain on consumption of memory , are you something referring in terms of znode ? @Josh Elser Kind Regards
... View more
04-11-2019
05:41 AM
I am getting exceeding 1500 connections error , I see zookeeper maxclient value is set to 1500 . I want to understand what are cons if we keep value zero that implies any number of connections to zookeeper . Kind Regards Anurag
... View more
Labels:
- Labels:
-
Apache HBase
04-02-2019
06:05 AM
I want to load data into dynamically partitioned table in hive using pyspark , table is already created in hive only data load has to be done with pyspark . I am using below code to do above requirement but need more suggestions : spark = SparkSession \ .builder \ .appName("Python Spark SQL Hive integration example") \ .config("spark.sql.warehouse.dir", warehouse_location) \ .enableHiveSupport() \ .getOrCreate()
# spark is an existing SparkSession
spark.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src") 1. INPATH path should be dynamic --- how to pass in spark.sql Kind Regards Anurag
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
02-20-2019
06:25 AM
I am trying to fetch results data stored in hive orc table , but when I run the query it gives results fast but When I run the with limit clause it is taking too much of time . select * from tablename where year="2019" and month="jan-mar" ; - Runs perfectly within 30-40secs select * from tablename limit 10; - Gets stuck
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
11-19-2018
05:05 PM
what are the limitations of hive in its latest version , as Hive has also added ACID support . What are limitations there where hive is not suitable .
... View more
Labels:
- Labels:
-
Apache Hive
11-15-2018
03:46 PM
I am getting below error when I am running the query : at sqlline.SqlLine.main(SqlLine.java:292)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.OutOfMemoryError: unable to create new native thread
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:208)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:327)
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:302)
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:167)
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:162)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:794)
at org.apache.phoenix.schema.stats.StatisticsUtil.readStatistics(StatisticsUtil.java:160)
at org.apache.phoenix.query.TableStatsCache$StatsLoader.load(TableStatsCache.java:92)
at org.apache.phoenix.query.TableStatsCache$StatsLoader.load(TableStatsCache.java:83)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3589)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2374)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2337)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2252)
... 21 more
Caused by: java.lang.RuntimeException: java.lang.OutOfMemoryError: unable to create new native thread
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:208)
at org.apache.hadoop.hbase.client.ClientSmallReversedScanner.loadCache(ClientSmallReversedScanner.java:211)
at org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:185)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1259)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1165)
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:156)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
... 33 more
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:717)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Phoenix
10-30-2018
02:36 PM
I have HDP 2.6.5 installed in the cluster but when i try to run teradata import sqoop job it is failing with below error : 18/10/30 15:16:15 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
18/10/30 15:16:15 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
18/10/30 15:16:15 ERROR sqoop.ConnFactory: Sqoop could not found specified connection manager class o.apache.sqoop.teradata.TeradataConnManager. Please check that you've specified the class correctly.
18/10/30 15:16:15 ERROR tool.BaseSqoopTool: Got error creating database manager: java.io.IOException: java.lang.ClassNotFoundException: o.apache.sqoop.teradata.TeradataConnManager
at org.apache.sqoop.ConnFactory.getManager(ConnFactory.java:166)
at org.apache.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:266)
at org.apache.sqoop.tool.CreateHiveTableTool.run(CreateHiveTableTool.java:51)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
Caused by: java.lang.ClassNotFoundException: o.apache.sqoop.teradata.TeradataConnManager
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.jar However teratajars are available under sqoop lib : tdgssconfig.jar teradata-connector-1.5.4-hadoop2.jar terajdbc4.jar hortonworks-teradata-connector-1.5.4.2.6.5.0-292.jar jar tvf hortonworks-teradata-connector-1.5.4.2.6.5.0-292.jar | grep org.apache.sqoop.teradata.TeradataConnManager org/apache/sqoop/teradata/TeradataConnManager.class
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
10-23-2018
02:00 PM
I want to run distcp job copying huge data from source cluster to destination cluster , how can i increase the performance or speed of the distcp job ?
... View more
Labels:
- Labels:
-
Apache Hadoop
10-16-2018
02:20 PM
I am trying to import from postgres using sqoop into hive but table is not getting created however data is getting loaded into hdfs under user directory . sqoop import --connect jdbc:postgresql://<psoitgresserver>/demodb --username user1 -P --table company --hive-import --create-hive-table --hive-table temp_psql.company --delete-target-dir data gets loaded under /user/<userid>/tablename but table is not getting loaded into hive . I have tried the command with --verbose and see that it tries to load as well create the table . however when I manually create and load the table it works.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Sqoop
10-16-2018
11:27 AM
I am able to import import spark.implicits._ , earlier i was using spark1 but launching spark2 solved the problem.
... View more
10-11-2018
07:15 AM
also I get below error too : scala> val spark = SparkSession.builder().enableHiveSupport().getOrCreate()
<console>:30: error: not found: value SparkSession
val spark = SparkSession.builder().enableHiveSupport().getOrCreate(
... View more
10-11-2018
05:00 AM
I am using spark shell , my os is centos6 when i am trying to import spark.implicits._ , i get below error: <console>:30: error: not found: value spark
import spark.implicits._
^
... View more
Labels:
- Labels:
-
Apache Spark
10-09-2018
01:18 PM
all the mentioned property is set in the cluster .also ranger is installed but plugin did not enable for kafka. I have grnated permission using below commands : /usr/hdp/current/kafka-broker/bin/kafka-acls.sh --add --group*--allow-principal User:*--operation All --authorizer-properties "zookeeper.connect=<ZOOKEEPER_HOST>:2181" /usr/hdp/current/kafka-broker/bin/kafka-acls.sh --add --topic ATLAS_ENTITIES --allow-principal User:*--operation All --authorizer-properties "zookeeper.connect=<ZOOKEEPER_HOST>:2181" /usr/hdp/current/kafka-broker/bin/kafka-acls.sh --add --topic ATLAS_HOOK --allow-principal User:*--operation All --authorizer-properties "zookeeper.connect=<ZOOKEEPER_HOST>:2181"
... View more
10-09-2018
10:41 AM
@Aditya Sirna yes Aditya !!! I had checked on the same line , atlas and kafka both are up and running also I have tried to run the sqoop job disabling atlas hook but still it end up with errors Failed to update metadata after 60000 ms.
... View more
10-08-2018
02:10 PM
I am trying t import data and creating table into hive but getting below error: sqoop import --connect jdbc:postgresql://<hsot>/iso --username <username> -P --table poc --hive-import --create-hive-table --hive-table hdp.poc --delete-target-dir -- --schema live; 18/10/08 15:31:51 INFO authenticator.AbstractLogin: Successfully logged in.
18/10/08 15:31:51 INFO kerberos.KerberosLogin: [Principal=null]: TGT refresh thread started.
18/10/08 15:31:51 INFO kerberos.KerberosLogin: [Principal=null]: TGT valid starting at: Mon Oct 08 15:30:53 CEST 2018
18/10/08 15:31:51 INFO kerberos.KerberosLogin: [Principal=null]: TGT expires: Tue Oct 09 01:30:53 CEST 2018
18/10/08 15:31:51 INFO kerberos.KerberosLogin: [Principal=null]: TGT refresh sleeping until: Mon Oct 08 23:40:14 CEST 2018
18/10/08 15:31:51 WARN producer.ProducerConfig: The configuration 'key.deserializer' was supplied but isn't a known config.
18/10/08 15:31:51 WARN producer.ProducerConfig: The configuration 'value.deserializer' was supplied but isn't a known config.
18/10/08 15:31:51 WARN producer.ProducerConfig: The configuration 'hook.group.id' was supplied but isn't a known config.
18/10/08 15:31:51 WARN producer.ProducerConfig: The configuration 'zookeeper.connection.timeout.ms' was supplied but isn't a known config.
18/10/08 15:31:51 WARN producer.ProducerConfig: The configuration 'zookeeper.session.timeout.ms' was supplied but isn't a known config.
18/10/08 15:31:51 WARN producer.ProducerConfig: The configuration 'enable.auto.commit' was supplied but isn't a known config.
18/10/08 15:31:51 WARN producer.ProducerConfig: The configuration 'zookeeper.connect' was supplied but isn't a known config.
18/10/08 15:31:51 WARN producer.ProducerConfig: The configuration 'zookeeper.sync.time.ms' was supplied but isn't a known config.
18/10/08 15:31:51 WARN producer.ProducerConfig: The configuration 'session.timeout.ms' was supplied but isn't a known config.
18/10/08 15:31:51 WARN producer.ProducerConfig: The configuration 'auto.offset.reset' was supplied but isn't a known config.
18/10/08 15:31:51 INFO utils.AppInfoParser: Kafka version : 1.0.0.2.6.5.0-292
18/10/08 15:31:51 INFO utils.AppInfoParser: Kafka commitId : 2ff1ddae17fb8503
18/10/08 15:31:51 INFO kafka.KafkaNotification: <== KafkaNotification.createProducer()
18/10/08 15:32:51 ERROR hook.AtlasHook: Failed to send notification - attempt #1; error=java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
18/10/08 15:33:52 ERROR hook.AtlasHook: Failed to send notification - attempt #2; error=java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
18/10/08 15:34:53 ERROR hook.FailedMessagesLogger: However map reduce job is getting successfully executed and data is loaded but it table is not getting created into hive . 8/10/08 15:31:21 INFO mapreduce.Job: Running job: job_1538735110847_0010
18/10/08 15:31:35 INFO mapreduce.Job: Job job_1538735110847_0010 running in uber mode : false
18/10/08 15:31:35 INFO mapreduce.Job: map 0% reduce 0%
18/10/08 15:31:45 INFO mapreduce.Job: map 25% reduce 0%
18/10/08 15:31:48 INFO mapreduce.Job: map 75% reduce 0%
18/10/08 15:31:49 INFO mapreduce.Job: map 100% reduce 0%
18/10/08 15:31:50 INFO mapreduce.Job: Job job_1538735110847_0010 completed successfully
18/10/08 15:31:50 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=736372
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=470
HDFS: Number of bytes written=546499
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=40826
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=40826
Total vcore-milliseconds taken by all map tasks=40826
Total megabyte-milliseconds taken by all map tasks=167223296
Map-Reduce Framework
Map input records=4269
Map output records=4269
Input split bytes=470
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=327
CPU time spent (ms)=7480
Physical memory (bytes) snapshot=1524961280
Virtual memory (bytes) snapshot=22446456832
Total committed heap usage (bytes)=2266497024
File Input Format Counters
Bytes Read=0
File Output Format Counters
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Sqoop
10-03-2018
06:09 AM
I am running the spark job which is leading to failure of job with the error no space left on the device however there is enough space available in the device . i have checked with df -h and df -i command and no issue with the space I see . Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 656 in stage 11.0 failed 4 times, most recent failure: Lost task 656.3 in stage 11.0 (TID 680, I<workernode>): java.io.IOException: No space left on device
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
10-01-2018
02:26 PM
Geoffrey Shelton Okot Thanks for the reply !!! but i was interested in knowing what password ambari use for the service like hdfs , hbase etc . Providing the admin password allows amabri to generate keytabs for the service user but internally it would be using some password at service level .
... View more
10-01-2018
12:09 PM
when we kerberized cluster from ambari we see keytabs are generated automatically for the user , we do not provide any password but ambari does , I want to know how does ambari does this . for e.g if I have user for whom i want to generate keytab I will do the following steps : kadmin.local: addprinc user1@TEST.COM WARNING:no policy specified for user1@TEST.COM; defaulting to no policy Enter password for principal "user1@TEST.COM": // here we are providing the password but when ambari does the same for the service user like hdfs what password does it set and how it does the same ? is there some script in the server which enables the same. Re-enter password for principal "user1@TEST.COM": Principal"user1@TEST.COM" created.
... View more
Labels:
- Labels:
-
Apache Ambari
09-29-2018
07:01 AM
Please follow below link : http://hbase.apache.org/0.94/book/ops_mgt.html#copytable
... View more
09-29-2018
06:59 AM
1 Kudo
Dhiraj There are many method to achieve the same like copytable , import/export utility , snapshot . I would prefer snapshot method .But snapshot method will work if Hbase is of same version in both the cluster .If your both cluster hbase versions are different then you can use Copytable method. snapshot method : STEP 1: Go to hbase-shell and Take a snapshot of table >hbase shell >snapshot "SOURCE_TABLE_NAME","SNAPSHOT_TABLE_NAME" Step 3 : Export that snapshot to other cluster >bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot SNAPSHOT_TABLE_NAME -copy-to hdfs://DESTINATION_CLUSTER_ACTIVE_NAMENODE_ADDRESS:8020/hbase -mappers 16 STEP 4: restore the table on DESTINATION Cluster : >hbase shell >disable "DEST_TABLENAME" >restore_snapshot "SNAPSHOT_TABLE_NAME"
... View more
09-29-2018
06:40 AM
I want to create a user who can access the HDP services in the kerberized cluster without creating keytab for this user . Is there any way we can do so , is there way to by pass kerberos security ?
... View more
Labels:
09-24-2018
06:50 AM
I am trying to run the distcp command in the secure cluster , My purpose is to move hdfs files from insecure cluster to secure cluster but i am getting errors . hadoop distcp -Dipc.client.fallback-to-simple-auth-allowed=true hdfs://<in-secure-hdfsnamenode>:8020/distsecure/f1.txt hdfs://<securenamenode>:8020/distdest/ java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1537527981132_0012 to YARN : Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, Service: <insecurenamenode>:8020, Ident: (HDFS_DELEGATION_TOKEN token 0 for hdfs)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:317)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:193)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:155)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:128)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:462)
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1537527981132_0012 to YARN : Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, Service: <insecurehdfs>:8020, Ident: (HDFS_DELEGATION_TOKEN token 0 for hdfs)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:272)
at org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:291)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:302) I do not understand why in the error log i see failed to renew token for the insecure cluster . also I have added ipc.client.fallback-to-simple-auth-allowed=true in the custom hdfs site. in the secure cluster .
... View more
Labels:
- Labels:
-
Apache Hadoop
09-17-2018
11:45 AM
in other cluster I do not have any such issue neither i have gave permission this way . however this cluster I am getting the issue and it is kerberized cluster . does these properties are relevant when cluster is kerberized ?
... View more
09-17-2018
06:51 AM
@Chiran Ravani thanks for your reply , but can you please explain why these atlas and kafka related errors are coming when I am trying to sqoop .
... View more