Member since
07-13-2017
19
Posts
0
Kudos Received
0
Solutions
01-17-2019
05:51 PM
Hello Everyone - We have a Hadoop cluster where Spark runs on Yarn. I dont have much knowledge of what to look or where to look in a spark job histroy server if a Spark application(query) is taking long time. If you share some details of how to analyze a spark job, and any keywords or phrases to look for.
... View more
Labels:
- Labels:
-
Apache Spark
01-17-2019
05:47 PM
Can someone please assist here
... View more
11-29-2018
11:04 PM
Hello Everyone - we have couple of queries which are simple create table or select queries taking long time. we set the below parameters: set hive.compute.query.using.stats=true; set hive.stats.fetch.column.stats=true; set hive.stats.fetch.partition.stats=true; set hive.vectorized.execution.enabled =true; set hive.vectorized.execution.reduce.enabled =true; Still there is no siginificant change. First of all, what should i look oe where should i look on Tez UI to figure out what i taking time. When a user comes with questions saying - query taking loner time, what things i should check into. Also - I dont understand Explain plan. Please help me understand, what parameters of fields to look for.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez
10-12-2018
03:52 PM
Hello - we want to compare the options for a POC. I see multiple blogs out there saying Azure partner with Hortonworks. How can HdInsight on Azure compared with Cloudebreak Hortonworks. Please help me understand this also it would be great help along with technology if compared in Costs aspect too.
... View more
Labels:
- Labels:
-
Hortonworks Cloudbreak
09-12-2018
02:36 AM
Kafka Connect Setup: Download the Confluent-Kafka tar for Confluent: https://www.confluent.io/download/ 2. Untar the package and copy the '/share' folder under '/usr/hdp/hdp_version_/kafka/' folder 3. update the CLASSPPATH with jars files location, in my case its '/usr/hdp/2.6.4.0-91/kafka/share/java' 4. Make appropriate changes to 'connect-distributed' & 'connect-standalone' property files under /etc/kafka/hdp_version/0/ 5. I added 'quickstart-hdfs.properties' under '/etc/kafka/hdp_version/0/' which includes topic names,topics dirs,flush size etc. 6. Run a test job with these changes and worked for me. **Attaching a template of quickstart hdfs properties file.quickstart-hdfs.txt
... View more
08-13-2018
07:30 PM
@ARUN Are you able to setup kafka-Connect with HDP 2.6. If so, can you please brief the steps.
... View more
05-22-2018
09:54 PM
I have a 4 node cluster (2 master & 2 Data nodes) - fresh Installation. One of the Datanode is not coming up - 2018-05-22 14:37:56,024 ERROR datanode.DataNode (BPServiceActor.java:run(780)) - Initialization failed for Block pool <registering> (DatanodeUuid unassigned) service to Host1.infosolco.net/10.215.78.41:8020. Exiting.
java.io.IOException: All specified directories are failed to load. When I see the VERSION file, the : root@Datanode02:/spark/hdfs/data/current # cat VERSION #Tue May 22 14:00:02 PDT 2018 storageID=DS-0009b75a-e67a-4623-b7a2-12bf395c1d61 clusterID=CID-eb6df30f-7f16-4f94-826c-c7640e1e45a2 cTime=0 datanodeUuid=f005656a-673e-4c97-b25a-e19f04e1ec94 storageType=DATA_NODE layoutVersion=-56 __________________ root@Datanode01:/spark/hdfs/data/current # cat VERSION #Tue May 22 14:00:02 PDT 2018 storageID=DS-0009b75a-e67a-4623-b7a2-12bf395c1d61 clusterID=CID-eb6df30f-7f16-4f94-826c-c7640e1e45a2 cTime=0 datanodeUuid=f005656a-673e-4c97-b25a-e19f04e1ec94 storageType=DATA_NODE layoutVersion=-56 I see both datanodes have same Uuid, and 2nd data node is not coming up. Please suggest!
... View more
Labels:
- Labels:
-
Apache Hadoop
01-23-2018
11:27 PM
We have a 2 node test cluster where we Installed Kafka & Nifi. Kafka brokers & Nifi running on both nodes . I am very new to this setup. What ports need to be open. How can a user access the cluster from his local machine and if possible can someone share a test from scratch. I know my ask is too much. Its just that i want to get a good insight how this whole thing works and integrates. I have gone through below article, but not sure where to start or how can i access kafka/nifi from my local machine: https://community.hortonworks.com/content/kbentry/57262/integrating-apache-nifi-and-apache-kafka.html
... View more
Labels:
- Labels:
-
Apache Kafka
-
Apache NiFi
11-20-2017
06:00 PM
I want to load data(~300 GB) from local filesystem to HDFS. And i will be doing similar activity once every month. What would be the feasible way to get this done. I am looking at Flume & HDFS Put options. These are some files (XML) and not log data. I dont need any conversion, its a straight copy to HDFS.
... View more
Labels:
- Labels:
-
Apache Flume
-
Apache Hadoop
10-11-2017
03:57 PM
I see edit logs created for every 2 minute interval @"/data/hadoop/hdfs/namenode/current" directory. Is there a best practice to maintain the edit logs in terms of purging. Also, we have a problem where Namenode fails to start and we use the following post "ttp://blog.cloudera.com/blog/2012/05/namenode-recovery-tools-for-the-hadoop-distributed-file-system/" to recover it. Is there something to do with the edits here.
... View more
Labels:
- Labels:
-
Apache Hadoop
10-04-2017
10:56 PM
@Slim I started the Druid installation process, but ran into some errors ExecutionFailed: Execution of '/usr/bin/yum -d 0 -e 0 -y install superset_2_6_0_3_8' returned 1. Error: Package: superset_2_6_0_3_8-0.15.0.2.6.0.3-8.x86_64 (HDP-2.6)
Requires: libffi-devel I tried uninstalling "libffi-devel" but it has dependencies, and not able to uninstall root@Host:~ # rpm -e libffi-3.0.5-3.2.el6.x86_64 error: Failed dependencies: libffi.so.5()(64bit) is needed by (installed) python-libs-2.6.6-66.el6_8.x86_64
libffi.so.5()(64bit) is needed by (installed) python34-libs-3.4.5-1.el6.x86_64 root@host:~ # /usr/bin/yum -d 0 -e 0 -y install superset_2_6_0_3_8 --skip-broken
Packages skipped because of dependency problems: openblas-0.2.19-2.el6.x86_64 from CDP openblas-devel-0.2.19-2.el6.x86_64 from CDP openblas-openmp-0.2.19-2.el6.x86_64 from CDP
openblas-threads-0.2.19-2.el6.x86_64 from CDP superset_2_6_0_3_8-0.15.0.2.6.0.3-8.x86_64 from HDP-2.6
... View more
10-04-2017
10:14 PM
@Slim For historicals and middle manager, do they need to be installed on Data nodes or Edge nodes from where users generally access the cluster?
... View more
10-04-2017
03:52 PM
Hello Everyone - We have a 16 node cluster [12 data nodes, 2 Master nodes, 2 edge nodes].All the servers have 32 cores & 252GB RAM. For Installing druid, how to choose the following : Cordinator,Superset,Broker,Overlord,Router.Can I have all these sitting on 1 instance? And for the Slaves & client: Druid Historical & Druid MiddleManager, how to pick on what servers these to be installed. Once the installation is done, are there other steps involved to integrate this with hive or is it done as part of installation? As part of Druid Installation, are there any other services that gets impacted or needs to be restarted? I have gone through this article: https://community.hortonworks.com/questions/108316/how-to-choose-servers-for-druid.html Please let me know if I miss anything or any other steps involved prior to the installation. Thanks in advance.
... View more
- Tags:
- druid
Labels:
- Labels:
-
Apache Hive
10-02-2017
08:22 PM
@Jay SenSharma # ps -ef | grep hiveserver2 I see the process running, Also Ambari UI has the hiveserver2 running screen-shot-2017-10-02-at-21921-pm.png 2. # netstat -tnlpa | grep `cat /var/run/hive/hive-server.pid` I see a list, but I don't see any process with 10000.Instead I see couple of them with 10001 3. less /var/log/hive/hiveserver2.log > 2017-10-02 00:00:36,108 INFO
[HiveServer2-HttpHandler-Pool: Thread-82257]: thrift.ThriftHttpServlet
(ThriftHttpServlet.java:doPost(145)) - Could not validate cookie sent, will try
to generate a new cookie 2017-10-02 00:00:36,108 INFO
[HiveServer2-HttpHandler-Pool: Thread-82257]: thrift.ThriftHttpServlet
(ThriftHttpServlet.java:doPost(204)) - Cookie added for clientUserName
anonymous 2017-10-02 00:00:36,108 INFO
[HiveServer2-HttpHandler-Pool: Thread-82257]: thrift.ThriftCLIService
(ThriftCLIService.java:OpenSession(316)) - Client protocol version:
HIVE_CLI_SERVICE_PROTOCOL_V8 2017-10-02 00:00:36,109 INFO
[HiveServer2-HttpHandler-Pool: Thread-82257]: metastore.ObjectStore
(ObjectStore.java:initializeHelper(377)) - ObjectStore, initialize called 2017-10-02 00:00:36,111 INFO
[HiveServer2-HttpHandler-Pool: Thread-82257]:
metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:<init>(139)) -
Using direct SQL, underlying DB is OTHER 2017-10-02 00:00:36,111 INFO
[HiveServer2-HttpHandler-Pool: Thread-82257]: metastore.ObjectStore
(ObjectStore.java:setConf(291)) - Initialized ObjectStore 2017-10-02 00:00:36,111 INFO
[HiveServer2-HttpHandler-Pool: Thread-82257]: metastore.HiveMetaStore
(HiveMetaStore.java:init(533)) - Begin calculating metadata count metrics. 2017-10-02 00:00:36,113 INFO
[HiveServer2-HttpHandler-Pool: Thread-82257]: metastore.HiveMetaStore
(HiveMetaStore.java:init(535)) - Finished metadata count metrics: 18 databases,
1009 tables, 33 partitions. 2017-10-02 00:00:36,113 WARN
[HiveServer2-HttpHandler-Pool: Thread-82257]: metrics2.CodahaleMetrics
(CodahaleMetrics.java:addGauge(299)) - A Gauge with name [init_total_count_dbs]
already exists. The old gauge will be
overwritten, but this is not recommended 2017-10-02 00:00:36,113 WARN
[HiveServer2-HttpHandler-Pool: Thread-82257]: metrics2.CodahaleMetrics
(CodahaleMetrics.java:addGauge(299)) - A Gauge with name
[init_total_count_tables] already exists.
The old gauge will be overwritten, but this is not recommended 2017-10-02 00:00:36,113 WARN
[HiveServer2-HttpHandler-Pool: Thread-82257]: metrics2.CodahaleMetrics
(CodahaleMetrics.java:addGauge(299)) - A Gauge with name [init_total_count_partitions]
already exists. The old gauge will be
overwritten, but this is not recommended 2017-10-02 00:00:36,125 INFO
[HiveServer2-HttpHandler-Pool: Thread-82257]: session.SessionState
(SessionState.java:createPath(677)) - Created local directory: /tmp/de11c83f-e087-406e-b715-dd6ba7148cfe_resources 2017-10-02 00:00:36,128 INFO
[HiveServer2-HttpHandler-Pool: Thread-82257]: session.SessionState
(SessionState.java:createPath(677)) - Created HDFS directory:
/tmp/hive/anonymous/de11c83f-e087-406e-b715-dd6ba7148cfe 2017-10-02 00:00:36,129 INFO
[HiveServer2-HttpHandler-Pool: Thread-82257]: session.SessionState
(SessionState.java:createPath(677)) - Created local directory:
/tmp/hive/de11c83f-e087-406e-b715-dd6ba7148cfe 2017-10-02 00:00:36,130 INFO
[HiveServer2-HttpHandler-Pool: Thread-82257]: session.SessionState
(SessionState.java:createPath(677)) - Created HDFS directory:
/tmp/hive/anonymous/de11c83f-e087-406e-b715-dd6ba7148cfe/_tmp_space.db 2017-10-02 00:00:36,130 INFO
[HiveServer2-HttpHandler-Pool: Thread-82257]: session.HiveSessionImpl
(HiveSessionImpl.java:setOperationLogSessionDir(265)) - Operation log session
directory is created:
/tmp/hive/operation_logs/de11c83f-e087-406e-b715-dd6ba7148cfe 2017-10-02 00:00:36,189 INFO
[HiveServer2-HttpHandler-Pool: Thread-82257]: session.HiveSessionImpl
(HiveSessionImpl.java:acquireAfterOpLock(333)) - We are setting the hadoop
caller context to de11c83f-e087-406e-b715-dd6ba7148cfe for thread
HiveServer2-HttpHandler-Pool: Thread-82257 2017-10-02 00:00:36,190 INFO
[HiveServer2-HttpHandler-Pool: Thread-82257]: session.HiveSessionImpl
(HiveSessionImpl.java:releaseBeforeOpLock(357)) - We are resetting the hadoop
caller context for thread HiveServer2-HttpHandler-Pool: Thread-82257 2017-10-02 00:00:36,196 INFO
[HiveServer2-HttpHandler-Pool: Thread-82257]: session.HiveSessionImpl
(HiveSessionImpl.java:acquireAfterOpLock(333)) - We are setting the hadoop
caller context to de11c83f-e087-406e-b715-dd6ba7148cfe for thread
HiveServer2-HttpHandler-Pool: Thread-82257 2017-10-02 00:00:36,196 INFO
[HiveServer2-HttpHandler-Pool: Thread-82257]: session.HiveSessionImpl
(HiveSessionImpl.java:releaseBeforeOpLock(357)) - We are resetting the hadoop
caller context for thread HiveServer2-HttpHandler-Pool: Thread-82257 2017-10-02 00:00:36,210 INFO
[HiveServer2-HttpHandler-Pool: Thread-82257]: session.HiveSessionImpl
(HiveSessionImpl.java:acquireAfterOpLock(333)) - We are setting the hadoop
caller context to de11c83f-e087-406e-b715-dd6ba7148cfe for thread
HiveServer2-HttpHandler-Pool: Thread-82257 2017-10-02 00:00:36,211 INFO
[HiveServer2-HttpHandler-Pool: Thread-82257]: session.HiveSessionImpl
(HiveSessionImpl.java:releaseBeforeOpLock(357)) - We are resetting the hadoop
caller context for thread HiveServer2-HttpHandler-Pool: Thread-82257 2017-10-02 00:00:36,211 INFO
[HiveServer2-HttpHandler-Pool: Thread-82257]: session.HiveSessionImpl
(HiveSessionImpl.java:acquireAfterOpLock(333)) - We are setting the hadoop
caller context to de11c83f-e087-406e-b715-dd6ba7148cfe for thread
HiveServer2-HttpHandler-Pool: Thread-82257 2017-10-02 00:00:36,212 INFO
[HiveServer2-HttpHandler-Pool: Thread-82257]: session.HiveSessionImpl
(HiveSessionImpl.java:releaseBeforeOpLock(357)) - We are resetting the hadoop
caller context for thread HiveServer2-HttpHandler-Pool: Thread-82257 2017-10-02 00:03:36,081 INFO
[HiveServer2-HttpHandler-Pool: Thread-82257]: thrift.ThriftHttpServlet
(ThriftHttpServlet.java:doPost(145)) - Could not validate cookie sent,
wi/var/log/hadoop/hive/hiveserver2.log What I might be missing here?
... View more
10-02-2017
05:58 PM
@Jay SenSharma I don't see anything under /var/log/hive -its empty I ran the jps command on node where hive server is running root@Hive-server:~ # jps
55267 QuorumPeerMain
186337 RunJar 38279 JournalNode 43241 Jps 183016 RunJar 184015 NameNode 9038 HistoryServer
36721 ActivityAnalyzerFacade 192339 Bootstrap 6547 HistoryServer
189617 RunJar 9460 SparkSubmit 39412 DFSZKFailoverController
8089 LivyServer 13246 HMaster 7006 SparkSubmit Also, I checked the "etc/hosts" file on both the hive-server node and node from where beeline command is executing./...its exactly the same. From Ambari Ui, I see the port# is 10000. screen-shot-2017-10-02-at-115713-am.png I am not sure what am I missing here. Please let me know how to open a port, if that is what causing the issue here?
... View more
10-02-2017
05:10 PM
@Jay SenSharmaBelow is the response: [hadoop-admin@edge-node hive]$ telnet hive-server2 10000
telnet: hive-server2: Name or service not known [hadoop-admin@edge-node hive]$ nc -v hive-server2 10000
nc: getaddrinfo: Name or service not known Ran the following command on the Hive Server 2 host to see if the port is opened ? root@Name-node:~ # netstat -tnlpa|grep 10000 Itdidn't return anything Also, ran the Hive Debug command :I found this from the output FATAL thrift.ThriftCLIService: Error starting HiveServer2: could not start ThriftHttpCLIService
... View more
10-02-2017
04:45 PM
beeline> !connect jdbc:hive2://Hive-server2:10000 Connecting to jdbc:hive2://Hive-server2:10000 Enter username for jdbc:hive2://Hive-server2:10000: username Enter password for jdbc:hive2://Hive-server2:10000: ********* 17/09/29 15:46:32 [main]: WARN jdbc.HiveConnection: Failed
to connect to Hive-server2:10000 Error: Could not open client transport with JDBC Uri:
jdbc:hive2://Hive-server2:10000: java.net.ConnectException: Connection refused
(state=08S01,code=0) 0: jdbc:hive2://Hive-server2:10000 (closed)> I checked few other posts on the community forums, but didn't really help me. Please let me know what am I missing here guys..Thanks in Advance
... View more
Labels:
- Labels:
-
Apache Hive