About vmurakami

vmurakami · ‎06-13-2018

Hey @Karthik Chandrashekhar! Hm, but these blocks belongs to your dfs.datanode.data.dir parameter? If so, they should belong to DFS not NON-DFS. Cause AFAIK, any data outside of hdfs and written in the same mount disk as dfs.datanode.data.dir path is considered as non-DFS. If these blocks doesn't belong to your DFS (NON-DFS) and they're in the same path as your dfs.datanode.data.dir value. Then, we might have an issue there 😞 Btw, could you check your mount points as well? Hope this helps!

vmurakami · ‎06-13-2018

Hi @Marc Vázquez! Could you check which command does Confluent stack runs for zk? [root@node2 ~]# ps -ef | grep -i zookeeper 1001 3802 1 0 Jun12 ? 00:00:56 /usr/jdk64/jdk1.8.0_112/bin/java -Dzookeeper.log.dir=/var/log/zookeeper -Dzookeeper.log.file=zookeeper-zookeeper-server-node2.log -Dzookeeper.root.logger=INFO,ROLLINGFILE -cp #Thousands of libs... -Xmx1024m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /usr/hdf/current/zookeeper-server/conf/zoo.cfg ps: did you set this cluster using the confluent.sh? If so, i made a research at their code and it should exists an directory for zk logs 😞 https://github.com/confluentinc/confluent-cli/blob/master/src/oss/confluent.sh#L414 Or if you prefer, you can try to set it manually, here's my example of log4j.properties for ZK. [root@node2 ~]# cat /etc/zookeeper/conf/log4j.properties # # # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. # # # # # ZooKeeper Logging Configuration # # DEFAULT: console appender only log4j.rootLogger=INFO, CONSOLE, ROLLINGFILE # Example with rolling log file #log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE # Example with rolling log file and tracing #log4j.rootLogger=TRACE, CONSOLE, ROLLINGFILE, TRACEFILE # # Log INFO level and above messages to the console # log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender log4j.appender.CONSOLE.Threshold=INFO log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n # # Add ROLLINGFILE to rootLogger to get log file output # Log DEBUG level and above messages to a log file log4j.appender.ROLLINGFILE=org.apache.log4j.RollingFileAppender log4j.appender.ROLLINGFILE.Threshold=DEBUG log4j.appender.ROLLINGFILE.File=/var/log/zookeeper/zookeeper.log # Max log file size of 10MB log4j.appender.ROLLINGFILE.MaxFileSize=10MB # uncomment the next line to limit number of backup files #log4j.appender.ROLLINGFILE.MaxBackupIndex=10 log4j.appender.ROLLINGFILE.layout=org.apache.log4j.PatternLayout log4j.appender.ROLLINGFILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n # # Add TRACEFILE to rootLogger to get log file output # Log DEBUG level and above messages to a log file log4j.appender.TRACEFILE=org.apache.log4j.FileAppender log4j.appender.TRACEFILE.Threshold=TRACE log4j.appender.TRACEFILE.File=zookeeper_trace.log log4j.appender.TRACEFILE.layout=org.apache.log4j.PatternLayout ### Notice we are including log4j's NDC here (%x) log4j.appender.TRACEFILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L][%x] - %m%n And don't forget to fill the zookeeper-env.sh [root@node2 ~]# cat /etc/zookeeper/conf/zookeeper-env.sh export JAVA_HOME=/usr/jdk64/jdk1.8.0_112 export ZOOKEEPER_HOME=/usr/hdf/current/zookeeper-server export ZOO_LOG_DIR=/var/log/zookeeper export ZOOPIDFILE=/var/run/zookeeper/zookeeper_server.pid export SERVER_JVMFLAGS=-Xmx1024m export JAVA=$JAVA_HOME/bin/java export CLASSPATH=$CLASSPATH:/usr/share/zookeeper/* Hope this helps!

vmurakami · ‎06-12-2018

Hey @Karthik Chandrashekhar! I'm not sure if i get you right, but my advice would be to not delete these files. It belongs to HDFS Datanode, the blk_12345 dir carries some blocks+meta = data stored in HDFS. If you want to know which file belongs to which block, you can use the following command: [hdfs@node2 ~]$ cd /hadoop/hdfs/data/current/BP-686380642-172.25.33.129-1527546468579/current/finalized/subdir0/subdir0/ [hdfs@node2 subdir0]$ ls | head -2 blk_1073741825 blk_1073741825_1001.meta [hdfs@node2 ~]$ hdfs fsck / -files -locations -blocks -blockId blk_1073741825 Connecting to namenode via http://node3:50070/fsck?ugi=hdfs&files=1&locations=1&blocks=1&blockId=blk_1073741825+&path=%2F FSCK started by hdfs (auth:SIMPLE) from /MYIP at Tue Jun 12 14:54:08 UTC 2018 Block Id: blk_1073741825 Block belongs to: /hdp/apps/2.6.4.0-91/mapreduce/mapreduce.tar.gz No. of Expected Replica: 3 No. of live Replica: 3 No. of excess Replica: 0 No. of stale Replica: 0 No. of decommissioned Replica: 0 No. of decommissioning Replica: 0 No. of corrupted Replica: 0 Block replica on datanode/rack: node2/default-rack is HEALTHY Block replica on datanode/rack: node3/default-rack is HEALTHY Block replica on datanode/rack: node4/default-rack is HEALTHY Hope this helps! 🙂

vmurakami · ‎06-12-2018

Hey @pradeep arumalla! I'm not a specialist in coding or spark, but did you tried to change your groupByKey for reduceByKey (at lhe last line)? And about the executors --num-executors, how are you launching your job, is it by spark-submit? Could you share with us? BTW: here's some links about shuffling 🙂 https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-shuffle.html https://0x0fff.com/spark-architecture-shuffle/ Hope this helps!

vmurakami · ‎06-12-2018

Hey @Marc Vázquez! Usually you will find a file named log4j.properties and there should exist a parameter called log4j.appender.ROLLINGFILE.File=<path>/zookeeper.log. [root@node1 conf]# cat /etc/zookeeper/conf/log4j.properties | grep -i zookeeper.log # ZooKeeper Logging Configuration log4j.appender.ROLLINGFILE.File=/var/log/zookeeper/zookeeper.log In confluent stack, i'm not sure about the path, but should it be something like: confluent_dss_version/etc/zookeeper/conf/log4j.properties. Or you can try to search for it, like: find / -name "zookeeper" -type d -exec ls -ltrah {} \; Hope this helps! 🙂

vmurakami · ‎06-12-2018

Hey @Rahul Kumar. Just asking, but after you stopped kafka/zookeeper, did you tried to produce and consume messages again? For example, let's say that you just did a kafka-console-consumer after 7 days, probably you won't be able to see that messages again on that topic, because Kafka has a parameter that retains messages for a determined period of time, which is log.retention.hours = 168 hours (7 days) by default (you can change it). But, if you did the whole process again (create a topic, kafka-console-producer and kafka-console-consumer) after the kafka cluster was down, then we may need to take a look at the errors from the logs of Kafka/ZK and watch the consumer groups/offsets. Hope this helps!

vmurakami · ‎06-12-2018

Hey @JAy PaTel! I see, could you share the output from the following command? C:\InputFileWindows>scp -v -p 2222 datafile.txt root@localhost:/ Thanks!

vmurakami · ‎06-11-2018

Hey @Rahul Kumar! How much is set for log.retention.hours? And could you check if your kafka-console-consumer are creating a consumer group? [root@node1 ~]# kafka-consumer-groups.sh --zookeeper $ZKENSEMBLE --list #Or [root@node1 ~]# kafka-consumer-groups.sh --bootstrap-server node1:6667 --list #If shows smtg, try to describe it with the --group <consumer-group-mynumber> and --describe #And we can also check the offset of your topic. root@node1 ~]# kafka-run-class.sh kafka.tools.ExportZkOffsets --zkconnect $ZKENSEMBLE --output-file zk_offset_kafka [root@node1 ~]# kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list node1:6667 --topic vin-hcc-nifi --time -1 #latest vin-hcc-nifi:2:1 vin-hcc-nifi:1:1 vin-hcc-nifi:0:1 [root@node1 ~]# kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list node1:6667 --topic vin-hcc-nifi --time -2 #earliest vin-hcc-nifi:2:0 vin-hcc-nifi:1:0 vin-hcc-nifi:0:0 Hope this helps!

vmurakami · ‎06-11-2018

Hey @JAy PaTel! Did you tried to connect using the port 2222? And if isn't, could you add the -v(verbose) parameter to your scp command? Hope this helps!

vmurakami · ‎06-11-2018

Hey @Simran kaur! Could you check if you're able to connect to all zk ports? And what about your datadir property (inside /etc/zookeeper/conf/zoo.cfg), could you check your permissions there? clientPort=2181 initLimit=10 autopurge.purgeInterval=24 syncLimit=5 tickTime=3000 dataDir=/hadoop/zookeeper autopurge.snapRetainCount=30 server.1=mynode1:2888:3888 server.2=mynode2:2888:3888 server.3=mynode3:2888:3888 Hope this helps!

Online	Offline
Last Visited	‎12-23-2018 04:33 AM

Member Since	‎05-07-2018 06:05 PM
Last Visited	‎12-23-2018 04:33 AM
Posts	331
Kudos received	45

Cloudera Community

Re: Minifi not connecting to Nifi - remote instanc...

Re: getsnmp attribute

Re: XML and Hive parsing error with Serde.

Re: Ranger and HDFS over SSL

Re: livy2 zepplin issue

Re: Non-DFS storage occupied in Hadoop mount in Li...

Re: confluent 4.1.1 zookeeper logs

Re: Non-DFS storage occupied in Hadoop mount in Li...

Re: spark job shuffle write super slow

Re: confluent 4.1.1 zookeeper logs

Re: kafka consumer not showing the consumed messag...

Re: How to Upload/Download file from Windows to Ho...

Re: kafka consumer not showing the consumed messag...

Re: How to Upload/Download file from Windows to Ho...

Re: Bad : Canary test failed to create an ephemera...