About vmurakami

vmurakami · ‎06-11-2018

Hey @priyal patel! Do you know how much is set for spark.driver.memoryOverhead spark.executor.memoryOverhead Also, do you mind to share your OOM error?

vmurakami · ‎06-09-2018

Hi @Rahul Kumar! Oh i see, so in this case you can use telegraf to take the data from InfluxDB -> Kafka. https://github.com/influxdata/telegraf https://www.influxdata.com/blog/using-telegraf-to-send-metrics-to-influxdb-and-kafka/ And from Kafka -> HDFS, you can use kafka connect. https://kafka.apache.org/documentation/#connect https://docs.confluent.io/current/connect/connect-hdfs/docs/hdfs_connector.htm

vmurakami · ‎06-09-2018

Hey @Rahul Kumar! It's up to you, actually 🙂 If you prefer to program a producer/consumer, you can use the mentioned link as an example. But in my humble opinion, will be much faster/easier to take this csv/xsl files and throw to kafka and put into HDFS using Nifi. Cause you don't need to worry about code, just need to drag'n drop processors and fill some parameters to construct a nifi flow. E.g.: https://hortonworks.com/tutorial/nifi-in-trucking-iot-on-hdf/section/3/ One question where these files are coming from? I'm asking this, cause if your files are coming from a DB, you have kafka connect, it can get/throw data using JDBC from/to kafka. Or you can use flume/spark/storm if you are looking for fast delivery. Hope this helps!

vmurakami · ‎06-08-2018

Hey @Pedro Rodgers! AFAIK, you don't have the same graceful way to query metadata from Hive/Impala as Sql server. Since they use a DBMS behind the scenes to keep their meta. You'll probably need to access directly the DBMS and run some queries there. So for instance, let's say that you have a hive/impala with a MySql as its Metastore/Catalog, in this case you'll need to access the mySql to gather info about Hive/Impala. But.. there's some tools from hive side, that may can be useful for you 🙂 https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool https://cwiki.apache.org/confluence/display/Hive/Hive+MetaTool https://cwiki.apache.org/confluence/display/Hive/HCatalog+CLI Hope this helps! 🙂

vmurakami · ‎06-08-2018

Hey @priyal patel! Could you share your spark-submit parameters

vmurakami · ‎06-08-2018

Hey @Jason Sphar! Did you tried to use zookeeper-cli instead of zk-cli.sh ?

vmurakami · ‎06-08-2018

Hey @Rahul Kumar! First you will need to create a kafka topic and then you've a few options to insert data into a kafka topic using a kafka producer. - Using other tools to put data directly into kafka: E.g. Nifi, Kafka Connect, Spark, Storm, Flume and so on. - Programming a kafka producer: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_kafka-component-guide/content/ch_kafka-development.html And after kafka receives your data, you can consume the data using a kafka consumer and putting into HDFS. And to create a kafka consumer, the same options as above. - Using other tools to put data directly into kafka: E.g. Nifi, Kafka Connect, Spark, Storm, Flume and so on. - Programming a kafka consumer: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_kafka-component-guide/content/ch_kafka-development.html Here's an example doing this manually: [root@node1 ~]# kafka-topics.sh --zookeeper $ZKENSEMBLE --create --topic vin-hcc-nifi --partitions 3 --replication-factor 3Created topic "vin-hcc-nifi". [root@node1 ~]# kafka-console-producer.sh --broker-list node1:6667 --topic vin-hcc-nifi >Im brazilian and im testing this topic >please work! >HCC [root@node2 ~]# kafka-console-consumer.sh --bootstrap-server node2:6667 --topic vin-hcc-nifi --from-beginning Im brazilian and im testing this topic please work! Hope this helps!

vmurakami · ‎06-07-2018

Hey @Melody S! I'm not a specialist in GEO data, but, this link may serve to you: https://community.hortonworks.com/articles/5129/geospatial-data-analysis-in-hadoop.html Hope this helps! 🙂

vmurakami · ‎06-07-2018

Hey @Thierry Vernhet! Hmm, I'm not sure if you can do this only using RegexSerde. Probably you will need to - Filter those NULL values in a query or - Construct an ETL by passing the NULL values and grabbing the ARMING values to other table or - Create a View or - Clean the data before putting into HDFS. I made a research at the code and here's some parts of the code that explain it: https://github.com/apache/hive/blob/master/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/RegexSerDe.java @Override public Object deserialize(Writable blob) throws SerDeException { if (inputPattern == null) { throw new SerDeException( "This table does not have serde property \"input.regex\"!"); } Text rowText = (Text) blob; Matcher m = inputPattern.matcher(rowText.toString()); // If do not match, ignore the line, return a row with all nulls. if (!m.matches()) { unmatchedRows++; if (unmatchedRows >= nextUnmatchedRows) { nextUnmatchedRows = getNextNumberToDisplay(nextUnmatchedRows); // Report the row LOG.warn("" + unmatchedRows + " unmatched rows are found: " + rowText); } return null; } // Otherwise, return the row. for (int c = 0; c < numColumns; c++) { try { row.set(c, m.group(c + 1)); } catch (RuntimeException e) { partialMatchedRows++; if (partialMatchedRows >= nextPartialMatchedRows) { nextPartialMatchedRows = getNextNumberToDisplay(nextPartialMatchedRows); // Report the row LOG.warn("" + partialMatchedRows + " partially unmatched rows are found, " + " cannot find group " + c + ": " + rowText); } row.set(c, null); } } return row; } Hope this helps! 🙂

vmurakami · ‎06-07-2018

Hey @Jason Sphar. Hmm, at least it give us a clue. AFAIK the old versions of kafka used to accept zookeeper as a way to access the kafka cluster. And now you have this boostrap way (by passing any broker to reach the leader of partition + zk). For example, i did a manual test in kafka, by creating a kafka topic, feeding with kafka-console-producer and getting the kafka-consumer-group, with the bootstrap-server parameter. I'm kinda confused now, cause in my humble opinion its seems that Nifi Processor has created the consumer-group on kafka, but kafka is unable to use.. [root@node1 ~]# kafka-topics.sh --zookeeper $ZKENSEMBLE --create --topic vin-hcc-nifi --partitions 3 --replication-factor 3 Created topic "vin-hcc-nifi". [root@node1 ~]# kafka-console-producer.sh --broker-list node1:6667 --topic vin-hcc-nifi >Im brazilian and im testing this topic >please work! >HCC [root@node2 ~]# kafka-console-consumer.sh --bootstrap-server node2:6667 --topic vin-hcc-nifi --from-beginning Im brazilian and im testing this topic please work! HCC[root@node2 ~]# kafka-consumer-groups.sh --bootstrap-server node2:6667 --list Note: This will not show information about old Zookeeper-based consumers. console-consumer-14185 console-consumer-81664 [root@node2 ~]# kafka-consumer-groups.sh --bootstrap-server node2:6667 --describe --group console-consumer-81664 Note: This will not show information about old Zookeeper-based consumers. Consumer group 'console-consumer-81664' has no active members. TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID vin-hcc-nifi 0 1 1 0 - - - vin-hcc-nifi 2 1 1 0 - - - vin-hcc-nifi 1 1 1 0 - - - Well, anyway I saw some interesting points about the consumer -group. ======================================================== Checking consumer position Sometimes it's useful to see the position of your consumers. We have a tool that will show the position of all consumers in a consumer group as well as how far behind the end of the log they are. To run this tool on a consumer group named my-group consuming a topic named my-topic would look like this: 1 2 3 4 5 6 7 8 > bin /kafka-consumer-groups .sh --bootstrap-server localhost:9092 --describe --group my-group Note: This will only show information about consumers that use the Java consumer API (non-ZooKeeper-based consumers). TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID my-topic 0 2 4 2 consumer-1-029af89c-873c-4751-a720-cefd41a669d6 /127 .0.0.1 consumer-1 my-topic 1 2 3 1 consumer-1-029af89c-873c-4751-a720-cefd41a669d6 /127 .0.0.1 consumer-1 my-topic 2 2 3 1 consumer-2-42c1abd4-e3b2-425d-a8bb-e1ea49b29bb2 /127 .0.0.1 consumer-2 This tool also works with ZooKeeper-based consumers: 1 2 3 4 5 6 7 8 > bin /kafka-consumer-groups .sh --zookeeper localhost:2181 --describe --group my-group Note: This will only show information about consumers that use ZooKeeper (not those using the Java consumer API). TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID my-topic 0 2 4 2 my-group_consumer-1 my-topic 1 2 3 1 my-group_consumer-1 my-topic 2 2 3 1 my-group_consumer-2 ======================================================== Link: https://kafka.apache.org/10/documentation.html So after that, I made a research and fount this KB Article made by @mrodriguez (link below), we can try this to solve your problem 🙂 https://community.hortonworks.com/content/supportkb/175137/error-the-group-coordinator-is-not-available-when.html Hope this helps! And a special thanks for @mrodriguez Cheers

Online	Offline
Last Visited	‎12-23-2018 04:33 AM

Member Since	‎05-07-2018 06:05 PM
Last Visited	‎12-23-2018 04:33 AM
Posts	331
Kudos received	45

Cloudera Community

Re: Minifi not connecting to Nifi - remote instanc...

Re: getsnmp attribute

Re: XML and Hive parsing error with Serde.

Re: Ranger and HDFS over SSL

Re: livy2 zepplin issue

Re: Spark job aborted due to java.lang.OutOfMemory...

Re: how to push data (csv/xsl ) in kafka ?

Re: how to push data (csv/xsl ) in kafka ?

Re: Impala/Hive - Query sys.tables objects

Re: Spark job aborted due to java.lang.OutOfMemory...

Re: NiFi Consume Kafka 010 not ingesting messages ...

Re: how to push data (csv/xsl ) in kafka ?

Re: Merge Points data with GEOJSON in Hive

Re: HIVE Regex with Uppercase

Re: NiFi Consume Kafka 010 not ingesting messages ...