Member since
05-07-2018
331
Posts
45
Kudos Received
35
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7340 | 09-12-2018 10:09 PM | |
2907 | 09-10-2018 02:07 PM | |
9708 | 09-08-2018 05:47 AM | |
3225 | 09-08-2018 12:05 AM | |
4229 | 08-15-2018 10:44 PM |
06-11-2018
02:21 PM
Hey @priyal patel! Do you know how much is set for spark.driver.memoryOverhead spark.executor.memoryOverhead Also, do you mind to share your OOM error?
... View more
06-09-2018
06:13 PM
2 Kudos
Hi @Rahul Kumar! Oh i see, so in this case you can use telegraf to take the data from InfluxDB -> Kafka. https://github.com/influxdata/telegraf https://www.influxdata.com/blog/using-telegraf-to-send-metrics-to-influxdb-and-kafka/ And from Kafka -> HDFS, you can use kafka connect. https://kafka.apache.org/documentation/#connect https://docs.confluent.io/current/connect/connect-hdfs/docs/hdfs_connector.htm
... View more
06-09-2018
06:58 AM
Hey @Rahul Kumar! It's up to you, actually 🙂 If you prefer to program a producer/consumer, you can use the mentioned link as an example. But in my humble opinion, will be much faster/easier to take this csv/xsl files and throw to kafka and put into HDFS using Nifi. Cause you don't need to worry about code, just need to drag'n drop processors and fill some parameters to construct a nifi flow. E.g.: https://hortonworks.com/tutorial/nifi-in-trucking-iot-on-hdf/section/3/ One question where these files are coming from? I'm asking this, cause if your files are coming from a DB, you have kafka connect, it can get/throw data using JDBC from/to kafka. Or you can use flume/spark/storm if you are looking for fast delivery. Hope this helps!
... View more
06-08-2018
11:25 PM
Hey @Pedro Rodgers! AFAIK, you don't have the same graceful way to query metadata from Hive/Impala as Sql server. Since they use a DBMS behind the scenes to keep their meta. You'll probably need to access directly the DBMS and run some queries there. So for instance, let's say that you have a hive/impala with a MySql as its Metastore/Catalog, in this case you'll need to access the mySql to gather info about Hive/Impala. But.. there's some tools from hive side, that may can be useful for you 🙂 https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool https://cwiki.apache.org/confluence/display/Hive/Hive+MetaTool https://cwiki.apache.org/confluence/display/Hive/HCatalog+CLI Hope this helps! 🙂
... View more
06-08-2018
07:04 PM
Hey @priyal patel! Could you share your spark-submit parameters
... View more
06-08-2018
06:08 PM
Hey @Jason Sphar! Did you tried to use zookeeper-cli instead of zk-cli.sh ?
... View more
06-08-2018
04:59 PM
Hey @Rahul Kumar! First you will need to create a kafka topic and then you've a few options to insert data into a kafka topic using a kafka producer. - Using other tools to put data directly into kafka: E.g. Nifi, Kafka Connect, Spark, Storm, Flume and so on. - Programming a kafka producer: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_kafka-component-guide/content/ch_kafka-development.html And after kafka receives your data, you can consume the data using a kafka consumer and putting into HDFS. And to create a kafka consumer, the same options as above. - Using other tools to put data directly into kafka: E.g. Nifi, Kafka Connect, Spark, Storm, Flume and so on. - Programming a kafka consumer: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_kafka-component-guide/content/ch_kafka-development.html Here's an example doing this manually: [root@node1 ~]# kafka-topics.sh --zookeeper $ZKENSEMBLE --create --topic vin-hcc-nifi --partitions 3 --replication-factor 3Created topic "vin-hcc-nifi". [root@node1 ~]# kafka-console-producer.sh --broker-list node1:6667 --topic vin-hcc-nifi >Im brazilian and im testing this topic >please work! >HCC [root@node2 ~]# kafka-console-consumer.sh --bootstrap-server node2:6667 --topic vin-hcc-nifi --from-beginning Im brazilian and im testing this topic please work! Hope this helps!
... View more
06-07-2018
06:20 PM
1 Kudo
Hey @Melody S! I'm not a specialist in GEO data, but, this link may serve to you: https://community.hortonworks.com/articles/5129/geospatial-data-analysis-in-hadoop.html Hope this helps! 🙂
... View more
06-07-2018
03:46 PM
Hey @Thierry Vernhet! Hmm, I'm not sure if you can do this only using RegexSerde. Probably you will need to - Filter those NULL values in a query or - Construct an ETL by passing the NULL values and grabbing the ARMING values to other table or - Create a View or - Clean the data before putting into HDFS. I made a research at the code and here's some parts of the code that explain it: https://github.com/apache/hive/blob/master/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/RegexSerDe.java @Override public Object deserialize(Writable blob) throws SerDeException {
if (inputPattern == null) { throw new SerDeException( "This table does not have serde property \"input.regex\"!"); } Text rowText = (Text) blob;
Matcher m = inputPattern.matcher(rowText.toString());
// If do not match, ignore the line, return a row with all nulls. if (!m.matches()) { unmatchedRows++; if (unmatchedRows >= nextUnmatchedRows) { nextUnmatchedRows = getNextNumberToDisplay(nextUnmatchedRows); // Report the row LOG.warn("" + unmatchedRows + " unmatched rows are found: " + rowText); } return null; }
// Otherwise, return the row. for (int c = 0; c < numColumns; c++) { try { row.set(c, m.group(c + 1)); } catch (RuntimeException e) { partialMatchedRows++; if (partialMatchedRows >= nextPartialMatchedRows) { nextPartialMatchedRows = getNextNumberToDisplay(nextPartialMatchedRows); // Report the row LOG.warn("" + partialMatchedRows + " partially unmatched rows are found, " + " cannot find group " + c + ": " + rowText); } row.set(c, null); } } return row; } Hope this helps! 🙂
... View more
06-07-2018
03:29 PM
Hey @Jason Sphar. Hmm, at least it give us a clue. AFAIK the old versions of kafka used to accept zookeeper as a way to access the kafka cluster. And now you have this boostrap way (by passing any broker to reach the leader of partition + zk). For example, i did a manual test in kafka, by creating a kafka topic, feeding with kafka-console-producer and getting the kafka-consumer-group, with the bootstrap-server parameter. I'm kinda confused now, cause in my humble opinion its seems that Nifi Processor has created the consumer-group on kafka, but kafka is unable to use.. [root@node1 ~]# kafka-topics.sh --zookeeper $ZKENSEMBLE --create --topic vin-hcc-nifi --partitions 3 --replication-factor 3
Created topic "vin-hcc-nifi".
[root@node1 ~]# kafka-console-producer.sh --broker-list node1:6667 --topic vin-hcc-nifi
>Im brazilian and im testing this topic
>please work!
>HCC
[root@node2 ~]# kafka-console-consumer.sh --bootstrap-server node2:6667 --topic vin-hcc-nifi --from-beginning
Im brazilian and im testing this topic
please work!
HCC[root@node2 ~]# kafka-consumer-groups.sh --bootstrap-server node2:6667 --list
Note: This will not show information about old Zookeeper-based consumers.
console-consumer-14185
console-consumer-81664
[root@node2 ~]# kafka-consumer-groups.sh --bootstrap-server node2:6667 --describe --group console-consumer-81664
Note: This will not show information about old Zookeeper-based consumers.
Consumer group 'console-consumer-81664' has no active members.
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
vin-hcc-nifi 0 1 1 0 - - -
vin-hcc-nifi 2 1 1 0 - - -
vin-hcc-nifi 1 1 1 0 - - -
Well, anyway I saw some interesting points about the consumer -group. ========================================================
Checking consumer position Sometimes it's useful to see the position of your consumers. We have a tool that will show the position of all consumers in a consumer group as well as how far behind the end of the log they are. To run this tool on a consumer group named my-group consuming a topic named my-topic would look like this: 1 2 3 4 5 6 7 8 > bin /kafka-consumer-groups .sh --bootstrap-server localhost:9092 --describe --group my-group Note: This will only show information about consumers that use the Java consumer API (non-ZooKeeper-based consumers). TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID my-topic 0 2 4 2 consumer-1-029af89c-873c-4751-a720-cefd41a669d6 /127 .0.0.1 consumer-1 my-topic 1 2 3 1 consumer-1-029af89c-873c-4751-a720-cefd41a669d6 /127 .0.0.1 consumer-1 my-topic 2 2 3 1 consumer-2-42c1abd4-e3b2-425d-a8bb-e1ea49b29bb2 /127 .0.0.1 consumer-2 This tool also works with ZooKeeper-based consumers: 1 2 3 4 5 6 7 8 > bin /kafka-consumer-groups .sh --zookeeper localhost:2181 --describe --group my-group Note: This will only show information about consumers that use ZooKeeper (not those using the Java consumer API). TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID my-topic 0 2 4 2 my-group_consumer-1 my-topic 1 2 3 1 my-group_consumer-1 my-topic 2 2 3 1 my-group_consumer-2 ======================================================== Link: https://kafka.apache.org/10/documentation.html So after that, I made a research and fount this KB Article made by @mrodriguez (link below), we can try this to solve your problem 🙂 https://community.hortonworks.com/content/supportkb/175137/error-the-group-coordinator-is-not-available-when.html Hope this helps! And a special thanks for @mrodriguez Cheers
... View more