Member since
10-17-2016
45
Posts
10
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
741 | 04-21-2017 06:14 PM | |
2608 | 04-19-2017 05:59 PM | |
674 | 04-11-2017 08:24 PM |
04-11-2018
03:34 PM
@vishal dutt - Validate your hive section : Advanced hive-atlas-application.properties in Ambari
atlas.hook.hive.synchronous - boolean, true to run the hook synchronously. default false. Recommended to be set to false to avoid delays in hive query completion.
... View more
06-23-2017
07:10 PM
@Matt Clarke, @Timothy Spann, @Shishir Saxena, @Bryan Bende, @milind pandit, @Dan Chaffelson, @Pierre Villard, @Andrew Grande Apache NiFi is installed on non-hadoop environment and targets to ingest processed files into HDFS (Kerberized cluster). Is it workable solution to achieve above use case as I face multiple error even after performing below activities. Please advise if there is anything additional I have to perform. * Firewall restriction between NiFi and management server is open and ports (22,88,749,389) are open. * Firewall restriction between NiFi and edge node server is open and ports (22, 2181,9083) are open * krb5.conf file from hadoop cluster along with keytab for application user is copied to NiFi server. Running kinit using application user and keytab - successful token is listed under klist. * SSH operation is successful and also SFTP into hadoop server works fine. * configured hdfs-site.xml and core-site.xml files into NiFi. * PutHDFS processor fails to ingest data throwing authentication error. Is there something I am missing here. Hadoop environment : 4 management node and 1 edge node on public network within cluster and 4 worker nodes on private network. Alternative solution to this, I have installed NiFi in edge node and everything works fine. I need to make Apache NiFi work on non-hadoop environment and ingest data into HDFS and Hive.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache NiFi
06-05-2017
07:34 PM
Hi. I have a configuration to fetch tweets via getTwitter filtered by language and set of keywords (or conditions and also 2 word combination - eg: nifi,spark,scala,spark dataframe,spark rdd, spark dataset). I have a similar combination of 210 words and I see the results with sometimes non-matching tweets and also duplicates. Please advice. @Scott Shaw @Artem Ervits @Timothy Spann @Pierre Villard
... View more
Labels:
- Labels:
-
Apache NiFi
04-25-2017
08:29 PM
Support from you guys is much appreciated - @ Andrew Grande, @Timothy Spann, @Constantin Stanca, @Matt Burgess
... View more
04-24-2017
07:47 PM
1 Kudo
Please advise on resolving this connectivity issue. NiFi installed on non-hadoop environment as standalone application is accessing hive on a clustered hadoop environment. What should my Hive metastore URI be : I tried with multiple options of values from hive-site.xml and also replacing that with edge node ip address. Facing the below error. Suspect its because of IP restrictions from non hadoop server to access hadoop edge node or Hive metastore URI as per hive-site.xml. Please advice. ERROR [Timer-Driven Process Thread-9] o.a.n.processors.hive.PutHiveStreaming PutHiveStreaming[id=87b35ed3-015b-1000-8c8d-a99b97d59afa] PutHiveStreaming[id=87b35ed3-015b-1000-8c8d-a99b97d59afa] failed to process session due to com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: Unable to instantiate org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient: com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: Unable to instantiate org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient ERROR [Timer-Driven Process Thread-9] o.a.n.processors.hive.PutHiveStreaming
com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: Unable to instantiate org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection timed out (Connection timed out) Support from you guys is much appreciated - @ Andrew Grande, @Timothy Spann, @Constantin Stanca, @Matt Burgess
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi
04-21-2017
06:15 PM
Deployed latest versions and able to access required hive processors.
... View more
04-21-2017
06:14 PM
Closed issue with data error.
... View more
04-21-2017
06:01 PM
1 Kudo
I am still facing this issue - Any support for resolution. How do I revert back and ensure my local ports are used for my sandbox. Describing the issue again - I was implementing the azure cloud deployment of HDP sandbox and faced above error, later when I had my local setup of sandbox on a virtual box - then I am unable to access 127.0.0.1 and respective ambari or zeppelin ports - Please advise as I need my local sandbox to work and ignore the azure stuff.
... View more
04-19-2017
05:59 PM
1 Kudo
Thanks @Timothy Spann. Looked at your comments just now after i did some modifications to existing nar file. I edited the current ScanAttribute file to iterate the dictionary words and check for its presence in twitter message. Customizing is good and look forward to make more solutions integrated into NiFi.
... View more
04-19-2017
03:47 PM
1 Kudo
@Timothy Spann : Greetings. I need your assistance in customizing processor - ScanAttribute. Flow file data considering it as tweet_msg attribute needs to be checked if it contains dictionary file words.
... View more
04-17-2017
05:45 PM
1 Kudo
ScanAttribute processor works fine for matching exact value and not substring of attribute value. I looked into the code {{if (dictionary.contains(entry.getValue())) }} which means matching condition if dictionary words contain any of attribute words. In my case attribute is a tweet - collection of words and would fail all as unmatched. Need to look at a logic to see if attribute : tweet (collection of words) contain dictionary words. Customization of processor or other ways of extract text from tweet attribute.
... View more
04-11-2017
08:24 PM
Resolved Solution Steps * Firewall restriction removal * Twitter Credentials regeneration * Restart of NIFI deleting provenance repository * Ensured NTP server is sync Worked fine.
... View more
04-11-2017
04:04 PM
NiFi processor - GetTwitter with correct twitter credentials (Consumer Key, Consumer secret, Access token, Access token secret) is not working. Throws error : Received error HTTP_ERROR: HTTP/1.1 401 Authorization Required. Will attempt to reconnect. It used to work fine for me in different environments. I have checked with network team on port blocks earlier and resolved Failed to establish Connection error. I tried multiple times of regenerating keys and updating the processor but still same error. Please advice.
... View more
Labels:
- Labels:
-
Apache NiFi
04-05-2017
07:20 PM
Hi, Deploying the hortonworks sandbox on azure, I faced issue in ssh tunneling section. vi ~/.ssh/config This has permission issues which was resolved using sudo command Following which I got struck with ssh azureSandbox / Bad owner or permissions on /home/freedom/.ssh/config. I resolved this step by chmod 600 ~/.ssh/config Now I am struck with this error - Please advice to move forward. [freedom@sandbox testing]$ ssh azureSandbox
Password:
bind: Address already in use
channel_setup_fwd_listener: cannot listen to port: 8080
bind: Address already in use
channel_setup_fwd_listener: cannot listen to port: 8888
bind: Address already in use
channel_setup_fwd_listener: cannot listen to port: 9995
bind: Address already in use
channel_setup_fwd_listener: cannot listen to port: 9996
bind: Address already in use
channel_setup_fwd_listener: cannot listen to port: 8886
bind: Address already in use
channel_setup_fwd_listener: cannot listen to port: 10500
bind: Address already in use
channel_setup_fwd_listener: cannot listen to port: 4200
bind: Address already in use
channel_setup_fwd_listener: cannot listen to port: 2222
Could not request local forwarding.
Last login: Wed Apr 5 18:48:00 2017 from 74.115.192.4
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
01-19-2017
04:38 PM
1 Kudo
Hi. Considering a spark sql or data set with 400 columns and 1 million rows. Not all rows have all 400 columns populated and essentially they cant be as not null columns as well. Need to understand if null value consumes space in memory and if so how much does it take. Do we have any fact sheet or article of all data types size in bytes or bits.
... View more
Labels:
- Labels:
-
Apache Spark
12-20-2016
11:00 PM
@Timothy Spann :: Whats missing in zeppelin version of Hortonworks sandbox 2.4 on azure causing this error.
... View more
12-20-2016
10:31 PM
I guess its something to do with zeppelin version. I didn't face the issue while running it in spark-shell programming. Thanks for the support as always when needed.
... View more
12-20-2016
07:47 PM
@Timothy Spann I am working on the hortonworks sandbox 2.4 on azure environment. Currently running the program in zeppelin. Your code as well threw the same above listed error. Please advice.
... View more
12-20-2016
07:15 PM
@Timothy Spann - I use 1.6 version sc.version res377: String = 1.6.0 I still face that error - Not sure why.
... View more
12-20-2016
05:58 PM
1 Kudo
I facing error while transform (tokenizer.transform) - Please advice ----------------------------------------------------------------------------------------------------------------------------------------------------------- import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer} val sentenceData = sqlContext.createDataFrame(Seq(
(0, "Hi I heard about Spark"),
(0, "I wish Java could use case classes"),
(1, "Logistic regression models are neat")
)).toDF("label", "sentence") val tokenizer = new Tokenizer().setInputCol("sentence").setOutputCol("words") val wordsData = tokenizer.transform(sentenceData) ----------------------------------------------------------------------------------------------------------------------------------------------------------- Error message for reference --> import org.apache.spark.ml.feature sentenceData: org.apache.spark.sql.DataFrame = [label: int, sentence: string]
tokenizer: org.apache.spark.ml.feature.Tokenizer = tok_6ac8a05b403d <console>:61: error: type mismatch;
found : org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.DataFrame
required: org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.DataFrame
val wordsData = tokenizer.transform(sentenceData)
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache Zeppelin
12-20-2016
05:54 PM
While running over tutorial https://community.hortonworks.com/articles/53903/spark-machine-learning-pipeline-by-example.html, I face issue in below line val header = flight2007.first val trainingData = flight2007
.filter(x => x != header)
unhandled exception while transforming <console>
error: uncaught exception during compilation: java.lang.NullPointerException
--------------------- Error message in detail after successful display of the header val. while compiling: <console>
during phase: specialize
library version: version 2.10.5
compiler version: version 2.10.5
reconstructed args: -classpath /usr/hdp/2.4.0.0-169/zeppelin/lib/interpreter/spark/zeppelin-spark-0.6.0.2.4.0.0-169.jar:/etc/spark/2.4.0.0-169/0:/usr/hdp/2.4.0.0-169/spark/lib/spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar:/usr/hdp/2.4.0.0-169/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/hdp/2.4.0.0-169/spark/lib/datanucleus-core-3.2.10.jar:/usr/hdp/2.4.0.0-169/spark/lib/datanucleus-rdbms-3.2.9.jar:/etc/hadoop/2.4.0.0-169/0:/usr/hdp/current/zeppelin-server/lib/interpreter/spark/zeppelin-spark-0.6.0.2.4.0.0-169.jar:/usr/hdp/current/spark-historyserver/conf:/usr/hdp/2.4.0.0-169/spark/lib/spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar:/usr/hdp/2.4.0.0-169/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/hdp/2.4.0.0-169/spark/lib/datanucleus-core-3.2.10.jar:/usr/hdp/2.4.0.0-169/spark/lib/datanucleus-rdbms-3.2.9.jar:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/zeppelin-server/lib/interpreter/spark/zeppelin-spark-0.6.0.2.4.0.0-169.jar
last tree to typer: TypeTree(anonymous class $anonfun)
symbol: anonymous class $anonfun (flags: final <synthetic>)
symbol definition: final class $anonfun extends AbstractFunction1[Array[String],Flight] with Serializable
tpe: scala.runtime.AbstractFunction1[Array[String],Flight] with Serializable
symbol owners: anonymous class $anonfun -> value trainingData -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $read -> package $line568
context owners: anonymous class $anonfun -> value trainingData -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $read -> package $line568
== Enclosing template or block ==
ClassDef( // final class $anonfun extends AbstractFunction1[String,Boolean] with Serializable
final <synthetic> @{ SerialVersionUID(0) }
"$anonfun"
[]
Template( // val <local $anonfun>: <notype>, tree.tpe=scala.runtime.AbstractFunction1[String,Boolean] with Serializable
"scala.runtime.AbstractFunction1", "scala.Serializable" // parents
ValDef(
private
"_"
<tpt>
<empty>
)
// 2 statements
DefDef( // def <init>(): scala.runtime.AbstractFunction1[String,Boolean] with Serializable
<method> <triedcooking>
"<init>"
[]
List(Nil)
<tpt> // tree.tpe=scala.runtime.AbstractFunction1[String,Boolean] with Serializable
Block( // tree.tpe=Unit
Apply( // def <init>(): scala.runtime.AbstractFunction1[T1,R] in class AbstractFunction1, tree.tpe=scala.runtime.AbstractFunction1[String,Boolean]
$anonfun.super."<init>" // def <init>(): scala.runtime.AbstractFunction1[T1,R] in class AbstractFunction1, tree.tpe=()scala.runtime.AbstractFunction1[String,Boolean]
Nil
)
()
)
)
DefDef( // final def apply(x: String): Boolean
<method> final
"apply"
[]
// 1 parameter list
ValDef( // x: String
<param> <triedcooking>
"x"
<tpt> // tree.tpe=String
<empty>
)
<tpt> // tree.tpe=Boolean
Apply( // final def !=(x$1: Object): Boolean in class Object, tree.tpe=Boolean
"x"."$bang$eq" // final def !=(x$1: Object): Boolean in class Object, tree.tpe=(x$1: Object)Boolean
Apply( // val header(): String, tree.tpe=String
$iwC.this.$VAL1317().$iw().$iw().$iw().$iw().$iw().$iw().$iw().$iw().$iw().$iw().$iw().$iw().$iw().$iw().$iw().$iw().$iw()."header" // val header(): String, tree.tpe=()String
Nil
)
)
)
)
)
== Expanded type of tree ==
TypeRef(
TypeSymbol(
final class $anonfun extends AbstractFunction1[Array[String],Flight] with Serializable
)
)
unhandled exception while transforming <console>
error: uncaught exception during compilation: java.lang.NullPointerException
----------------------------------------------
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache Zeppelin
11-28-2016
10:06 PM
You are right - its 0.3 and I see that in the properties file # Core Properties#
nifi.version=0.3.0-SNAPSHOT Please advice on steps for migration to latest version
... View more
11-28-2016
09:47 PM
nifi-ambari-stackversions.png @matt - Are you sure NiFi version is 0.3 as I see the amabari stacks and versions where its listed as 1.1. Please double check.
... View more
11-28-2016
09:40 PM
Hi Matt. Thanks and appreciate your response. I installed sandbox from microsoft azure and used ambari to install the NiFi service. Seems like it holds the outdated one - Could you please share me the steps to upgrade to latest NiFi version on this current HDP or any alternatives.
... View more
11-28-2016
07:25 PM
hdp-version.png nifi-version.png Can I know why HIVE related processors are not getting listed under my Apache NiFi Processors. I am looking for putHiveStreaming, putHiveQL, selectHiveQL. Please advice.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi
11-23-2016
02:10 AM
Hi. Appreciate your answer. Sorry for delay in response - I was away for a while. ScanAttribute processor sounds to be the answer to my question and I did try a sample but its not picking up value and filtering the incoming tweets accordingly. Can you provide an example with snapshots. Incoming tweets via getTwitter Processor >> ScanAttribute processor. (dictionary file : <location of text file with keywords), attribute pattern - $.text, Match criteria - atleast one value must match)
... View more
11-21-2016
08:40 PM
Hi. I am integrating Apache NiFi with Twitter to fetch my tweets. As part of sample programming, I am using getTwitter processor and fetching tweets filtered by keyword and filtered endpoint. I need to understand the volume of tweets supported and options along with pricing in terms of licensing when I need to access a major content. As I understand of an individual user its 3200 tweets and 5000 tweets by keyword limitation. Can you help me understand on the pricing and any best firehose or options to look at larger scale volume of tweets.
... View more
Labels:
- Labels:
-
Apache NiFi
11-15-2016
06:27 PM
2 Kudos
I want processor to fetch attribute value on run time. Example - if I am filtering twitter feeds by specific keywords, i want to maintain the list of keywords in a separate repository like file or table and not confined as a text box value. In that case, how will NiFi processor fetch that values from external file or table to attribute value.
... View more
Labels:
- Labels:
-
Apache NiFi
11-08-2016
07:46 PM
Hi. Any suggestions on this issue. I am waiting to march forward on this use-case
... View more