Member since
08-15-2017
31
Posts
28
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
896 | 01-30-2018 07:47 PM | |
288 | 08-31-2017 02:05 AM | |
223 | 08-25-2017 05:35 PM |
11-19-2018
07:08 PM
I have a streaming job running using SBT. whenever i do "sbt run", i see below error. I see it is because workers are not able to get the required kafka dependency. build.sbt: name := "MyAPP"
version := "0.5"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.3.1",
"org.apache.spark" %% "spark-sql" % "2.3.1",
"org.apache.spark" %% "spark-streaming" % "2.3.1",
"org.apache.spark" %% "spark-streaming-kafka-0-10" % "2.3.1",
"org.apache.spark" %% "spark-sql-kafka-0-10" % "2.3.1",
"com.typesafe" % "config" % "1.3.2",
"org.apache.logging.log4j" % "log4j-api" % "2.11.0",
"org.apache.logging.log4j" % "log4j-core" % "2.11.0",
"org.apache.logging.log4j" %% "log4j-api-scala" % "11.0",
"org.scalatest" %% "scalatest" % "3.0.5" % "test",
"org.apache.kafka" % "kafka_2.11" % "0.10.2.2",
"org.apache.kafka" % "kafka-clients" % "0.10.2.2",
"ml.combust.mleap" %% "mleap-runtime" % "0.11.0",
"com.typesafe.play" % "play-json_2.11" % "2.6.10",
"com.fasterxml.jackson.module" % "jackson-module-scala_2.11" % "2.8.11",
"net.liftweb" %% "lift-json" % "3.3.0"
)
lazy val excludeJpountz = ExclusionRule(organization = "net.jpountz.lz4", name = "lz4")
lazy val kafkaClients = "org.apache.kafka" % "kafka-clients" % "0.10.2.2" excludeAll(excludeJpountz)
logBuffered in Test := false
fork in Test := true
// Don't run tests before assembling
test in assembly := {}
retrieveManaged := true
assemblyMergeStrategy in assembly := {
case "META-INF/services/org.apache.spark.sql.sources.DataSourceRegister" => MergeStrategy.concat
case PathList("META-INF", xs@_*) => MergeStrategy.discard
case "log4j.properties" => MergeStrategy.discard
case x => MergeStrategy.first
}
unmanagedBase := baseDirectory.value / "lib" Is there a way of passing the dependency jars along with sbt run command? Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 13.0 failed 4 times, most recent failure: Lost task 1.3 in stage 13.0 (TID 227, 10.148.9.12, executor 1): java.lang.ClassNotFoundException: org.apache.spark.sql.kafka010.KafkaContinuousDataReaderFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
... View more
Labels:
04-04-2018
03:03 PM
The spark consumer have to read topics with same name from different Bootstrap servers. So in need to create two JavaDstreams, performing union, process the stream and commit the offsets. JavaInputDStream<ConsumerRecord<String,GenericRecord>> dStream =KafkaUtils.createDirectStream(...); Problem is JavaInputDStream doesn't support dStream.Union(stream2) ; If i use, JavaDStream<ConsumerRecord<String,GenericRecord>> dStream=KafkaUtils.createDirectStream(...); But JavaDstream doesn't support,
... View more
Labels:
03-29-2018
03:50 PM
I'm trying to read and store messages from a kafka topic using Spark Structured Streaming. The records read are in df. The below code shows zero records. If i replace the format with format("console"), i'm able to see the records being printed on console. StreamingQuery initDF = df.writeStream()
.outputMode("append")
.format("memory")
.queryName("initDF")
.trigger(Trigger.ProcessingTime(1000))
.start();
sparkSession.sql("select * from initDF").show();
initDF.awaitTermination();
... View more
Labels:
03-29-2018
03:49 PM
I'm trying to read and store messages from a kafka topic using Spark Structured Streaming. The records read are in df. The below code shows zero records. If i replace the format with format("console"), i'm able to see the records being printed on console. StreamingQuery initDF = df.writeStream()
.outputMode("append")
.format("memory")
.queryName("initDF")
.trigger(Trigger.ProcessingTime(1000))
.start();
sparkSession.sql("select * from initDF").show();
initDF.awaitTermination();
... View more
Labels:
02-21-2018
08:01 PM
I'm reading data from Kafka topics using Spark streaming. Java Dstreams is converted to RDD and custom method is called on each rdd. But the method myProcessor.executeValidation() is not called/Executed.
InputData is JavaDStream object, which is not null.
if (inputData != null) {
inputData.foreachRDD(rdd -> myProcessor.executeValidation(rdd));
}
... View more
Labels:
01-30-2018
07:47 PM
@Viswa Add this configuration in your pom.xml under build tag, rather than adding jar in spark-submit. <descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
... View more
12-01-2017
06:39 PM
@Sai Sandeep Can you please share the HS2 logs from /var/log/hive/hiveserver2.log.
... View more
11-16-2017
02:59 PM
2 Kudos
@Abhay Singh The tutorial is using HDF-2.1.0 which is an older version. You can download this from the archive section on this page: https://hortonworks.com/downloads/#sandbox
... View more
11-13-2017
05:53 PM
@Dinesh Chitlangia That helped. Thank you..!!
... View more
11-13-2017
05:40 PM
1 Kudo
I am using Spark 1.6. I get the following error when I try to read a json file as per documentation of Spark 1.6 scala> val colors = sqlContext.read.json("C:/Downloads/colors.json");
colors: org.apache.spark.sql.DataFrame = [_corrupt_record: string] 1. The Spark 1.6 works fine and I have been able to read other text files. 2. Attached the json file as text(since upload of json is blocked), I have also validated the json using jsoneditoronline to ensure the json file is well formulated colors.txt
... View more
Labels:
11-07-2017
05:27 PM
In HDPCD practice exam instance available in AWS, while trying --hive-import in sqoop, mentioning --target-dir is an option. But this instance doesnt run the sqoop query until --target-dir is not mentioned. sqoop import --connect jdbc:mysql://namenode/flightinfo --query "SELECT * from weather WHERE precipitation!=0 AND \$CONDITIONS" --hive-import -m 1 --username root -P
Warning: /usr/hdp/2.2.0.0-2041/sqoop/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
17/11/05 00:39:34 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5.2.2.0.0-2041
Enter password:
Must specify destination with --target-dir.
Try --help for usage instructions.
... View more
Labels:
11-03-2017
07:42 PM
1 Kudo
@manikandan ayyasamy
The same scenario is solved here:
https://community.hortonworks.com/questions/134841/hive-convert-all-values-for-a-column-to-a-comma-se.html?childToView=134850#comment-134850
Hope this helps.
... View more
10-27-2017
03:52 AM
2 Kudos
I am following this link : https://cwiki.apache.org/confluence/display/AMBARI/Enabling+HDFS+per-user+Metrics However, I am still not able to view the metrics in Grafana. Is there a different set of settings that we need to do for a HA environment?
... View more
Labels:
10-16-2017
10:45 PM
Thank you @Dinesh Chitlangia. This solved the issue.
... View more
10-16-2017
09:43 PM
@Dinesh Chitlangia I had verified all the listed know issues mentioned in the above URL. Still the problem exists.
... View more
10-16-2017
09:37 PM
2 Kudos
I just upgraded from HDP-2.6.1 to HDP-2.6.2 Both are kerberizzed clusters and doAs=True in Hive. In 2.6.1, jdbc hive interpreter was working fine.After upgrading, even simple queries like 'Show Databases' results in error in doAs. My configurations are same in both the versions. org.apache.zeppelin.interpreter.InterpreterException: Error in doAs at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnection(JDBCInterpreter.java:415) at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:633) at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:733) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:101) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:502) at org.apache.zeppelin.scheduler.Job.run(Job.java:175) at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1884) at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnection(JDBCInterpreter.java:407) ... 13 more Caused by: java.sql.SQLException: Could not open client transport for any of the Server URI's in ZooKeeper: java.net.ConnectException: Connection refused at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:218) at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:156) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:208) at org.apache.commons.dbcp2.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:79) at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:205) at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861) at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435) at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363) at org.apache.commons.dbcp2.PoolingDriver.connect(PoolingDriver.java:129) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:270) at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnectionFromPool(JDBCInterpreter.java:362) at org.apache.zeppelin.jdbc.JDBCInterpreter.access$000(JDBCInterpreter.java:89) at org.apache.zeppelin.jdbc.JDBCInterpreter$1.run(JDBCInterpreter.java:410) at org.apache.zeppelin.jdbc.JDBCInterpreter$1.run(JDBCInterpreter.java:407) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) ... 14 more Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused at org.apache.thrift.transport.TSocket.open(TSocket.java:185) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:248) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssuming
... View more
Labels:
10-16-2017
09:30 PM
@Shu Thank you for the explanation. I wanted to know Hive queries (Hive sql) where there is no reducer phase at all, only mapper phase. Is there such an example ?
... View more
10-14-2017
05:06 PM
I'm looking for Hive query scenarios, where it uses only mappers or only reducers.
... View more
Labels:
09-15-2017
09:50 PM
Thank you @Dinesh Chitlangia !! That helped.
... View more
09-15-2017
08:17 PM
3 Kudos
In the below query, I want to force a column value to null by casting it to required datatype. This works with usual sql. But Hive is throwing the exception. Select * from(select driverid, name, ssn from drivers where driverid<15
UNION ALL
Select driverid,name, cast(null as bigint) from drivers where driverid BETWEEN 18 AND 21) T
;
SemanticException 3:48 Schema of both sides of union should match. T-subquery2 does not have the field ssn. Error encountered near token 'drivers'.
... View more
Labels:
08-31-2017
02:05 AM
3 Kudos
Just verified - HDP 2.6.1 has Hadoop/HDFS 2.7.3 https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_release-notes/content/comp_versions.html The issue you have reported is a bug that has been fixed in HDFS 2.8.0 as per this Apache JIRA: https://issues.apache.org/jira/browse/HDFS-8805
... View more
08-28-2017
07:49 PM
Thank you @Nandish B Naidu..!! The solution worked.
... View more
- Tags:
- thank you
- that worked
08-27-2017
08:05 PM
Thank You @Sonu Sahi for the solution.
... View more
08-27-2017
08:04 PM
Thank you @Sindhu for the elaborate explanation and the solution..!!
... View more
08-25-2017
05:35 PM
3 Kudos
The way Ambari has been designed is that during the upgrade it will backup all paths listed under dfs.namenode.name.dir Thus, currently there is no feature that let's you select a particular file to backup instead of all. This could be a new feature!
... View more
08-24-2017
10:12 PM
Thank you..!! It worked.
... View more
08-24-2017
09:42 PM
1 Kudo
My dataset set is as below: 700,Angus (1995),Comedy 702,Faces (1968),Drama 703,Boys (1996),Drama 704,"Quest, The (1996)",Action|Adventure 705,Cosi (1996),Comedy 747,"Stupids, The (1996)",Comedy Create table movies(
movieid int,
title String,
genre string)
row format delimited
fields terminated by ','
;
Select title , genre from movies;
Since the rows have comma separated values, the records like 704,"Quest, The (1996)",Action|Adventure returns Quest as title, instead of Quest,The (1996). And Genre value is shown as The(1996) instead of Action|Adventure. How to load such data correctly by escaping the delimiter in the value ?
... View more
Labels:
08-24-2017
08:28 PM
4 Kudos
Select name, city from people; The above query results: jon Atlanta jon Newyork snow LA snow DC But i want the result as a single row as follows: jon Atlanta,Newyork snow LA,DC
... View more
Labels:
08-16-2017
05:12 PM
@Andres Urrego Thank you..!! Yes. But when we are using only 'Import' instead of 'Import-all-tables', we can specify where to store the table, instead of the default HDFS path using --target-dir . As per below code, the data is imported to the target directory specified. sqoop import --connect jdbc:mysql://localhost/my_db --table EMP --username root --target-dir 'sqoopdata/emp' -m 1
... View more
08-15-2017
11:23 PM
4 Kudos
While trying to use import-all-tables if i specify the target-dir , i'm getting the below error. $ sqoop import-all-tables --connect jdbc:mysql://localhost/db --username root --target-dir 'alltables/data' -m 1
17/08/14 14:08:07 ERROR tool.BaseSqoopTool: Error parsing arguments for import-all-tables:
17/08/14 14:08:07 ERROR tool.BaseSqoopTool: Unrecognized argument: --target-dir
17/08/14 14:08:07 ERROR tool.BaseSqoopTool: Unrecognized argument: alltables/data
17/08/14 14:08:07 ERROR tool.BaseSqoopTool: Unrecognized argument: -m
17/08/14 14:08:07 ERROR tool.BaseSqoopTool: Unrecognized argument: 1 Appreciate any help.!!
... View more
Labels: