Member since
12-29-2017
33
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2053 | 11-14-2018 01:40 AM |
04-25-2019
08:27 PM
Thanks, Shu.. It is working now
... View more
04-18-2019
10:28 PM
I am trying to create a hive table in parquet format with snappy compression. Instead of sqlContext I am using HiveContext to directly save my dataframe results into a table using saveAsTable("<table name>"). I set the format using "hc.setConf('spark.sql.parquet.compression.codec','snappy')" But the hive table is always created as parquet with gz compression instead of parquet with snappy compression codec. Is there any solution for this?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
01-23-2019
04:42 PM
1 Kudo
I am trying to start kafka server. But I am getting the below error. Any idea about fix? [2019-01-23 16:36:17,954] ERROR [KafkaServer id=0] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
org.apache.kafka.common.KafkaException: Socket server failed to bind to 0.0.0.0:9092: Address already in use.
at kafka.network.Acceptor.openServerSocket(SocketServer.scala:442)
at kafka.network.Acceptor.<init>(SocketServer.scala:332)
at kafka.network.SocketServer$$anonfun$createAcceptorAndProcessors$1.apply(SocketServer.scala:149)
at kafka.network.SocketServer$$anonfun$createAcceptorAndProcessors$1.apply(SocketServer.scala:145)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at kafka.network.SocketServer.createAcceptorAndProcessors(SocketServer.scala:145)
at kafka.network.SocketServer.startup(SocketServer.scala:94)
at kafka.server.KafkaServer.startup(KafkaServer.scala:250)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
at kafka.Kafka$.main(Kafka.scala:75)
at kafka.Kafka.main(Kafka.scala)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
at kafka.network.Acceptor.openServerSocket(SocketServer.scala:438)
... View more
Labels:
- Labels:
-
Apache Kafka
12-04-2018
08:13 PM
This issue got fixed by doing using single quote like below. val productsRDD=products.map(rec=>{ var r = rec.split('|') (r(0).toInt, r(1).toInt, r(2), r(3), r(4).toFloat, r(5)) })
... View more
11-27-2018
08:29 PM
I am trying to process the data after sqoop import. But I am getting the below error while loading the data into RDD. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) Below I mentioned my SQOOP import sqoop import \ -- connect "jdbc:mysql://01-mysql-test232855.envnxs.net:3306/retail_db" \ -- username autoenv_root -- password UkdWMmIzQnpVbTlqYTNvPQ = \ -- table lnld_products_23425 \ -- as -textfile \ -- target -dir / user / mpalanisamy / lnld_products_23425 \ -- fields -terminated - by '|'; Below mentioned the Spark commands for processing it. case class products_rec( product_id: Int,product_category_id: Int,product_name: String,product_desc: String,product_price: Double,product_image: String,... ... ) val products = sc.textFile("/user/mpalanisamy/problem2/lnld_products_23425") val productsRDD=products.map(rec=> { var r = rec.split("|") (r(0).toInt, r(1).toInt, r(2), r(3), r(4).toFloat, r(5)) }) After giving the below I am getting the above error. productsRDD.take(10).foreach(println) It looks like I am not doing anything wrong. But still I am getting the below error. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 17, 01.hadoop-datanode.test232855.nym2): java.lang.NumberFormatException: For input string: "|"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$anonfun$1.apply(<console>:28)
at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$anonfun$1.apply(<console>:26)
at scala.collection.Iterator$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$anonfun$take$1$anonfun$28.apply(RDD.scala:1302)
at org.apache.spark.rdd.RDD$anonfun$take$1$anonfun$28.apply(RDD.scala:1302)
at org.apache.spark.SparkContext$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.SparkContext$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$failJobAndIndependentStages(DAGScheduler.scala:1283)
at org.apache.spark.scheduler.DAGScheduler$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
at org.apache.spark.scheduler.DAGScheduler$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
at org.apache.spark.scheduler.DAGScheduler$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at org.apache.spark.scheduler.DAGScheduler$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
at org.apache.spark.util.EventLoop$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
at org.apache.spark.rdd.RDD$anonfun$take$1.apply(RDD.scala:1302)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
at org.apache.spark.rdd.RDD.take(RDD.scala:1276)
at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:29)
at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:34)
at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:36)
at $iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:38)
at $iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:40)
at $iwC$iwC$iwC$iwC$iwC.<init>(<console>:42)
at $iwC$iwC$iwC$iwC.<init>(<console>:44)
at $iwC$iwC$iwC.<init>(<console>:46)
at $iwC$iwC.<init>(<console>:48)
at $iwC.<init>(<console>:50)
at <init>(<console>:52)
at .<init>(<console>:56)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:683)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:214)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NumberFormatException: For input string: "|"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$anonfun$1.apply(<console>:28)
at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$anonfun$1.apply(<console>:26)
at scala.collection.Iterator$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$anonfun$take$1$anonfun$28.apply(RDD.scala:1302)
at org.apache.spark.rdd.RDD$anonfun$take$1$anonfun$28.apply(RDD.scala:1302)
at org.apache.spark.SparkContext$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.SparkContext$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
... View more
Labels:
- Labels:
-
Apache Spark
11-18-2018
10:58 PM
After trying multiple options finally fixed this issue to passing per node and rack min sizes. I passed the values are parms to Sqoop like below. sqoop export \
-Dmapreduce.input.fileinputformat.split.minsize.per.rack=749983 \
-Dmapreduce.input.fileinputformat.split.minsize.per.node=749983 \
--connect jdbc:mysql://01-mysql-test232855.envnxs.net:3306/retail_export \
--username autoenv_root \
--export-dir /user/hive/warehouse/retail_db.db/orders \
-table orders\
-P
... View more
11-18-2018
10:56 PM
After trying multiple options. I fixed the issue by giving the below options in SQOOP Export. I fixed the per node min split size, min rack size. Then the job ran successfully. sqoop export \
-Dmapreduce.input.fileinputformat.split.minsize.per.rack=749983 \
-Dmapreduce.input.fileinputformat.split.minsize.per.node=749983 \
--connect jdbc:mysql://01-mysql-test232855.envnxs.net:3306/retail_export \
--username autoenv_root \
--export-dir /user/hive/warehouse/retail_db.db/orders \
-table orders\
-P
... View more
11-15-2018
08:45 PM
Please find the screen shot the HIVE QL. screen-shot-2018-11-15-at-34106-pm.png
... View more
11-15-2018
08:44 PM
It is working perfectly for Hive1.1.0-cdh5.4.0. Below I mentioned hive QL I used for creating an dummy table. Also attached the screen shot for the reference.
hive (retail_warehouse)> create table vendor(
> vendor_id int,
> vendor_name string)
> row format delimited fields terminated by "\;"
> stored as textfile;
OK
Time taken: 0.857 seconds
... View more
11-15-2018
08:26 PM
I am using Hive 1.1.0-cdh5.4.0. My HIVE runs in cloudera 5.4.0.
mpalanisamy@01:~$ hive --version
Hive 1.1.0-cdh5.4.0
Subversion file:///data/jenkins/workspace/generic-package-ubuntu64-14-04/CDH5.4.0-Packaging-Hive-2015-04-21_12-09-14/hive-1.1.0+cdh5.4.0+103-1.cdh5.4.0.p0.56~trusty -r Unknown
Compiled by jenkins on Tue Apr 21 12:12:12 PDT 2015
From source with checksum 2bf708133bf40715eaa74b142056808a
... View more