Member since
12-09-2015
43
Posts
18
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
13183 | 12-17-2015 07:27 AM |
01-09-2018
06:42 AM
then how to slove that issue,how to process the file and also i try (json_file = sqlContext.read.json('/user/admin/emp/empData.json') its also not work same issue only come
... View more
01-08-2018
10:14 AM
$pyspark
$json_file = sqlContext.read.json(sc.wholeTextFiles('/user/admin/emp/*').values())
18/01/08 15:34:36 ERROR Utils: Uncaught exception in thread stdout writer for python2.7
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at org.spark_project.guava.io.ByteStreams.copy(ByteStreams.java:211)
at org.spark_project.guava.io.ByteStreams.toByteArray(ByteStreams.java:252)
at org.apache.spark.input.WholeTextFileRecordReader.nextKeyValue(WholeTextFileRecordReader.scala:79)
at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.nextKeyValue(CombineFileRecordReader.java:65)
at org.apache.spark.rdd.NewHadoopRDD$anon$1.hasNext(NewHadoopRDD.scala:182)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:504)
at org.apache.spark.api.python.PythonRunner$WriterThread$anonfun$run$3.apply(PythonRDD.scala:328)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1877)
at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)
Exception in thread "stdout writer for python2.7" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at org.spark_project.guava.io.ByteStreams.copy(ByteStreams.java:211)
at org.spark_project.guava.io.ByteStreams.toByteArray(ByteStreams.java:252)
at org.apache.spark.input.WholeTextFileRecordReader.nextKeyValue(WholeTextFileRecordReader.scala:79)
at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.nextKeyValue(CombineFileRecordReader.java:65)
at org.apache.spark.rdd.NewHadoopRDD$anon$1.hasNext(NewHadoopRDD.scala:182)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:504)
at org.apache.spark.api.python.PythonRunner$WriterThread$anonfun$run$3.apply(PythonRDD.scala:328)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1877)
at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)
... View more
Labels:
- Labels:
-
Apache Spark
11-09-2016
06:11 AM
i already import import org.apache.spark.sql.hive.orc._
import org.apache.spark.sql._ still i have the same issue i am using HDP 2.3
... View more
11-08-2016
07:29 AM
Hi @Matthieu Lamairesse Error : scala> df.write.format("orc").saveAsTable("default.sample_07_new_schema") <console>:33: error: value write is not a member of org.apache.spark.sql.DataFrame df.write.format("orc").saveAsTable("default.sample_07_new_schema") ^
... View more
11-04-2016
02:12 PM
Is oozie can be installed and running without Hadoop. I refer oozie materials where hadoop is needed. Let's say i have 2 plain java applications.Now I want to chain these 2 java applications in a oozie workflow and want to produce the final json output from the 2nd java code. I don't want to code these 2 java applications in Map/reduce program. They should be plain java code. Please suggest. how to run oozie without hadoop ? is it possible
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Kafka
-
Apache Oozie
11-02-2016
01:29 PM
Hive Table: Orginal table Database Name : Student Tabe name : Student_detail id name dept 1 siva cse Need Output : Database Name : CSE Tabe name : New_tudent_detail s_id s_name s_dept 1 siva cse i want Migrate Student_detail hive table into New_tudent_detail without data lose using spark
Different colum name Different database Different table
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
09-22-2016
03:19 PM
Hi @Mats Johansson
i hv 1 name node and 3 data node cluster , acutualy my data node faild , so remove that data from my clster and add new data node to my cluster .
after i ass new node i got WARNING : There are 776885 missing blocks. Please check the logs or run fsck in order to identify the missing blocks
so i am remove the corrupte file in my cluster after i excute hdfs fsck / heal The filesystem under path '/' is HEALTHY change good but Under-replicated blocks: 1572982 (95.59069 %) Now problem was hadoop automaticaly rplicate the file one data node another data node 6 per second hadoop dfs -setrep -R -w 3 / excute the cmd it is show replicate the file 24days , i cannot wait for 24days i want accuthe fille and balance replication for the data node dfs.namenode.replication.work.multiplier.per.iteration 2 i dont hv below peroberty dfs.namenode.replication.max-streams dfs.namenode.replication.max-streams-hard-limit i am using hadoop 1.x serice what is the best way to balance my cluster
... View more
09-22-2016
05:11 AM
i execute the cmd hadoop dfs -setrep -R -w 3 / it is work fine ,i have 5,114,551 under replicated blocks its take 24days how do fasly slove that problem
... View more
Labels:
- Labels:
-
Apache Hadoop
03-22-2016
10:56 AM
i alrady done that step hive> add "somepath/mongo-hadoop-hive.jar" hive> add "somepath/mongo-hadoop-core.jar"
... View more
03-22-2016
07:28 AM
2 Kudos
Jar -> mongo-hadoop-core-1.4.0,mongo-hadoop-hive-1.4.0,mongo-java-driver-2.10.1 hive> CREATE EXTERNAL TABLE minute_bars
> (
>
> id STRING,
> Symbol STRING,
> `Timestamp` STRING,
> Day INT,
> Open DOUBLE,
> High DOUBLE,
> Low DOUBLE,
> Close DOUBLE,
> Volume INT
> )
> STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
> WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id",
> "Symbol":"Symbol", "Timestamp":"Timestamp", "Day":"Day", "Open":"Open", "High":"High", "Low":"Low", "Close":"Close", "Volume":"Volume"}')
> TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/marketdata.minbars');
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. com/mongodb/hadoop/io/BSONWritable
hive>
... View more
Labels:
- Labels:
-
Apache Hive