About hadoopsmi

hadoopsmi · ‎01-09-2018

then how to slove that issue,how to process the file and also i try (json_file = sqlContext.read.json('/user/admin/emp/empData.json') its also not work same issue only come

hadoopsmi · ‎01-08-2018

$pyspark $json_file = sqlContext.read.json(sc.wholeTextFiles('/user/admin/emp/*').values()) 18/01/08 15:34:36 ERROR Utils: Uncaught exception in thread stdout writer for python2.7 java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3236) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) at org.spark_project.guava.io.ByteStreams.copy(ByteStreams.java:211) at org.spark_project.guava.io.ByteStreams.toByteArray(ByteStreams.java:252) at org.apache.spark.input.WholeTextFileRecordReader.nextKeyValue(WholeTextFileRecordReader.scala:79) at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.nextKeyValue(CombineFileRecordReader.java:65) at org.apache.spark.rdd.NewHadoopRDD$anon$1.hasNext(NewHadoopRDD.scala:182) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:504) at org.apache.spark.api.python.PythonRunner$WriterThread$anonfun$run$3.apply(PythonRDD.scala:328) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1877) at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269) Exception in thread "stdout writer for python2.7" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3236) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) at org.spark_project.guava.io.ByteStreams.copy(ByteStreams.java:211) at org.spark_project.guava.io.ByteStreams.toByteArray(ByteStreams.java:252) at org.apache.spark.input.WholeTextFileRecordReader.nextKeyValue(WholeTextFileRecordReader.scala:79) at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.nextKeyValue(CombineFileRecordReader.java:65) at org.apache.spark.rdd.NewHadoopRDD$anon$1.hasNext(NewHadoopRDD.scala:182) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:504) at org.apache.spark.api.python.PythonRunner$WriterThread$anonfun$run$3.apply(PythonRDD.scala:328) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1877) at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)

hadoopsmi · ‎11-09-2016

i already import import org.apache.spark.sql.hive.orc._ import org.apache.spark.sql._ still i have the same issue i am using HDP 2.3

hadoopsmi · ‎11-08-2016

Hi @Matthieu Lamairesse Error : scala> df.write.format("orc").saveAsTable("default.sample_07_new_schema") <console>:33: error: value write is not a member of org.apache.spark.sql.DataFrame df.write.format("orc").saveAsTable("default.sample_07_new_schema") ^

hadoopsmi · ‎11-04-2016

Is oozie can be installed and running without Hadoop. I refer oozie materials where hadoop is needed. Let's say i have 2 plain java applications.Now I want to chain these 2 java applications in a oozie workflow and want to produce the final json output from the 2nd java code. I don't want to code these 2 java applications in Map/reduce program. They should be plain java code. Please suggest. how to run oozie without hadoop ? is it possible

hadoopsmi · ‎11-02-2016

Hive Table: Orginal table Database Name : Student Tabe name : Student_detail id name dept 1 siva cse Need Output : Database Name : CSE Tabe name : New_tudent_detail s_id s_name s_dept 1 siva cse i want Migrate Student_detail hive table into New_tudent_detail without data lose using spark Different colum name Different database Different table

hadoopsmi · ‎09-22-2016

Hi @Mats Johansson i hv 1 name node and 3 data node cluster , acutualy my data node faild , so remove that data from my clster and add new data node to my cluster . after i ass new node i got WARNING : There are 776885 missing blocks. Please check the logs or run fsck in order to identify the missing blocks so i am remove the corrupte file in my cluster after i excute hdfs fsck / heal The filesystem under path '/' is HEALTHY change good but Under-replicated blocks: 1572982 (95.59069 %) Now problem was hadoop automaticaly rplicate the file one data node another data node 6 per second hadoop dfs -setrep -R -w 3 / excute the cmd it is show replicate the file 24days , i cannot wait for 24days i want accuthe fille and balance replication for the data node dfs.namenode.replication.work.multiplier.per.iteration 2 i dont hv below peroberty dfs.namenode.replication.max-streams dfs.namenode.replication.max-streams-hard-limit i am using hadoop 1.x serice what is the best way to balance my cluster

hadoopsmi · ‎09-22-2016

i execute the cmd hadoop dfs -setrep -R -w 3 / it is work fine ,i have 5,114,551 under replicated blocks its take 24days how do fasly slove that problem

hadoopsmi · ‎03-22-2016

i alrady done that step hive> add "somepath/mongo-hadoop-hive.jar" hive> add "somepath/mongo-hadoop-core.jar"

hadoopsmi · ‎03-22-2016

Jar -> mongo-hadoop-core-1.4.0,mongo-hadoop-hive-1.4.0,mongo-java-driver-2.10.1 hive> CREATE EXTERNAL TABLE minute_bars > ( > > id STRING, > Symbol STRING, > `Timestamp` STRING, > Day INT, > Open DOUBLE, > High DOUBLE, > Low DOUBLE, > Close DOUBLE, > Volume INT > ) > STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler' > WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id", > "Symbol":"Symbol", "Timestamp":"Timestamp", "Day":"Day", "Open":"Open", "High":"High", "Low":"Low", "Close":"Close", "Volume":"Volume"}') > TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/marketdata.minbars'); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. com/mongodb/hadoop/io/BSONWritable hive>

Online	Offline
Last Visited	‎01-18-2018 12:22 PM

Member Since	‎12-09-2015 12:34 PM
Last Visited	‎01-18-2018 12:22 PM
Posts	43
Kudos received	18

Cloudera Community

Re: How to create auto increment key for a table i...

Re: processing 1GB file pyspark in my HDP cluster ...

processing 1GB file pyspark in my HDP cluster jav...

Re: Migrating from one hive table to another hive...

Re: Migrating from one hive table to another hive...

How to schedule plain java programs using oozie wi...

Migrating from one hive table to another hive tab...

Re: How to fix under replicated blocks fasly its t...

How to fix under replicated blocks fasly its take ...

Re: Mongodb with hive : Error, return code 1 from ...

Mongodb with hive : Error, return code 1 from org....