Member since
12-09-2015
43
Posts
18
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
9250 | 12-17-2015 07:27 AM |
01-09-2018
06:42 AM
then how to slove that issue,how to process the file and also i try (json_file = sqlContext.read.json('/user/admin/emp/empData.json') its also not work same issue only come
... View more
01-08-2018
10:14 AM
$pyspark
$json_file = sqlContext.read.json(sc.wholeTextFiles('/user/admin/emp/*').values())
18/01/08 15:34:36 ERROR Utils: Uncaught exception in thread stdout writer for python2.7
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at org.spark_project.guava.io.ByteStreams.copy(ByteStreams.java:211)
at org.spark_project.guava.io.ByteStreams.toByteArray(ByteStreams.java:252)
at org.apache.spark.input.WholeTextFileRecordReader.nextKeyValue(WholeTextFileRecordReader.scala:79)
at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.nextKeyValue(CombineFileRecordReader.java:65)
at org.apache.spark.rdd.NewHadoopRDD$anon$1.hasNext(NewHadoopRDD.scala:182)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:504)
at org.apache.spark.api.python.PythonRunner$WriterThread$anonfun$run$3.apply(PythonRDD.scala:328)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1877)
at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)
Exception in thread "stdout writer for python2.7" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at org.spark_project.guava.io.ByteStreams.copy(ByteStreams.java:211)
at org.spark_project.guava.io.ByteStreams.toByteArray(ByteStreams.java:252)
at org.apache.spark.input.WholeTextFileRecordReader.nextKeyValue(WholeTextFileRecordReader.scala:79)
at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.nextKeyValue(CombineFileRecordReader.java:65)
at org.apache.spark.rdd.NewHadoopRDD$anon$1.hasNext(NewHadoopRDD.scala:182)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:504)
at org.apache.spark.api.python.PythonRunner$WriterThread$anonfun$run$3.apply(PythonRDD.scala:328)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1877)
at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)
... View more
Labels:
- Labels:
-
Apache Spark
07-13-2017
02:50 PM
pls try that [root@sandbox ~]# ls /usr/share/java/mysql-connector-java.jar
/usr/share/java/mysql-connector-java.jar
[root@sandbox ~]# spark-shell --jars /usr/share/java/mysql-connector-java.jar
... View more
11-09-2016
06:11 AM
i already import import org.apache.spark.sql.hive.orc._
import org.apache.spark.sql._ still i have the same issue i am using HDP 2.3
... View more
11-08-2016
07:29 AM
Hi @Matthieu Lamairesse Error : scala> df.write.format("orc").saveAsTable("default.sample_07_new_schema") <console>:33: error: value write is not a member of org.apache.spark.sql.DataFrame df.write.format("orc").saveAsTable("default.sample_07_new_schema") ^
... View more
11-04-2016
02:12 PM
Is oozie can be installed and running without Hadoop. I refer oozie materials where hadoop is needed. Let's say i have 2 plain java applications.Now I want to chain these 2 java applications in a oozie workflow and want to produce the final json output from the 2nd java code. I don't want to code these 2 java applications in Map/reduce program. They should be plain java code. Please suggest. how to run oozie without hadoop ? is it possible
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Kafka
-
Apache Oozie
11-02-2016
01:29 PM
Hive Table: Orginal table Database Name : Student Tabe name : Student_detail id name dept 1 siva cse Need Output : Database Name : CSE Tabe name : New_tudent_detail s_id s_name s_dept 1 siva cse i want Migrate Student_detail hive table into New_tudent_detail without data lose using spark
Different colum name Different database Different table
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
09-22-2016
03:19 PM
Hi @Mats Johansson
i hv 1 name node and 3 data node cluster , acutualy my data node faild , so remove that data from my clster and add new data node to my cluster .
after i ass new node i got WARNING : There are 776885 missing blocks. Please check the logs or run fsck in order to identify the missing blocks
so i am remove the corrupte file in my cluster after i excute hdfs fsck / heal The filesystem under path '/' is HEALTHY change good but Under-replicated blocks: 1572982 (95.59069 %) Now problem was hadoop automaticaly rplicate the file one data node another data node 6 per second hadoop dfs -setrep -R -w 3 / excute the cmd it is show replicate the file 24days , i cannot wait for 24days i want accuthe fille and balance replication for the data node dfs.namenode.replication.work.multiplier.per.iteration 2 i dont hv below peroberty dfs.namenode.replication.max-streams dfs.namenode.replication.max-streams-hard-limit i am using hadoop 1.x serice what is the best way to balance my cluster
... View more
09-22-2016
05:11 AM
i execute the cmd hadoop dfs -setrep -R -w 3 / it is work fine ,i have 5,114,551 under replicated blocks its take 24days how do fasly slove that problem
... View more
Labels:
- Labels:
-
Apache Hadoop
03-22-2016
10:56 AM
i alrady done that step hive> add "somepath/mongo-hadoop-hive.jar" hive> add "somepath/mongo-hadoop-core.jar"
... View more
03-22-2016
07:28 AM
2 Kudos
Jar -> mongo-hadoop-core-1.4.0,mongo-hadoop-hive-1.4.0,mongo-java-driver-2.10.1 hive> CREATE EXTERNAL TABLE minute_bars
> (
>
> id STRING,
> Symbol STRING,
> `Timestamp` STRING,
> Day INT,
> Open DOUBLE,
> High DOUBLE,
> Low DOUBLE,
> Close DOUBLE,
> Volume INT
> )
> STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
> WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id",
> "Symbol":"Symbol", "Timestamp":"Timestamp", "Day":"Day", "Open":"Open", "High":"High", "Low":"Low", "Close":"Close", "Volume":"Volume"}')
> TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/marketdata.minbars');
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. com/mongodb/hadoop/io/BSONWritable
hive>
... View more
Labels:
- Labels:
-
Apache Hive
03-18-2016
02:40 PM
HDP 2.3.0 hadoop 2.7.1 Hbase 1.1.1 i already include below dependencies jar /usr/hdp/2.3.0.0-2557/hbase/lib/
... View more
03-18-2016
02:33 PM
1 Kudo
hadoop jar /root/hbase.jar MyfirstHBaseTable
WARNING: Use "yarn jar" to launch YARN applications.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 4 more import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.hbase.HBaseConfiguration;import org.apache.hadoop.hbase.HColumnDescriptor;import org.apache.hadoop.hbase.HTableDescriptor;import org.apache.hadoop.hbase.client.HBaseAdmin;publicclassMyfirstHBaseTable{publicstaticvoid main(String[] args)throwsIOException{HBaseConfiguration hconfig =newHBaseConfiguration(newConfiguration());HTableDescriptor htable =newHTableDescriptor("User");
htable.addFamily(newHColumnDescriptor("Id"));
htable.addFamily(newHColumnDescriptor("Name"));System.out.println("Connecting...");HBaseAdmin hbase_admin =newHBaseAdmin( hconfig );System.out.println("Creating Table...");
hbase_admin.createTable( htable );System.out.println("Done!");}}
... View more
Labels:
- Labels:
-
Apache HBase
02-29-2016
06:39 AM
1 Kudo
i got this error message [root@sandbox ~]# bin/nutch fetch 1456727546-2019589981 Exception in thread "main" java.lang.RuntimeException: job failed: name=apache-nutch-2.3.1.jar, jobid=job_local522155708_0001 at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120) at org.apache.nutch.fetcher.FetcherJob.run(FetcherJob.java:205) at org.apache.nutch.fetcher.FetcherJob.fetch(FetcherJob.java:251) at org.apache.nutch.fetcher.FetcherJob.run(FetcherJob.java:314) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.nutch.fetcher.FetcherJob.main(FetcherJob.java:322)
... View more
02-29-2016
04:36 AM
2 Kudos
i want crawl the web urls information using nutch and store the data in hbase db. any one can suggest for how to do this with some example. bcoz i am new one for nutch.
... View more
Labels:
- Labels:
-
Apache HBase
01-25-2016
07:51 AM
i just want know how to upgrade and downgrade
... View more
01-25-2016
07:49 AM
@Paul Boali just want d3 for ad hoc data visualization
... View more
01-25-2016
04:41 AM
i got this worning message Append mode for hive imports is not yet supported. Please remove the parameter --append-mode
... View more
01-25-2016
04:39 AM
thanks @ Predrag Minovic and also now i find solution please change append sqoop job --create incjob --import--connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --incremental append -check-column ts --target-dir sqin -m 1 --merge-key id --last-value 0
... View more
01-23-2016
07:14 AM
1 Kudo
sqoop incremental import i got this below error ,how can i solve that i am using hortonworks 2.3 Append mode for hive imports is not yet supported. Please remove the parameter --append-mode
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Sqoop
01-22-2016
05:22 AM
Now i am using time lastmodified in this case , full data move into the hbase , i want only move new record ,how can i do that
... View more
01-20-2016
12:34 PM
i dont have time stamp in my table ,i want know how do with without DATE format
... View more
01-20-2016
06:24 AM
thnks ,but i want how to with sqoop incremental import metadata,update column value into hive table
... View more
01-20-2016
05:07 AM
1 Kudo
Mysql table --------------------------- no | student name | dept 1 | siva | IT 2 | raj | cse now i create sqoop incremental JOB data move into hive table (sqoop job --exec student_info) hive table ----------------- no | student name | dept 1 | siva | IT 2 | raj | cse working fine . now i update mysql Table Column value ( dept ) IT -> EEE IN ID 1 Mysql Table --------------------- no | student name | dept 1 | siva | EEE now i again run the sqoop increment import job (sqoop job --exec student_info) IT Show that message 16/01/20 04:41:42 INFO tool.ImportTool: Incremental import based on column `id`
16/01/20 04:41:42 INFO tool.ImportTool: No new rows detected since last import. [root@sandbox ~] data not move into hive table i want know how to move update value move into hive table (or) if not possible means how to move to NOSQL (HBASE) tabe
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Hive
-
Apache Sqoop
01-07-2016
05:37 AM
1 Kudo
i want visualize hive table data into d3.js ,i dont know how to connect d3.js and hive table , any one help me and also if u know any opensource visualzeation tools inform me
... View more
Labels:
- Labels:
-
Apache Hive
12-29-2015
07:13 PM
1 Kudo
throw ambari or if u know any curl command please send me ,i want know how to upgrade hive version in hortonworks and also it is make any issues in future
... View more
Labels:
12-29-2015
04:26 AM
i already have that jar file in my hortonworks,i want know how to compile and execute the file on that path
... View more
12-28-2015
03:33 PM
i want execute the below program in hortonworks some one help me import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;
public class HiveJdbcClient {
private staticString driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";
public static void main(String[] args) throws SQLException {
try {
Class.forName(driverName);
} catch(ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
System.exit(1);
}
Connection con = DriverManager.getConnection("jdbc:hive://localhost:10000/default", "", "");
Statement stmt = con.createStatement();
String tableName = "testHiveDriverTable";
stmt.executeQuery("drop table " + tableName);
ResultSet res = stmt.executeQuery("create table "+ tableName + " (key int, value string)");
// show tables
String sql = "show tables '"+ tableName + "'";
System.out.println("Running: " + sql);
res = stmt.executeQuery(sql);
if(res.next()) {
System.out.println(res.getString(1));
}
// describe table
sql = "describe " + tableName;
System.out.println("Running: " + sql);
res = stmt.executeQuery(sql);
while (res.next()) {
System.out.println(res.getString(1) + "\t" + res.getString(2));
}
// load data into table
// NOTE: filepath has to be local to the hive server
// NOTE: /tmp/a.txt is a ctrl-A separated file with two fields per line
String filepath = "/tmp/a.txt";sql = "load data local inpath '" + filepath + "' into table " + tableName;
System.out.println("Running: "+ sql);
res = stmt.executeQuery(sql);
// select * query sql = "select * from " + tableName;
System.out.println("Running: " + sql);
res = stmt.executeQuery(sql);
while (res.next()) {
System.out.println(String.valueOf(res.getInt(1)) + "\t" + res.getString(2));
}
// regular hive query
sql = "select count(1) from " + tableName;
System.out.println("Running: " + sql);
res = stmt.executeQuery(sql);
while (res.next()) {
System.out.println(res.getString(1));
}
}
}
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
12-17-2015
07:27 AM
2 Kudos
Use the reflect UDF to generate UUIDs. reflect("java.util.UUID", "randomUUID")
... View more
12-09-2015
02:26 PM
2 Kudos
Thnks lot but in ur comment minor correction (-put (space) - (space) after given file path ) hadoop fs -text /hdfs_path/compressed_file.gz | hadoop fs -put - /hdfs_path/uncompressed-file.txt
... View more