Member since
12-09-2015
43
Posts
18
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
13262 | 12-17-2015 07:27 AM |
01-09-2018
06:42 AM
then how to slove that issue,how to process the file and also i try (json_file = sqlContext.read.json('/user/admin/emp/empData.json') its also not work same issue only come
... View more
11-04-2016
03:12 PM
5 Kudos
@Sivasaravanakumar K As mentioned by @Geoffrey Shelton Okot, you can install minimal version of Hadoop to get Oozie working! For java thing. it's not compulsory to write mapreduce code. You can write your code as per your requirements --> Keep commands to run java code in a simple shell script --> execute that shell script via Oozie using shell action. Hope this information helps!
... View more
11-09-2016
06:19 PM
Hi @Sivasaravanakumar K The write function was implemented in 1.4.1... Try simply : df.saveAsTable("default.sample_07_new_schema")
It will be saved as Parquet (default format for Spark)
... View more
09-22-2016
03:19 PM
Hi @Mats Johansson
i hv 1 name node and 3 data node cluster , acutualy my data node faild , so remove that data from my clster and add new data node to my cluster .
after i ass new node i got WARNING : There are 776885 missing blocks. Please check the logs or run fsck in order to identify the missing blocks
so i am remove the corrupte file in my cluster after i excute hdfs fsck / heal The filesystem under path '/' is HEALTHY change good but Under-replicated blocks: 1572982 (95.59069 %) Now problem was hadoop automaticaly rplicate the file one data node another data node 6 per second hadoop dfs -setrep -R -w 3 / excute the cmd it is show replicate the file 24days , i cannot wait for 24days i want accuthe fille and balance replication for the data node dfs.namenode.replication.work.multiplier.per.iteration 2 i dont hv below peroberty dfs.namenode.replication.max-streams dfs.namenode.replication.max-streams-hard-limit i am using hadoop 1.x serice what is the best way to balance my cluster
... View more
03-30-2016
11:40 PM
I got it to work with the following in my repo I linked earlier hdfs dfs -put drivers/* /tmp/udfs
beeline
!connect jdbc:hive2://localhost:10000 “” ””
add jar hdfs://sandbox.hortonworks.com:8020/tmp/udfs/mongo-hadoop-hive-1.5.0-SNAPSHOT.jar;
add jar hdfs://sandbox.hortonworks.com:8020/tmp/udfs/mongo-hadoop-core-1.5.0-SNAPSHOT.jar;
add jar hdfs://sandbox.hortonworks.com:8020/tmp/udfs/mongodb-driver-3.0.4.jar;
DROP TABLE IF EXISTS bars;
CREATE EXTERNAL TABLE bars
(
objectid STRING,
Symbol STRING,
TS STRING,
Day INT,
Open DOUBLE,
High DOUBLE,
Low DOUBLE,
Close DOUBLE,
Volume INT
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"objectid":"_id",
"Symbol":"Symbol", "TS":"Timestamp", "Day":"Day", "Open":"Open", "High":"High", "Low":"Low", "Close":"Close", "Volume":"Volume"}')
TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/marketdata.minibars');
... View more
07-08-2016
02:06 PM
1 Kudo
I have got similar problem. hadoop jar HBaseBulkLoader.jar HBaseBulkLoadDriver ../flume/data/MY_SCHEMA.TAB_BL_10C op1
WARNING: Use "yarn jar" to launch YARN applications.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
at HBaseBulkLoadDriver.main(HBaseBulkLoadDriver.java:31)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358) It says HBaseConfiguration.class not found. I have tried by setting the following environmental variables also. # echo $LIBJARS
/usr/hdp/2.4.0.0-169/hadoop/lib/*.jar,/usr/hdp/2.4.0.0-169/hbase/lib/*.jar # echo $HADOOP_TASKTRACKER_OPTS/usr/hdp/2.4.0.0-169/hadoop/lib:/usr/hdp/2.4.0.0-169/hbase/lib # echo $HADOOP_CLASSPATH
/usr/hdp/2.4.0.0-169/hadoop/lib:/usr/hdp/2.4.0.0-169/hbase/lib # echo $CLASSPATH
/usr/hdp/2.4.0.0-169/flume/lib:/usr/hdp/2.4.0.0-169/hbase/lib I have tried the following also. hadoop jar HBaseBulkLoader.jar HBaseBulkLoadDriver -D mapred.chold.env="/usr/hdp/2.4.0.0-169/hbase/lib/" ../flume/data/MY_SCHEMA.TAB_BL_10C op1 hadoop jar HBaseBulkLoader.jar HBaseBulkLoadDriver -libjars /usr/hdp/2.4.0.0-169/hbase/lib/hbase-common.jar ../flume/data/MY_SCHEMA.TAB_BL_10C op1.txt Its all the same error. Can anybody help what is wrong with my approach?
... View more
02-29-2016
07:49 AM
1 Kudo
@sivasaravanakumar k http://nutch.apache.org/ Recommender: Apache Hadoop 2.5.2 I highly recommend to take a look on this http://stackoverflow.com/questions/4269632/an-alternative-web-crawler-to-nutch Nutch tutorial http://cs.boisestate.edu/~amit/research/nutch/Nutch-Hadoop-Cluster-Howto.html
... View more
02-18-2016
01:31 PM
@sivasaravanakumar k Another way of doing incremental import can be is to do each incremental as a separate partition of Hive table. First create an external partitioned table CREATE EXTERNAL TABLE h2 (id int, name STRING, ts TIMESTAMP) PARTITIONED BY (pt=string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '/user/it1/sqin5'; Then sqoop data to external partitions by specifying HDFS external location. sqoop import--connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --where "where id < 5000" --target-dir /user/it1/sqin5/pt=0 -m 1 Add partition to Hive table alter table h2 add PARTITION(pt=0) LOCATION '/user/it1/sqin5/pt=0'; Verify table count. It should return 5000 rows. Now, run an incremental sqoop by specifying appropriate where clause sqoop import--connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --where "where id > 5000 and id < 10000" --target-dir /user/it1/sqin5/pt=1 -m 1 Add 2nd partition to Hive table. alter table h2 add PARTITION(pt=1) LOCATION '/user/it1/sqin5/pt=1'; Verify table count. It should return 10,000 rows.
... View more
01-24-2018
08:23 AM
We will import updated row and already we imported that row in earlier import.so now we will have those 2 rows,how can we avoid this ?
... View more
01-25-2016
07:49 AM
@Paul Boali just want d3 for ad hoc data visualization
... View more