About hadoopsmi

hadoopsmi · ‎01-09-2018

then how to slove that issue,how to process the file and also i try (json_file = sqlContext.read.json('/user/admin/emp/empData.json') its also not work same issue only come

KuldeepK · ‎11-04-2016

@Sivasaravanakumar K As mentioned by @Geoffrey Shelton Okot, you can install minimal version of Hadoop to get Oozie working! For java thing. it's not compulsory to write mapreduce code. You can write your code as per your requirements --> Keep commands to run java code in a simple shell script --> execute that shell script via Oozie using shell action. Hope this information helps!

mlamairesse · ‎11-09-2016

Hi @Sivasaravanakumar K The write function was implemented in 1.4.1... Try simply : df.saveAsTable("default.sample_07_new_schema") It will be saved as Parquet (default format for Spark)

hadoopsmi · ‎09-22-2016

Hi @Mats Johansson i hv 1 name node and 3 data node cluster , acutualy my data node faild , so remove that data from my clster and add new data node to my cluster . after i ass new node i got WARNING : There are 776885 missing blocks. Please check the logs or run fsck in order to identify the missing blocks so i am remove the corrupte file in my cluster after i excute hdfs fsck / heal The filesystem under path '/' is HEALTHY change good but Under-replicated blocks: 1572982 (95.59069 %) Now problem was hadoop automaticaly rplicate the file one data node another data node 6 per second hadoop dfs -setrep -R -w 3 / excute the cmd it is show replicate the file 24days , i cannot wait for 24days i want accuthe fille and balance replication for the data node dfs.namenode.replication.work.multiplier.per.iteration 2 i dont hv below peroberty dfs.namenode.replication.max-streams dfs.namenode.replication.max-streams-hard-limit i am using hadoop 1.x serice what is the best way to balance my cluster

aervits · ‎03-30-2016

I got it to work with the following in my repo I linked earlier hdfs dfs -put drivers/* /tmp/udfs beeline !connect jdbc:hive2://localhost:10000 “” ”” add jar hdfs://sandbox.hortonworks.com:8020/tmp/udfs/mongo-hadoop-hive-1.5.0-SNAPSHOT.jar; add jar hdfs://sandbox.hortonworks.com:8020/tmp/udfs/mongo-hadoop-core-1.5.0-SNAPSHOT.jar; add jar hdfs://sandbox.hortonworks.com:8020/tmp/udfs/mongodb-driver-3.0.4.jar; DROP TABLE IF EXISTS bars; CREATE EXTERNAL TABLE bars ( objectid STRING, Symbol STRING, TS STRING, Day INT, Open DOUBLE, High DOUBLE, Low DOUBLE, Close DOUBLE, Volume INT ) STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler' WITH SERDEPROPERTIES('mongo.columns.mapping'='{"objectid":"_id", "Symbol":"Symbol", "TS":"Timestamp", "Day":"Day", "Open":"Open", "High":"High", "Low":"Low", "Close":"Close", "Volume":"Volume"}') TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/marketdata.minibars');

savasarala · ‎07-08-2016

I have got similar problem. hadoop jar HBaseBulkLoader.jar HBaseBulkLoadDriver ../flume/data/MY_SCHEMA.TAB_BL_10C op1 WARNING: Use "yarn jar" to launch YARN applications. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration at HBaseBulkLoadDriver.main(HBaseBulkLoadDriver.java:31) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) It says HBaseConfiguration.class not found. I have tried by setting the following environmental variables also. # echo $LIBJARS /usr/hdp/2.4.0.0-169/hadoop/lib/*.jar,/usr/hdp/2.4.0.0-169/hbase/lib/*.jar # echo $HADOOP_TASKTRACKER_OPTS/usr/hdp/2.4.0.0-169/hadoop/lib:/usr/hdp/2.4.0.0-169/hbase/lib # echo $HADOOP_CLASSPATH /usr/hdp/2.4.0.0-169/hadoop/lib:/usr/hdp/2.4.0.0-169/hbase/lib # echo $CLASSPATH /usr/hdp/2.4.0.0-169/flume/lib:/usr/hdp/2.4.0.0-169/hbase/lib I have tried the following also. hadoop jar HBaseBulkLoader.jar HBaseBulkLoadDriver -D mapred.chold.env="/usr/hdp/2.4.0.0-169/hbase/lib/" ../flume/data/MY_SCHEMA.TAB_BL_10C op1 hadoop jar HBaseBulkLoader.jar HBaseBulkLoadDriver -libjars /usr/hdp/2.4.0.0-169/hbase/lib/hbase-common.jar ../flume/data/MY_SCHEMA.TAB_BL_10C op1.txt Its all the same error. Can anybody help what is wrong with my approach?

nsabharwal · ‎02-29-2016

@sivasaravanakumar k http://nutch.apache.org/ Recommender: Apache Hadoop 2.5.2 I highly recommend to take a look on this http://stackoverflow.com/questions/4269632/an-alternative-web-crawler-to-nutch Nutch tutorial http://cs.boisestate.edu/~amit/research/nutch/Nutch-Hadoop-Cluster-Howto.html

shishir_saxena4 · ‎02-18-2016

@sivasaravanakumar k Another way of doing incremental import can be is to do each incremental as a separate partition of Hive table. First create an external partitioned table CREATE EXTERNAL TABLE h2 (id int, name STRING, ts TIMESTAMP) PARTITIONED BY (pt=string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '/user/it1/sqin5'; Then sqoop data to external partitions by specifying HDFS external location. sqoop import--connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --where "where id < 5000" --target-dir /user/it1/sqin5/pt=0 -m 1 Add partition to Hive table alter table h2 add PARTITION(pt=0) LOCATION '/user/it1/sqin5/pt=0'; Verify table count. It should return 5000 rows. Now, run an incremental sqoop by specifying appropriate where clause sqoop import--connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --where "where id > 5000 and id < 10000" --target-dir /user/it1/sqin5/pt=1 -m 1 Add 2nd partition to Hive table. alter table h2 add PARTITION(pt=1) LOCATION '/user/it1/sqin5/pt=1'; Verify table count. It should return 10,000 rows.

ravikirandasar1 · ‎01-24-2018

We will import updated row and already we imported that row in earlier import.so now we will have those 2 rows,how can we avoid this ?

hadoopsmi · ‎01-25-2016

@Paul Boali just want d3 for ad hoc data visualization

Online	Offline
Last Visited	‎01-18-2018 12:22 PM

Member Since	‎12-09-2015 12:34 PM
Last Visited	‎01-18-2018 12:22 PM
Posts	43
Kudos received	18

Cloudera Community

Re: How to create auto increment key for a table i...

Re: processing 1GB file pyspark in my HDP cluster ...

Re: How to schedule plain java programs using oozi...

Re: Migrating from one hive table to another hive...

Re: How to fix under replicated blocks fasly its t...

Re: Mongodb with hive : Error, return code 1 from ...

Re: Exception in thread “main” java.lang.NoClassDe...

Re: nutch web crawling using hbase in hortonworks

Re: sqoop incremental import in hive i get error m...

Re: sqoop incremental import working fine ,now i w...

Re: how to visualize hortonworks hive table data i...