Member since
05-22-2018
69
Posts
1
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1321 | 06-07-2018 05:33 AM | |
169 | 05-30-2018 06:30 AM |
03-27-2020
04:10 AM
Hi All,
I am using Hortonworks Sandbox in my local system. I want to connect my HBase Datasets to Power BI. Does anyone try to connect Power BI to Apache HBase ??
I have download CData HBase ODBC Driver and Simba HBase ODBC Driver with SQL Connector.
But I want Hortornworks ODBC driver like Hive ODBC Driver(if exists). I searched for that but I didn't find any specific links.
Thanks,
Jay
... View more
- Tags:
- HBase
03-05-2020
04:31 AM
Hi All,
I want to use trigger in Hive table,
There is one table in which I have created ModifiedDate and CreatedDate. On the ModifiedDate, I crave to use the trigger.
i.e. If my record get the update and It points to ModifedDate and will change as SystemDate in hive table as How MsSQL does.
I have created a Transactional table for it.
Thanks,
Jay
... View more
- Tags:
- Hive
- Hive table
- mssql
Labels:
03-03-2020
01:39 AM
@HadoopHelp as I didn't yet use shell script more, could you please provide me a sample shell-script for that ??
... View more
03-02-2020
10:23 PM
Hi all,
I want to load only new files from the Apache Pig LOAD statement. I have 25 files in HDFS directory and on other days I upload another few files in the same directory of HDFS. But now I just want to load only those files which are still unloaded (New files) in the Pig.
How could I achieve this?
Thanks,
Jay.
... View more
Labels:
02-27-2020
02:22 AM
Hi All,
I am sunning Oozie job on the local sandbox.
I am stuck at the point, where I am executing Hive-job. Below is my flow,
start --> shell-action --> Pig-action --> Hive-action --> shell-action --> Hive-action --> end.
The problem is, my whole boh get executed until last hive job. I have looked into yarn logs, and I come to know Tez session not started yet. And my job still in running mode after almost 45mins.
Why this Tez session taking too much time to start ??
I tried this hive-action in hive-shell without Oozie workflow.
Logs:
INFO : Executing command(queryId=hive_xxxxxx_xxxxx_xxxx_xxx): < query_line1>. . . <query_line_n> INFO : Query ID = hive_xxxxxx_xxxxx_xxxx_xxx INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Subscribed to counters: [] for queryId: hive_xxxxxx_xxxxx_xxxx_xxx INFO : Tez session hasn't been created yet. Opening session End of LogType:stderr.This log file belongs to a running container (container_e03_xxxxxx_xxxx_xxx) and so may not be complete.
please kindly go through it and help me to resolve this issue.
Thanks,
Jay
... View more
Labels:
12-09-2019
06:46 AM
Hi,
I am running the HDP sandbox. I submitted the Oozie job, in which I am load data from one hive table to another. I have three actions in Oozie job. PIg-action, Shell-action, and hive-action.
Whenever I submitted Oozie job it will execute until shell action, but while loading data from one hive table to another hive action stuck at 'Running' state.
As I have watched on yarn logs. I have noticed that hive-action stuck because of the Tez engine.
Tez session hasn't been created yet. Opening session for a very long time.
Actually, nothing happened even after one hour.
And I have followed below links,
https://community.cloudera.com/t5/Support-Questions/Tez-session-hasn-t-been-created-yet-Opening-session/td-p/112954
https://community.cloudera.com/t5/Support-Questions/Cannot-Disable-Tez-with-Hive-on-HDP3-0/td-p/186044
And I noticed that Hadoop 3.0 only supports Tez engine as default and there is require no change to execute on Tez instead of MapReduce.
Thanks,
Jay.
... View more
12-05-2019
06:25 AM
Hi,
I am using Hortonworks cloudbreak on Azure. I want to run pig job from Oozie but when a job enters into the RUNNING state it throws below error message and stuck in RUNNING state,
Can not find the logs for the application: application_xxx_1113 with the appOwner: hdfs
I run Oozie job as a hdfs user and the logs directory hdfs:///app-logs/hdfs/logs/ has all privileges. When I run the same pig script using 'pig -x tez script.pig' then it run successfully but when I run through Oozie workflow it throws the above error.
I go through https://community.cloudera.com/t5/Support-Questions/File-app-logs-centos-logs-ifile-application-1525529485402/td-p/184726 but the error didn't resolve.
Regards,
Jay.
... View more
Labels:
12-05-2019
04:54 AM
I am using Hortonworks cloudbreak on Azure. I want to run pig job from Oozie but when a job enters into the RUNNING state it throws below error message and stuck in RUNNING state, Can not find the logs for the application: application_xxx_1113 with the appOwner: hdfs I run Oozie job as a hdfs user and the logs directory hdfs:///app-logs/hdfs/logs/ has all privileges. When I run the same pig script using 'pig -x tez script.pig' then it run successfully but when I run through Oozie workflow it throws the above error.
... View more
11-27-2019
06:50 AM
Hi all,
I want to find the average of the 'rate' column using scala code in Spark. For that, I have created Dataframe and view then use Spark SQL for queries. When I run a select query using view it gives proper output But when I perform avg and group by using view then it gives no records.
data.txt is a tab-separated file.
abandon - 2 abandoned - 2 abandons - 2
val AFINN = sc . textFile ( "hdfs://sandbox-hdp.hortonworks.com:8020/Input/data.txt" ). map ( x => x . split ( "\t" )). map ( x =>( x ( 0 ). toString , x ( 1 ). toInt ))
//AFINN: org.apache.spark.rdd.RDD[(String, Int)]
val AFINNDF = AFINN . toDF ( "word" , "rate" )
//AFINNDF: org.apache.spark.sql.DataFrame = [word: string, rate: int] AFINNDF . createOrReplaceTempView ( "temp" )
val DF = spark . sql ( "select word,rate from temp" )
//DF: org.apache.spark.sql.DataFrame = [word: string, rate: int] DF . show ()
Output:
+----------+----+
| word | rate |
+----------+----+
| abandon | - 2 |
| abandoned | - 2 |
| abandons | - 2 |
+----------+----+
val DF = spark . sql ( "select word,avg(rate) as rating from temp group by word" )
//DF: org.apache.spark.sql.DataFrame = [word: string, rating: double]
Output:
+----+------+
| word | rating |
+----+------+
+----+------+
How to find avg using Spark SQL queries in scala?
Thanks,
Jay.
... View more
Labels:
11-19-2019
02:05 AM
Hi all,
I am trying to insert data in the Partitioned hive table. As I have already load data in Hive table using Pig script once and Hive created dynamic partitioned under HDFS Directory. I have another file that has different data (but there is data on which I want to partition and that partition already exists) but I want to save those data into the same partitioned table. But Pig is giving me an error that says
Partition already present with given partition key values : Data already exists in hdfs://sandbox-hdp.hortonworks.com:8020/<hive-table-location>/<hive-partition>, duplicate publish not possible.
How could I Insert into the existing partitioned table? Is it possible to insert it? Please kindly help me for the same.
Regards,
Jay.
... View more
Labels:
10-11-2019
04:23 AM
Hi All, I am using HDP 3.0.1. I want to execute only hive-action for Oozie. Following is my workflow and other configuration files. But I am facing issues regarding the "HiveCompatibilityMain". I am unable to solve the ERROR. Please help me out for the same.
workflow.xml
<?xml version="1.0" encoding="UTF-8"?> <workflow-app name="hive-wf" xmlns="uri:oozie:workflow:0.2" > <start to="hive-node"/> <action name="hive-node"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <script>${appPath}/hive.hql</script> </hive> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Hive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
job.properties
nameNode=hdfs://sandbox-hdp.hortonworks.com:8020 jobTracker=sandbox-hdp.hortonworks.com:8050 queueName=default oozie.use.system.libpath=true oozie.wf.application.path=${nameNode}/user/root/demoanalysis/oozie/hive appPath=${oozie.wf.application.path}
hive.hql
CREATE DATABASE IF NOT EXISTS dbtemp; DROP TABLE IF EXISTS dbtemp.demo; CREATE TABLE IF NOT EXISTS dbtemp.demo(id int, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
Error:
2019-10-11 11:05:43,840 INFO CallbackServlet:520 - SERVER[sandbox-hdp.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-191011075519153-oozie-oozi-W] ACTION[0000001-191011075519153-oozie-oozi-W@hive-node] callback for action [0000001-191011075519153-oozie-oozi-W@hive-node] 2019-10-11 11:05:50,050 INFO CoordMaterializeTriggerService$CoordMaterializeTriggerRunnable:520 - SERVER[sandbox-hdp.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] CoordMaterializeTriggerService - Curr Date= 2019-10-11T11:10Z, Num jobs to materialize = 0 2019-10-11 11:05:50,052 INFO CoordMaterializeTriggerService$CoordMaterializeTriggerRunnable:520 - SERVER[sandbox-hdp.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Released lock for [org.apache.oozie.service.CoordMaterializeTriggerService] 2019-10-11 11:06:03,627 INFO HiveCompatibilityActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-wf] JOB[0000001-191011075519153-oozie-oozi-W] ACTION[0000001-191011075519153-oozie-oozi-W@hive-node] Trying to get job [job_1570780157818_0003], attempt [1] 2019-10-11 11:06:05,045 INFO HiveCompatibilityActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-wf] JOB[0000001-191011075519153-oozie-oozi-W] ACTION[0000001-191011075519153-oozie-oozi-W@hive-node] action completed, external ID [job_1570780157818_0003] 2019-10-11 11:06:05,185 WARN HiveCompatibilityActionExecutor:523 - SERVER[sandbox-hdp.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-wf] JOB[0000001-191011075519153-oozie-oozi-W] ACTION[0000001-191011075519153-oozie-oozi-W@hive-node] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.HiveCompatibilityMain], exit code [2] 2019-10-11 11:06:05,281 INFO HiveCompatibilityActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-wf] JOB[0000001-191011075519153-oozie-oozi-W] ACTION[0000001-191011075519153-oozie-oozi-W@hive-node] Action ended with external status [FAILED/KILLED] 2019-10-11 11:06:05,306 INFO ActionEndXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-wf] JOB[0000001-191011075519153-oozie-oozi-W] ACTION[0000001-191011075519153-oozie-oozi-W@hive-node] ERROR is considered as FAILED for SLA 2019-10-11 11:06:05,381 INFO JPAService:520 - SERVER[sandbox-hdp.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-wf] JOB[0000001-191011075519153-oozie-oozi-W] ACTION[0000001-191011075519153-oozie-oozi-W@hive-node] No results found 2019-10-11 11:06:05,528 INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-wf] JOB[0000001-191011075519153-oozie-oozi-W] ACTION[0000001-191011075519153-oozie-oozi-W@fail] Start action [0000001-191011075519153-oozie-oozi-W@fail] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2019-10-11 11:06:05,541 INFO KillActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-wf] JOB[0000001-191011075519153-oozie-oozi-W] ACTION[0000001-191011075519153-oozie-oozi-W@fail] Starting action 2019-10-11 11:06:05,553 INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-wf] JOB[0000001-191011075519153-oozie-oozi-W] ACTION[0000001-191011075519153-oozie-oozi-W@fail] [***0000001-191011075519153-oozie-oozi-W@fail***]Action status=DONE 2019-10-11 11:06:05,561 INFO ActionStartXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-wf] JOB[0000001-191011075519153-oozie-oozi-W] ACTION[0000001-191011075519153-oozie-oozi-W@fail] [***0000001-191011075519153-oozie-oozi-W@fail***]Action updated in DB! 2019-10-11 11:06:05,648 INFO KillActionExecutor:520 - SERVER[sandbox-hdp.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[hive-wf] JOB[0000001-191011075519153-oozie-oozi-W] ACTION[0000001-191011075519153-oozie-oozi-W@fail] Action ended with external status [OK] 2019-10-11 11:06:05,914 INFO WorkflowNotificationXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-191011075519153-oozie-oozi-W] ACTION[0000001-191011075519153-oozie-oozi-W@fail] No Notification URL is defined. Therefore nothing to notify for job 0000001-191011075519153-oozie-oozi-W@fail 2019-10-11 11:06:05,914 INFO WorkflowNotificationXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-191011075519153-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000001-191011075519153-oozie-oozi-W 2019-10-11 11:06:05,915 INFO WorkflowNotificationXCommand:520 - SERVER[sandbox-hdp.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000001-191011075519153-oozie-oozi-W] ACTION[0000001-191011075519153-oozie-oozi-W@hive-node] No Notification URL is defined. Therefore nothing to notify for job 0000001-191011075519153-oozie-oozi-W@hive-node
Regards,
Jay Patel.
... View more
- Tags:
- hive-action
- Oozie
Labels:
04-15-2019
11:41 AM
HI all, I want to pull data from the SFTP server and load into HDFS. Scenario 1: I configured Agent as in, Source as `SFTP server`, SInk as `HDFS` and channel as `File-Channel`. But in this scenario, HDFS sink creates too many small files. I have searched for that. But I didn't get any specific solution for that. so I thought the second scenario. Scenario 2: In this scenario, I have configured two flume agent. Agent1: Source >>> SFTP Server, Sink >>> file_roll , Channel >>> file-channel Agent2: Source >>> SpoolDir, Sink >>> HDFS, Channel >>> file-channel So first of all, I've configured one flume agent named Agent1 which will load data into Hadoop local from SFTP server. Now here the second step, I have configured the second agent name Agent2 which will load data from Hadoop local and store into HDFS. But whenever I started the second agent the main file renamed as .COMPLETE extension. And I am running twos agent simultaneously. Agent1 will load all data into one file (here rollcountInterval = 0) but Agent2 cannot load data from that particular file. Can anybody help me out for this block? Thanks.
... View more
Labels:
11-15-2018
02:05 PM
Hi All, I want to install scala in Linux as in sandbox centOS. I have already downloaded scala-2.10.5 and also create a soft link for it in /us/hdp/current folder. After that, I executed below commands to set PATH for SCALA_HOME. echo "export SCALA_HOME = /usr/hdp/current/scala"
or
echo "export SCALA_HOME = /usr/hdp/2.5.0.0-1245/scala"
echo "export PATH=$PATH:$SCALA_HOME/bin" After that, I jumped at root directory in local and executed below command to execute .bashrc and .bash_profile files. source ~/.bashrc
source ~/.bash_profile I thought that is all. But I couldn't able to go into scala shell from anywhere. Can anyone help me with this? Regards,
... View more
10-25-2018
07:35 AM
@Aditya Sirna Thanks for sharing. But I don't want any java application for implementation. I just want to configure .conf file for practice. Thanks you.
... View more
10-24-2018
07:19 AM
Hi, I want to implement flume for trial. I have seen many blogs that Flume implemented on Twitter data only. As Twitter is not accessible in my area. Is there any real-time or streaming source that can be implemented on Flume? Thanks, Jay.
... View more
Labels:
08-03-2018
06:41 AM
Hi all, I am new to Druid, I have too many running tasks in Coordinator. So now unable to submit more tasks in Druid. I tried to shutdown task by executing below command; curl -X POST http://xxx-xx-xxx-xxx-x.compute-1.amazonaws.com:8081/druid/indexer/v1/task/<taskID>/shutdown But somewhere I don't know, there is too many tasks are running, and I don't know many are remain to stop. So I want to know, is there any way to delete/shutdown all task at once? Regards, Jay.
... View more
Labels:
07-31-2018
06:23 AM
Hi All, I want to perform incremental import using the Sqoop command. I have following data into MySQL table. mysql> select * from data;
+------+------+------------+
| id | name | join_date |
+------+------+------------+
| 1 | aaa | 2017-11-26 |
| 2 | bbb | 2018-03-06 |
| 3 | ccc | 2018-04-25 |
+------+------+------------+
3 rows in set (0.02 sec)
I have imported these MySQL data into hive table. hive> select * from datahive;
OK
1 aaa 2017-11-26
2 bbb 2017-11-26
3 ccc 2018-04-25
Time taken: 3.876 seconds, Fetched: 3 row(s)
hive>
Now, I am updating one record and inserting a new record in MySQL table for incremental Sqoop import. mysql> update data set name="new_update" where id = 1;
mysql> insert into data values (5,"new_insert","2018-06-01");
Sqoop command; sqoop import \
--connect "jdbc:mysql://sandbox.hortonworks.com:3306/sqltempdb" \
--username root \
--password hadoop \
--table data \
--hive-import \
--hive-table datahive \
--fields-terminated-by "," \
-m 1 \
--incremental lastmodified \
--check-column join_date \
--last-value "2017-01-01" By executing the above Sqoop command I am getting the following records; hive> select * from datahive;
OK
1 aaa 2017-11-26
2 bbb 2017-11-26
3 ccc 2018-04-25
1 new_update 2017-11-26
2 bbb 2017-11-26
3 ccc 2018-04-25
5 new_insert 2018-06-01
Time taken: 30.438 seconds, Fetched: 7 row(s)
hive>
But I only want the updated record and newly inserted record into Hive table, I don't want to append all data with existing data. Required Data into Hive: 1 aaa 2017-11-26
2 bbb 2018-03-06
3 ccc 2018-04-25
1 new_update 2017-11-26
5 new_insert 2018-06-01
Kindly help me to reach out this solution. Regards, Jay.
... View more
Labels:
07-21-2018
01:44 PM
Hi All, I want to load data into druid from Kafka topic. I have created the specification file in JSON format. I have already fetched the data as CSV format into Kafka broker. Sample data: timestamp,open,high,low,close,volume
2018-07-20 05:08:00,1990.8000,1991.5500,1990.8000,1991.0000,1321
2018-07-20 05:07:00,1991.1000,1991.1500,1990.6000,1991.0500,2387
2018-07-20 05:06:00,1991.0000,1991.3000,1991.0000,1991.1000,1776
2018-07-20 05:05:00,1991.7500,1991.8000,1990.5000,1991.0500,5988
2018-07-20 05:04:00,1991.9500,1992.0000,1991.7500,1991.7500,1646
2018-07-20 05:03:00,1992.0000,1992.0500,1991.8500,1991.9500,3272 Now, I want to push this data into Druid datastore named "stockexchange". supervisor.json [ { "dataSource" : [ { "spec" : { "dataSchema" : { "granularitySpec" : { "queryGranularity" : "none", "type" : "uniform", "segmentGranularity" : "hour" }, "dataSource" : "stockexchange", "parser" : { "type" : "string", "parseSpec" : { "format" : "csv", "timestampSpec" : { "format" : "auto", "column" : "timestamp" }, "columns" : ["timestamp","open","high","low","close","volume"], "dimensionsSpec" : { "dimensions" : ["open","high","low","close","volume"] } } }, }, "ioConfig" : { "type" : "realtime" }, "tuningConfig" : { "type" : "realtime", "intermediatePersistPeriod" : "PT10M", "windowPeriod" : "PT10M", "maxRowsInMemory" : 75000 } }, "properties" : { "task.partitions" : "2", "task.replicants" : "2", "topicPattern" : "stockexchange.*", "topicPattern.priority" : "1" } } ], "properties" : { "zookeeper.connect" : "ip-xxx-xx-xxxx-xx.ec2.internal:2181", "zookeeper.timeout" : "PT20S", "druid.selectors.indexing.serviceName" : "druid/overlord", "druid.discovery.curator.path" : "/druid/discovery", "kafka.zookeeper.connect" : "ip-xxx-xx-xxxx-xx.ec2.internal:2181", "kafka.group.id" : "xxxx-xxxxx-xxxx", "consumer.numThreads" : "2", "commit.periodMillis" : "15000", "reportDropsAsExceptions" : "false" } } ] But, when I post data into Druid datastore using following curl command, it throws following error: Command: curl -X POST -H 'Content-Type: application/json' -d @supervisor.json http://ec2-xxx-xx-xxxx-xxxx.compute-1.amazonaws.com:8090/druid/indexer/v1/supervisor Error: {"error":"Unexpected token (START_OBJECT), expected VALUE_STRING: need JSON String that contains type id (for subtype of io.druid.indexing.overlord.supervisor.SupervisorSpec)\n at [Source: HttpInputOverHTTP@1f0c750b; line: 1, column: 2]"} I have searched for this error, but I didn't find any particular solution. Please kindly help me to solve this error. Regards, Jay.
... View more
Labels:
07-16-2018
12:06 PM
Hi all, I am testing Druid for process JSON files. I have loaded JSON files into HDFS, Now I want to store those JSON files into druid for further analytics. I have tried to load data into Druid Datastore but it is not working. I have attached my observation below. I am following this video and referring batch data ingetion for Druid. 1supervisor-spec-fixed.json 2stock-example-data.json And I am executing the following command to start supervisor; curl -X 'POST' -H 'Content-Type:application/json' -d @supervisor.json http://<host>.amazonaws.com:<port>/druid/indexer/v1/task But when I am executing above command, I am facing below error; {"error":"Can not deserialize instance of java.util.ArrayList out of VALUE_STRING token\n at [Source: HttpInputOverHTTP@ba248c9; line: 1, column: 1495]"}[ Please kindly help me. Regards, Jay.
... View more
- Tags:
- aws
- druid
- hortonwork
Labels:
07-12-2018
06:22 AM
@Vinicius Higa Murakami Thanks for the reply, But as per my knowledge, it won't work. Cause I have too many input files in the same folder, how could Sqoop identify which file user wants to export? Regards, Jay.
... View more
07-11-2018
10:59 AM
Hi All, I want to export CSV data into MsSQL using Sqoop. I have created a table in MsSQL which have one auto-increment column named 'ID'. I have one CSV file in HDFS directory. I have executed below Sqoop export command; Sqoop Command: sqoop export --connect 'jdbc:sqlserver://xxx.xxx.xx.xx:xxxx;databasename=<mssql_database_name>'--username xxxx-passwordxxxx--export-dir /user/root/input/data.csv--table <mssql_table_name> I am facing the following error; Error: 18/07/11 10:30:48 INFO mapreduce.Job: map 0% reduce 0%
18/07/11 10:31:12 INFO mapreduce.Job: map 75% reduce 0%
18/07/11 10:31:13 INFO mapreduce.Job: map 100% reduce 0%
18/07/11 10:31:17 INFO mapreduce.Job: Job job_1531283775339_0005 failed with state FAILED due to: Task failed task_1531283775339_0005_m_000003
Job failed as tasks failed. failedMaps:1 failedReduces:0
18/07/11 10:31:18 INFO mapreduce.Job: Counters: 31
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=163061
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=261
HDFS: Number of bytes written=0
HDFS: Number of read operations=7
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Failed map tasks=3
Launched map tasks=4
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=87423
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=87423
Total vcore-milliseconds taken by all map tasks=87423
Total megabyte-milliseconds taken by all map tasks=21855750
Map-Reduce Framework
Map input records=0
Map output records=0
Input split bytes=240
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=346
CPU time spent (ms)=470
Physical memory (bytes) snapshot=106921984
Virtual memory (bytes) snapshot=1924980736
Total committed heap usage (bytes)=39321600
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
18/07/11 10:31:18 INFO mapreduce.ExportJobBase: Transferred 261 bytes in 71.3537 seconds (3.6578 bytes/sec)
18/07/11 10:31:18 INFO mapreduce.ExportJobBase: Exported 0 records.
18/07/11 10:31:18 ERROR mapreduce.ExportJobBase: Export job failed!
18/07/11 10:31:18 ERROR tool.ExportTool: Error during export: Export job failed! Sample Data: abc,1223
abck,1332
abckp,2113 Regards, Jay.
... View more
Labels:
07-09-2018
06:55 AM
Hi, I am new to learning HBase and I am using Hortonworks sandbox on virtual box. I just have open HBase Shell and ran the first simple command to show status but it is giving me following error. hbase(main):010:0> status 'simple'
ERROR: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2402)
at org.apache.hadoop.hbase.master.MasterRpcServices.getClusterStatus(MasterRpcServices.java:778)
at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:57174)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2127)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
at java.lang.Thread.run(Thread.java:745)
Here is some help for this command:
Show cluster status. Can be 'summary', 'simple', 'detailed', or 'replication'. The
default is 'summary'. Examples:
hbase> status
hbase> status 'simple'
hbase> status 'summary'
hbase> status 'detailed'
hbase> status 'replication'
hbase> status 'replication', 'source'
hbase> status 'replication', 'sink'
hbase(main):011:0> I have checked that HBase Master is running well on Ambari. I have restarted Sandbox as well. But I am facing same error. Regards, Jay.
... View more
Labels:
07-06-2018
07:59 AM
@Shu Thank a ton !!! I would like to suggest to use While Loop. Regards, Jay.
... View more
06-22-2018
01:13 PM
Hi @Geoffrey Shelton Okot, Please have a look into my sqoop code. sqoop import --connect jdbc:sqlserver://<HOST>:<PORT>;databasename=<mssql_database_nameMS> --username xxxx --password xxxx --hive-database <hive_database_name> --table <mssql_table1>,<mssql_table2> --hive-import -m 1 Thank you, Jay.
... View more
06-22-2018
11:56 AM
hi @Geoffrey Shelton Okot I tried above solution, but it is also throwing below error; 18/06/22 11:53:59 ERROR manager.SqlManager: Error executing statement: com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name '<ms_SQL_tbalename>,<ms_SQl_Tablename>'. Regards, Jay.
... View more
06-18-2018
10:21 AM
@Felix Albani Hi, I have actually tried both, i.e with the full path of spark-submit and navigate directory and execute there there. But faced same error. Anyways, I have updated my question. Have look into that. Regards, Jay.
... View more
06-16-2018
07:10 AM
Hi All, I have created HDP cluster on AWS. Now I want to execute a spark-submit command using shell action. Spark-submit command is simple, that take input from HDFS and store output in HDFS and .jar file taken from Hadoop local. My spark-submit command is running well on a command line. It can read data and store output on HDFS in a specific directory. And I could also create a script and run on command line, it also worked well. But the problem is while executing oozie workflow for this. script.sh #!/bin/bash
/usr/hdp/current/spark2-client/bin/spark-submit --class org.apache.<main> --master local[2] <jar_file_path> <HDFS_input_path> <HDFS_output_path> job.properties nameNode=hdfs://<HOST>:8020
jobTracker=<HOST>:8050
queueName=default
oozie.wf.application.path=${nameNode}/user/oozie/shelloozie workflow.xml <workflow-app name="ShellAction" xmlns="uri:oozie:workflow:0.3">
<start to='shell-node' />
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>script.sh</exec>
<file>${nameNode}/user/oozie/shelloozie/script.sh#script.sh</file>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
</workflow-app> Anyways, I have checked my yarn log, it is giving me following, I didn't get it what it is explaining. LogType:stderr
Log Upload Time:Sat Jun 16 07:00:47 +0000 2018
LogLength:1721
Log Contents:
Jun 16, 2018 7:00:24 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class
Jun 16, 2018 7:00:24 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class
Jun 16, 2018 7:00:24 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class
Jun 16, 2018 7:00:24 AM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
Jun 16, 2018 7:00:24 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton"
Jun 16, 2018 7:00:25 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton"
Jun 16, 2018 7:00:26 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest"
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
End of LogType:stderr Kindly help me to solve this. Thank You, Jay.
... View more
Labels:
06-16-2018
06:28 AM
@Aditya Sirna Thank you Aditya. Your observation worked for me. NOTE:
echo "`date` hi">/tmp/output ; hdfs dfs -appendToFile <local_directory_path> <hdfs_directory_path> Regards, Jay.
... View more
06-15-2018
11:07 AM
HI All, I have a shell script on HDFS as well as the locally named script.sh contains echo Hi. I could execute the script.sh file on locally and store output on locally of course. But I want to execute script.sh file (wherever on local or on HDFS) and store output on HDFS. I have done following; script.sh #!/bin/bash
echo "`date` hi" > /tmp/output bash script.sh above command ran successfully. but if I changed the output path it is giving me an error that ; script.sh: line 2: hdfs://<host>:<port>/user/oozie/output/shell: No such file or directory #!/bin/bash
echo "`date` hi" > hdfs://<HOST>:<PORT>/user/oozie/output/shell Kindly help me for this. Thank you, Jay.
... View more
Labels:
06-14-2018
07:56 AM
@Felix Albani I tried using putty also. But it also doesn't work for me. Regards, Jay.
... View more