Member since
05-19-2016
93
Posts
17
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1370 | 01-30-2017 07:34 AM | |
831 | 09-14-2016 10:31 AM |
01-27-2017
04:58 AM
@Pranay Vyas This is kerberized cluster. And I am using hive principal and keytab for authentication. All create table and insert data query are working from java api. But for any select query I am getting this authentication error (Authentication failed, status: 403, message: Forbidden). Please help.
... View more
01-24-2017
06:53 AM
@Pranay Vyas Thanks for your quick response. Hive table created successfully by hive CLI . below is the table structure. hive> desc teradata_demo_table;
OK
id int
col_date string
col_time string
col_timestamp string
col_timestamp_tz string
col_time_tz string
col_interval_yr_month string
col_interval_yr string
col_interval_month string
col_interval_day string
col_interval_day_hr string
col_period_date string
col_period_timestamp string
col_byteint tinyint
col_smallint smallint
col_integer int
col_bigint bigint
col_decimal decimal(8,2)
col_float string
col_char char(7)
col_varchar varchar(16384)
Time taken: 0.379 seconds, Fetched: 21 row(s) Full Log: 2017-01-24 16:01:38,840 INFO [HiveServer2-Background-Pool: Thread-10098]: exec.Utilities (Utilities.java:getBaseWork(400)) - PLAN PATH = hdfs://ctsc00691239901.cts.com:8020/tmp/hive/hive/ae98860b-bdbe-45a1-9212-172a3bae152b/hive_2017-01-24_16-01-38_673_8353564872225299216-27/-mr-10003/0cfdbbd9-00ba-4f02-a100-5287bdaedfb8/reduce.xml
2017-01-24 16:01:38,841 INFO [HiveServer2-Background-Pool: Thread-10098]: exec.Utilities (Utilities.java:getBaseWork(466)) - File not found: File does not exist: /tmp/hive/hive/ae98860b-bdbe-45a1-9212-172a3bae152b/hive_2017-01-24_16-01-38_673_8353564872225299216-27/-mr-10003/0cfdbbd9-00ba-4f02-a100-5287bdaedfb8/reduce.xml
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
...
2017-01-24 16:01:38,841 INFO [HiveServer2-Background-Pool: Thread-10098]: exec.Utilities (Utilities.java:getBaseWork(467)) - No plan file found: hdfs://ctsc00691239901.cts.com:8020/tmp/hive/hive/ae98860b-bdbe-45a1-9212-172a3bae152b/hive_2017-01-24_16-01-38_673_8353564872225299216-27/-mr-10003/0cfdbbd9-00ba-4f02-a100-5287bdaedfb8/reduce.xml
2017-01-24 16:01:38,848 INFO [HiveServer2-Background-Pool: Thread-10098]: hdfs.DFSClient (DFSClient.java:getDelegationToken(1047)) - Created HDFS_DELEGATION_TOKEN token 3924 for hive on 10.223.72.129:8020
2017-01-24 16:01:38,857 INFO [HiveServer2-Background-Pool: Thread-10098]: mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(249)) - Cleaning up the staging area /user/hive/.staging/job_1484412561270_0059
2017-01-24 16:01:38,858 ERROR [HiveServer2-Background-Pool: Thread-10098]: exec.Task (SessionState.java:printError(962)) - Job Submission failed with exception 'java.io.IOException(java.lang.reflect.UndeclaredThrowableException)'
java.io.IOException: java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:892)
at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:86)
at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2291)
...
Caused by: java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1727)
at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:874)
... 40 more
Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: Authentication failed, status: 403, message: Forbidden
at org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:274)
at org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:77)
at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128)
at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:214)
...
2017-01-24 16:01:38,860 INFO [ATS Logger 0]: hooks.ATSHook (ATSHook.java:createPostHookEvent(193)) - Received post-hook notification for :hive_20170124160138_1cd543d0-afe0-46ee-95c6-b9fc1d4440f6
2017-01-24 16:01:38,860 ERROR [HiveServer2-Background-Pool: Thread-10098]: ql.Driver (SessionState.java:printError(962)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
...
2017-01-24 16:01:38,861 ERROR [HiveServer2-Background-Pool: Thread-10098]: operation.Operation (SQLOperation.java:run(209)) - Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:315)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:156)
at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
All other commands are running fine. This is simple hive query where I want to store the query result in local directory. Please help.
... View more
01-20-2017
09:31 AM
I am trying to execute below query using java api INSERT OVERWRITE LOCAL DIRECTORY '/home/bigframe/aps/temp' select * from teradata_demo_table And I am getting below error java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296)
at com.bigframe.perl.api.HiveReconTableCreate.doHiveTableCreation(HiveReconTableCreate.java:82)
at com.bigframe.perl.api.HiveReconTableCreate.main(HiveReconTableCreate.java:142) Same query is working from Hive CLI. Also "select * from teradata_demo_table" is working from java API. Please suggest.
... View more
Labels:
01-10-2017
05:12 AM
@Nicola Poffo Can you please suggest how to start? did you read any book?
... View more
12-30-2016
07:10 AM
@Rajeshbabu Chintaguntla Thanks for your quick response . I am not using ImportTsv ti import data. Instead I am using Java map reduce program for import. Is there any otherwat to do this ?
... View more
12-30-2016
06:39 AM
I have one map reduce job by which I can successfully import data from HDFS to HBase. Now I want to perform data validation in Hbase. I want to find out or distinguish good data and bad (corrupt) data with respect to rowkey in Hbase. Can anyone suggest any way to do this ?
... View more
Labels:
11-29-2016
09:52 AM
Can anyone discuss step by step coding for Map reduce job using python with use case ?
... View more
Labels:
11-23-2016
01:32 PM
We can follow steps mentioned in the below link. https://community.hortonworks.com/articles/1283/hive-script-to-validate-tables-compare-one-with-an.html Can anyone any idea other than this ?
... View more
11-23-2016
12:41 PM
I am using sqoop to import data from RDBMS to Hive. I have to validate data after import done. Scoop has --validate option but it is not working for hive as mentioned in sqoop document. Can you please suggest any way so that I can validate the data after import and I can re-import the corrupt data as required.
... View more
Labels:
11-22-2016
05:49 AM
@Greg Keys Thanks for your response. As per sqoop document --validate option does not work with Hive and HBase. What strategy should I take for Hive and HBase ? Please suggest.
... View more
11-21-2016
01:46 PM
We can take below approach for incremental import in Hive or HDFS which is mentioned in Hortonworks document. In this approach we assume that each source table will have a unique single or multi-key identifier and a “modified_date” field is maintained for each record – either defined as part of the original source table or added as part of the ingest process. This approach is based on four main phase as describe below. 1. Ingest: Complete data movement from the operational database (base_table) followed by change or update of changed records only (incremental_table). (Step 1 to Step 4) 2. Reconcile: Create a single view of the base table and change records (reconcile_view) to reflect the latest record set. (Step 5) 3. Compact: Create a reporting table (reporting_table) from the reconciled view. (Step 6) 4. Purge: Replace the base table with the reporting table contents and delete any previously processed change records before the next data ingestion cycle, which is shown in the following diagram. (Step 7 and Step 😎
Below is the sqoop commands for implementation: Step 1 : sqoop import --connection-manager org.apache.sqoop.teradata.TeradataConnManager --connect jdbc:teradata://**.***.***.***/DATABASE=***** --username **** --password **** -table emp2 --target-dir /user/aps/poc/base_table -m 1 Step 2 : CREATE TABLE base_table (emp_id string, emp_name string, mob_no string, create_time string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION "/user/aps/poc/base_table"; Step 3 : sqoop import --connection-manager org.apache.sqoop.teradata.TeradataConnManager --connect jdbc:teradata://**.***.***.***/DATABASE=***** --username **** --password **** --target-dir /user/aps/poc/incremental_table --query "select * from emp2 where emp_id > 12 AND \$CONDITIONS" -m 1 Step 4:
CREATE EXTERNAL TABLE incremental_table (emp_id string, emp_name string, mob_no string, create_time string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION "/user/aps/poc/incremental_table"; Step 5 : CREATE VIEW reconcile_view AS
SELECT t1.* FROM
(SELECT * FROM base_table
UNION ALL
SELECT * FROM incremental_table) t1
JOIN
(SELECT emp_id, max(create_time) max_modified FROM
(SELECT * FROM base_table
UNION ALL
SELECT * FROM incremental_table) t2
GROUP BY emp_id) s
ON t1.emp_id = s.emp_id AND t1.create_time = s.max_modified; Step 6 : DROP TABLE reporting_table; CREATE TABLE reporting_table AS SELECT * FROM reconcile_view; Step 7 : DROP TABLE base_table; CREATE TABLE base_table AS SELECT * FROM reporting_table; Step 8 : hadoop fs -rm -r /user/aps/poc/incremental_table/* The tables and views that will be a part of the Incremental Update Workflow are: base_table: A HIVE Local table that initially holds all records from the source system. After the initial processing cycle, it will maintain a copy of the most up-to-date synchronized record set from the source. At the end of each processing cycle, it is overwritten by the reporting_table .
incremental_table: A HIVE External table that holds the incremental change records (INSERTS and UPDATES) from the source system. At the end of each processing cycle, it is cleared of content . reconcile_view: A HIVE View that combines and reduces the base_table and incremental_table content to show only the most up-to-date records. It is used to populate the reporting_table . reporting_table: A HIVE Local table that holds the most up-to-date records for reporting purposes. It is also used to overwrite the base_table at the end of each processing run. We have to wait each step to complete for going to next step. We can maintain this using Unix shell or oozie. I have checked one case in the above example which is working fine. This approach is a bit complex compare to sqoop incremental option .
... View more
11-17-2016
10:02 AM
1 Kudo
I am using sqoop for data transfer from RDBMS to HDFS, hive and HBase. How can I validate the data after sqoop command execution in HDFS, Hive and HBase? Please suggest. What will be the best strategy for data validation ?
... View more
11-17-2016
06:09 AM
I have below questions. 1. Is sqoop creating sql query internally ? If yes , then How it is getting created and executed for multiple mapper? 2. Is sqoop using any staging node to load the data ? Or is sqoop loading data directly in data node ? How it behaves for different mapper? 3. How sqoop run parallel for multiple mapper ? Please explain with simple architecture.
... View more
Labels:
11-17-2016
04:19 AM
Can anyone help ?
... View more
11-15-2016
04:49 AM
@Vladimir Zlatkin Thanks for your response. Can we apply this four step strategy for HDFS import only? I do not need hive at this moment. Please suggest.
... View more
11-14-2016
11:38 AM
I am trying to run sqoop codegen command but seems it is not working. Sqoop Command : sqoop codegen --connection-manager org.apache.sqoop.teradata.TeradataConnManager --connect jdbc:teradata://**.***.***.***/DATABASE=***** --username ***** --password ***** -table emp2 --outdir /home/bigframe/aps/ Log: 16/11/14 17:03:44 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258
16/11/14 17:03:44 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/11/14 17:03:44 INFO manager.SqlManager: Using default fetchSize of 1000
16/11/14 17:03:44 INFO tool.CodeGenTool: The connection manager declares that it self manages mapping between records & fields and rows & columns. No class will will be generated. Please help.
... View more
Labels:
11-14-2016
08:52 AM
@jss Thank you for your quick response . If I change to different directory How can I marge 2 sets of data in different directory ?
... View more
11-14-2016
07:55 AM
Is an incremental import using sqoop query support loading data in same directory? I am getting below error Caused by: com.teradata.connector.common.exception.ConnectorException: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /user/aps/incr already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at com.teradata.connector.common.ConnectorOutputFormat.checkOutputSpecs(ConnectorOutputFormat.java:47)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139) Sqoop Command I am running : sqoop import --connection-manager org.apache.sqoop.teradata.TeradataConnManager --connect jdbc:teradata://**.***.***.***/DATABASE=**** --username **** --password **** --target-dir /user/aps/incr --query "select * from EMP2 where CREATE_TIME > '2016-07-14 07:06:00' AND \$CONDITIONS" -m 1 Above query is running fine for 1st time. I am getting above error if I run same query twice changing the where clause parameter. Sqoop incremental option (append or lastmodified ) can store data in same directory but for teradata manager sqoop incremental option does not support . Is the any strategy we can take for achieve my goal ? Please Help.
... View more
Labels:
11-11-2016
09:23 AM
1 Kudo
How can we use incremental import in sqoop for Teradata with --connection-manager org.apache.sqoop.teradata.TeradataConnManager? it seems sqoop --incremental option is not working for TeradataConnManager. Below is the statement from Teradata connector user
guide from Hortonworks. Please help.
... View more
Labels:
10-26-2016
01:50 PM
@Artem Ervits Thanks for your quick reply. StreamSets got certified to work with Cloudera . It will be good to certified with HDP asap. What is the future road map for StreamSets in Hortonworks?
... View more
10-24-2016
07:35 AM
1 Kudo
From google I found Streamsets is an open source data flow tool for ingesting data from outside world to HDFS and many more systems. Cloudera and MAPR already decided to provide support and integration for Streamsets in their distribution for data ingestion. I am planning to use Streamsets tool for my next project. We are using HDP 2.4. So my question is, whether Hortonworks going to supports Streamsets in HDP/HDF like Nifi in future release?
... View more
10-19-2016
06:05 AM
@Lester Martin Many many thanks for your valuable inputs.
... View more
10-14-2016
01:39 PM
@Andrew Grande Can you please explain a bit more with use cases ?
... View more
10-14-2016
10:48 AM
1 Kudo
What is the difference between Apache Nifi and StreamSets ? Can anyone explain the use cases where they can be used ?
... View more
- Tags:
- Data Ingestion & Streaming
- data-ingestion
- hadoop-ecosystem
- NiFi
- nifi-streaming
- stream-processing
- streaming
Labels:
10-05-2016
07:26 AM
@Arun Thanks for your reply. I am not sure. I am also looking for solution.
... View more
10-04-2016
12:03 PM
@Kaliyug Antagonist Please check Chapter 11 in the below book https://www.amazon.com/Apache-Hadoop-YARN-Processing-Addison-Wesley/dp/0321934504
... View more
10-04-2016
09:28 AM
@Kaliyug Antagonist You can try with Yarn Distributed Shell..
... View more