Member since
03-24-2017
11
Posts
0
Kudos Received
0
Solutions
03-19-2018
03:17 PM
This is the script. sqoop_sql.sh <code>query=$(cat ${SQL_SCRIPT})
where_clause=" where dateadded >= '2016-05-01' and dateadded < '2016-06-01' and \$CONDITIONS"
sqoop import -D mapreduce.job.queuename=s_sourcedata \--connect 'jdbc:sqlserver://connection' \--compression-codec org.apache.hadoop.io.compress.SnappyCodec \--username name \--password pas \--query "${query}${where_clause}" \--as-parquetfile \--split-by dateadded \--delete-target-dir \--target-dir prioritypass_history \-m 1
It doesn't work this way, but if I change first string to <code>query="select * FROM smth.[dbo].[tablename]"
it works. My action looks like this <code><action name="history" cred="hv_cred">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${JOB_TRACKER}</job-tracker>
<name-node>${NAME_NODE}</name-node>
<exec>sqoop_sql.sh</exec>
<env-var>SQL_SCRIPT=${SQL_SCRIPT_HISTORY}</env-var>
...
<file>${WORKFLOW_APPLICATION_PATH}/bash/sqoop_sql.sh#sqoop_sql.sh</file>
<file>${WORKFLOW_APPLICATION_PATH}/oracle/${SQL_SCRIPT_HISTORY}#${SQL_SCRIPT_HISTORY}</file>
</shell><ok to="end"/>
<error to="kill"/></action>
The thing is I used this same code to import data from oracle, changing only connection details. My only guess is that oozie doesn't like the fact the script is in folder oracle , but I'm not sure and don't know what to change it to if that's the case. PS I don't use sqoop action because there are some libs missing on the claster and it doesn't work.
... View more
Labels:
- Labels:
-
Apache Oozie
-
Apache Sqoop
09-05-2017
01:24 PM
Okey. I did it. There were a few problems, but this is how the final variant looks. My docker. krb5.conf and keytab are in the same folder as my docker file. When I build the project they are added to the container and in the entrypoint I use -Djava.security.krb5.conf to provide krb5 location. There are also a few options for debugging + I connect mongo. FROM java:8
ADD report.jar report.jar
ADD krb5.conf /etc/krb5.conf
ADD evkuzmin.keytab /etc/evkuzmin.keytab
RUN sh -c 'touch report.jar'
ENTRYPOINT ["java","-Dspring.data.mongodb.uri=mongodb://audpro_mongo/report","-Djavax.net.debug=all","-Dsun.security.spnego.debug=true","-Dsun.security.krb5.debug=true","-Djava.security.krb5.conf=/etc/krb5.conf","-jar","/report.jar"] Then I use KerberosRestTemplate to connect to webhdfs public String getReportJSON()throwsURISyntaxException{
KerberosRestTemplate restTemplate =newKerberosRestTemplate("/etc/evkuzmin.keytab","EvKuzmin@DOMAIN");
URI uri =new URI("http"+"://"+ host +":"+ port +"/webhdfs/v1"+ path +"?op=OPEN");
String json = restTemplate.getForObject(uri,String.class);return json;
return json;
} If you want to run the app without docker, just build it and add the keytab to the same direction as the jar. Then change /etc/evkuzmin.keytab so it points to the new location.
... View more
09-04-2017
02:53 PM
@Geoffrey Shelton Okot
Why do I need krb5 if I already have keytab? I already pass the principal on java. what else is there?
... View more
09-04-2017
12:45 PM
@Geoffrey Shelton Okot I followed the instructions, but had to change a few things. The net=host didbn't work. Changed to network=aud_pro_net. When I tried moving krb5.conf file directly like so /etc/krb5.conf:/etc/krb5.conf, I got the error that it's not a directory, so I changed it too. The rest was left as is. In the end I initialized the keytab wihtout problem. But the error persisted. Unable to obtain password. I added everything to the post.
... View more
09-04-2017
10:00 AM
I use spring to connect to a cluster secured with kerberos. My code private KerberosRestTemplate restTemplate = new KerberosRestTemplate("evkuzmin.keytab",<br> "EvKuzmin@REALM");
URI uri = new URI("http" + "://" + host + ":" + port + "/webhdfs/v1" + path + "?op=OPEN");
String json = restTemplate.getForObject(uri, String.class);
return json; Here I read read the file and return a string. I generated keytab file and checked in CLI. It works. I checked the app itself, it also works. In fact, when I simply run the app, I don't need kerberos keytab, because I have a ticket that is automatically used for autorization. The problem start when I try to run the app in docker. If I don't use keytab, it doesn't see the ticket and I get <code>AuthenticationException:Unauthorized But when I use it, it can't obtain the password. What am I doing wrong? Edit How I start spring docker run -d --name audpro --network=aud_pro_net -p 8080:8080 --link audpro_mongo:audpro_mongo beeline/report How I tried to start kerberos docker run -d --network=aud_pro_net -v /kerb:/etc/ -v /dev/urandom:/dev/random --name kerberos -e BOOTSTRAP=0 sequenceiq/kerberos
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Docker
07-07-2017
03:11 PM
I have some hive scripts running on the cluster. They are started by oozie. I want to know if they can stop working if the cluster is updated. Is there a place where I can see what words became reserved words(like date) or what settings we have to write differently(for example SET hive.execution.engine=mr changing to hive.exec.engine) and other such things? I tried looking at the official site and there are some descriptions, but they are not what I need.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
03-31-2017
02:05 PM
Sometimes I get this. And sometimes I get this. There is also a third case where I get a mix of both. What is happening? My settings. set hive.execution.engine=tez;
set tez.queue.name=adhoc;
set hive.tez.container.size=4096;
set hive.auto.convert.join=true;
set hive.exec.parallel=true;
set hive.tez.auto.reducer.parallelism=true;
... View more
- Tags:
- HDFS
- tez
- Upgrade to HDP 2.5.3 : ConcurrentModificationException When Executing Insert Overwrite : Hive
Labels:
- Labels:
-
Apache Hadoop
-
Apache Tez
03-31-2017
11:09 AM
I run a script with tez and it worked, but then I tried to run it without tez and there was this line in the logs tez session was closed reopenning the settings I had with tez set hive.execution.engine=tez;
set tez.queue.name=adhoc;
set hive.tez.container.size=4096;
set hive.auto.convert.join=true;
set hive.exec.parallel=true;
set hive.tez.auto.reducer.parallelism=true; without tez SET mapreduce.job.queuename = adhoc;
SET mapreduce.job.reduces = 100;
SET hive.exec.parallel.thread.number = 8;
SET hive.cli.print.header=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;
SET hive.exec.parallel=true;
SET hive.cli.print.current.db=true;
set hive.auto.convert.join=false;
set hive.resultset.use.unique.column.names=false; The code was identical in both cases. I run scripts from HUE. Why did it happen?
... View more
- Tags:
- hadoop
- Hadoop Core
- Hive
- tez
- Upgrade to HDP 2.5.3 : ConcurrentModificationException When Executing Insert Overwrite : Hive
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Apache Tez