Member since
10-01-2015
3933
Posts
1150
Kudos Received
374
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3362 | 05-03-2017 05:13 PM | |
2792 | 05-02-2017 08:38 AM | |
3068 | 05-02-2017 08:13 AM | |
3002 | 04-10-2017 10:51 PM | |
1511 | 03-28-2017 02:27 AM |
01-13-2016
06:18 AM
1 Kudo
Thanks all of you for your answers. Below are answers that i got from Apache atlas developer Apache atlas supports integration with hive. limited integration with storm, kafka, sqoop and falcon is available in 0.6
Atlas metadata is stored in titan graph Atlas doesn’t support metadata exchange with informatica currently. Its in the roadmap
... View more
02-05-2016
11:11 PM
@rich Thanks rich. your solution worked out. I accept this answer.
... View more
01-05-2016
08:25 PM
3 Kudos
Groovy UDF example Can be compiled at run time Currently only works in "hive" shell, does not work in beeline <code>su guest
hive
paste the following code into the hive shellthis will use Groovy String replace function to replace all instances of lower case 'e' with 'E' <code>compile `import org.apache.hadoop.hive.ql.exec.UDF \;
import org.apache.hadoop.io.Text \;
public class Replace extends UDF {
public Text evaluate(Text s){
if (s == null) return null \;
return new Text(s.toString().replace('e', 'E')) \;
}
} ` AS GROOVY NAMED Replace.groovy;
now create a temporary function to leverage the Groovy UDF <code>CREATE TEMPORARY FUNCTION Replace as 'Replace';
now you can use the function in your SQL <code>SELECT Replace(description) FROM sample_08 limit 5;
full example <code>hive> compile `import org.apache.hadoop.hive.ql.exec.UDF \;
> import org.apache.hadoop.io.Text \;
> public class Replace extends UDF {
> public Text evaluate(Text s){
> if (s == null) return null \;
> return new Text(s.toString().replace('e', 'E')) \;
> }
> } ` AS GROOVY NAMED Replace.groovy;
Added [/tmp/0_1452022176763.jar] to class path
Added resources: [/tmp/0_1452022176763.jar]
hive> CREATE TEMPORARY FUNCTION Replace as 'Replace';
OK
Time taken: 1.201 seconds
hive> SELECT Replace(description) FROM sample_08 limit 5;
OK
All Occupations
ManagEmEnt occupations
ChiEf ExEcutivEs
GEnEral and opErations managErs
LEgislators
Time taken: 6.373 seconds, Fetched: 5 row(s)
hive>
Another example this will duplicate any String passed to the function <code>compile `import org.apache.hadoop.hive.ql.exec.UDF \;
import org.apache.hadoop.io.Text \;
public class Duplicate extends UDF {
public Text evaluate(Text s){
if (s == null) return null \;
return new Text(s.toString() * 2) \;
}
} ` AS GROOVY NAMED Duplicate.groovy;
CREATE TEMPORARY FUNCTION Duplicate as 'Duplicate';
SELECT Duplicate(description) FROM sample_08 limit 5;
All OccupationsAll Occupations
Management occupationsManagement occupations
Chief executivesChief executives
General and operations managersGeneral and operations managers
LegislatorsLegislators
JSON Parsing UDF <code>compile `import org.apache.hadoop.hive.ql.exec.UDF \;
import groovy.json.JsonSlurper \;
import org.apache.hadoop.io.Text \;
public class JsonExtract extends UDF {
public int evaluate(Text a){
def jsonSlurper = new JsonSlurper() \;
def obj = jsonSlurper.parseText(a.toString())\;
return obj.val1\;
}
} ` AS GROOVY NAMED json_extract.groovy;
CREATE TEMPORARY FUNCTION json_extract as 'JsonExtract';
SELECT json_extract('{"val1": 2}') from date_dim limit 1;
2
... View more
Labels:
02-02-2016
08:50 PM
Thanks @sindhu seenivasan for the final followup
... View more
01-06-2016
12:40 AM
Glad that's its not an abandoned feature. Are there more examples and/or docs available? I created a few of my own but I think we need better examples. Thank you @gopal
... View more
03-08-2016
02:30 AM
Hey guys. The tutorial mentioned above has been updated and is also compatible with the latest Sandbox HDP 2.4. It addresses the issue of permissions. Here is the link: http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-hive/ When you a chance, can you go through the tutorial on our new Sandbox?
... View more
01-21-2016
08:44 PM
3 Kudos
Today I spoke with Robert Molina from Hortonworks and possibly found what is creating all those alerts! The sandbox is intended to be run on a desktop with a NAT networked interface. I set it up on a dedicaded headless server with a bridge adaptor. Looks like sandbox have a problem with that and that cause some of the services configs to not function properly! As a result some services works but reports network connections alerts! After some config change. The related alerts weren't there anymore. So always use a vm for and how it was intended to be used. Thanks to the hortonworks team and Robert who wanted to go to the bottom of this. Conclusion: If you want, like me, to test drive hortonworks on a headless server. Start from scratch and build it! What every sysadmin should do anyways... That's what I'll do this week end... P
... View more
01-01-2016
02:09 AM
Good call Vladimir. mkdir: Permission denied: user=yarn, access=WRITE, inode="/user/ambari-qa/falcon/demo/primary/input/enron/2015-12-30-01":ambari-qa:hdfs:drwxr-xr-x
I executed the job from falcon using ambari-qa. Is there any configuration I can change so it uses the user ambari-qa during execution?
... View more
01-04-2016
02:07 PM
Dear Grace, We can start with this template and improve it : #!/bin/bash kinit ...... hdfs dfs -rm -r hdfs://.... sqoop import --connect "jdbc:sqlserver://....:1433;username=.....;password=….;database=....DB" --table ..... \ -m 1 --where "...... > 0" CR=$? if [ $CR -ne 0 ]; then echo 'Sqoop job failed' exit 1 fi hdfs dfs -cat hdfs://...../* > export_fs_table.txt CR=$? if [ $CR -ne 0 ]; then echo 'hdfs cat failed' exit 1 fi while IFS=',' read -r id tablename nbr flag; do sqoop import --connect "jdbc:sqlserver://......:1433;username=......;password=......;database=.......DB" --table $tablename CR=$? if [ $CR -ne 0 ]; then echo 'sqoop import failed for '$tablename exit 1 fi done < export_fs_table.txt Kind regards
... View more
12-30-2015
02:17 AM
4 Kudos
I’m going to show you a neat way to work with CSV files and Apache Hive. Usually, you’d have to do some preparatory work on CSV data before you can consume it with Hive but I’d like to show you a built-in SerDe (Serializer/Deseriazlier) for Hive that will make it a lot more convenient to work with CSV. This work was merged in Hive 0.14 and there’s no additional steps necessary to work with CSV from Hive. Suppose you have a CSV file with the following entries
id first_name last_name email gender ip_address
1 James Coleman jcoleman0@cam.ac.uk Male 136.90.241.52
2 Lillian Lawrence llawrence1@statcounter.com Female 101.177.15.130
3 Theresa Hall thall2@sohu.com Female 114.123.153.64
4 Samuel Tucker stucker3@sun.com Male 89.60.227.31
5 Emily Dixon edixon4@surveymonkey.com Female 119.92.21.19 to consume it from within Hive, you’ll need to upload it to hdfs hdfs dfs -put sample.csv /tmp/serdes/
now all it takes is to create a table schema on top of the file drop table if exists sample;
create external table sample(id int,first_name string,last_name string,email string,gender string,ip_address string)
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
stored as textfile
location '/tmp/serdes/';
now you can query the table as is select * from sample limit 10;
but what if your CSV file was tab-delimited rather than comma? well the SerDe got you covered there too: drop table if exists sample;
create external table sample(id int,first_name string,last_name string,email string,gender string,ip_address string)
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties (
"separatorChar" = "\t"
)
stored as textfile
location '/tmp/serdes/';
notice the separatorChar argument, in all, the SerDe accepts two more arguments; custom escape characters and quote characters
Take a look at the wiki for more info https://cwiki.apache.org/confluence/display/Hive/CSV+Serde.
... View more
Labels: