Member since
10-01-2015
3933
Posts
1150
Kudos Received
374
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3644 | 05-03-2017 05:13 PM | |
| 2999 | 05-02-2017 08:38 AM | |
| 3253 | 05-02-2017 08:13 AM | |
| 3212 | 04-10-2017 10:51 PM | |
| 1678 | 03-28-2017 02:27 AM |
04-21-2016
03:42 PM
considering you have the following data { "user": { "userlocation": "California, Santa Clara", "id": 222222, "name": "Hortonworks", "screenname": "hortonworks", "geoenabled": true }, "tweetmessage": "Learn more about #Spark in #HDP 2.3 with @Hortonworks founder @acmurthy in this video overview http://bit.ly/1gOyr9w #hadoop", "createddate": "2015-07-24T16:30:33", "geolocation": null} your schema would look like so with JsonSerDe CREATE EXTERNAL TABLE tweets ( createddate string, geolocation string, tweetmessage string, `user` struct<geoenabled:boolean, id:int, name:string, screenname:string, userlocation:string>)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' LOCATION '/user/root/';
SELECT DISTINCT tweetmessage, user.name, createddate
FROM tweets WHERE user.name = 'Hortonworks'
ORDER BY createddate;
... View more
04-21-2016
07:43 AM
I don't, I do have a verbal from docs team and unit test from oozie also goes against 8032.
... View more
04-21-2016
07:35 AM
8032 and 8050 is an oversight, documentation team is addressing it, at least in Oozie's case
... View more
04-21-2016
01:26 AM
@cstella I just confirmed that it still fails with DataFu 1.3 on latest Sandbox 2.4 v3.
... View more
04-21-2016
12:04 AM
Checkout my UDF examples using streaming https://github.com/dbist/pig/tree/master/udfs specifically formathtml.pig script and it's associated UDF written in python
... View more
04-20-2016
06:23 PM
You have to ssh to the machine using terminal. There is no hdfs user in Ambari that can create directory as superuser.
... View more
04-20-2016
06:22 PM
sudo -u hdfs hdfs dfs -mkdir /user/root
sudo -u hdfs hdfs dfs -chown -R root:hdfs /user/root
... View more
04-20-2016
10:18 AM
1 Kudo
You did not include the python interpreter line in your python script and it has difficulty understanding its python. For what you're trying to achieve, you can skip streaming and just use Pig built-in filter function. It will perform better than streaming. http://pig.apache.org/docs/r0.15.0/ SSN_NAME = load 'students.txt' using PigStorage() as (ssn:long, name:chararray);
/* do a left outer join of SSN with SSN_Name */
X = JOIN SSN by ssn LEFT OUTER, SSN_NAME by ssn;
/* only keep those ssn's for which there is no name */
Y = filter X by IsEmpty(SSN_NAME);
... View more
04-19-2016
09:51 PM
1 Kudo
I believe you can use postgres driver and Sqoop from HAWQ using standard postgres connection string.
... View more
04-19-2016
05:10 PM
what do you mean you assigned broker IDs? Kafka will assign its own IDs, I wouldn't mess with that configuration. I would look in the broker logs for the error message you're having, most likely it's conflict of IDs.
... View more