Member since
05-30-2018
1322
Posts
715
Kudos Received
148
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4047 | 08-20-2018 08:26 PM | |
| 1947 | 08-15-2018 01:59 PM | |
| 2374 | 08-13-2018 02:20 PM | |
| 4106 | 07-23-2018 04:37 PM | |
| 5013 | 07-19-2018 12:52 PM |
07-23-2016
11:59 PM
Josh that link you shared is priceless.
... View more
07-23-2016
11:55 PM
2 Kudos
Apologies for grammar and typos... writing this from my phone. If your date format within a given data set is inconsistent the i would write a UDF to handle this. Inside the UDF you would have to detect the type of date you are working with using regex for example. This is done very nicely witb NiFi if you want yo hit the wasy button. If the format is consistent within a dataset yet different amount others then simply write a hive or pig script for each dataset and then parse out the date with the format you expect for that specific data set.
... View more
07-23-2016
11:49 PM
So your looking for windowing on storm.ie do somethikg based on a specificed time period. Until recently you had to build your own windowing logic in storm by keep track of time and do some disk cache to hold events until window tome has completed. Now the functionality comes out of the box. Take a look at an excellent article written on how the new functionality works in storm here. https://community.hortonworks.com/articles/14171/windowing-and-state-checkpointing-in-apache-storm.html
... View more
07-23-2016
04:14 PM
1 Kudo
File on HDFS are immutable. Hdfs bolt allows for example "After every 1,000 tuples it will sync filesystem, making that data visible to other HDFS clients. It will rotate files when they reach 5 megabytes in size." So you can buffer up events until specified interval. Take a look at my github storm code. You will see how that is performed https://github.com/sunileman/storm-twitter-sentiment
... View more
07-22-2016
09:46 PM
I am trying to connect from Squirrel to phoenix and it errors out with at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:202)
at net.sourceforge.squirrel_sql.fw.sql.SQLDriverManager.getConnection(SQLDriverManager.java:133)
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand.executeConnect(OpenConnectionCommand.java:167)
... 7 more
Caused by: java.io.IOException: Login failure for smanjee@CLOUD.HORTONWORKS.COM from keytab /Users/smanjee/keytabs/keytab
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:921)
at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:242)
at org.apache.hadoop.hbase.security.User$SecureHadoopUser.login(User.java:386)
at org.apache.hadoop.hbase.security.User.login(User.java:253)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:380)
... 17 more
Caused by: javax.security.auth.login.LoginException: Unable to obtain password from user
at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:897) I verified by keytab file looks good by issuing a curl webhdfs against to cluster with success. What am i missing here?
... View more
Labels:
- Labels:
-
Apache Phoenix
07-22-2016
02:32 PM
1 Kudo
@Timothy Spann "best" is a subjective term when it comes to public cloud providers. I would take your typically on perm profile and back port it to the AWS EC2 profiles. From a pricing perspective you can further spread the master/datanodes services to small boxes based on cost difference. For example ZK uses low ram. Then you can find a small 2-4gig box for your ZK quorum. Take the typical on perm requirements and back port into 1:M VM on your cloud provider. For example Master Nodes - Multipule of m4.4xlarge, or r3.4xlarge Data nodes - i2.4xlarge or d2.4xlarge Storm nodes - c4.4xlarge Spark - x1.32xlarge GPU processing - g2.2xlarge
... View more
07-21-2016
09:03 PM
thanks @jwitt. Was searching for this info.
... View more
07-21-2016
08:32 PM
@Kumar Veerappan Similar to this issue. You are getting a "permission denied"-error because you are trying to access a folder that is owned by the hdfs-user and the permissions do not allow write access from others. A) You could use the HDFS-user to run your application/script su hdfs or export HADOOP_USER_NAME=hdfs B) Change the owner of the your user (note: to change the owner you have to be a superuser or the owner => hdfs) hdfs dfs -chown -R <username_of_new_owner> /user
... View more
07-21-2016
07:54 PM
Ok I found the documentation. Streamtable is used by default in mapside join: In every map/reduce stage of the join, the last table in the sequence is streamed through the reducers where as the others are buffered. However it must be the last table in the sequence. So if it is not then your suggestion would be helpful. However in this situation the smallest table is the last table in sequence.
... View more
07-21-2016
07:48 PM
@Constantin Stanca my understanding this happens by default when using map side joins. Is that not the case?
... View more