About sunile_manjee

sunile_manjee · ‎07-23-2016

Josh that link you shared is priceless.

sunile_manjee · ‎07-23-2016

Apologies for grammar and typos... writing this from my phone. If your date format within a given data set is inconsistent the i would write a UDF to handle this. Inside the UDF you would have to detect the type of date you are working with using regex for example. This is done very nicely witb NiFi if you want yo hit the wasy button. If the format is consistent within a dataset yet different amount others then simply write a hive or pig script for each dataset and then parse out the date with the format you expect for that specific data set.

sunile_manjee · ‎07-23-2016

So your looking for windowing on storm.ie do somethikg based on a specificed time period. Until recently you had to build your own windowing logic in storm by keep track of time and do some disk cache to hold events until window tome has completed. Now the functionality comes out of the box. Take a look at an excellent article written on how the new functionality works in storm here. https://community.hortonworks.com/articles/14171/windowing-and-state-checkpointing-in-apache-storm.html

sunile_manjee · ‎07-23-2016

File on HDFS are immutable. Hdfs bolt allows for example "After every 1,000 tuples it will sync filesystem, making that data visible to other HDFS clients. It will rotate files when they reach 5 megabytes in size." So you can buffer up events until specified interval. Take a look at my github storm code. You will see how that is performed https://github.com/sunileman/storm-twitter-sentiment

sunile_manjee · ‎07-22-2016

I am trying to connect from Squirrel to phoenix and it errors out with at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:202) at net.sourceforge.squirrel_sql.fw.sql.SQLDriverManager.getConnection(SQLDriverManager.java:133) at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand.executeConnect(OpenConnectionCommand.java:167) ... 7 more Caused by: java.io.IOException: Login failure for smanjee@CLOUD.HORTONWORKS.COM from keytab /Users/smanjee/keytabs/keytab at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:921) at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:242) at org.apache.hadoop.hbase.security.User$SecureHadoopUser.login(User.java:386) at org.apache.hadoop.hbase.security.User.login(User.java:253) at org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:380) ... 17 more Caused by: javax.security.auth.login.LoginException: Unable to obtain password from user at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:897) I verified by keytab file looks good by issuing a curl webhdfs against to cluster with success. What am i missing here?

sunile_manjee · ‎07-22-2016

@Timothy Spann "best" is a subjective term when it comes to public cloud providers. I would take your typically on perm profile and back port it to the AWS EC2 profiles. From a pricing perspective you can further spread the master/datanodes services to small boxes based on cost difference. For example ZK uses low ram. Then you can find a small 2-4gig box for your ZK quorum. Take the typical on perm requirements and back port into 1:M VM on your cloud provider. For example Master Nodes - Multipule of m4.4xlarge, or r3.4xlarge Data nodes - i2.4xlarge or d2.4xlarge Storm nodes - c4.4xlarge Spark - x1.32xlarge GPU processing - g2.2xlarge

sunile_manjee · ‎07-21-2016

thanks @jwitt. Was searching for this info.

sunile_manjee · ‎07-21-2016

@Kumar Veerappan Similar to this issue. You are getting a "permission denied"-error because you are trying to access a folder that is owned by the hdfs-user and the permissions do not allow write access from others. A) You could use the HDFS-user to run your application/script su hdfs or export HADOOP_USER_NAME=hdfs B) Change the owner of the your user (note: to change the owner you have to be a superuser or the owner => hdfs) hdfs dfs -chown -R <username_of_new_owner> /user

sunile_manjee · ‎07-21-2016

Ok I found the documentation. Streamtable is used by default in mapside join: In every map/reduce stage of the join, the last table in the sequence is streamed through the reducers where as the others are buffered. However it must be the last table in the sequence. So if it is not then your suggestion would be helpful. However in this situation the smallest table is the last table in sequence.

sunile_manjee · ‎07-21-2016

@Constantin Stanca my understanding this happens by default when using map side joins. Is that not the case?

Online	Offline
Last Visited	‎05-25-2022 10:07 AM

Member Since	‎05-30-2018 10:40 PM
Last Visited	‎05-25-2022 10:07 AM
Posts	1,322
Kudos received	713

Cloudera Community

Re: Iterate over ADLS files using spark?

Re: Install NiFi CA service post nifi cluster inst...

Re: Which storage format is optimum for training m...

Re: Ambari custom alert failing

Re: df.cache() is not working on jdbc table

Re: Is there a limitation on a number of secondary...

Re: Convert date from different format to one part...

Re: Can HdfsBolt append data to existing file?

Re: Can HdfsBolt append data to existing file?

Kerberos Unable to obtain password from user

Re: Cloud Sizing for AWS, Azure and Google - What ...

Re: Nifi - Does getKafka processor support reading...

Re: Unable to query HIVE database

Re: ORC query on all columns

Re: ORC query on all columns