About sunile_manjee

sunile_manjee · ‎10-05-2018

I am getting the following error on zeppelin: %pyspark df_single_recipes = df[df.Lotdf_nrecipes[df_nrecipes == 1].index Traceback (most recent call last): File "/tmp/zeppelin_pyspark-3154198105554295939.py", line 364, in <module> code = compile('\n'.join(stmts), '<stdin>', 'exec', ast.PyCF_ONLY_AST, 1) File "<stdin>", line 2 __zeppelin__._displayhook() ^ SyntaxError: invalid syntax Any ideas how to fix this?

sunile_manjee · ‎09-17-2018

@Saikiran Parepally are the number of regions evenly distributed? Or are you referring to the size of data per Region Server which is not evenly distributed?

sunile_manjee · ‎09-07-2018

Log Forwarding/Ingestion Patterns Log forwarding & ingestion is a key starting point for many logging initiatives such as log analytics, cyber security, anomaly & bot detection, etc etc. This article will focus few (not comprehensive) patterns for log forwarding/ingestion using NiFi. Commonly rsyslog is used to capture and ship log messages.“Rsyslog is an open-source software utility used on UNIX and Unix-like computer systems for forwarding log messages in an IPnetwork. It implements the basic syslog protocol, extends it with content-based filtering, rich filtering capabilities, flexible configuration options and adds features such as using TCP for transport.” More on how to configure rsyslog: here NiFi is able to ingest messages from rsyslog over TCP or UDP via ListenSysLog processor. This allows for little to no coding. Patterns Pattern A A minimalist design. Rsyslog is configured to simply forward log messages to a NiFi cluster. Rsyslog /etc/rsyslog.conf file needs to be configured to forward messages to a NiFi port identified in ListenSysLog processor. Pattern B A MiNiFi listen socket design. MiNiFi is installed on the server(s) leveraging ListenSysLog processor. This pattern offers end to end data linage along with more rich operational capabilities compared to Pattern A. MiNiFi via ListenSysLog will capture rsyslog messages and ship them to NiFi via S2S (site 2 site). Rsyslog is configured to simply forward log messages to a locally installed MiNiFi instance (localhost:port). Rsyslog /etc/rsyslog.conf file needs to be configured to forward messages to a the local MiNiFi port identified in ListenSysLog processor. This design will provide at least once message delivery guarantee. Pattern C A MiNiFi tail file design. MiNiFi is installed on the server(s) leveraging TailFile processor unlike Pattern B using ListenSyslog. Both pattern A and B offer end to end data linage and rich operational capabilities. MiNiFi will capture log messages by tailing a directory of files or a file and ship them to NiFi via S2S (site 2 site). Identify a log file to tail (ie /var/log/messages) or a directory for files, start MiNiFi and the log messages will start flow from the server(s) to NiFi. This design will provide at least once message delivery guarantee. These are a few but common pattens I have developed & implemented in the field with success. Happy log capturing!

sunile_manjee · ‎09-06-2018

During launch of HDP or HDF on azure via cloudbreak, if the following provisioning error is thrown (Check cloudbreak logs): log:55 INFO c.m.a.m.r.Deployments checkExistence - [owner:xxxxx] [type:STACK] [id:2] [name:sparky] [flow:xxx] [tracking:] <-- 404 Not Found https://management.azure.com/subscriptions/xxxxxx/resourcegroups/spark. (104 ms, 92-byte body)/cbreak_cloudbreak_1 | 2018-09-05 14:25:22,882 [reactorDispatcher-24] launch:136 ERROR c.s.c.c.a.AzureResourceConnector - [owner:xxxxxx] [type:STACK] [id:2] [name:sparky] [flow:xxxxxx] [tracking:] Provisioning error: This means the instance type selected is not available within the region. Please change region where instance is available or change to instance type which is available within region.

sunile_manjee · ‎08-20-2018

I found a solution: import scala.sys.process._ val lsResult = Seq("hadoop","fs","-ls","adl://mylake.azuredatalakestore.net/").!!

sunile_manjee · ‎08-20-2018

There are many ways to it iterate HDFS files using spark. Is there any way to iterate over files in ADLS? Here is my code: import org.apache.hadoop.fs.Path import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.FileSystem val path = "adl://mylake.azuredatalakestore.net/" val conf = new Configuration() val fs = FileSystem.get(conf) val p = new Path(path) val ls = fs.listStatus(p) ls.foreach( x => { val f = x.getPath.toString println(f) val content = spark.read.option("delimiter","|").csv(f) content.show(1) } ) and I get the following error: java.lang.IllegalArgumentException: Wrong FS: adl://mylake.azuredatalakestore.net/, expected: hdfs://sparky-m1.klqj4twfp4tehiuq3c3entk04g.jx.internal.cloudapp.net:8020 It expect hdfs but the prefix for ADLS is adl. Any ideas?

sunile_manjee · ‎08-16-2018

I launched a HDP instance on azure via Cloudbreak and added my ADLS information prior to creation. I am reading this tutorial: https://community.hortonworks.com/articles/105994/how-to-configure-authentication-with-adls.html which mentions to assign app owner role to ADLS. My app has contributor role and owner role is not allowed as the enterprise owns it (ADLS) and will not provide me such access. Is there any way for my app with contributor role to use ADLS? Here is the error I get: [cloudbreak@sparky-m1 bin]$ hadoop fs -ls adl://xxxxx.azuredatalakestore.net ls: GETFILESTATUS failed with error 0x83090aa2 (Forbidden. ACL verification failed. Either the resource does not exist or the user is not authorized to perform the requested operation.). [e300ca0f-5b03-48d8-a63a-e66175efe18a][2018-08-16T14:23:24.5402535-07:00] [ServerRequestId:e300ca0f-5b03-48d8-a63a-e66175efe18a]

sunile_manjee · ‎08-15-2018

@Matt Clarke That is good to know. I have CA signed certs and the NiFi CA service is enabled on my cluster. I don't see way to remove NiFi CA service but do see option to "invalidate CA Server". Should I take that approach?

sunile_manjee · ‎08-15-2018

This can be performed on the ambari host page, add NiFi CA service

sunile_manjee · ‎08-15-2018

@pdarvasi is correct. Just to close the loop, here is what I did on azure. I assume same would work for aws/gcp/openstack/etc Update the following file: /var/lib/cloudbreak-deployment/Profile Edit the following lines export UAA_DEFAULT_USER_EMAIL=NewAdmin@HeyNow.com export UAA_DEFAULT_USER_PW='HeyNow' and from the cloudbreak-deployement directory I ran CBD_DEFAULT_PROFILE=tmp cbd util add-default-user and new admin user was created. Simple.

Online	Offline
Last Visited	‎05-25-2022 10:07 AM

Member Since	‎05-30-2018 10:40 PM
Last Visited	‎05-25-2022 10:07 AM
Posts	1,322
Kudos received	713

Cloudera Community

Re: Iterate over ADLS files using spark?

Re: Install NiFi CA service post nifi cluster inst...

Re: Which storage format is optimum for training m...

Re: Ambari custom alert failing

Re: df.cache() is not working on jdbc table

Zeppelin displayhook error

Re: Hbase Regions are not getting balanced even th...

Log Forwarding & Ingestion Patterns using MiNiFi a...

CloudBreak Azure provisioning error

Re: Iterate over ADLS files using spark?

Iterate over ADLS files using spark?

Cloudbreak HDP instances require owner access to a...

Re: Is NiFi CA service required for signed Certs?

Re: Install NiFi CA service post nifi cluster inst...

Re: How to change cloudbreak admin password?