Member since
05-30-2018
1322
Posts
715
Kudos Received
148
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3359 | 08-20-2018 08:26 PM | |
1482 | 08-15-2018 01:59 PM | |
1882 | 08-13-2018 02:20 PM | |
3361 | 07-23-2018 04:37 PM | |
4072 | 07-19-2018 12:52 PM |
10-05-2018
07:28 PM
I am getting the following error on zeppelin: %pyspark
df_single_recipes = df[df.Lotdf_nrecipes[df_nrecipes == 1].index
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-3154198105554295939.py", line 364, in <module>
code = compile('\n'.join(stmts), '<stdin>', 'exec', ast.PyCF_ONLY_AST, 1)
File "<stdin>", line 2
__zeppelin__._displayhook()
^
SyntaxError: invalid syntax Any ideas how to fix this?
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache Zeppelin
09-17-2018
07:54 PM
@Saikiran Parepally are the number of regions evenly distributed? Or are you referring to the size of data per Region Server which is not evenly distributed?
... View more
09-07-2018
09:44 PM
6 Kudos
Log Forwarding/Ingestion Patterns Log forwarding & ingestion is a key starting point for many logging initiatives such as log analytics, cyber
security, anomaly & bot detection, etc etc. This article will focus few (not comprehensive) patterns for log
forwarding/ingestion using NiFi.
Commonly rsyslog is used to capture and ship log messages.“Rsyslog is an open-source software utility used
on UNIX and Unix-like computer systems for forwarding log messages in an IPnetwork. It implements the
basic syslog protocol, extends it with content-based filtering, rich filtering capabilities, flexible configuration
options and adds features such as using TCP for transport.” More on how to configure rsyslog: here NiFi is able to ingest messages from rsyslog over TCP or UDP via ListenSysLog processor. This allows for
little to no coding. Patterns Pattern A A minimalist design. Rsyslog is configured to simply forward log messages to a NiFi cluster. Rsyslog
/etc/rsyslog.conf file needs to be configured to forward messages to a NiFi port identified in ListenSysLog
processor. Pattern B A MiNiFi listen socket design. MiNiFi is installed on the server(s) leveraging ListenSysLog processor. This
pattern offers end to end data linage along with more rich operational capabilities compared to Pattern A.
MiNiFi via ListenSysLog will capture rsyslog messages and ship them to NiFi via S2S (site 2 site). Rsyslog
is configured to simply forward log messages to a locally installed MiNiFi instance (localhost:port). Rsyslog
/etc/rsyslog.conf file needs to be configured to forward messages to a the local MiNiFi port identified in
ListenSysLog processor. This design will provide at least once message delivery guarantee. Pattern C A MiNiFi tail file design. MiNiFi is installed on the server(s) leveraging TailFile processor unlike Pattern B
using ListenSyslog. Both pattern A and B offer end to end data linage and rich operational capabilities.
MiNiFi will capture log messages by tailing a directory of files or a file and ship them to NiFi via S2S (site 2
site). Identify a log file to tail (ie /var/log/messages) or a directory for files, start MiNiFi and the log
messages will start flow from the server(s) to NiFi. This design will provide at least once message delivery guarantee. These are a few but common pattens I have developed & implemented in the field with success. Happy log
capturing!
... View more
Labels:
09-06-2018
01:56 PM
During launch of HDP or HDF on azure via cloudbreak, if the following provisioning error is thrown (Check cloudbreak logs): log:55 INFO c.m.a.m.r.Deployments checkExistence - [owner:xxxxx] [type:STACK] [id:2] [name:sparky] [flow:xxx] [tracking:] <-- 404 Not Found https://management.azure.com/subscriptions/xxxxxx/resourcegroups/spark. (104 ms, 92-byte body)/cbreak_cloudbreak_1 | 2018-09-05 14:25:22,882 [reactorDispatcher-24] launch:136 ERROR c.s.c.c.a.AzureResourceConnector - [owner:xxxxxx] [type:STACK] [id:2] [name:sparky] [flow:xxxxxx] [tracking:] Provisioning error: This means the instance type selected is not available within the region. Please change region where instance is available or change to instance type which is available within region.
... View more
Labels:
08-20-2018
08:26 PM
I found a solution: import scala.sys.process._
val lsResult = Seq("hadoop","fs","-ls","adl://mylake.azuredatalakestore.net/").!!
... View more
08-20-2018
06:44 PM
There are many ways to it iterate HDFS files using spark. Is there any way to iterate over files in ADLS? Here is my code: import org.apache.hadoop.fs.Path
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
val path = "adl://mylake.azuredatalakestore.net/"
val conf = new Configuration()
val fs = FileSystem.get(conf)
val p = new Path(path)
val ls = fs.listStatus(p)
ls.foreach( x => {
val f = x.getPath.toString
println(f)
val content = spark.read.option("delimiter","|").csv(f)
content.show(1)
} )
and I get the following error: java.lang.IllegalArgumentException: Wrong FS: adl://mylake.azuredatalakestore.net/, expected: hdfs://sparky-m1.klqj4twfp4tehiuq3c3entk04g.jx.internal.cloudapp.net:8020 It expect hdfs but the prefix for ADLS is adl. Any ideas?
... View more
Labels:
- Labels:
-
Apache Spark
08-16-2018
09:37 PM
I launched a HDP instance on azure via Cloudbreak and added my ADLS information prior to creation. I am reading this tutorial: https://community.hortonworks.com/articles/105994/how-to-configure-authentication-with-adls.html which mentions to assign app owner role to ADLS. My app has contributor role and owner role is not allowed as the enterprise owns it (ADLS) and will not provide me such access. Is there any way for my app with contributor role to use ADLS? Here is the error I get: [cloudbreak@sparky-m1 bin]$ hadoop fs -ls adl://xxxxx.azuredatalakestore.net
ls: GETFILESTATUS failed with error 0x83090aa2 (Forbidden. ACL verification failed. Either the resource does not exist or the user is not authorized to perform the requested operation.). [e300ca0f-5b03-48d8-a63a-e66175efe18a][2018-08-16T14:23:24.5402535-07:00] [ServerRequestId:e300ca0f-5b03-48d8-a63a-e66175efe18a]
... View more
Labels:
- Labels:
-
Hortonworks Cloudbreak
08-15-2018
02:31 PM
@Matt Clarke That is good to know. I have CA signed certs and the NiFi CA service is enabled on my cluster. I don't see way to remove NiFi CA service but do see option to "invalidate CA Server". Should I take that approach?
... View more
08-15-2018
01:59 PM
1 Kudo
This can be performed on the ambari host page, add NiFi CA service
... View more
08-15-2018
01:49 PM
@pdarvasi is correct. Just to close the loop, here is what I did on azure. I assume same would work for aws/gcp/openstack/etc Update the following file: /var/lib/cloudbreak-deployment/Profile Edit the following lines export UAA_DEFAULT_USER_EMAIL=NewAdmin@HeyNow.com
export UAA_DEFAULT_USER_PW='HeyNow'
and from the cloudbreak-deployement directory I ran CBD_DEFAULT_PROFILE=tmp cbd util add-default-user and new admin user was created. Simple.
... View more