Member since 
    
	
		
		
		05-30-2018
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                1322
            
            
                Posts
            
        
                715
            
            
                Kudos Received
            
        
                148
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 4006 | 08-20-2018 08:26 PM | |
| 1882 | 08-15-2018 01:59 PM | |
| 2339 | 08-13-2018 02:20 PM | |
| 4060 | 07-23-2018 04:37 PM | |
| 4955 | 07-19-2018 12:52 PM | 
			
    
	
		
		
		10-05-2018
	
		
		07:28 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I am getting the following error on zeppelin:  %pyspark
df_single_recipes = df[df.Lotdf_nrecipes[df_nrecipes == 1].index
Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-3154198105554295939.py", line 364, in <module>
    code = compile('\n'.join(stmts), '<stdin>', 'exec', ast.PyCF_ONLY_AST, 1)
  File "<stdin>", line 2
    __zeppelin__._displayhook()
               ^
SyntaxError: invalid syntax  Any ideas how to fix this? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Spark
- 
						
							
		
			Apache Zeppelin
			
    
	
		
		
		09-17-2018
	
		
		07:54 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							  @Saikiran Parepally are the number of regions evenly distributed? Or are you referring to the size of data per Region Server which is not evenly distributed? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-07-2018
	
		
		09:44 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		6 Kudos
		
	
				
		
	
		
					
							 Log Forwarding/Ingestion Patterns      Log forwarding & ingestion is a key starting point for many logging initiatives such as log analytics, cyber
security, anomaly & bot detection, etc etc. This article will focus few (not comprehensive) patterns for log
forwarding/ingestion using NiFi.  
Commonly rsyslog is used to capture and ship log messages.“Rsyslog is an open-source software utility used
on UNIX and Unix-like computer systems for forwarding log messages in an IPnetwork. It implements the
basic syslog protocol, extends it with content-based filtering, rich filtering capabilities, flexible configuration
options and adds features such as using TCP for transport.”  More on how to configure rsyslog: here  NiFi is able to ingest messages from rsyslog over TCP or UDP via ListenSysLog processor. This allows for
little to no coding.    Patterns   Pattern A   A minimalist design. Rsyslog is configured to simply forward log messages to a NiFi cluster. Rsyslog
/etc/rsyslog.conf file needs to be configured to forward messages to a NiFi port identified in ListenSysLog
processor.      Pattern B   A MiNiFi listen socket design. MiNiFi is installed on the server(s) leveraging ListenSysLog processor. This
pattern offers end to end data linage along with more rich operational capabilities compared to Pattern A.
MiNiFi via ListenSysLog will capture rsyslog messages and ship them to NiFi via S2S (site 2 site). Rsyslog
is configured to simply forward log messages to a locally installed MiNiFi instance (localhost:port). Rsyslog
/etc/rsyslog.conf file needs to be configured to forward messages to a the local MiNiFi port identified in
ListenSysLog processor. This design will provide at least once message delivery guarantee.      Pattern C   A MiNiFi tail file design. MiNiFi is installed on the server(s) leveraging TailFile processor unlike Pattern B
using ListenSyslog. Both pattern A and B offer end to end data linage and rich operational capabilities.
MiNiFi will capture log messages by tailing a directory of files or a file and ship them to NiFi via S2S (site 2
site). Identify a log file to tail (ie /var/log/messages) or a directory for files, start MiNiFi and the log
messages will start flow from the server(s) to NiFi.  This design will provide at least once message delivery guarantee.      These are a few but common pattens I have developed & implemented in the field with success. Happy log
capturing! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		09-06-2018
	
		
		01:56 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 During launch of HDP or HDF on azure via cloudbreak, if the following provisioning error is thrown (Check cloudbreak logs):  log:55 INFO  c.m.a.m.r.Deployments checkExistence - [owner:xxxxx] [type:STACK] [id:2] [name:sparky] [flow:xxx] [tracking:] <-- 404 Not Found https://management.azure.com/subscriptions/xxxxxx/resourcegroups/spark. (104 ms, 92-byte body)/cbreak_cloudbreak_1 | 2018-09-05 14:25:22,882 [reactorDispatcher-24] launch:136 ERROR c.s.c.c.a.AzureResourceConnector - [owner:xxxxxx] [type:STACK] [id:2] [name:sparky] [flow:xxxxxx] [tracking:] Provisioning error:  This means the instance type selected is not available within the region.  Please change region where instance is available or change to instance type which is available within region. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		08-20-2018
	
		
		08:26 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I found a solution:  import scala.sys.process._ 
val lsResult = Seq("hadoop","fs","-ls","adl://mylake.azuredatalakestore.net/").!!
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-20-2018
	
		
		06:44 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 There are many ways to it iterate HDFS files using spark.  Is there any way to iterate over files in ADLS?  Here is my code:  import org.apache.hadoop.fs.Path
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
 
val path = "adl://mylake.azuredatalakestore.net/"
val conf = new Configuration()
val fs = FileSystem.get(conf)
val p = new Path(path)
val ls = fs.listStatus(p)
 
ls.foreach( x => {
val f = x.getPath.toString
println(f)
val content = spark.read.option("delimiter","|").csv(f)
content.show(1)
} )
  and I get the following error:  java.lang.IllegalArgumentException: Wrong FS: adl://mylake.azuredatalakestore.net/, expected: hdfs://sparky-m1.klqj4twfp4tehiuq3c3entk04g.jx.internal.cloudapp.net:8020  It expect hdfs but the prefix for ADLS is adl.  Any ideas? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Spark
			
    
	
		
		
		08-16-2018
	
		
		09:37 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I launched a HDP instance on azure via Cloudbreak and added my ADLS information prior to creation.  I am reading this tutorial:  https://community.hortonworks.com/articles/105994/how-to-configure-authentication-with-adls.html  which mentions to assign app owner role to ADLS.  My app has contributor role and owner role is not allowed as the enterprise owns it (ADLS) and will not provide me such access.  Is there any way for my app with contributor role to use ADLS?  Here is the error I get:  [cloudbreak@sparky-m1 bin]$ hadoop fs -ls adl://xxxxx.azuredatalakestore.net
ls: GETFILESTATUS failed with error 0x83090aa2 (Forbidden. ACL verification failed. Either the resource does not exist or the user is not authorized to perform the requested operation.). [e300ca0f-5b03-48d8-a63a-e66175efe18a][2018-08-16T14:23:24.5402535-07:00] [ServerRequestId:e300ca0f-5b03-48d8-a63a-e66175efe18a]
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Hortonworks Cloudbreak
			
    
	
		
		
		08-15-2018
	
		
		02:31 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							  @Matt Clarke That is good to know.  I have CA signed certs and the NiFi CA service is enabled on my cluster. I don't see way to remove NiFi CA service but do see option to "invalidate CA Server".  Should I take that approach?     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-15-2018
	
		
		01:59 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 This can be performed on the ambari host page, add NiFi CA service 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-15-2018
	
		
		01:49 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							  @pdarvasi is correct.  Just to close the loop, here is what I did on azure.  I assume same would work for aws/gcp/openstack/etc  Update the following file:  /var/lib/cloudbreak-deployment/Profile  Edit the following lines  export UAA_DEFAULT_USER_EMAIL=NewAdmin@HeyNow.com
export UAA_DEFAULT_USER_PW='HeyNow'
  and from the cloudbreak-deployement directory I ran  CBD_DEFAULT_PROFILE=tmp cbd util add-default-user  and new admin user was created.  Simple. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
         
					
				













