Member since 
    
	
		
		
		10-06-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                273
            
            
                Posts
            
        
                202
            
            
                Kudos Received
            
        
                81
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 4119 | 10-11-2017 09:33 PM | |
| 3649 | 10-11-2017 07:46 PM | |
| 2615 | 08-04-2017 01:37 PM | |
| 2245 | 08-03-2017 03:36 PM | |
| 2284 | 08-03-2017 12:52 PM | 
			
    
	
		
		
		08-14-2017
	
		
		02:46 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks Matt,  Interesting approach and makes a lot of sense to do things that way.  I'll give it a try. Thanks for your help. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-09-2017
	
		
		02:04 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I have two files that get dropped into a folder.  The first is a CSV file containing the data to be processed and landed in Hive.  The second is an XML file that contains metadata about the CSV file.  The metadata file contains information such as compression to be used (Snappy, etc..), HDFs storage format (AVRO, ORC, etc...), the table the data needs to be saved to, the different columns/schema in the CSV, as well as some other information.  My question is what is the best strategy/way through Nifi to use this metadata file to process the CSV file and land the data in Hive?  I've looked at using Schema Registry, but I believe that will only cover the columns mapping portion rather than the other info such as table name, storage format and compression. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache NiFi
 
			
    
	
		
		
		08-04-2017
	
		
		01:37 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Saurabh  Currently this functionality does not exist with HDP 2.6.1.  However, we are working on building it so expect to see it in a future release.  The caveat is that some clients do not want to propagate tags automatically, so it will likely be an optionally enabled feature. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-03-2017
	
		
		03:36 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		4 Kudos
		
	
				
		
	
		
					
							 @Marc Parmentier   The date format is as follows:  {yyyy}-{mm}-{dd}T{hh}:{mm}:{ss}.{zzz}Z  {year}-{month}-{day}T{hours}:{minutes}:{seconds}.{timezone_id}Z  e.g. 2017-04-18T18:49:44.000Z 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-03-2017
	
		
		12:52 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @chris herssens  Streaming Analytics Manager is architected to be agnostic to the underlying streaming engine, and aims to support multiple streaming substrates such as Storm, Spark Streaming, Flink, etc. As part of the first release of SAM, Apache Storm is fully supported.  Support for other streaming engines, including Spark Streaming, will be added in future releases.  https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.0/bk_overview/content/ch_stream-analytics-overview.html  https://www.slideshare.net/harshach/streaming-analytics-manager 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-02-2017
	
		
		07:28 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 No, Atlas only works with Titan, not Janusgraph.  So you cannot use DynamoDB as a datastore for Atlas. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-02-2017
	
		
		12:49 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 When you follow the links to the Github repo you'll see that AWS has built custom adapters to allow integration with JanusGraph not Titan.  JanusGraph is a fork of the Titan project and has some differences.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-01-2017
	
		
		05:05 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 How are you trying to use the metadata?    In most of our implementations we use the Atlas REST API (http://atlas.apache.org/api/v2/index.html ) for metadata/lineage import and export.  Have you considered using that?  Please note that I have linked above to the new API, the legacy API ( http://atlas.apache.org/api/rest.html ) has been deprecated with HDP 2.6 (Atlas 0.8) and will be removed in a future version. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-01-2017
	
		
		05:00 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Please share a link to the source.  There might be some confusion. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-31-2017
	
		
		06:55 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Al John Mangahas   Distcp spins off MapReduce jobs on the cluster it is running on/from.  You can use the Yarn UI on that cluster to monitor the job progress and utilization.    Having said that, if you are copying from a Prod cluster to a DR cluster, and are worried about resource usage, then you can actually run the Distcp job on the DR cluster and have it "pull" the data from Prod.  That way, the impact in terms of resources on Prod is minimal.  
						
					
					... View more