Member since 
    
	
		
		
		09-29-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                142
            
            
                Posts
            
        
                45
            
            
                Kudos Received
            
        
                15
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2163 | 06-08-2017 05:28 PM | |
| 7232 | 05-30-2017 02:07 PM | |
| 2181 | 05-26-2017 07:48 PM | |
| 4904 | 04-28-2017 02:48 PM | |
| 3204 | 04-28-2017 02:41 PM | 
			
    
	
		
		
		06-15-2016
	
		
		01:22 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
	Atlas Quickstart creates a number of Tags. You may also have created some tags with the REST API. You may want to list the definition of a single Tag or Trait, or you may want a list of all Tags/Traits in Atlas.  
	The following command will list all TRAITS or Tags  curl -iv -d -H "Content-Type: application/json" -X GET http://sandbox.hortonworks.com:21000/api/atlas/types?type=TRAIT  The following response shows that I have seven Traits/Tags defined:  {"results":["Dimension","ETL","Fact","JdbcAccess","Metric","PII","EXPIRES_ON"],"count":7,"requestId":"qtp1770708318-84 - 6efad306-cb19-4d12-8fd4-31f664e771eb"}  The following command returns the definition of a Tag/Trait named, EXPIRES_ON:  curl -iv -d -H "Content-Type: application/json" -X GET http://sandbox.hortonworks.com:21000/api/atlas/types/EXPIRES_ON  Following is the response:  {"typeName":"EXPIRES_ON","definition":"{\n  \"enumTypes\":[\n    \n  ],\n  \"structTypes\":[\n    \n  ],\n  \"traitTypes\":[\n    {\n      \"superTypes\":[\n        \n      ],\n      \"hierarchicalMetaTypeName\":\"org.apache.atlas.typesystem.types.TraitType\",\n      \"typeName\":\"EXPIRES_ON\",\n      \"attributeDefinitions\":[\n        {\n          \"name\":\"expiry_date\",\n          \"dataTypeName\":\"string\",\n          \"multiplicity\":\"required\",\n          \"isComposite\":false,\n          \"isUnique\":false,\n          \"isIndexable\":true,\n          \"reverseAttributeName\":null\n        }\n      ]\n    }\n  ],\n  \"classTypes\":[\n    \n  ]\n}","requestId":"qtp1770708318-97 - cffcd8b0-5ebe-4673-87b2-79fac9583557"}  Notice all of the new lines (\n) that are part of the response. This is a known issue, and you can follow the progress in this JIRA:  https://issues.apache.org/jira/browse/ATLAS-208 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		06-14-2016
	
		
		07:09 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I figured this out. I had left out dataTypeName as part of the attributeDefinitions. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-14-2016
	
		
		05:31 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello,  Can you tell me which version of HDP and Atlas that you tested this with? I tried today with HDP 2.4, which comes with Atlas 0.5.0.2.4, and I'm getting an error regarding "Unable to deserialize json"  I'm using the following curl command to test:  curl -iv -d @./atlas_payload.json -H "Content-Type: application/json" -X POST http://sandbox.hortonworks.com:21000/api/atlas/types  Thanks! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-02-2016
	
		
		04:26 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Your AD / LDAP server will have a limit set somewhere and when you're using the ldapsearch command, you can add a limit also. Curious to know the size of your result set using the given search base. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-17-2016
	
		
		10:59 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 How to Index PDF File with Flume and MorphlineSolrSink   The flow
is as follows:   Spooling
Directory Source > File Channel > MorphlineSolrSink  The reason I
wanted to complete this exercise was to provide a less complex solution; that
is, fewer moving parts, less configuration, and no coding compared to kafka /
storm or spark. Also, the example is easy to setup and demonstrate quickly.  Flume
compared to Kafka/Storm is limited by its declarative nature, but that is what
makes it easy to use. However, the morphline does even provide a java command
(with some potential performance side effects), so you can get pretty explicit.  I’ve read that
Flume can handle at 50,000 events per second on a single server, so while the
pipe may not be as fat as a Kafka/Storm pipe, it may be well suited for many
use cases.   Step-by-step guide  1. Take care of
     dependencies. I am running HDP 2.2.4 Sandbox and the Solr that came with
     it. To get started, you will need to add a lot of dependencies to
     your /usr/hdp/current/flume-server/lib/. You can get all of the
     dependencies from /opt/solr/solr/contrib/
     and /opt/solr/solr/dist directory structure.  commons-fileupload-1.2.1.jar  config-1.0.2.jar  fontbox-1.8.4.jar  httpmime-4.3.1.jar  kite-morphlines-avro-0.12.1.jar  kite-morphlines-core-0.12.1.jar  kite-morphlines-json-0.12.1.jar  kite-morphlines-tika-core-0.12.1.jar  kite-morphlines-tika-decompress-0.12.1.jar  kite-morphlines-twitter-0.12.1.jar  lucene-analyzers-common-4.10.4.jar  lucene-analyzers-kuromoji-4.10.4.jar  lucene-analyzers-phonetic-4.10.4.jar  lucene-core-4.10.4.jar  lucene-queries-4.10.4.jar  lucene-spatial-4.10.4.jar  metrics-core-3.0.1.jar  metrics-healthchecks-3.0.1.jar  noggit-0.5.jar  org.restlet-2.1.1.jar  org.restlet.ext.servlet-2.1.1.jar  pdfbox-1.8.4.jar  solr-analysis-extras-4.10.4.jar  solr-cell-4.10.4.jar  solr-clustering-4.10.4.jar  solr-core-4.10.4.jar  solr-dataimporthandler-4.10.4.jar  solr-dataimporthandler-extras-4.10.4.jar  solr-langid-4.10.4.jar  solr-map-reduce-4.10.4.jar  solr-morphlines-cell-4.10.4.jar  solr-morphlines-core-4.10.4.jar  solr-solrj-4.10.4.jar  solr-test-framework-4.10.4.jar  solr-uima-4.10.4.jar  solr-velocity-4.10.4.jar  spatial4j-0.4.1.jar  tika-core-1.5.jar  tika-parsers-1.5.jar  tika-xmp-1.5.jar2.   2. Configure
     SOLR. Next there are some important SOLR configurations:   
  
  
 solr.xml – The solr.xml included with collection1 was
      unmodified   
 schema.xml – The schema.xml that is included with
      collection1 is all you need. It includes the fields that SolrCell will
      return when processing the PDF file. You need to make sure that you
      capture the fields you want with the SolrCell command in the
      morphline.conf file.   
 solorconfig.xml – The solorconfig.xml that is included
      with collection1 is all you need. It includes the
      ExtractingRequestHandler that you need to process the PDF file.   3. Flume Configuration  #agent config  agent1.sources = spooling_dir_src  agent1.sinks = solr_sink  agent1.channels = fileChannel  # Use a file channel  agent1.channels.fileChannel.type = file  #agent1.channels.fileChannel.capacity = 10000  #agent1.channels.fileChannel.transactionCapacity = 10000  # Configure source  agent1.sources.spooling_dir_src.channels = fileChannel  agent1.sources.spooling_dir_src.type = spooldir  agent1.sources.spooling_dir_src.spoolDir = /home/flume/dropzone  agent1.sources.spooling_dir_src.deserializer =
org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder  #Configure Solr Sink  agent1.sinks.solr_sink.type =
org.apache.flume.sink.solr.morphline.MorphlineSolrSink  agent1.sinks.solr_sink.morphlineFile = /home/flume/morphline.conf  agent1.sinks.solr_sink.batchsize = 1000  agent1.sinks.solr_sink.batchDurationMillis = 2500  agent1.sinks.solr_sink.channel = fileChannel  4. Morphline Configuration File  solrLocator: {  collection : collection1  #zkHost : "127.0.0.1:9983"  zkHost : "127.0.0.1:2181"  }  morphlines : [  {  id : morphline1  importCommands : ["org.kitesdk.**", "org.apache.solr.**"]  commands : [  { detectMimeType { includeDefaultMimeTypes : true }
}  {  solrCell {  solrLocator : ${solrLocator}  captureAttr : true  lowernames : true  capture : [title, author, content, content_type]  parsers : [ { parser : org.apache.tika.parser.pdf.PDFParser } ]  }  }  { generateUUID { field : id } }  { sanitizeUnknownSolrFields { solrLocator :
${solrLocator} } }  { loadSolr: { solrLocator : ${solrLocator} } }  ]  }  ]  5. Start SOLR. I used the following command so I
could watch the logging. Note I am using the embedded Zookeeper that starts
with this command:   ./solr start –f  6. Start Flume. I
     used the following command:  /usr/hdp/current/flume-server/bin/flume-ng agent --name
agent1 --conf /etc/flume/conf/agent1 --conf-file /home/flume/flumeSolrSink.conf
 -Dflume.root.logger=DEBUG,console  7. Drop a PDF file
     into /home/flume/dropzone. If you're watching the log, you'll see
     when the process is completed.  8. In SOLR Admin,
     queries to run:   
  
  
 text:* (or any text in the file)   
 title:* (or the title)   
 content_type:* (or pdf)   
 author:* (or the author)   
 use the content
      field for highlighting, not for searching  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		04-18-2016
	
		
		07:07 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		4 Kudos
		
	
				
		
	
		
					
							 Ravi, you can use Sqoop to import tables and store them directly as ORC. They key option is --hcatalog-storage-stanza.  Check out the documentation in Sqoop  http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_importing_data_into_hive  And review 22.3 Automatic Table Creation  Example:  $ sqoop import --connect jdbc:mysql://localhost/employees --username hive --password hive --table departments --hcatalog-database default --hcatalog-table my_table_orc --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile" 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-28-2016
	
		
		07:43 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 I see you are logged in as root. If you run the ls -la command in your home directory, you should see at least a .bash_profile. You can add the exports in that file. Or you can create a .profile in your home directory. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-28-2016
	
		
		06:07 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Can you verify if you have added the hive-site.xml to HDFS and included a reference to that file in your workflow? I don't see it referenced in the the sqoop action. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-16-2015
	
		
		06:44 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 A little late to this thread, but late last month, I used the binary build that Kylin says should work with the version of HBASE in HDP 2.3 (apache-kylin-1.2-HBase1.1-incubating-SNAPSHOT). It installed ok, but when creating a cube, I got the following error:  Error: java.lang.NullPointerException at org.apache.kylin.job.hadoop.cube.FactDistinctColumnsMapper.setup(FactDistinctColumnsMapper.java:73)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at   org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-13-2015
	
		
		08:00 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Deepash, can you provide us with a link to the open issue? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		- « Previous
 - Next »