About bhagan

bhagan · ‎06-15-2016

Atlas Quickstart creates a number of Tags. You may also have created some tags with the REST API. You may want to list the definition of a single Tag or Trait, or you may want a list of all Tags/Traits in Atlas. The following command will list all TRAITS or Tags curl -iv -d -H "Content-Type: application/json" -X GET http://sandbox.hortonworks.com:21000/api/atlas/types?type=TRAIT The following response shows that I have seven Traits/Tags defined: {"results":["Dimension","ETL","Fact","JdbcAccess","Metric","PII","EXPIRES_ON"],"count":7,"requestId":"qtp1770708318-84 - 6efad306-cb19-4d12-8fd4-31f664e771eb"} The following command returns the definition of a Tag/Trait named, EXPIRES_ON: curl -iv -d -H "Content-Type: application/json" -X GET http://sandbox.hortonworks.com:21000/api/atlas/types/EXPIRES_ON Following is the response: {"typeName":"EXPIRES_ON","definition":"{\n \"enumTypes\":[\n \n ],\n \"structTypes\":[\n \n ],\n \"traitTypes\":[\n {\n \"superTypes\":[\n \n ],\n \"hierarchicalMetaTypeName\":\"org.apache.atlas.typesystem.types.TraitType\",\n \"typeName\":\"EXPIRES_ON\",\n \"attributeDefinitions\":[\n {\n \"name\":\"expiry_date\",\n \"dataTypeName\":\"string\",\n \"multiplicity\":\"required\",\n \"isComposite\":false,\n \"isUnique\":false,\n \"isIndexable\":true,\n \"reverseAttributeName\":null\n }\n ]\n }\n ],\n \"classTypes\":[\n \n ]\n}","requestId":"qtp1770708318-97 - cffcd8b0-5ebe-4673-87b2-79fac9583557"} Notice all of the new lines (\n) that are part of the response. This is a known issue, and you can follow the progress in this JIRA: https://issues.apache.org/jira/browse/ATLAS-208

bhagan · ‎06-14-2016

I figured this out. I had left out dataTypeName as part of the attributeDefinitions.

bhagan · ‎06-14-2016

Hello, Can you tell me which version of HDP and Atlas that you tested this with? I tried today with HDP 2.4, which comes with Atlas 0.5.0.2.4, and I'm getting an error regarding "Unable to deserialize json" I'm using the following curl command to test: curl -iv -d @./atlas_payload.json -H "Content-Type: application/json" -X POST http://sandbox.hortonworks.com:21000/api/atlas/types Thanks!

bhagan · ‎06-02-2016

Your AD / LDAP server will have a limit set somewhere and when you're using the ldapsearch command, you can add a limit also. Curious to know the size of your result set using the given search base.

bhagan · ‎05-17-2016

How to Index PDF File with Flume and MorphlineSolrSink The flow is as follows: Spooling Directory Source > File Channel > MorphlineSolrSink The reason I wanted to complete this exercise was to provide a less complex solution; that is, fewer moving parts, less configuration, and no coding compared to kafka / storm or spark. Also, the example is easy to setup and demonstrate quickly. Flume compared to Kafka/Storm is limited by its declarative nature, but that is what makes it easy to use. However, the morphline does even provide a java command (with some potential performance side effects), so you can get pretty explicit. I’ve read that Flume can handle at 50,000 events per second on a single server, so while the pipe may not be as fat as a Kafka/Storm pipe, it may be well suited for many use cases. Step-by-step guide 1. Take care of dependencies. I am running HDP 2.2.4 Sandbox and the Solr that came with it. To get started, you will need to add a lot of dependencies to your /usr/hdp/current/flume-server/lib/. You can get all of the dependencies from /opt/solr/solr/contrib/ and /opt/solr/solr/dist directory structure. commons-fileupload-1.2.1.jar config-1.0.2.jar fontbox-1.8.4.jar httpmime-4.3.1.jar kite-morphlines-avro-0.12.1.jar kite-morphlines-core-0.12.1.jar kite-morphlines-json-0.12.1.jar kite-morphlines-tika-core-0.12.1.jar kite-morphlines-tika-decompress-0.12.1.jar kite-morphlines-twitter-0.12.1.jar lucene-analyzers-common-4.10.4.jar lucene-analyzers-kuromoji-4.10.4.jar lucene-analyzers-phonetic-4.10.4.jar lucene-core-4.10.4.jar lucene-queries-4.10.4.jar lucene-spatial-4.10.4.jar metrics-core-3.0.1.jar metrics-healthchecks-3.0.1.jar noggit-0.5.jar org.restlet-2.1.1.jar org.restlet.ext.servlet-2.1.1.jar pdfbox-1.8.4.jar solr-analysis-extras-4.10.4.jar solr-cell-4.10.4.jar solr-clustering-4.10.4.jar solr-core-4.10.4.jar solr-dataimporthandler-4.10.4.jar solr-dataimporthandler-extras-4.10.4.jar solr-langid-4.10.4.jar solr-map-reduce-4.10.4.jar solr-morphlines-cell-4.10.4.jar solr-morphlines-core-4.10.4.jar solr-solrj-4.10.4.jar solr-test-framework-4.10.4.jar solr-uima-4.10.4.jar solr-velocity-4.10.4.jar spatial4j-0.4.1.jar tika-core-1.5.jar tika-parsers-1.5.jar tika-xmp-1.5.jar2. 2. Configure SOLR. Next there are some important SOLR configurations: solr.xml – The solr.xml included with collection1 was unmodified schema.xml – The schema.xml that is included with collection1 is all you need. It includes the fields that SolrCell will return when processing the PDF file. You need to make sure that you capture the fields you want with the SolrCell command in the morphline.conf file. solorconfig.xml – The solorconfig.xml that is included with collection1 is all you need. It includes the ExtractingRequestHandler that you need to process the PDF file. 3. Flume Configuration #agent config agent1.sources = spooling_dir_src agent1.sinks = solr_sink agent1.channels = fileChannel # Use a file channel agent1.channels.fileChannel.type = file #agent1.channels.fileChannel.capacity = 10000 #agent1.channels.fileChannel.transactionCapacity = 10000 # Configure source agent1.sources.spooling_dir_src.channels = fileChannel agent1.sources.spooling_dir_src.type = spooldir agent1.sources.spooling_dir_src.spoolDir = /home/flume/dropzone agent1.sources.spooling_dir_src.deserializer = org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder #Configure Solr Sink agent1.sinks.solr_sink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink agent1.sinks.solr_sink.morphlineFile = /home/flume/morphline.conf agent1.sinks.solr_sink.batchsize = 1000 agent1.sinks.solr_sink.batchDurationMillis = 2500 agent1.sinks.solr_sink.channel = fileChannel 4. Morphline Configuration File solrLocator: { collection : collection1 #zkHost : "127.0.0.1:9983" zkHost : "127.0.0.1:2181" } morphlines : [ { id : morphline1 importCommands : ["org.kitesdk.**", "org.apache.solr.**"] commands : [ { detectMimeType { includeDefaultMimeTypes : true } } { solrCell { solrLocator : ${solrLocator} captureAttr : true lowernames : true capture : [title, author, content, content_type] parsers : [ { parser : org.apache.tika.parser.pdf.PDFParser } ] } } { generateUUID { field : id } } { sanitizeUnknownSolrFields { solrLocator : ${solrLocator} } } { loadSolr: { solrLocator : ${solrLocator} } } ] } ] 5. Start SOLR. I used the following command so I could watch the logging. Note I am using the embedded Zookeeper that starts with this command: ./solr start –f 6. Start Flume. I used the following command: /usr/hdp/current/flume-server/bin/flume-ng agent --name agent1 --conf /etc/flume/conf/agent1 --conf-file /home/flume/flumeSolrSink.conf -Dflume.root.logger=DEBUG,console 7. Drop a PDF file into /home/flume/dropzone. If you're watching the log, you'll see when the process is completed. 8. In SOLR Admin, queries to run: text:* (or any text in the file) title:* (or the title) content_type:* (or pdf) author:* (or the author) use the content field for highlighting, not for searching

bhagan · ‎04-18-2016

Ravi, you can use Sqoop to import tables and store them directly as ORC. They key option is --hcatalog-storage-stanza. Check out the documentation in Sqoop http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_importing_data_into_hive And review 22.3 Automatic Table Creation Example: $ sqoop import --connect jdbc:mysql://localhost/employees --username hive --password hive --table departments --hcatalog-database default --hcatalog-table my_table_orc --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"

bhagan · ‎03-28-2016

I see you are logged in as root. If you run the ls -la command in your home directory, you should see at least a .bash_profile. You can add the exports in that file. Or you can create a .profile in your home directory.

bhagan · ‎03-28-2016

Can you verify if you have added the hive-site.xml to HDFS and included a reference to that file in your workflow? I don't see it referenced in the the sqoop action.

bhagan · ‎11-16-2015

A little late to this thread, but late last month, I used the binary build that Kylin says should work with the version of HBASE in HDP 2.3 (apache-kylin-1.2-HBase1.1-incubating-SNAPSHOT). It installed ok, but when creating a cube, I got the following error: Error: java.lang.NullPointerException at org.apache.kylin.job.hadoop.cube.FactDistinctColumnsMapper.setup(FactDistinctColumnsMapper.java:73) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

bhagan · ‎10-13-2015

Deepash, can you provide us with a link to the open issue?

Online	Offline
Last Visited	‎01-10-2022 11:19 AM

Member Since	‎09-29-2015 03:09 PM
Last Visited	‎01-10-2022 11:19 AM
Posts	142
Kudos received	45

Cloudera Community

Re: HIVE insert/update/delete

Re: updating and inserting new data to mysql using...

Re: requirement for ACLs

Re: Ambari - adding custom service

Re: nifi dataflow - get result, parse it, and save...

List Atlas Tags and Traits

Re: Create Trait Types in Atlas

Re: Create Trait Types in Atlas

Re: LDAP: error code 4 - Sizelimit Exceeded - KNOX

How to Index PDF File with Flume and MorphlineSolr...

Re: Can sqoop be used to directly import data into...

Re: unable to run the command...(cat ~/.profile) i...

Re: Oozie fails with: Oozie submit sqoop which exe...

Re: How to Make Kylin work with HDP 2.3

Re: With Hive View what causes: "H100 Unable to su...