Member since
09-29-2015
142
Posts
45
Kudos Received
15
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1729 | 06-08-2017 05:28 PM | |
6264 | 05-30-2017 02:07 PM | |
1595 | 05-26-2017 07:48 PM | |
3921 | 04-28-2017 02:48 PM | |
2412 | 04-28-2017 02:41 PM |
10-12-2016
11:45 AM
Glad that you tried resetting the property to check the actual issue. I think the actual issue was, the way you tried to access the hive entities from UI. You could have tried DSL search like below.
... View more
02-01-2017
03:25 AM
@boyer if that answers your question, please accept it as best.
... View more
10-11-2016
02:06 PM
@Vasilis Vagias When i login to ambari as holger_gov, i have acces to hive data through the hive view. I also have Ranger tag sync service and HBase region server running. Still it does not work...
... View more
09-15-2016
05:43 PM
Currently Spark does not support the deployment to YARN from a SparkContext. Use spark-submit instead. For unit testing it is recommended to use [local] runner. The problem is that you can not set the Hadoop conf from outside the SparkContext, it is received from *-site.xml config under HADOOP_HOME during the spark-submit. So you can not point to your remote cluster in Eclipse unless you setup the correct *-site.conf on your laptop and use spark-submit. SparkSubmit is available as a Java class, but I doubt that
you will achieve what your are looking for with it. But you would be able to
launch a spark job from Eclipse to a remote cluster, if this is sufficient for you. Have a look at the Oozie Spark launcher as an example. SparkContext is dramatically changing in Spark 2 in favor I think of SparkClient to support multiple SparkContexts. I am not sure what the situation is with that.
... View more
07-29-2016
05:21 PM
1 Kudo
I was reviewing some posts related to Pig, and found the following question interesting: https://community.hortonworks.com/questions/47720/apache-pig-guarantee-that-all-the-value-in-a-colum.html#answer-47767 I wanted to share an alternative solution using Pentaho Data
Integration (PDI), an open source ETL tool, that provides visual mapreduce
capabilities. PDI is YARN ready, so when you configure PDI to use your HDP
cluster (or sandbox) and run the attached job, it will run as a YARN
application. The following image is your Mapper. Above, you see the main transformation. It
reads input, which you configure in the Pentaho MapReduce Job (seen below). The
transformation follows a pattern, which is to immediately split the delimited
file into individual fields. Next, I use a Java Expression to determine if a
field is numeric. If not, the we set the value of the field as the String,
null. Next, to prepare for MapReduce output, we concatenate the
fields back together as a single value and pass the key / value to the
MapReduce Output. Once you have the main MapReduce
transformation created, you wrap that into a PDI MapReduce Job. If you're
familiar with MapReduce, you will recognize the configuration options below,
which you would set in your code. Next, configure your Mapper. The Job Succeeds! And the file is in HDFS.
... View more
Labels:
05-18-2017
10:47 AM
Hi I am using microleaves and i want to set hortonworks behing proxies how it works? Please give me colplete information with steps.
... View more
06-15-2016
01:22 AM
Atlas Quickstart creates a number of Tags. You may also have created some tags with the REST API. You may want to list the definition of a single Tag or Trait, or you may want a list of all Tags/Traits in Atlas.
The following command will list all TRAITS or Tags curl -iv -d -H "Content-Type: application/json" -X GET http://sandbox.hortonworks.com:21000/api/atlas/types?type=TRAIT The following response shows that I have seven Traits/Tags defined: {"results":["Dimension","ETL","Fact","JdbcAccess","Metric","PII","EXPIRES_ON"],"count":7,"requestId":"qtp1770708318-84 - 6efad306-cb19-4d12-8fd4-31f664e771eb"} The following command returns the definition of a Tag/Trait named, EXPIRES_ON: curl -iv -d -H "Content-Type: application/json" -X GET http://sandbox.hortonworks.com:21000/api/atlas/types/EXPIRES_ON Following is the response: {"typeName":"EXPIRES_ON","definition":"{\n \"enumTypes\":[\n \n ],\n \"structTypes\":[\n \n ],\n \"traitTypes\":[\n {\n \"superTypes\":[\n \n ],\n \"hierarchicalMetaTypeName\":\"org.apache.atlas.typesystem.types.TraitType\",\n \"typeName\":\"EXPIRES_ON\",\n \"attributeDefinitions\":[\n {\n \"name\":\"expiry_date\",\n \"dataTypeName\":\"string\",\n \"multiplicity\":\"required\",\n \"isComposite\":false,\n \"isUnique\":false,\n \"isIndexable\":true,\n \"reverseAttributeName\":null\n }\n ]\n }\n ],\n \"classTypes\":[\n \n ]\n}","requestId":"qtp1770708318-97 - cffcd8b0-5ebe-4673-87b2-79fac9583557"} Notice all of the new lines (\n) that are part of the response. This is a known issue, and you can follow the progress in this JIRA: https://issues.apache.org/jira/browse/ATLAS-208
... View more
Labels:
01-27-2017
04:19 PM
Could you please provide examples according to : < 2. You can use ambari blueprint to start and stop the services.> Thanks
... View more
05-17-2016
10:59 PM
How to Index PDF File with Flume and MorphlineSolrSink The flow
is as follows: Spooling
Directory Source > File Channel > MorphlineSolrSink The reason I
wanted to complete this exercise was to provide a less complex solution; that
is, fewer moving parts, less configuration, and no coding compared to kafka /
storm or spark. Also, the example is easy to setup and demonstrate quickly. Flume
compared to Kafka/Storm is limited by its declarative nature, but that is what
makes it easy to use. However, the morphline does even provide a java command
(with some potential performance side effects), so you can get pretty explicit. I’ve read that
Flume can handle at 50,000 events per second on a single server, so while the
pipe may not be as fat as a Kafka/Storm pipe, it may be well suited for many
use cases. Step-by-step guide 1. Take care of
dependencies. I am running HDP 2.2.4 Sandbox and the Solr that came with
it. To get started, you will need to add a lot of dependencies to
your /usr/hdp/current/flume-server/lib/. You can get all of the
dependencies from /opt/solr/solr/contrib/
and /opt/solr/solr/dist directory structure. commons-fileupload-1.2.1.jar config-1.0.2.jar fontbox-1.8.4.jar httpmime-4.3.1.jar kite-morphlines-avro-0.12.1.jar kite-morphlines-core-0.12.1.jar kite-morphlines-json-0.12.1.jar kite-morphlines-tika-core-0.12.1.jar kite-morphlines-tika-decompress-0.12.1.jar kite-morphlines-twitter-0.12.1.jar lucene-analyzers-common-4.10.4.jar lucene-analyzers-kuromoji-4.10.4.jar lucene-analyzers-phonetic-4.10.4.jar lucene-core-4.10.4.jar lucene-queries-4.10.4.jar lucene-spatial-4.10.4.jar metrics-core-3.0.1.jar metrics-healthchecks-3.0.1.jar noggit-0.5.jar org.restlet-2.1.1.jar org.restlet.ext.servlet-2.1.1.jar pdfbox-1.8.4.jar solr-analysis-extras-4.10.4.jar solr-cell-4.10.4.jar solr-clustering-4.10.4.jar solr-core-4.10.4.jar solr-dataimporthandler-4.10.4.jar solr-dataimporthandler-extras-4.10.4.jar solr-langid-4.10.4.jar solr-map-reduce-4.10.4.jar solr-morphlines-cell-4.10.4.jar solr-morphlines-core-4.10.4.jar solr-solrj-4.10.4.jar solr-test-framework-4.10.4.jar solr-uima-4.10.4.jar solr-velocity-4.10.4.jar spatial4j-0.4.1.jar tika-core-1.5.jar tika-parsers-1.5.jar tika-xmp-1.5.jar2. 2. Configure
SOLR. Next there are some important SOLR configurations:
solr.xml – The solr.xml included with collection1 was
unmodified
schema.xml – The schema.xml that is included with
collection1 is all you need. It includes the fields that SolrCell will
return when processing the PDF file. You need to make sure that you
capture the fields you want with the SolrCell command in the
morphline.conf file.
solorconfig.xml – The solorconfig.xml that is included
with collection1 is all you need. It includes the
ExtractingRequestHandler that you need to process the PDF file. 3. Flume Configuration #agent config agent1.sources = spooling_dir_src agent1.sinks = solr_sink agent1.channels = fileChannel # Use a file channel agent1.channels.fileChannel.type = file #agent1.channels.fileChannel.capacity = 10000 #agent1.channels.fileChannel.transactionCapacity = 10000 # Configure source agent1.sources.spooling_dir_src.channels = fileChannel agent1.sources.spooling_dir_src.type = spooldir agent1.sources.spooling_dir_src.spoolDir = /home/flume/dropzone agent1.sources.spooling_dir_src.deserializer =
org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder #Configure Solr Sink agent1.sinks.solr_sink.type =
org.apache.flume.sink.solr.morphline.MorphlineSolrSink agent1.sinks.solr_sink.morphlineFile = /home/flume/morphline.conf agent1.sinks.solr_sink.batchsize = 1000 agent1.sinks.solr_sink.batchDurationMillis = 2500 agent1.sinks.solr_sink.channel = fileChannel 4. Morphline Configuration File solrLocator: { collection : collection1 #zkHost : "127.0.0.1:9983" zkHost : "127.0.0.1:2181" } morphlines : [ { id : morphline1 importCommands : ["org.kitesdk.**", "org.apache.solr.**"] commands : [ { detectMimeType { includeDefaultMimeTypes : true }
} { solrCell { solrLocator : ${solrLocator} captureAttr : true lowernames : true capture : [title, author, content, content_type] parsers : [ { parser : org.apache.tika.parser.pdf.PDFParser } ] } } { generateUUID { field : id } } { sanitizeUnknownSolrFields { solrLocator :
${solrLocator} } } { loadSolr: { solrLocator : ${solrLocator} } } ] } ] 5. Start SOLR. I used the following command so I
could watch the logging. Note I am using the embedded Zookeeper that starts
with this command: ./solr start –f 6. Start Flume. I
used the following command: /usr/hdp/current/flume-server/bin/flume-ng agent --name
agent1 --conf /etc/flume/conf/agent1 --conf-file /home/flume/flumeSolrSink.conf
-Dflume.root.logger=DEBUG,console 7. Drop a PDF file
into /home/flume/dropzone. If you're watching the log, you'll see
when the process is completed. 8. In SOLR Admin,
queries to run:
text:* (or any text in the file)
title:* (or the title)
content_type:* (or pdf)
author:* (or the author)
use the content
field for highlighting, not for searching
... View more
Labels:
03-28-2016
07:43 PM
1 Kudo
I see you are logged in as root. If you run the ls -la command in your home directory, you should see at least a .bash_profile. You can add the exports in that file. Or you can create a .profile in your home directory.
... View more
- « Previous
-
- 1
- 2
- Next »