Member since
09-10-2015
32
Posts
29
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1592 | 10-04-2015 10:36 PM | |
481 | 09-30-2015 04:59 PM | |
4447 | 09-26-2015 05:24 PM |
03-29-2017
09:10 PM
https://hortonworks.com/hadoop-tutorial/manage-security-policy-hive-hbase-knox-ranger/
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
01-23-2017
10:00 PM
I did, but it looks like the jar is not being automatically copeied on restart of Hive to other nodes
... View more
01-23-2017
09:09 PM
I am setting the Hive metastore db driver using: ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/hdp/current/hive-server2/lib/mysql-connector-java-5.1.38.jar
Using python /usr/bin/python
Setup ambari-server
.Copying /usr/hdp/current/hive-server2/lib/mysql-connector-java-5.1.38.jar to /var/lib/ambari-server/resources
If you are updating existing jdbc driver jar for mysql with mysql-connector-java-5.1.38.jar. Please remove the old driver jar, from all hosts. Restarting services that need the driver, will automatically copy the new jar to the hosts.
JDBC driver was successfully initialized.
Ambari Server 'setup' completed successfully. and I have checked that the jar is in that path. But, when I try to restart Hiveserver2, I get the following error message: 2017-01-23 20:59:37,414 - Error! Sorry, but we can't find jdbc driver with default name mysql-connector-java.jar in hive lib dir. So, db connection check can fail. Please run 'ambari-server setup --jdbc-db={db_name} --jdbc-driver={path_to_jdbc} on server host.'
... View more
- Tags:
- Ambari
- Data Processing
- hive-jdbc
- hiveserver2
- Upgrade to HDP 2.5.3 : ConcurrentModificationException When Executing Insert Overwrite : Hive
Labels:
- Labels:
-
Apache Ambari
-
Apache Hive
10-08-2016
04:17 PM
1 Kudo
Great step-by-step instruction. You can skip Step 2 and replace step 3 with
docker load < HDP_2.5_docker.tar.gz
... View more
08-19-2016
04:17 PM
I was facing a fatal issue while using this with Sandbox with HDP 2.4, so I fixed it in this repo https://github.com/saptak/ambari-vnc-service and sent a pull request to hortonworks-gallery.
... View more
12-28-2015
02:16 AM
I am trying to upgrade HDP to 2.3.4 from 2.3.2 from Ambari 2.2. I am facing an issue on the Amabri UI where, clicking on Install Packages does nothing as illustrated in the video below: https://www.dropbox.com/s/f9vbhp1vmy7gmg4/hdp-upgrade.mp4?dl=0
... View more
10-15-2015
02:49 PM
1 Kudo
I tried installing the ODBC driver on Mac OS X El Capitan, I got "Installation failed" https://www.dropbox.com/s/9hpim6rjl5qr21m/Screenshot%202015-10-15%2007.47.06.png?dl=0 Any idea what I am doing wrong?
... View more
10-14-2015
02:47 PM
2 Kudos
Has anyone tried manually upgrading to Spark 1.5.1 on Hortonworks Sandbox and faced any issues?
... View more
Labels:
- Labels:
-
Apache Spark
10-05-2015
03:04 PM
Hi, I am new to HDP and hadoop.I managed to install HDP 2.3 sandbox on Virtual box. I tried a few sample programs and they are working fine from the sandbox. I have installed Eclipse with Scala in my Windows machine. At present ,I use SBT and package my application and deploy the jar in the HDP Sandbox for execution. I would like to execute programs from my Eclipse against the HDP sandbox directly instead of packaging it each and every time. A sample code which I am trying to modify val conf = new SparkConf().setAppName(“Simple Application”).setMaster(“local[2]”).set(“spark.executor.memory”,”1g”) I guess , I have to change the local[2] to the master node / yarn cluster url. How do I get the url from the sandbox ? Any other configurations which has to be done on the Virtual box or on my code ?
... View more
10-04-2015
11:04 PM
While upgrading HDP 2.0 to HDP 2.1 and Metastore Schema from 0.12 to 0.13 I got the Error: Duplicate column name ‘OWNER_NAME’ (state=42S21,code=1060). The Metastore version is 0.12 in the VERSION Table however the ‘OWNER_NAME’ column in the ‘DBS’ table already exists. Here is the detailed error: +———————————————+
| < HIVE-6386: Add owner filed to database > |
+———————————————+
1 row selected (0.001 seconds)
0: jdbc:mysql://hadoop.domain> ALTER TABLE <code>DBS
ADD OWNER_NAME varchar(128)
Error: Duplicate column name ‘OWNER_NAME’ (state=42S21,code=1060)
Closing: 0: jdbc:mysql://hadoop.domain/hive?createDatabaseIfNotExist=true
org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore state would be inconsistent !!
org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore state would be inconsistent !!
at org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:242)
at org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:211)
at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:489)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.io.IOException: Schema script failed, errorcode 2
at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:377)
at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:350)
at org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:237)
… 7 more
*** schemaTool failed *** Has anyone run into the same issue? Any idea what is the source of this problem?
... View more
Labels:
- Labels:
-
Apache Hive
10-04-2015
10:57 PM
I’m not a Kerberos wizard, so I’m on a bit of a learning curve. I’ve followed all of the Kerberos instructions in the HDP 2.1 documentation and run into an issue where my datanodes won’t start (3 node cluster). If I roll back all of the xml files to non-kerberos versions, I can start everything from the command line. When I shut down the cluster and roll in the kerberos versions of the xml files, I’m able to start the namenode, but all of the datanodes refuse to start and the only clue I have is as follows; ************************************************************/2014-07-24 11:04:22,181 INFO datanode.DataNode (SignalLogger.java:register(91)) - registered UNIX signal handlers for [TERM, HUP, INT]2014-07-24 11:04:22,399 WARN common.Util (Util.java:stringAsURI(56)) - Path /opt/hadoop/hdfs/dn should be specified as a URI in configuration files. Please update hdfs configuration.2014-07-24 11:04:23,055 INFO security.UserGroupInformation (UserGroupInformation.java:loginUserFromKeytab(894)) - Login successful for user dn/abc0123.xy.local@XYZ.COM using keytab file /etc/security/keytabs/dn.service.keytab2014-07-24 11:04:23,210 INFO impl.MetricsConfig (MetricsConfig.java:loadFirst(111)) - loaded properties from hadoop-metrics2.properties2014-07-24 11:04:23,274 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:startTimer(344)) - Scheduled snapshot period at 60 second(s).2014-07-24 11:04:23,274 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:start(183)) - DataNode metrics system started2014-07-24 11:04:23,279 INFO datanode.DataNode (DataNode.java:<init>(269)) - File descriptor passing is enabled.2014-07-24 11:04:23,283 INFO datanode.DataNode (DataNode.java:<init>(280)) - Configured hostname is cvm0932.dg.local2014-07-24 11:04:23,284 FATAL datanode.DataNode (DataNode.java:secureMain(2002)) - Exception in secureMainjava.lang.RuntimeException: Cannot start secure cluster without privileged resources.at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:700)at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:281)at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1885)at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1772)at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1819)at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1995)at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2019)2014-07-24 11:04:23,287 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 12014-07-24 11:04:23,289 INFO datanode.DataNode (StringUtils.java:run(640)) - SHUTDOWN_MSG:/**********************************
... View more
10-04-2015
10:43 PM
When will HDP 2.x will start giving support on CentOS 7 . I don’t understand the python26 dependency fully but I presume that is a major roadblock in moving to the new OS ?
... View more
10-04-2015
10:41 PM
Is there a source which makes rpm from “HDP tar.gz” like “Bigtop”? I am using ambari and HDP.
I want to change some part of HDP and make rpm.
Do I have to customize Bigtop or rpmbuild to make rpm?
... View more
10-04-2015
10:36 PM
Before Ranger was integrated with Sandbox, the dfs.perm in Sandbox was set to false. The reason was to allow Hue and some other use cases to create databases and tables.
After Ranger was integrated, we emulated the same behavior by creating a global policy to allow everyone. If they go through the Sandbox Security tutorials, the first step is to disable the global policy (for each component). If you disable the global HDFS policy in Ranger which allows everyone, then you should see what you expect from HDFS security permissions.
... View more
10-04-2015
10:35 PM
Hello,
I want to test the file permissions of HDFS. By these Tests I get a strange behavior of Hadoop.
I created a new directory with the user “root”. The used command was “hadoop fs -mkdir /user/test”.
After this I changed the permissions of this directory to r, w, x only for the owner (“hadoop fs -chmod 700 /user/test”).
And I copied a new file into this directory (“hadoop fs -put test.txt /user/test”) and I changed the permissions of this file (“hadoop fs -chmod 600 /user/test/test.txt”), too. I created an new user and a new usergroup and added the new user to this group.
With this new user is accessed the folder (“hadoop fs -ls /user/test”) and deleted the file (“hadoop fs -rm ./user/test/test.txt”).
With the right permissions i havn’t do this. I do this Test with the same file in the UNIX-Filesystem and there the Deletion failed. This is the right behavior I expected in HDFS. I used the HDP 2.3 Sandbox with default configuration. Had anyone the same behavior or did I a mistake?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Ranger
10-04-2015
07:00 PM
I am trying to install HUE 3.8.1 on HDP 2.3 cluster installed over SLES 11 by following this blog post “http://gethue.com/hadoop-hue-3-on-hdp-installation-tutorial/” But struggling to get all the prerequisite packages, below are packages which i’m not able to install with zypper(yast): krb5-devel , mysql-devel, openssl-devel , cyrus-sasl-devel , cyrus-sasl-gssapi, sqlite-devel, libtidy , libxml2-devel , libxslt-devel, openldap-devel, python-devel , python-setuptools are there any online repos available from where i can get these. I was successfully in installing HUE on CentOS HDP cluster using same steps.
Realy appreciate Any help and pointers on this.
... View more
Labels:
- Labels:
-
Cloudera Hue
10-04-2015
06:57 PM
Hi,
I took the HDPCD practice test on AWS but I am facing few problems with sqoop. For the Task 10, i used “sqoop export –connect jdbc:mysql://namenode:3306/flightinfo –table weather –export-dir /user/horton/weather –input-fields-terminated-by ‘,’ –username root –password hadoop” and got the following: Warning: /usr/hdp/2.2.0.0-2041/sqoop/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
15/10/03 18:46:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5.2.2.0.0-2041
15/10/03 18:46:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/10/03 18:46:56 INFO manager.SqlManager: Using default fetchSize of 1000
15/10/03 18:46:56 INFO tool.CodeGenTool: Beginning code generation
15/10/03 18:46:57 ERROR manager.SqlManager: Error executing statement: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Access denied for user ‘root’@’%’ to database ‘flightinfo’
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Access denied for user ‘root’@’%’ to database ‘flightinfo’ And also, I am not able to copy the pig scripts into the solutions folder as mentioned. cp -f flightdelays_clean.pig /home/horton/solutions/
cp: cannot create regular file ‘/home/horton/solutions/flightdelays_clean.pig’: Permission denied Am I missing something? Please help.
... View more
Labels:
- Labels:
-
Apache Pig
-
Apache Sqoop
09-30-2015
04:59 PM
1 Kudo
Check out this tutorial which walks you through ORC support in Spark: http://hortonworks.com/hadoop-tutorial/using-hive-with-orc-from-apache-spark/
... View more
09-28-2015
02:06 PM
I have a main thread that opens a JDBC connection to HiveServer2, this connection object is shared by multiple threads. The thread has a prepared statement and executes a select query(not CRUD) and does some processing with the resultset. I am trying this with Hive as I have some legacy code from the product I work on which I dont want to change and I know that this works with Oracle. Below is the stack trace of the exception. org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376)
at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453)
at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435)
at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_FetchResults(TCLIService.java:501)
at org.apache.hive.service.cli.thrift.TCLIService$Client.FetchResults(TCLIService.java:488)
at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:360)
at hivetrial.RSIterator2.run(ConcurrentRSIteration2.java:60)
at java.lang.Thread.run(Unknown Source)
Trying to understand if this is a limitation.
... View more
Labels:
- Labels:
-
Apache Hive
09-26-2015
05:24 PM
3 Kudos
After a bit of research, I found the hadoopsdk on codeplex is a good place to start. As far as very basic connection examples go, try this blog for an example, but note that the connection for HDInsight is slightly different now it's all using the templeton interface, so this will get you going: var db =newHiveConnection(
webHCatUri:newUri("http://localhost:50111"),
userName:(string)"hadoop", password:(string)null);var result = db.ExecuteHiveQuery("select * from w3c");
If you are looking to do full on MapReduce on HDInsight, then you probably want to take a look at the C# MapReduce examples with the sdk on codeplex. Note that the default HDInsight install also comes with some good samples, which include a bit of data to play with and some powershell scripts and .NET code to get you started. If there are other recommendations I am all ears.
... View more
09-26-2015
05:18 PM
1 Kudo
I am working on a solution where I will have a Hadoop cluster with Hive running and I want to send jobs and hive queries from a .NET application to be processed and get notified when they are done. What is the recommended API, library here?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
09-25-2015
08:10 PM
I have HDP-2.2 cluster with FreeIPA configured.But when we are trying to access hive jdbc via knox. Following is the JDBC uri that we are using:
jdbc:hive2://xxxxxxxxxxx:8443/;ssl=true;sslTrustStore=/var/lib/knox/data/security/keystores/gateway.jks;trustStorePassword=xxxxxxxxxxxx?hive.server2.transport.mode=http;hive.server2.thrift.http.path=gateway/default/hive Below is the error I am getting: _
Keystore was tampered with, or password was incorrect (state=08S01,code=0) It seems that password of trustStore does not match as that of mentioned in JDBC URI.I tried changing the Knox Master password but ambari does not allow to change the it.Is their any way wherein I can change the trustStore password and create new knox master? Will it affect the other services if the master secret password is changed? In addition to that if I use the same uri for creating hive repository in Ranger we get _"Connection failed" _error.Is the same JDBC uri to be used in ranger to create repository for hive? Note: If I set hive transport mode to "binary" instead of "http" then we are able to create repository in ranger but in that case hive over knox will not work as it requires "http" mode
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Knox
-
Apache Ranger
09-25-2015
06:49 PM
1 Kudo
I am using hortonwork Sandbox for kafka server trying to connect kafka from eclipse with java code . Use this configuration to connect to producer to send the message metadata.broker.list=sandbox.hortonworks.com:45000
serializer.class=kafka.serializer.DefaultEncoder
zk.connect=sandbox.hortonworks.com:2181
request.required.acks=0
producer.type=sync
where sandbox.hortonworks.com is sandboxname to whom i connect in kafka server.properties I changed this configuration host.name=sandbox.hortonworks.com
advertised.host.name=System IP(on which my eclipse is running)
advertised.port=45000
did the port forwarding also , I am able to connect to kafka server from eclipse but while sending the message get the exception Exception"Failed to send messages after 3 tries."
... View more
Labels:
- Labels:
-
Apache Kafka
09-25-2015
02:54 PM
3 Kudos
Apache Spark 1.3.1 is part and parcel of the HDP 2.3 release. The easiest way to get started with Spark on HDP is to download the latest Hortonworks Sandbox and then peruse the variety of tutorials on Spark we published here.
... View more
09-25-2015
02:40 PM
4 Kudos
Raspberry Pi is a very inexpensive system often hooked up to variety of sensors. It will be useful to use Raspberry Pi as the host for Apache NiFi to ingest and coordinate the data from sensors before transporting the stream to alerting system or persistent storage
... View more
Labels:
- Labels:
-
Apache NiFi
09-25-2015
02:26 PM
That's a great idea. Will create a new tutorial with the Ambari Capacity Scheduler View.
... View more
09-24-2015
07:47 PM
5 Kudos
We are excited to announce the general availability of Hortonworks Sandbox with HDP 2.3 on Microsoft AzureGallery. Hortonworks Sandbox is already a very popular environment for developers, data scientists and administrators to learn and experiment with the latest innovations in Hortonworks Data Platform. The hundreds of innovations span across Apache Hadoop, Kafka, Storm, Spark, Hive, Pig, YARN, Ambari, Falcon, Ranger and other components that make up HDP platform. We also provide tutorials to help you get a jumpstart on how to use HDP to implement an Open Enterprise Hadoop in your organization. Every component is updated, including some of the key technologies we added in HDP 2.3. This guide walks you through using the Azure Gallery to quickly deploy Hortonworks Sandbox on Microsoft Azure. Prerequisite:
A Microsoft Azure account – you can sign up for an evaluation account if you do not already have one. Guide Start by logging into the Azure Portal with your Azure account: https://portal.azure.com/ Navigate to the MarketPlace Search for Hortonworks. Click on the Hortonworks Sandbox icon. To go directly to the Hortonworks Sandbox on Azure page navigate to http://azure.microsoft.com/en-us/marketplace/partners/hortonworks/hortonworks-sandbox/ This will launch the wizard to configure Hortonworks Sandbox for deployment. You will need to note down the hostname and the username / password that you enter in the next steps to be able to access the Hortonworks Sandbox once deployed. Also ensure you select a Azure instance with size A4 or more for optimal experience.
Click Buy if you agree with everything on this page. At this point it should take you back to the Azure portal home page where you can see the deployment in progress.
You can see the progress in more details by clicking on Audit Once the deployment completes you will see this page with configuration and status of you VM. Again it is important to note down the DNS name of your VM which you will use in the next steps. If you scroll down you can see the Estimated spend and other metrics for your VM. Let’s navigate to the home page of your Sandbox by pointing your browser to the URL: http://<hostname>.cloudapp.net:8888 , where <hostname> is the hostname you entered during configuration. By navigating to port 8080 of your Hortonworks Sandbox on Azure you can access the Ambari interface for your Sandbox. If you want a full list of tutorial that you can use with your newly minted Hortonworks Sandbox on Azure go to http://hortonworks.com/tutorials. HDP 2.3 leverages the Ambari Views Framework to deliver new user views and a breakthrough user experience for both cluster operators and data explorers. Happy Hadooping with Hortonworks Sandbox!
... View more
- Find more articles tagged with:
- Sandbox & Learning
09-21-2015
05:32 PM
6 Kudos
One of the search use cases that I’ve been introduced to would require the ability to index text such as scanned text in png files. I set out to figure out how to do this with SOLR. I came across a couple pretty good blog posts, but as usual, you have to put together what you learn from multiple sources before you can get things to work correctly (or at least that’s what usually happens for me). So I thought I would put together the steps I took to get it to work. I used HDP Sandbox 2.3. Step-by-step guide Install dependencies - this will provide you support for processing pngs, jpegs, and tiffs
yum install autoconf automake libtool yum install libpng-devel yum install libjpeg-devel yum install libtiff-devel yum install zlib-devel Download Leptonica, an image processing library wget http://www.leptonica.org/source/leptonica-1.69.tar.gz Download Tesseract, an Optical Character Recognition engine
wget http://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.02.02.tar.gz Ensure proper variables and pathing are set – This is necessary so that when building leptonica, the build can find the dependencies that you installed earlier. If this pathing is not correct, you will get Unsupported image type errors when running tesseract command line client.
Also, when installing tesseract, you will place language data at TESSDATA_PREFIX dir. [root@sandbox tesseract-ocr]# cat ~/.profile export TESSDATA_PREFIX='/usr/local/share/' export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib:/usr/lib64 Build leptonica
tar xvf leptonica-1.69.tar.gz cd leptonica-1.69 ./configure make sudo make install Build Tesseract
tar xvf tesseract-ocr-3.02.02.tar.gz cd tesseract-ocr ./autogen.sh ./configure make sudo make install sudo ldconfig Download tesseract language(s) and place them in TESSDATA_PREFIX dir, defined above
wget http://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.02.eng.tar.gz tar xzf tesseract-ocr-3.02.eng.tar.gz cp tesseract-ocr/tessdata/* /usr/local/share/tessdata Test Tesseract – Use the image in this blog post. You’ll notice that this is where I started. The ‘hard’ part of this was getting the builds correct for leptonica. And the problem there was ensuring that I had the correct dependencies installed and that they were available on the path defined above. If this doesn’t work, there’s no sense moving on to SOLR.
http://blog.thedigitalgroup.com/vijaym/2015/07/17/using-solr-and-tikaocr-to-search-text-inside-an-image/ [root@sandbox tesseract-ocr]# /usr/local/bin/tesseract ~/OM_1.jpg ~/OM_out Tesseract Open Source OCR Engine v3.02.02 with Leptonica [root@sandbox tesseract-ocr]# cat ~/OM_out.txt ‘ '"I“ " "' ./ lrast. Shortly before the classes started I was visiting a. certain public school, a school set in a typically English countryside, which on the June clay of my visit was wonder- fully beauliful. The Head Master—-no less typical than his school and the country-side—pointed out the charms of both, and his pride came out in the ?nal remark which he made beforehe left me. He explained that he had a class to take in'I'heocritus. Then (with a. buoyant gesture); “ Can you , conceive anything more delightful than a class in Theocritus, on such a day and in such a place?" If you have text in your out file, then you’ve done it correctly! Start Solr Sample – This sample contains the Proper Extracting Request Handler for processing with tika
https://wiki.apache.org/solr/ExtractingRequestHandler cd /opt/lucidworks-hdpsearch/solr/bin/ ./solr -e dih Use SOLR Admin to upload the image
Go back to the blog post or to the RequestHandler page for the proper update/extract command syntax. From SOLR admin, select the tika core. Click Documents In the Request-Handler (qt) field, enter /update/extract In the Document Type drop down, select File Upload Choose the png file In the Extracting Req. Handler Params box, type the following: literal.id=d1&uprefix=attr_&fmap.content=attr_content&commit=true Understanding all the parameters is another process, but the literal.id is the unique id for the document. For more information on this command, start by reviewing https://wiki.apache.org/solr/ExtractingRequestHandler and then the SOLR documentation. Run a query
From SOLR admin, select tika core. Click Query. In the q field, type attr_content:explained Execute the query. http://sandbox.hortonworks.com:8983/solr/tika/select?q=attr_content%3Aexplained&wt=json&indent=true Try it again
Use another png or supported file type. Be sure to use the same Request Handler Params, except provide a new unique literal.id Note, that the attr_content is a dynamic field, and it cannot be highlighted. If you figure out how to add an indexed and stored field to hold the image text, let me know 🙂
... View more
- Find more articles tagged with:
- Data Processing