Member since
02-09-2015
95
Posts
8
Kudos Received
9
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5497 | 08-23-2021 04:07 PM | |
1472 | 06-30-2021 07:34 AM | |
1769 | 06-30-2021 07:26 AM | |
14138 | 05-17-2019 10:27 PM | |
3108 | 04-08-2019 01:00 PM |
04-08-2019
09:13 AM
Hi guys, i am using spark-shell ( spark-shell --master yarn --jars /usr/hdp/current/hive-warehouse-connector/hive-warehouse-connector_2.11-1.0.0.3.1.2.0-4.jar --conf spark.security.credentials.hiveserver2.enabled=false) to read hive tables using spark, i am able to execute commands like create table, create database, show tables, show databases, but i am not able to read data from tables, my code is as below import com.hortonworks.hwc.HiveWarehouseSession val hive = HiveWarehouseSession.session(spark).build() hive.createDatabase("spark_llap01",false) hive.setDatabase("spark_llap01") hive.createTable("hwx_table").column("value", "string").create() hive.executeUpdate("insert into hwx_table values('1')") hive.executeQuery("select * from hwx_table").show i get this error, whenever i try to fetch data java.lang.AbstractMethodError: Method com/hortonworks/spark/sql/hive/llap/HiveWarehouseDataSourceReader.createBatchDataReaderFactories()Ljava/util/List; is abstract i used beeline to check whether the data has been written, and i found the database exists along with the table, also when i queried the table i found the data.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
10-26-2018
06:50 AM
Hi, i have hortonworks cluster of multiple nodes, and want to migrate the data (mainly HDFS and configuration files for services like zookeeper ..etc) to Cloudera, so i can be able to manage the cluster by cloudera manager and use cloudera's products, so any suggestions for migration tools or steps? Best Regards,
... View more
Labels:
04-05-2017
10:45 AM
yes you have to
... View more
01-10-2016
05:35 AM
don't forget to reload the collection to see the effect
... View more
01-10-2016
05:34 AM
i found the solution , go to : cd /opt/cloudera/parcels/CDH/lib/solr/bin/ then type : solrctl instancedir --update collection_name new_configuration_path/
... View more
01-05-2016
05:45 AM
1 Kudo
i see no need to contact zookeeper at all , all you need is to contact solr directly , i have the below code working fine , try it import org.apache.solr.client.solrj.SolrQuery; import org.apache.solr.client.solrj.SolrServerException; import org.apache.solr.client.solrj.impl.HttpSolrServer; import org.apache.solr.client.solrj.response.QueryResponse; import org.apache.solr.common.SolrInputDocument; import java.io.IOException; import java.util.HashMap; import java.util.Map; public class update_records { /** * @param args * @throws IOException * @throws SolrServerException */ public static void main(String[] args) throws SolrServerException, IOException { // TODO Auto-generated method stub HttpSolrServer server = new HttpSolrServer("http://localhost:8983/solr/xml_unstructured"); SolrInputDocument doc = new SolrInputDocument(); Map<String, String> partialUpdate = new HashMap<String, String>(); partialUpdate.put("set", "New title"); doc = new SolrInputDocument(); doc.addField("id", "12331131"); doc.addField("link", "new link"); doc.addField("title", partialUpdate); server.add(doc); server.commit(); } }
... View more
12-31-2015
05:55 AM
if your configuration is working with smaller files , then the problem is for sure with the configuration your are using , so i suggest checking this post , it might be helpful for large files https://community.cloudera.com/t5/Data-Ingestion-Integration/Flume-HDFS-sink-Can-t-write-large-files/td-p/23456 (joey's answer) hope it helps , and good luck
... View more
12-31-2015
04:48 AM
i found this URL very helpful , so if anyone is facing this problem , it will help alot https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika but for solr cloud , there's another good way: 1- configure data import handler in the solrconfig.xml add this part after any request handler inside the file <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">DIHconfigfile.xml</str> </lst> </requestHandler> 2- create the data import handler file "DIHconfigfile.xml" and reside it next to the solrconfig.xml below you can find more about DIH https://wiki.apache.org/solr/DataImportHandler (check file data source part) 3- reload the core 4- from solr web UI you can start indexing the file/files you specified in the DIH .. Happy indexing
... View more
12-31-2015
03:57 AM
i guess the configuration : spoolDir.sinks.sink_to_hdfs1.hdfs.batchSize = X will write only the first X number of lines of the file to the channel and send it , so you need to decide whether you want flume to send a certain amount of lines to the channel or do you want to send the whole file as it is (trigger to send data to channel : certain number of lines , or a whole file ) i prefer whole file to be considered as a trigger , but when file size is large then channel size will be the bottleneck
... View more
12-31-2015
01:12 AM
i found out that's practically imposible to do so using solr's default query parser , we can use a custom query parser that tokenize the query and apply fuzzy for each work individually , as solr's default query parser handle the following query : "testing big data"~SLOPE_VALUE where slope is the distance between each word not fuzzy value for each word
... View more