About tarekabouzeid91

tarekabouzeid91 · ‎04-08-2019

Hi guys, i am using spark-shell ( spark-shell --master yarn --jars /usr/hdp/current/hive-warehouse-connector/hive-warehouse-connector_2.11-1.0.0.3.1.2.0-4.jar --conf spark.security.credentials.hiveserver2.enabled=false) to read hive tables using spark, i am able to execute commands like create table, create database, show tables, show databases, but i am not able to read data from tables, my code is as below import com.hortonworks.hwc.HiveWarehouseSession val hive = HiveWarehouseSession.session(spark).build() hive.createDatabase("spark_llap01",false) hive.setDatabase("spark_llap01") hive.createTable("hwx_table").column("value", "string").create() hive.executeUpdate("insert into hwx_table values('1')") hive.executeQuery("select * from hwx_table").show i get this error, whenever i try to fetch data java.lang.AbstractMethodError: Method com/hortonworks/spark/sql/hive/llap/HiveWarehouseDataSourceReader.createBatchDataReaderFactories()Ljava/util/List; is abstract i used beeline to check whether the data has been written, and i found the database exists along with the table, also when i queried the table i found the data.

tarekabouzeid91 · ‎10-26-2018

Hi, i have hortonworks cluster of multiple nodes, and want to migrate the data (mainly HDFS and configuration files for services like zookeeper ..etc) to Cloudera, so i can be able to manage the cluster by cloudera manager and use cloudera's products, so any suggestions for migration tools or steps? Best Regards,

tarekabouzeid91 · ‎04-05-2017

yes you have to

tarekabouzeid91 · ‎01-10-2016

don't forget to reload the collection to see the effect

tarekabouzeid91 · ‎01-10-2016

i found the solution , go to : cd /opt/cloudera/parcels/CDH/lib/solr/bin/ then type : solrctl instancedir --update collection_name new_configuration_path/

tarekabouzeid91 · ‎01-05-2016

i see no need to contact zookeeper at all , all you need is to contact solr directly , i have the below code working fine , try it import org.apache.solr.client.solrj.SolrQuery; import org.apache.solr.client.solrj.SolrServerException; import org.apache.solr.client.solrj.impl.HttpSolrServer; import org.apache.solr.client.solrj.response.QueryResponse; import org.apache.solr.common.SolrInputDocument; import java.io.IOException; import java.util.HashMap; import java.util.Map; public class update_records { /** * @param args * @throws IOException * @throws SolrServerException */ public static void main(String[] args) throws SolrServerException, IOException { // TODO Auto-generated method stub HttpSolrServer server = new HttpSolrServer("http://localhost:8983/solr/xml_unstructured"); SolrInputDocument doc = new SolrInputDocument(); Map<String, String> partialUpdate = new HashMap<String, String>(); partialUpdate.put("set", "New title"); doc = new SolrInputDocument(); doc.addField("id", "12331131"); doc.addField("link", "new link"); doc.addField("title", partialUpdate); server.add(doc); server.commit(); } }

tarekabouzeid91 · ‎12-31-2015

if your configuration is working with smaller files , then the problem is for sure with the configuration your are using , so i suggest checking this post , it might be helpful for large files https://community.cloudera.com/t5/Data-Ingestion-Integration/Flume-HDFS-sink-Can-t-write-large-files/td-p/23456 (joey's answer) hope it helps , and good luck

tarekabouzeid91 · ‎12-31-2015

i found this URL very helpful , so if anyone is facing this problem , it will help alot https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika but for solr cloud , there's another good way: 1- configure data import handler in the solrconfig.xml add this part after any request handler inside the file <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">DIHconfigfile.xml</str> </lst> </requestHandler> 2- create the data import handler file "DIHconfigfile.xml" and reside it next to the solrconfig.xml below you can find more about DIH https://wiki.apache.org/solr/DataImportHandler (check file data source part) 3- reload the core 4- from solr web UI you can start indexing the file/files you specified in the DIH .. Happy indexing

tarekabouzeid91 · ‎12-31-2015

i guess the configuration : spoolDir.sinks.sink_to_hdfs1.hdfs.batchSize = X will write only the first X number of lines of the file to the channel and send it , so you need to decide whether you want flume to send a certain amount of lines to the channel or do you want to send the whole file as it is (trigger to send data to channel : certain number of lines , or a whole file ) i prefer whole file to be considered as a trigger , but when file size is large then channel size will be the bottleneck

tarekabouzeid91 · ‎12-31-2015

i found out that's practically imposible to do so using solr's default query parser , we can use a custom query parser that tokenize the query and apply fuzzy for each work individually , as solr's default query parser handle the following query : "testing big data"~SLOPE_VALUE where slope is the distance between each word not fuzzy value for each word

Online	Offline
Last Visited	‎10-12-2021 03:27 AM

Member Since	‎02-09-2015 12:35 AM
Last Visited	‎10-12-2021 03:27 AM
Posts	95
Kudos received	8

Cloudera Community

Re: Parquet schema error

Re: sqoop jdbc error sandbox hortonwork

Re: Kafka offsets in DR scenario

Re: Hive - tez , vertex failed error during reduc...

Re: Cannot read data using Spark - Hive Warehouse...

Cannot read data using Spark - Hive Warehouse Con...

Migrating from hortonworks eco-system to Cloudera

Re: [CDH 5.3] Spark -hive integration issue

Re: how to update solr cloud existing schema

Re: how to update solr cloud existing schema

Re: "No live SolrServers available to handle this ...

Re: Flume Spooling Directory Source: Cannot load f...

Re: solr schema less in text search

Re: Flume Spooling Directory Source: Cannot load f...

Re: Phrase fuzzy search on solr