Member since
04-25-2016
579
Posts
609
Kudos Received
111
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1120 | 02-12-2020 03:17 PM | |
883 | 08-10-2017 09:42 AM | |
7153 | 07-28-2017 03:57 AM | |
1322 | 07-19-2017 02:43 AM | |
1023 | 07-13-2017 11:42 AM |
06-07-2016
06:29 AM
5 Kudos
@nyadav no Hive don't have capability to query other data source until its storage handler is defined. hive has concept of native and non native tables, for native tables it know how to manage it but for non native table it dont have a capability until it has not storage handler. to know more of storage handler you can refer this doc https://cwiki.apache.org/confluence/display/Hive/StorageHandlers
... View more
06-07-2016
05:38 AM
@Roberto Sancho yes, please change the table DDL and use insert overwrite to insert the data again.
... View more
06-07-2016
05:35 AM
@Roberto Sancho the configuration looks gud,if you want to cache hive meta for longer period of time you can increase the hive.metastore.cache-ttl-seconds value.
... View more
06-06-2016
02:40 PM
1 Kudo
@Hemalatha Panneerselvam is this what you looking for SELECT
count(col_name)
FROM
table where col_name like '%http%' having count(col_name) >=3
... View more
06-06-2016
10:44 AM
3 Kudos
if your external table is pointing to some location in hdfs and you are putting more csv data on table location which has same schema as of defined table then hive will take care of new data automatically.
... View more
06-06-2016
07:24 AM
2 Kudos
I will suggest you to try nifi PutHDFS, more on this you can find here https://community.hortonworks.com/articles/7999/apache-nifi-part-1-introduction.html
... View more
06-05-2016
09:29 AM
@Roberto Sancho I could not get your question here, could you please elobarate what you are asking?
... View more
06-05-2016
06:26 AM
2 Kudos
it looks that atlas hook is enabled in your sandbox, could you please check your hive-site.xml change hive.exec.post.hooks as follows if you dont want to use atlas hooks from <property>
<name>hive.exec.post.hooks</name>
<value>org.apache.hadoop.hive.ql.hooks.ATSHook, org.apache.atlas.hive.hook.HiveHook</value>
</property> to <property>
<name>hive.exec.post.hooks</name>
<value>org.apache.hadoop.hive.ql.hooks.ATSHook</value>
</property>
... View more
06-04-2016
04:19 PM
Drill uses Java direct memory as well as Java Heap memory to do the computation. if you have hive orc table drill will do the computation in "drill java heap memory" not in "drill direct memory" (physical memory). depending on hive storage plugin configuration (given below), During query planning phase Drill will query your metastore service driven by property 'hive.metastore.uris' to know the schema and other required information and prepare query plan. for better performance drill also support the caching of hive metadata into drill cache which is controlled by "hive.metastore.cache-ttl-seconds" and "hive.metastore.cache-expire-after".cache-ttl-seconds value can be any non-negative value, including 0, which turns caching off. The cache-expire-after value can be “access” or “write”. Access indicates expiry after a read or write operation, and write indicates expiry after a write operation only. {
"type": "hive",
"enabled": false,
"configProps": {
"hive.metastore.uris": "thrift://hostname:9083",
"hive.metastore.sasl.enabled": "false",
"fs.default.name": "hdfs://nmhostname/"
}
}
... View more
06-04-2016
03:32 PM
4 Kudos
@Roberto Sancho metastore comes into the picture when drill tries to query data stored in hive tables and it merely used to know the schema of the hive table otherwise it has a capability to query certain datastore by evaluating schema on the fly. on most of the data store drill uses direct memory to do all the computation but for hive tables if the data it is stored in orc or parquet it leverage hive orc or hive parquet reader to query the data which eventually read the data in java heap. he keep the metastore in memory?? No,Drill do not keep compete metastore in memory, during query parsing and planning phase it query hive metastore service to know the schema so that it can validate the query and plan accordingly.
... View more
06-03-2016
04:34 PM
3 Kudos
this is due to the memory required by orc writer while writing orc files, you can limit the memory use by tweaking the value of orc.compress.size which is of 256KB by default.I am not sure about your heap size, start testing with 8KB of buffer using alter table table_name set tblproperties("orc.compress.size"="8192") and see if it helps.
... View more
06-03-2016
03:54 PM
4 Kudos
There is no configuration required just run the spark thrift server as a spark user using following command ./sbin/start-thriftserver.sh --master yarn-client --executor-memory 512m --hiveconf hive.server2.thrift.port=100015
... View more
06-03-2016
01:17 PM
1 Kudo
@Kumar Sanyam
Could you please check whether antlr runtime is available in pig classpath using pig -printCmdDebug | grep antlr --color
... View more
06-03-2016
09:47 AM
2 Kudos
you can not turn on acid on the existing table, you need to specify TBLPROPERTIES ('transactional'='true') at the time of table creation(DDL). further on this you can follow the apache documentation https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions
... View more
06-02-2016
03:48 PM
3 Kudos
with the provided logs it seems that there is some problem with nimbus or zookeeper,please check nimbus and zookeeper logs to identify the problem.
... View more
06-02-2016
12:38 PM
1 Kudo
it looks you have not enough resources available with RM and also you are not able to check resource available due to access denied on RM UI.try to access ui by replacing http://sandbox.hortonworks.com:8042 by http://<IP_OF_SANDBOX>:8042 and ensure that your IP is reachable from your host as this is a single node so you can try yarn node -status <nm ip>
... View more
06-02-2016
12:29 PM
good to see that worked but due to limitation with Transactional table you can not use sorted by, may be in coming future version it will be supported.
... View more
06-02-2016
11:41 AM
1 Kudo
do you have multiple python version of python installed on your machine or your working with python testenv. what is your PYTHONPATH?
... View more
06-02-2016
11:09 AM
1 Kudo
numpy is missing here,install numpy using pip install numpy
... View more
06-01-2016
04:29 PM
is there any zk client is running on your local machine which is trying to connect to zk-server running on vm?
... View more
06-01-2016
04:19 PM
3 Kudos
all these value are getting picked up from your env variable http://grepcode.com/file/repo1.maven.org/maven2/org.apache.zookeeper/zookeeper/3.3.1/org/apache/zookeeper/Environment.java#Environment please check your env variable
... View more
06-01-2016
12:17 PM
3 Kudos
if you commit the offset based on timestamp you can start consuming from kafka at next batch cycle like this --commit last consumed
consumer = KafkaConsumer(bootstrap_servers='localhost:9092') tp = TopicPartition(topic, partition)
consumer.seek(tp, end)
consumer.commit()
--now start consuming offset from Kafka when the job restarts at the next batch cycle:
consumer.assign([tp])
start = consumer.committed(tp)
... View more
06-01-2016
11:06 AM
with "transactional"="true" you are not able to compile this DDL statement, transactional table wont allow sorted column, are you able to successfully execute this statement?
... View more
06-01-2016
10:50 AM
5 Kudos
@Sanjeev Verma you can use following ways to get the external configuration inside the topplogy
1:
pass the arguments like this
storm jar storm-jar topology-name -c sKey=sValue -c key1=value1 -c key2=value2 >/tmp/storm.txt
2:
Create a simple java resource file (properties files) and pass it as arguments to your topology main class, in main method read the properties from the main file
and build the storm configuration object using conf.put()
3:
create separate yaml file read it through the Utils method provided by storm api,look for more documentation https://nathanmarz.github.io/storm/doc/backtype/storm/utils/Utils.html
Utils.findAndReadConfigFile()
... View more
06-01-2016
10:09 AM
4 Kudos
as you are using Transactional Table you can not take advantage of sort by on fechaoprcnf column.Apart from partitioning try to create storage index on the table using tblproperties ("orc.create.index"="true","orc.compress"="ZLIB", "orc.stripe.size"="268435456", "orc.row.index.stride"="10000") -- orc stripe and index.stride value in this case are default try to tune these value and compare performace results.
... View more
06-01-2016
09:44 AM
2 Kudos
considering you are using orc table,if you are not using ACID table it will be good if you can modify the table DDL clustered by (codnrbeenf) sorted by (fechaoprcnf). further to this you can create storage based index on orc table by specifying orc.create.index=true.
... View more
06-01-2016
05:16 AM
4 Kudos
@akeezhadath it seems that you are not calling action which actually don't trigger the job. spark actions are lazily evaluted ,can you run some terminal operation on the filterwords like count or collect and see if you are able to see the incremented value of accumulators.
... View more
05-31-2016
05:01 PM
1 Kudo
looking at this exception java.lang.NoSuchMethodError: org.apache.hadoop.hive.shims.HadoopShims.setHadoopSessionContext(Ljava/lang/String;)V it seems that there is wrong version of HadoopShims jar is available in your classpath which dont have setHadoopSessionContext implementation in it or it has different method signature. to troubleshoot this problem lsof -p <HS2 process id> | grep -i jar |awk '{ print $9 }' > class-jar.txt for jar in `cat class-jar.txt` ; do echo "$jar" ; jar -tvf "$jar" | grep --color 'org.apache.hadoop.hive.shims.HadoopShims' ; done
look out the jars(there could me multiple shim jar available) which contains this class and then extract this class from jar for each jar which contains HadoopShims do jar xvf <jar> org/apache/hadoop/hive/shims/HadoopShims run javap org.apache.hadoop.hive.shims.HadoopShims to verify the method setHadoopSessionContext availbility and method signature
... View more
05-31-2016
01:35 PM
2 Kudos
@yong yang apart from adding jars at each session level as suggested by @Jitendra Yadav you can add them permanently using hive.aux.jars.path <property>
<name>hive.aux.jars.path</name>
<value>/var/lib/hive</value>
</property>
... View more
05-31-2016
07:11 AM
4 Kudos
can you check nifi bootstrap logs to see if there is any port conflict.
... View more
- « Previous
- Next »