Member since
09-23-2015
88
Posts
109
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2209 | 08-24-2016 09:13 PM |
12-19-2016
04:11 PM
I recommend checking with ElasticSearch to see if HDFS is supported as a storage medium for your ES indexes. It is not apparent from the documentation whether or not this is possible using ElasticSearch (Lucene): https://www.elastic.co/guide/en/elasticsearch/hadoop/5.1/es-yarn.html Alternatively, Solr (HDP Search) does support storage indexes to HDFS: https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
... View more
12-19-2016
03:58 PM
Do you have NodeManager running on all machines, including the "Master" server running Ambari? This would explain why you are unable to use Ambari correctly while running YARN jobs (Sqoop). Go to the Hosts tab, search for the NodeManager component across your hosts. Find out which hosts are running the NodeManager service.
... View more
12-16-2016
03:19 AM
Docker container executor will not be supported until some future version of HDP and Hadoop. The current apache documentation for LinuxContainerExecutor is available in the community you to try on your own, but not yet supported in HDP 2.5 (latest).
... View more
12-16-2016
03:16 AM
Yes, Hive does have the capacity to create Indexes , however I do not recommend using it. Most of the Hive engineering effort has gone to utilizing ORC's file-native indexes. Please try your performance test using Parquet, then ORC. I'd love to hear your difference. Also this test bench will create sample datasets for you to easily test both scenarios with: https://github.com/hortonworks/hive-testbench
... View more
12-16-2016
01:53 AM
HDP 2.5.3 includes Sqoop 1.4.6 out of the box. There is no configuration needed for "Sqoop" as it is only client libraries which you invoke when you run your Sqoop job. Here is documentation for running Sqoop in HDP: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_dataintegration/content/ch_using-sqoop.html
... View more
12-16-2016
01:50 AM
Please see these articles which describe how to use Hive and S3. Also - the HDC (Hortonworks Data Cloud) offering with Amazon includes more advanced S3 connectivity capability "HDC and S3": http://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/s3-hive/index.html "Using AWS S3 as the Hive warehouse": http://blog.sequenceiq.com/blog/2014/11/17/datalake-cloudbreak-2/
... View more
12-16-2016
01:44 AM
The creation of files under /yarn/local/usercache is normal operation for any YARN job (including the MapReduce job launched by Sqoop). It is not recommended that you attempt to change this. Could you describe why you are concerned about this part of Sqoop operation?
... View more
12-16-2016
01:40 AM
Could you describe what you mean by Quick Install vs Full Install? Essentially there are three ways to install HDP: 1) Manual command line install - http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_command-line-installation/content/ch_getting_ready_chapter.html 2) Ambari GUI install (my recommendation) - http://docs.hortonworks.com/HDPDocuments/Ambari/Ambari-2.4.2.0/index.html 3) Ambari Blueprint install - https://cwiki.apache.org/confluence/display/AMBARI/Blueprints
... View more
12-16-2016
01:38 AM
Can you describe "why" you are trying to mount NFS gateway to ElasticSearch? There might be a better solution we can recommend. I have not seen this as a common integration with ElasticSearch and it may not be supported. While ElasticSearch may be able to write to a normal NFS mounted drive, HDFS NFS gateway does not implement ALL the NFS capability, specifically: File append is supported but random write is not supported. https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html
... View more
12-16-2016
01:30 AM
1) YARN memory at 100% is a good thing. That implies you are fully utilizing your cluster. 2) Can you show a screenshot of your Ambari UI where you cannot access the YARN -> Config tab? 3) "Incorrect key file for table '/tmp/" appears to be an issue with your MySQL table, not anything to do with your Hadoop config or Sqoop: http://stackoverflow.com/questions/2090073/mysql-incorrect-key-file-for-tmp-table-when-making-multiple-joins . Possibly due to lack of /tmp space. Please correct the MySQL issue and retest. 4) You can typically see a copy of the SQL which Sqoop is attempting to run as part of its normal output when you run the command. Try running this command directly in MySQL to confirm whether or not this is a MySQL issue.
... View more
12-07-2016
03:38 PM
Issues: 1) In your table definition "create table ..." you do not specify the LOCATION attribute of your table. Therefore Hive will default to look for the file in the default warehouse dir path. The location in your screenshot is under /user/admin/. You can run the command "show create table ..." to see where Hive thinks the table's files are located. By default Hive creates managed tables, where files, metadata and statistics are managed by internal Hive processes. A managed table is stored under the hive.metastore.warehouse.dir path property, by default in a folder path similar to /apps/hive/warehouse/databasename.db/tablename/. The default location can be overridden by the location property during table creation. 2) You are specifying the format using hive.default.fileformat. I would avoid using this property. Instead simply use "STORED AS TEXTFILE" or "STORED AS ORC" in your table definition. Please change the above, retest and let us know how that works
... View more
12-07-2016
03:30 PM
Could you describe why you want to add this Ruby function to HBase shell? Also - why are you looking to count the number of columns? Do you mean column families in HBase or Column Qualifiers? If you can give a recap of your overall project, that will help. There may be a much easier solution with a different approach.
... View more
11-28-2016
09:51 PM
@Mark Herring should we convert this to an article instead of a question?
... View more
11-28-2016
09:41 PM
Could you explain what you mean by "Veritas" cluster agents? I'm curious because we most often work with Kafka message brokers and Storm.
... View more
11-28-2016
09:39 PM
Igor - sorry that we haven't been able to get to your question sooner. Have you been able to retest this in the latest version of HDP 2.5.0 with Storm 1.0.1? http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_release-notes/content/ch_relnotes_v250.html
... View more
11-28-2016
09:31 PM
Unfortunately Kylin is not a product currently supported by Hortonworks at this time. Is your problem related to a component in the HDP stack, such as Hive? Can you share the full error text?
... View more
11-28-2016
09:28 PM
Can you please retry this using the latest HDP 2.5 sanbox? I'd rather make good use of your time working through issues on our latest Sandbox instance if possible - http://hortonworks.com/products/sandbox/
... View more
11-28-2016
09:27 PM
Unfortunately Nutch is not a component that Hortonworks currently supports and not one that I have direct experience to offer. Could you describe your use case in more detail? Perhaps we can find a more modern tool that can help you?
... View more
11-28-2016
09:25 PM
The problem appears to be related to your LLAP daemons not all being started correctly:
LLAP app 'llap0'in'RUNNING_PARTIAL' state.LiveInstances:'2'.DesiredInstances:'4' after 98.7810809612 secs.
Can you share the Hive Config details: how you are configuring your LLAP daemons, memory? Instructions available here . Any log file errors you can share from Ambari when the LLAP daemons/service starts?
... View more
11-28-2016
08:41 PM
Can you share the portions of your source code from all three applications which read/write to Hive table? Can you clarify - how are you seeing different results in your data query? Are you seeind different row counts - or is it the # of partitions cached in memory that is the problem?
... View more
11-28-2016
08:35 PM
1) Movement between remote locations - I recommend using Kafka MirrorMaker or Apache NiFi (both part of Hortonworks Data Flow) to move data between datacenters. 2) Websockets - can you elaborate on what your design flow would look like? Apache Kafka does not yet have a RESTful or WebSockets based API. List of Kafka APIs here. 3) Offset management should only be managed per each local datacenter. E.g. you would not want to maintain sync of Kafka offset or Zookeeper across datacenters.
... View more
11-28-2016
08:32 PM
Are you trying to ensure even distribution (load balancing) to the Knox instances or HS2 instances? The Knox request to HS2 may have preference (stickiness) for the first HS2 instance to respond successfully. If you require true (even) load balancing - I recommend putting Apache HTTP mod_proxy in front of the Knox instances to guarantee your request goes to different HS2 instances. Details on Knox HA and load balancing here - http://knox.apache.org/books/knox-0-6-0/user-guide.html#High+Availability
... View more
11-28-2016
08:19 PM
Most likely you would prefer to pipe the results of your HBase shell command into your Ruby function. Could you show some of the source code or pseudo logic of your Ruby function so that we can verify?
... View more
11-28-2016
08:17 PM
Because you also want the graph to be "acyclic" - meaning you do not want to have loops in your Oozie logic (graph) Cyclic vs Acyclic - https://www.quora.com/What-is-the-difference-between-a-cyclic-and-an-acyclic-graph-in-graph-theory
... View more
11-28-2016
08:04 PM
This could be due to either: 1) Better compression of your specific data when in RDD format 2) Project Tungsten and caching of memory off heap (which would make the on-heap memory usage appear smaller)
... View more
11-28-2016
07:57 PM
1 Kudo
My guess is that your local "Cluster A" config values are superseding your use of the "-D" option to overwrite the defaultFS parameter. E.g. your local Cluster A values may have higher priority. I would have expected that your second command with "hadoop fs -ls" should work to display the remote clusters file directory. Perhaps there was a typo or some other reason why this is not being picked up? Could you alternatively use WebHDFS command via REST API (bash or Python) to list directories? https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#LISTSTATUS
... View more
11-28-2016
07:47 PM
What do you mean by "then ORC count was double"? Can you share all the commands you used for step #1, as well as the table schema?
... View more
11-28-2016
07:46 PM
How did you write the ORC file to this location (Pig, Spark, NiFi other?). Can you show the schema of the table, the contents of the folder where ORC files are written, and also any details/code on how the file was ingested?
... View more
11-23-2016
06:55 PM
Well done Ned!
... View more
08-25-2016
08:23 AM
Appreciate the correction 😉
... View more