Member since
11-19-2015
158
Posts
25
Kudos Received
21
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
11730 | 09-01-2018 01:27 AM | |
1096 | 09-01-2018 01:18 AM | |
3668 | 08-20-2018 09:39 PM | |
484 | 07-20-2018 04:51 PM | |
1461 | 07-16-2018 09:41 PM |
01-15-2018
04:54 PM
@Tu Nguyen - I'm afraid I don't understand your question. Spark does not use JDBC to communicate with Hive, but it can load Hive with any type of data that's able to be represented as a Spark DataSet. You may want to try a "MSCK REPAIR TABLE <tablename>;" in Hive, though
... View more
01-13-2018
02:58 AM
See https://community.hortonworks.com/questions/158942/ms-access-odbc-connection-to-hive.html
... View more
01-13-2018
02:57 AM
You can coalesce using Spark, or MergeContent in NiFi to "compact" processes without needing to go to -getmerge. You should ideally avoid zip files on HDFS. They are not a common format in HDFS since they are not splittable. Reading a large ZIP file will therefore be only processable by a single mapper. Querying multiple part files of uncompressed CSV will be faster. If you need these files compressed in HDFS for archival while also able to query via Hive and other engines, use a different, compressed, binary format like Snappy w/ ORC. If you just want a CSV, use Beeline's output format argument, and write the results to a file, which you can then ZIP.
... View more
01-09-2018
08:37 PM
According to the Compatibility Matrix, 1.0 should work with older clients. Broker 1.0.0
Basic client compatibility:
Java: any version KIP-35 enabled clients: any version Kafka Streams: any version Kafka Connect: any version Doesn't this depend on what you want to do with the data once you consume it, though? HDP itself doesn't provide Kafka 1.0, NiFi doesn't have a 1.0 processor pair... Kafka Streams is a standalone Java process, and therefore is not tied to HDP
... View more
01-02-2018
10:26 PM
Ambari itself doesn't know those disks are mounted until you edit the host configurations for HDFS/YARN and update the data directory configurations. The Ambari Alert check will run periodically to see if those configured disks are mounted, then the agent will update the dashboard.
... View more
12-21-2017
06:45 PM
@Geoffrey Shelton Okot I'm not deploying Hue on a server with HDP libraries, so there are no Hadoop client libraries that would be related. I was able to build Hue from source, then built an RPM, and installed and templated the configurations on both Centos6 and Centos7 with Puppet. Both OS Hue installs are pointing at same HDP 2.5.3 cluster, which has been running the same endpoints (Namenode, ResourceManagers, HiveServer, etc) since HDP 2.2, so that first link I provided in the other answer is all that needs to be referenced. The Hue config file itself doesn't change that significantly between releases.
... View more
12-19-2017
10:11 PM
Not sure what you mean by "hive base" . Can you add you hue.ini to the post? Are you using a kerberized cluster? Are you using the correct ports, and FQDN server names? I have followed these, and it works fine for at least HDFS, YARN, Oozie, Hive, and HBase pages of Hue. http://gethue.com/how-to-configure-hue-in-your-hadoop-cluster/ http://gethue.com/hadoop-hue-3-on-hdp-installation-tutorial/
... View more
12-18-2017
06:24 PM
The exception says hdfs://secondnamenode:9000/sample already exists. List the content of the other cluster, and delete the directory if that folder is indeed there. Otherwise, add the overwrite option to your Spark code.
... View more
12-18-2017
05:34 PM
@Evan Tattam I'm sure it's supported, but the link is just down.
... View more
12-15-2017
09:17 PM
@Evan Tattam Is there a specific reason you want an old version? The 1.4.0 link is working for me. http://public-repo-1.hortonworks.com/HCP/centos7/1.x/updates/1.4.0.0/tars/metron/hcp-ambari-mpack-1.4.0.0-38.tar.gz
... View more
12-14-2017
04:32 PM
@Rakesh AN
I have not used Flume in a distributed fashion, but whatever agent you choose, it tails the logs from the agent on that server, then ships them to the configured sink destinations. One agent per server makes it collect from different servers. Flume is near real-time, since it is configured with a batch size. It's not clear what doubt you have... Can you please explain how you've configured your Flume agents, and the issues you are experiencing? The Flume documentation is fairly straightforward
... View more
12-14-2017
04:25 PM
Your open source options include Oozie (included with HDP), Luigi, Airflow, and Azkaban. https://www.bizety.com/2017/06/05/open-source-data-pipeline-luigi-vs-azkaban-vs-oozie-vs-airflow/ NiFi is also capable of scheduling tasks. An enterprise option includes Stonebranch UAC "Best" is relative to your needs and who will support it
... View more
12-14-2017
04:22 PM
@Markus Wilhelm, That API call tells the Ambari server that it has a service, but yes, now you'll need to add the nifi component. Some combination of these requests should be able to show / add that... https://github.com/apache/ambari/blob/trunk/ambari-server/docs/api/v1/components.md https://github.com/apache/ambari/blob/trunk/ambari-server/docs/api/v1/create-component.md https://github.com/apache/ambari/blob/trunk/ambari-server/docs/api/v1/create-hostcomponent.md I think the Nifi component name is NIFI_MASTER,
... View more
12-12-2017
07:26 PM
You should read the warning on the ExecSource docs against using tail -f https://flume.apache.org/FlumeUserGuide.html#exec-source It even provides you the other sources to consider using instead. Those being "Spooling Directory Source, Taildir Source or direct integration with Flume via the SDK." Personally, I like tools such as Filebeat or Fluentd for real time collection of logs, and sending those to either Elasticsearch or Solr, since they provide better tooling around log inspection.
... View more
12-12-2017
07:03 PM
@Markus Wilhelm, Since NiFi is installed on the machine itself, you might be able to post the service to the Ambari database https://github.com/apache/ambari/blob/trunk/ambari-server/docs/api/v1/create-service.md Something like this POST ambari-server:8080/clusters/:clusterName/services/NIFI
... View more
12-11-2017
11:30 PM
@Markus Wilhelm Well, the error does not say the installation of NiFi failed. Maybe the service won't show up in Ambari, but are you able to start NiFi on the server you installed it to?
... View more
12-11-2017
11:27 PM
The command syntax is
hadoop fs -copyFromLocal <localsrc> URI
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#copyFromLocal
If "URI" is not given, it will copy to /home/$(whoami), where in your case `$(whoami) == "hduser"`
In other words, running this command as the hduser linux account
hadoop fs -copyFromLocal afile in
Will copy "afile" to hdfs:///home/hduser/in If you want to copy to a different location on HDFS, give the full path to the destination
... View more
12-08-2017
09:43 PM
It looks like it created the directory structure for the service.
2017-12-07 16:43:16,148 - Installing package nifi_3_0_* ('/usr/bin/yum -d 0 -e 0 -y install 'nifi_3_0_*'')
2017-12-07 16:44:58,917 - Directory['/usr/hdf/current/nifi'] {'owner': 'nifi', 'group': 'nifi', 'create_parents': True, 'recursive_ownership': True}
2017-12-07 16:44:58,918 - Creating directory Directory['/usr/hdf/current/nifi'] since it doesn't exist.
2017-12-07 16:44:58,918 - Following the link /usr/hdf/current/nifi to /usr/hdf/3.0.0.0-453/nifi to create the directory
2017-12-07 16:44:58,919 - Changing owner for /usr/hdf/3.0.0.0-453/nifi from 0 to nifi
2017-12-07 16:44:58,919 - Changing group for /usr/hdf/3.0.0.0-453/nifi from 0 to nifi
2017-12-07 16:44:58,921 - Directory['/var/run/nifi'] {'owner': 'nifi', 'create_parents': True, 'group': 'nifi', 'recursive_ownership': True}
2017-12-07 16:44:58,921 - Directory['/var/log/nifi'] {'owner': 'nifi', 'create_parents': True, 'group': 'nifi', 'recursive_ownership': True}
2017-12-07 16:44:58,922 - Directory['/var/lib/nifi'] {'owner': 'nifi', 'create_parents': True, 'group': 'nifi', 'recursive_ownership': True}
2017-12-07 16:44:58,922 - Directory['/var/lib/nifi/database_repository'] {'owner': 'nifi', 'create_parents': True, 'group': 'nifi', 'recursive_ownership': True}
2017-12-07 16:44:58,923 - Directory['/var/lib/nifi/flowfile_repository'] {'owner': 'nifi', 'create_parents': True, 'group': 'nifi', 'recursive_ownership': True}
2017-12-07 16:44:58,923 - Directory['/var/lib/nifi/provenance_repository'] {'owner': 'nifi', 'create_parents': True, 'group': 'nifi', 'recursive_ownership': True}
2017-12-07 16:44:58,924 - Directory['/usr/hdf/current/nifi/conf'] {'owner': 'nifi', 'create_parents': True, 'group': 'nifi', 'recursive_ownership': True}
2017-12-07 16:44:58,924 - Creating directory Directory['/usr/hdf/current/nifi/conf'] since it doesn't exist.
2017-12-07 16:44:58,924 - Changing owner for /usr/hdf/current/nifi/conf from 0 to nifi
2017-12-07 16:44:58,924 - Changing group for /usr/hdf/current/nifi/conf from 0 to nifi
It looks like you are using a custom local repository? Have you tried using the Hortonworks ones?
... View more
12-06-2017
08:25 PM
1 Kudo
@Ravikiran Dasari, You can of course install NiFi as an extra service, just as anything else. You are not locked to only packages HDP provides. You just lose the advantage of using Ambari to monitor and configure it. Feel free to read over the NiFi installation documentation, if you want to use it. Or you can install HDF services (such as NiFi) to your existing HDP cluster. If you want to use Flume, it seems there is an external FTP source, however, I personally don't know to install or configure it Also see https://community.hortonworks.com/questions/150882/ftp-files-to-hdfs.html
... View more
12-05-2017
11:31 PM
You should use Hive partitioned tables.
Make your folders like this
/user/hadoop/tableName/day=2017-12-04/
/user/hadoop/tableName/day=2017-12-05/
/user/hadoop/tableName/day=2017-12-06/
And your corresponding Hive query would be create external table test_table(id int, name string)
partitioned by (day string) row format delimited fields terminated by ',' location '/user/hadoop/tableName';
... View more
12-05-2017
11:26 PM
NiFI is not your only option. You could install a Flume Agent on the SFTP Server to read this folder as a spooling directory. You can use Spark to read from the FTP directory and write to HDFS as it's just a filesystem. Add FTP Java clients to your code, and read from a folder. Whatever route you chose, you either need 1. additional software installed on the SFTP Server itself 2. setup a process "upstream" of the SFTP server that also sends files to HDFS. That could be by WebHDFS, HTTPFS, or a NFS Gateway 3. Some software that HDP does not provide out of the box between that server and HDFS. This includes NiFi, but Streamsets is another option. The official documentation for those softwares are going to tell you more than I would be able to here. If you want to use HDF, I believe you see if this documentation suits your needs. https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.1.1/bk_installing-hdf-and-hdp/content/ch_install-ambari.html
... View more
12-04-2017
10:09 PM
2 Kudos
NiFI has a GetFTP and PutHDFS processor. Are you using an HDF cluster?
... View more
12-01-2017
11:22 PM
Is topic deletion enabled at the broker level (delete.topic.enable=true) in the entire cluster, and did you restart it if you did enable it? Maybe since the disk is full, Kafka and related services are refusing to start. Have you verified there are such processes running on the machine? The nuclear, manual option would be to delete the topic data from the broker, but you must also purge the Zookeeper records for this topic
... View more
12-01-2017
04:14 PM
I don't think the message "Unable to lookup the cluster by ID" has anything to do with local repositories, that looks like a problem with how you named your cluster. See "clusterName=clusterID=-1". But yes, you had to first create an "ambari.repo" and "hdp.repo" file in /etc/yum.repos.d/ that point to your local repo before you even can install ambari-server and other related HDP packages on that machine. Try starting over. https://community.hortonworks.com/questions/1110/how-to-completely-remove-uninstall-ambari-and-hdp.html
... View more
11-30-2017
09:32 PM
The repos are explicitly defined by an ambari stack. For the HDP stack, these are public-repo-1.hortonworks.com. You can reconfigure these yourself by logging into Ambari as an admin, dropping down the username > Manage Ambari > Versions > Version Name. See in the screenshot that I have set the repos to be Artifactory.
... View more
11-30-2017
09:06 PM
@Michael Bronson Topics are never automatically deleted. The logs are retained for a configured number of bytes (log.retention.bytes) or period of time (log.retention.{hours, minutes, ms}), then the log segments are purged or compacted, which is another Kafka setting (log.cleanup.policy). All the configurations that you seek are defined in the Kafka documentation, and you should really take these tunables into consideration when installing a production Kafka cluster.
... View more
11-29-2017
06:57 PM
Ambari should manage this already depending on the clients/services installed. And if there is no refresh icon, it will be up-to-date. Feel free to spot-check the configurations for your services in /usr/hdp/current/ or /etc/hadoop/conf
... View more
11-27-2017
07:58 PM
What documentation did you follow to install Hue? From what I have found, there is no systemd service that it comes with. You should be able to extract the tarball on gethue.com and start the server.
... View more
11-27-2017
05:28 PM
1 Kudo
Connect to which part of Hadoop? HDFS, Hive, HBase? If HDFS, you can use WebHDFS from any programming language with an HTTP client, or you can include the hadoop-common library in your code via Maven, for example. If HBase, there are Java clients you can find. If Hive, you can use JDBC.
... View more
11-21-2017
09:48 PM
You can't clear HDFS on a host because HDFS is an filesystem abstraction over the entire cluster. You can clear the datanode directories of a particular host (or format the disks), but the HDFS balancer will fill them back in depending on the other data ingestion processes of the cluster and ensuring 3 replicas on the files.
... View more