Member since
05-30-2018
1322
Posts
715
Kudos Received
148
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4067 | 08-20-2018 08:26 PM | |
| 1963 | 08-15-2018 01:59 PM | |
| 2390 | 08-13-2018 02:20 PM | |
| 4140 | 07-23-2018 04:37 PM | |
| 5046 | 07-19-2018 12:52 PM |
02-13-2017
05:11 AM
3 Kudos
Easist, Use Apache NiFi, using the UI to move data from any RDBMS to hive table. You can use hive streaming option as well with apache nifi. Full details on how to do this is here: https://community.hortonworks.com/articles/45706/using-the-new-hiveql-processors-in-apache-nifi-070.html Also you can use Sqoop. Full details on how to sqoop data from RDMBS to hive/hadoop and back is here http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/using_sqoop_to_move_data_into_hive.html Just as an example, to move entire table from mysql into a hive table named EMPLOYEES:
sqoop import --connect jdbc:mysql://db.foo.com/bar --table EMPLOYEES Or only latest data sqoop import --connect jdbc:mysql://db.foo.com/bar --table EMPLOYEES --where "start_date > '2010-01-01'"
or using a query sqoop import --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id)
WHERE $CONDITIONS' --split-by a.id --target-dir /user/foo/joinresults I would by default use NiFi since it is the easiest way to get data into hive/hadoop
... View more
02-13-2017
02:33 AM
I am not sure about the cloudera vm, as my CDH vm has faulted on me several times so this point I am not sure about. However, what you can do is go into the services you are using and set log4j properties to retain no more then x days For example..Kafka
log4j.appender.kafkaAppender=org.apache.log4j.RollingFileAppender log4j.appender.kafkaAppender.MaxFileSize=100MB log4j.appender.kafkaAppender.MaxBackupIndex=9 This will only allow 9 backup logs Here is good article on how to control log sizes and retention for each HDP service https://community.hortonworks.com/content/kbentry/8882/how-to-control-size-of-log-files-for-various-hdp-c.html
... View more
02-12-2017
02:52 PM
Yes you can. This requires state to keep track of the number. Use distributed map cache (DMC) to fetch and put your sequence. Your DMC put would be existing number plus 1.
... View more
02-11-2017
06:40 PM
1 Kudo
The sandbox play roles of ambari, edge, master, and data node. It is setup to get you up and running quick to learn the hadoop stack. In a production environment, you would separate ambari, edge, master services (1:m on each node) and x number (min 3) of data nodes. You will scale your data nodes based on the compute and storage required for your workloadd
... View more
02-11-2017
06:38 PM
1 Kudo
I can't say for sure why but I would recommend you suspending your vm when not in use. It is not designed for long running instance. to save your work, don't turn off the box..but instead suspend it. This makes things much easier when you resume activities.
... View more
02-11-2017
03:34 AM
on your sandbox, please confirm you have service atlas running. if not please enable it and rerun sqoop command above
... View more
02-11-2017
03:31 AM
1 Kudo
The dbgen.jar was not created during build. please verify you have gcc installed.
... View more
02-11-2017
02:49 AM
1 Kudo
OK I found what i was doing wrong. The posthttp processor has a compressionLevel attribute. If you set this value to > 0 it will compress the content as gzip. So no reason to compress prior to using posthttp if you are using gzip
... View more
02-10-2017
10:53 PM
Here is my workflow Data-->CompressContent(gzip, level 2) --> PostHTTP which post to the ListenHTTP endpoint ListenHTTP-->CompressContent(gzip, level 2, for decompression) --> put file For decompression i set compression format to gzip and level 2. However when I look at the file post decompression it is still compressed. any ideas? By the way, I tested the CompressContent process by doing this Data --> CompressContent(gzip, level 2) --> (forDemcompress)CompressContent(gzip, level 2), and in both stages it compresses the data and decompresses with success. This seems to be issue around ListenHTTP into CompressContent for decompression.
... View more
Labels:
- Labels:
-
Apache NiFi
02-10-2017
06:54 PM
I used the update attribute to create an attribute called compressvalue and set it to 2. Then I used this attribute in the posthttp processor Attribute Compression Level and used my attribute $(compressvalue:toNumber()). The processor fails to validate complaining my attribute is not an integer. any ideas?
... View more
Labels:
- Labels:
-
Apache NiFi