Member since
05-18-2016
71
Posts
39
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1517 | 12-16-2016 06:12 PM | |
571 | 11-02-2016 05:35 PM | |
3237 | 10-06-2016 04:32 PM | |
918 | 10-06-2016 04:21 PM | |
992 | 09-12-2016 05:16 PM |
03-07-2018
10:51 PM
try adding this parameter to your sqoop job -D sqoop.export.records.per.statement=100 you can micro batch your transactions.
... View more
12-22-2017
06:27 PM
if you are using 2.6 version or later, you can turn on ACID and execute delete commands. you can audit your delete via your app if it is needed. Otherwise, delete, merge and update can run on Hive directly with ACID.
... View more
12-20-2017
04:35 PM
You can use EvaluateJsonPath to extract the value of attribute1: and RouteonAttribute to make filtering based decisions for your flow. Here is an example of evaluateJsonPath https://community.hortonworks.com/articles/64069/converting-a-large-json-file-into-csv.html Here is an example of RouteonAttribute : https://community.hortonworks.com/articles/83610/nifi-rest-api-flowfile-count-monitoring.html
... View more
12-20-2017
04:20 PM
Is your NiFi installed on the same node as Hadoop nodes? If not have you copied the core-site.xml to you NiFi Node. As Aditya mentioned the problem is you are not able to log into to Hadoop using the configuration. You might have to check your configuration. Please also try using PutHDFS processor to debug your flow.
... View more
10-27-2017
03:37 PM
1 Kudo
Matt, Thanks for helping me with this. The real problem was with InferAvroSchema processor as it uses Kite to determine the data type of the record. if you have nulls or zeros as a record value, this inferAvroSchema is not consistent, and during a merge if a bin consists of some data of double or float data type, and some zeros, ConvertJSONtoAVRO fails as the schema inferred in incorrect. It would be wise to configure the schema manually in the ConvertJSONtoAVRO schema instead of using InferAvroSchema, if that makes sense..
... View more
10-19-2017
12:05 AM
Hi Matt, this happens randomly, within my sample data set i ran it mutiple times, every time this fails at a different point. Interesting thing is, when i change the number of bins to 1, then it does not any merge at all, when i remove merge content processor its absolutely fine. screen-shot-2017-10-18-at-52930-pm.pngscreen-shot-2017-10-18-at-53030-pm.png
... View more
10-18-2017
07:06 PM
I am collecting HTTP/JSON data and converting it into a ORC file eventually, so i can use Hive table to read this file. I am able to successfully do this without a problem, but i generate a lot of ORC files and Hive queries are slower, so i decided to use mergeContent processor, once i start using this processor before convertCSVtoAVRO processor, CsV to AVRO sporadically cannot convert some records and throws out a warning message and that data is lost. This is not consistent, Sometimes all the data is processed correctly and sometimes a few records are not processed, i tested it with the same data set and everytime its a different record.
... View more
Labels:
- Labels:
-
Apache NiFi
05-16-2017
02:36 PM
by Inserting into table using the construct INSERT INTO SELECT FROM... you are essentially bacthing/microbatching the data. You would have to create scripts/code to handle exceptions as you are loading. There are multiple ways of doing this. But there is no automated way unless you use ETL/ELT tools.
... View more
05-15-2017
06:35 PM
How are you loading the table? Are you loading them from CSV files, or using sqoop. Each of those ways you could enable logging at job level to track changes and restart the batch. If you have written your own scripting, on error you should try and write those files to error directories so you could go back and look for errors, fix them and reload them.
... View more
04-17-2017
01:24 AM
This works perfectly with Field Cloud. If you want to run some queries on phoenix by following this and Phoenix and Hbase tutorials this is an awesome demoable material
... View more
04-04-2017
03:35 PM
Can you write triggers that can copy the data into a newer table and may be you can introduce the timestamp and mark the transactions if it was update, insert or a delete. I know it may not be possible to influence the source databases..
... View more
04-03-2017
06:27 PM
Also, how big is data in your table.. are you doing some sort of Limit or where clause when you run the query?
... View more
04-03-2017
06:23 PM
1 Kudo
Does this table have a timestamp column or a column that can be indexed as a highest value column. If you do, then sqoop job would be the most efficient and consistent way to move data from source to you destination, without loosing data. Please read this tutorial to extract data to your source in a consistent way https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_dataintegration/content/incrementally-updating-hive-table-with-sqoop-and-ext-table.html
... View more
04-03-2017
06:18 PM
Are these tables External Tables? In the case of external tables you would have manually clean the folders by removing the files and folders that are referenced by the table ( using hadoop fs -rm command)
... View more
02-15-2017
02:45 PM
Can you give us more details, how much data are we talking about. Is that in Hive, Files, etc.. If it is a small set of data, You can always copy from HDFS to local and copy back to the new sandbox use hadoop fs commands Or using the Hadoop Ambari HDFS and Hive views. If you have terabytes of data ( which i dont think on a sandbox), then distcp can be used, but i have not tried this on 2 sand boxes running at the same time.
... View more
02-03-2017
07:19 PM
1 Kudo
@Aditya update yarn-site.xml. Find out the parameters based on the documentation and update them to increase your resource and restart yarn and its affected components.
... View more
02-03-2017
07:17 PM
1 Kudo
Is this an external table? if it is an external table then write a perl/shell script that runs daily based on a pattern it will remove files older than a certain date. If it is not, and if it is a managed table with ORC/Parquet format, then it depends on how you load the table. While loading the table you could setup logic to populate the date in the table partition by that date and as @Avijeet Dash mentioned drop the partition that is 7 days old.
... View more
02-02-2017
11:16 PM
1 Kudo
@Aditya, Are you using Hortonworks Hadoop Distribution with Ambari. If you are using HDP then, go to ambari and go to Yarn configuration change to config tabs and change the Memory, Container and CPU settings to your liking. Restart Yarn, Mapreduce components if they are impacted.
... View more
01-31-2017
04:22 PM
The database agnostic high-level model to go over metadata is the hive meta tool. HIVE_CONF_DIR=/etc/hive/conf/conf.server/ hive --service metatool -executeJDOQL "select name from org.apache.hadoop.hive.metastore.model.MDatabase" HIVE_CONF_DIR=/etc/hive/conf/conf.server/ hive --service metatool -executeJDOQL "select database.name + '.' + tableName from org.apache.hadoop.hive.metastore.model.MTable" You can find the ORM data layouts here https://github.com/apache/hive/blob/master/metastore/src/model/package.jdo
... View more
12-16-2016
06:12 PM
1 Kudo
Ambari should always run on 8080 port, so connecting to http://localhost:8080 should take you directly to the ambari login. Zeppelin runs on port 9995.
... View more
12-15-2016
09:07 PM
sudo su - hdfs, then execute your commands. the xdl3 user does not have write access to /xdl/tmp directory. Also i hope you dont have any acls setup.
... View more
12-15-2016
09:02 PM
your host name is set to "cluster name" which is incorrect. instead of the hdfs://clustername/folder/file" use hdfs://hostname/folder/file", update it with your hostname.
... View more
11-17-2016
03:30 PM
This is a great article, can we do ATLAS tagging to fields in Hbase, by tagging the external table. Can you apply Ranger policies to that??
... View more
11-02-2016
05:35 PM
As long as you are able to get the task accomplished either manually or via Ambari you will be OK.
... View more
10-06-2016
04:32 PM
Hi "Kaliyug Antagonist!!" Try setting the sqoop import as a sqoop job. The incremental data import is supported via sqoop job.. and not directly via sqoop import. check out the link for more examples https://community.hortonworks.com/questions/10710/sqoop-incremental-import-working-fine-now-i-want-k.html Hopefully this helps out.
... View more
10-06-2016
04:21 PM
Please contact hortonworks support that administers the test, You should be able to reschedule the test once the issues get resolved.
... View more
10-03-2016
10:36 PM
1 Kudo
This is awesome, where we can demo clustered NIFI servers to clients instead of a standalone instance.
... View more
09-19-2016
05:08 PM
Does Confluent Inc Support cost extra if someone wants to install this in Production? Is there a hortonworks out of the box solution to get RESTful APIs data to be ingested into HDFS.
... View more
09-16-2016
12:57 PM
I had questions about the need for the triggers, The main reason for creating Triggers in mysql are 1) Triggers set up date and time stamp whenever a row is inserted or updated and NIFI processor is polling on the date and time column to pull the latest data from RDBMS into nifi to generate a flow file. Date and time field is critical. 2) Also, it helps to figure out if the record was inserted or updated in Mysql as well as in Hive. So we know the state of the record in the source system. This field is just being used for demo purpose, its not really required to set this data.
... View more
09-14-2016
02:06 PM
Thanks @Joshua Adeleke, this solution is not acceptable for my client as they want to install metastore db in Oracle. Was there any Support tickets opened for this issue?
... View more