Member since
05-18-2016
71
Posts
39
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
668 | 12-16-2016 06:12 PM | |
206 | 11-02-2016 05:35 PM | |
2096 | 10-06-2016 04:32 PM | |
353 | 10-06-2016 04:21 PM | |
564 | 09-12-2016 05:16 PM |
03-07-2018
10:51 PM
try adding this parameter to your sqoop job -D sqoop.export.records.per.statement=100 you can micro batch your transactions.
... View more
12-22-2017
06:27 PM
if you are using 2.6 version or later, you can turn on ACID and execute delete commands. you can audit your delete via your app if it is needed. Otherwise, delete, merge and update can run on Hive directly with ACID.
... View more
12-20-2017
04:35 PM
You can use EvaluateJsonPath to extract the value of attribute1: and RouteonAttribute to make filtering based decisions for your flow. Here is an example of evaluateJsonPath https://community.hortonworks.com/articles/64069/converting-a-large-json-file-into-csv.html Here is an example of RouteonAttribute : https://community.hortonworks.com/articles/83610/nifi-rest-api-flowfile-count-monitoring.html
... View more
12-20-2017
04:20 PM
Is your NiFi installed on the same node as Hadoop nodes? If not have you copied the core-site.xml to you NiFi Node. As Aditya mentioned the problem is you are not able to log into to Hadoop using the configuration. You might have to check your configuration. Please also try using PutHDFS processor to debug your flow.
... View more
12-20-2017
04:16 PM
Are you using Ranger, and have you set any policies, if you have done so, you might have to check the policies to make sure they are not conflicting with the Oozie user. If you have not set up Ranger policies, then updating ACLS and file level permissions could solve your problem.
... View more
10-27-2017
03:37 PM
1 Kudo
Matt, Thanks for helping me with this. The real problem was with InferAvroSchema processor as it uses Kite to determine the data type of the record. if you have nulls or zeros as a record value, this inferAvroSchema is not consistent, and during a merge if a bin consists of some data of double or float data type, and some zeros, ConvertJSONtoAVRO fails as the schema inferred in incorrect. It would be wise to configure the schema manually in the ConvertJSONtoAVRO schema instead of using InferAvroSchema, if that makes sense..
... View more
10-19-2017
12:05 AM
Hi Matt, this happens randomly, within my sample data set i ran it mutiple times, every time this fails at a different point. Interesting thing is, when i change the number of bins to 1, then it does not any merge at all, when i remove merge content processor its absolutely fine. screen-shot-2017-10-18-at-52930-pm.pngscreen-shot-2017-10-18-at-53030-pm.png
... View more
10-18-2017
07:06 PM
I am collecting HTTP/JSON data and converting it into a ORC file eventually, so i can use Hive table to read this file. I am able to successfully do this without a problem, but i generate a lot of ORC files and Hive queries are slower, so i decided to use mergeContent processor, once i start using this processor before convertCSVtoAVRO processor, CsV to AVRO sporadically cannot convert some records and throws out a warning message and that data is lost. This is not consistent, Sometimes all the data is processed correctly and sometimes a few records are not processed, i tested it with the same data set and everytime its a different record.
... View more
Labels:
05-16-2017
02:36 PM
by Inserting into table using the construct INSERT INTO SELECT FROM... you are essentially bacthing/microbatching the data. You would have to create scripts/code to handle exceptions as you are loading. There are multiple ways of doing this. But there is no automated way unless you use ETL/ELT tools.
... View more
05-15-2017
06:35 PM
How are you loading the table? Are you loading them from CSV files, or using sqoop. Each of those ways you could enable logging at job level to track changes and restart the batch. If you have written your own scripting, on error you should try and write those files to error directories so you could go back and look for errors, fix them and reload them.
... View more
04-17-2017
01:24 AM
This works perfectly with Field Cloud. If you want to run some queries on phoenix by following this and Phoenix and Hbase tutorials this is an awesome demoable material
... View more
04-04-2017
03:35 PM
Can you write triggers that can copy the data into a newer table and may be you can introduce the timestamp and mark the transactions if it was update, insert or a delete. I know it may not be possible to influence the source databases..
... View more
04-03-2017
06:27 PM
Also, how big is data in your table.. are you doing some sort of Limit or where clause when you run the query?
... View more
04-03-2017
06:23 PM
1 Kudo
Does this table have a timestamp column or a column that can be indexed as a highest value column. If you do, then sqoop job would be the most efficient and consistent way to move data from source to you destination, without loosing data. Please read this tutorial to extract data to your source in a consistent way https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_dataintegration/content/incrementally-updating-hive-table-with-sqoop-and-ext-table.html
... View more
04-03-2017
06:18 PM
Are these tables External Tables? In the case of external tables you would have manually clean the folders by removing the files and folders that are referenced by the table ( using hadoop fs -rm command)
... View more
02-15-2017
02:45 PM
Can you give us more details, how much data are we talking about. Is that in Hive, Files, etc.. If it is a small set of data, You can always copy from HDFS to local and copy back to the new sandbox use hadoop fs commands Or using the Hadoop Ambari HDFS and Hive views. If you have terabytes of data ( which i dont think on a sandbox), then distcp can be used, but i have not tried this on 2 sand boxes running at the same time.
... View more
02-03-2017
07:19 PM
1 Kudo
@Aditya update yarn-site.xml. Find out the parameters based on the documentation and update them to increase your resource and restart yarn and its affected components.
... View more
02-03-2017
07:17 PM
1 Kudo
Is this an external table? if it is an external table then write a perl/shell script that runs daily based on a pattern it will remove files older than a certain date. If it is not, and if it is a managed table with ORC/Parquet format, then it depends on how you load the table. While loading the table you could setup logic to populate the date in the table partition by that date and as @Avijeet Dash mentioned drop the partition that is 7 days old.
... View more
02-02-2017
11:16 PM
1 Kudo
@Aditya, Are you using Hortonworks Hadoop Distribution with Ambari. If you are using HDP then, go to ambari and go to Yarn configuration change to config tabs and change the Memory, Container and CPU settings to your liking. Restart Yarn, Mapreduce components if they are impacted.
... View more
01-31-2017
04:22 PM
The database agnostic high-level model to go over metadata is the hive meta tool. HIVE_CONF_DIR=/etc/hive/conf/conf.server/ hive --service metatool -executeJDOQL "select name from org.apache.hadoop.hive.metastore.model.MDatabase" HIVE_CONF_DIR=/etc/hive/conf/conf.server/ hive --service metatool -executeJDOQL "select database.name + '.' + tableName from org.apache.hadoop.hive.metastore.model.MTable" You can find the ORM data layouts here https://github.com/apache/hive/blob/master/metastore/src/model/package.jdo
... View more
01-16-2017
09:46 PM
This is a great question. As @cduby pointed out its not available until sqoop 2. But, if you have to load data into tables using some sort of authorization, you could directly load the data in hdfs, then use beeline to move data into tables with authorization, then you can apply Ranger policies per user and other features. Its a bit of a long way to achieve the results but will get you closer to audit, authorization requirements.
... View more
01-06-2017
06:23 PM
1 Kudo
This happened because your mysql database got corrupted, I had this same thing happen, I got this error. Could not open or create the system tablespace. If you tried to add new data files to the system tablespace, and it failed here, you should now edit innodb_data_file_path in my.cnf back to what it was, and remove the new ibdata files InnoDB created in this failed attempt. InnoDB only wrote those files full of zeros, but did not yet use them in any way. But be careful: do not remove old data files which contain your precious data! Once i removed and re-installed the sandbox from scratch, there were no errors and hive started normally. Alternatively you could uninstall mysql and re-install it. If you already had data in Hive, then backup a copy of your metastore and when you re-install mysql then restore your backup into the new version of mysql.
... View more
01-06-2017
04:48 PM
Were you able to resolve this on the oracle virtual box?
... View more
01-04-2017
10:02 PM
1 Kudo
Easiest way to copy over files from Windows to hdfs is by using Ambari files view, and use the upload button to load files from your OS to HDFS, you can then use hadoop fs commands to move the files from hdfs to unix file system within your vm..
... View more
12-16-2016
06:12 PM
1 Kudo
Ambari should always run on 8080 port, so connecting to http://localhost:8080 should take you directly to the ambari login. Zeppelin runs on port 9995.
... View more
12-15-2016
09:07 PM
sudo su - hdfs, then execute your commands. the xdl3 user does not have write access to /xdl/tmp directory. Also i hope you dont have any acls setup.
... View more
12-15-2016
09:04 PM
2 Kudos
hortonworks data platform ships with zeppelin with a variety of interpreters. Spark Sql and data frames can be used to visualize the data using this notebook.
... View more
12-15-2016
09:02 PM
your host name is set to "cluster name" which is incorrect. instead of the hdfs://clustername/folder/file" use hdfs://hostname/folder/file", update it with your hostname.
... View more
11-17-2016
03:30 PM
This is a great article, can we do ATLAS tagging to fields in Hbase, by tagging the external table. Can you apply Ranger policies to that??
... View more
11-03-2016
06:58 PM
Did it work, were you able to process data by increasing the yarn container size.
... View more