About Mark_Heydenrych

Mark_Heydenrych · ‎07-07-2017

Hi Matt I switched "Remove trailing Newlines" to false and got the number of fragments to 66443 as you suggested. This is a little confusing to me, as when I check the original file the number of lines is 66430. However your point is 100% correct. Thank you for opening the Jira request. While I wait for this, do you know of any useful workaround I can use in the time being to get the number of actually emitted fragments? It would be slower, but would it be possible, after the split, the merge the fragments (which would now include no newlines) and split them again? Thanks, Mark

Mark_Heydenrych · ‎07-06-2017

I have a flow in NiFi which splits a file into individual lines, inserts those lines into a database and after those have been inserted updates a control table. The control table only updates after every line has been inserted. To achieve this, the fragment.index is compared to fragment.count - if these are equal, then I know that every line has been processed and we can move on to updating the control table. However recently some of our files failed to update the control table. I have outputted the Attributes of the flow files to disk, and it shows something that confuses me: the number of flow files that comes out of the split text processor is 66430, which matches the number of lines in the file. However, the fragment.count attribute is 66443. Does anybody know why the fragment index would be incorrect, and how I can fix this?

Mark_Heydenrych · ‎12-06-2016

Rather than using the Cloudera JDBC driver, HortonWorks provides drivers at http://hortonworks.com/downloads/#data-platform

Mark_Heydenrych · ‎04-28-2016

On HDP 2.4, some services may have corrupted jar and tar.gz files on HDFS. The specific files I have seen broken are as follows: hive.tar.gz mapreduce.tar.gz hadoop-streaming.jar pig.tar.gz spark-hdp-assembly.jar sqoop.tar.gz tez.tar.gz All of these are found in the /hdp/apps/<hdp-version> directory. On my install, they all had zero size (reported as 0.1 kB on HDFS File View). This led to errors in a variety of services, including the following: gzip: /foo/bar/yarn/local/filecache/11_tmp/tmp_mapreduce.tar.gz: unexpected end of file tar: This does not look like a tar archive Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher There may be other errors, but those are the ones I personally experienced. This is fairly easy to fix. Each corrupt file has a healthy version on the local file system. The healthy version must be copied from the local system to HDFS, replacing the corrupt version. For example, to update Tez, perform the following: $ hdfs dfs -rm /hdp/apps/<hdp-version>/tez/* $ hdfs dfs put /usr/hdp/current/tez-client/lib/tez.tar.gz /hdp/apps/<hdp-version>/tez/ $ hdfs dfs -chmod 444 /hdp/apps/<hdp_version>/tez/tez.tar.gz Problems caused by a corrupt tar on Tez should now be fixed

Online	Offline
Last Visited	‎05-25-2018 05:20 AM

Member Since	‎04-22-2016 05:48 AM
Last Visited	‎05-25-2018 05:20 AM
Posts	67
Kudos received	6

Cloudera Community

Re: Incorrect fragment.count in nifi

Resolving Incorrect Fragment Count Values When Spl...

Re: Connect Oracle SQL Developer to Hive

Fixing broken tar.gz and jar files in HDP 2.4