Member since
04-22-2016
67
Posts
6
Kudos Received
2
Solutions
07-07-2017
06:00 AM
Hi Matt I switched "Remove trailing Newlines" to false and got the number of fragments to 66443 as you suggested. This is a little confusing to me, as when I check the original file the number of lines is 66430. However your point is 100% correct. Thank you for opening the Jira request. While I wait for this, do you know of any useful workaround I can use in the time being to get the number of actually emitted fragments? It would be slower, but would it be possible, after the split, the merge the fragments (which would now include no newlines) and split them again? Thanks, Mark
... View more
07-06-2017
09:52 AM
I have a flow in NiFi which splits a file into individual lines, inserts those lines into a database and after those have been inserted updates a control table. The control table only updates after every line has been inserted. To achieve this, the fragment.index is compared to fragment.count - if these are equal, then I know that every line has been processed and we can move on to updating the control table.
However recently some of our files failed to update the control table. I have outputted the Attributes of the flow files to disk, and it shows something that confuses me: the number of flow files that comes out of the split text processor is 66430, which matches the number of lines in the file. However, the fragment.count attribute is 66443.
Does anybody know why the fragment index would be incorrect, and how I can fix this?
... View more
Labels:
- Labels:
-
Apache NiFi
12-06-2016
11:52 AM
Rather than using the Cloudera JDBC driver, HortonWorks provides drivers at http://hortonworks.com/downloads/#data-platform
... View more
04-28-2016
10:35 AM
2 Kudos
On HDP 2.4, some services may have corrupted jar and tar.gz files on HDFS. The specific files I have seen broken are as follows:
hive.tar.gz mapreduce.tar.gz hadoop-streaming.jar pig.tar.gz spark-hdp-assembly.jar sqoop.tar.gz tez.tar.gz All of these are found in the /hdp/apps/<hdp-version> directory. On my install, they all had zero size (reported as 0.1 kB on HDFS File View). This led to errors in a variety of services, including the following:
gzip: /foo/bar/yarn/local/filecache/11_tmp/tmp_mapreduce.tar.gz: unexpected end of file tar: This does not look like a tar archive
Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher
There may be other errors, but those are the ones I personally experienced. This is fairly easy to fix. Each corrupt file has a healthy version on the local file system. The healthy version must be copied from the local system to HDFS, replacing the corrupt version. For example, to update Tez, perform the following: $ hdfs dfs -rm /hdp/apps/<hdp-version>/tez/* $ hdfs dfs put /usr/hdp/current/tez-client/lib/tez.tar.gz /hdp/apps/<hdp-version>/tez/ $ hdfs dfs -chmod 444 /hdp/apps/<hdp_version>/tez/tez.tar.gz Problems caused by a corrupt tar on Tez should now be fixed
... View more