Member since
08-03-2019
186
Posts
34
Kudos Received
26
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1913 | 04-25-2018 08:37 PM | |
5838 | 04-01-2018 09:37 PM | |
1550 | 03-29-2018 05:15 PM | |
6668 | 03-27-2018 07:22 PM | |
1957 | 03-27-2018 06:14 PM |
03-23-2018
06:18 AM
So the issue is with the "PK" column used in distributing the data in case of multiple mappers. It has always been recommended that an "integral" column is used as the "split by" column and your import is trying to use the column "CustID" which is String. Have a look at how your splits are calculated during the import. 8020 [main] WARN org.apache.sqoop.mapreduce.db.TextSplitter - You are strongly encouraged to choose an integral split column.
8025 [main] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - Creating input split with lower bound '`CustID` >= '1'' and upper bound '`CustID` < '3?????''
8025 [main] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - Creating input split with lower bound '`CustID` >= '1'' and upper bound '`CustID` < '3?????''
8025 [main] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - Creating input split with lower bound '`CustID` >= '3?????'' and upper bound '`CustID` < '5?????''
8025 [main] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - Creating input split with lower bound '`CustID` >= '3?????'' and upper bound '`CustID` < '5?????''
8025 [main] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - Creating input split with lower bound '`CustID` >= '5?????'' and upper bound '`CustID` < '7*?????''
8025 [main] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - Creating input split with lower bound '`CustID` >= '5?????'' and upper bound '`CustID` < '7*?????''
8025 [main] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - Creating input split with lower bound '`CustID` >= '7*?????'' and upper bound '`CustID` <= '999999''
8025 [main] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - Creating input split with lower bound '`CustID` >= '7*?????'' and upper bound '`CustID` <= '999999''
8068 [main] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:4 The "?" indicates some foreign characters probably not parsed properly and hence resulted in the failure of your tasks. However, when you have only a single mapper, there is no such parse needed for CustID column and the data is "copied and pasted" to HDFS and your job ends OK.
... View more
03-23-2018
06:08 AM
@Shantanu kumar Your question can mean two things. Based on the value, redirect the flow files accordingly. Split the flow file based on a "custom value as separator". I am answering both of them. Solution 1 Follows a sample flow. What am I doing in this flow? 1. Generate flow file In this processor, I am generating a sample flow file with the content This is success.
This is failure. 2. Split text In this processor, I am splitting every individual row into a flow file. 3. Route on content This is the processor which is sending the flow files to respective relationships based on the content. Follows a snapshot of the processor config. In this processor, I am checking if the flow file contains "failure" in the content, then redirect to a similarly named relation. "success" in the content, then redirect it accordingly. else the flow file will go to "unmatched" relation And this way you can have your file split based on the content of it. PS - If you have structured/semi-structured data,eg CSV or JSON, you can change the logic to check for the value of that specific column and then redirect the flow files accordingly. Solution 2 Split the content based on a custom value. In this flow, I am using the SplitContent Processor. It can take either of two following options as splitting value. Hexadecimal byte stream, or Text My input flow file from GenerateFlowFile processor has following content. This is success. # This is failure And the SplitContent processor is using # as the split value. Follows the snapshot of the processor config. And I am able to get 2 flow files based on splits happened by my custom value passed to the processor. Hope this helps!
... View more
03-23-2018
05:24 AM
@ANKIT PATEL Here's a sample flow. For simplicity, instead of reding from FTP location, I am reading from local path. I have 3 processors in my flow. ListFile - Will list the files in the directory I passed in the Configuration tab. Similar to ListFTP, just local. FetchFile - Will fetch the files that I will mention. Again, similar to FetchFTP, but just local. PutFile - Will write the data. If you pay attention, ListFile processor is giving me the list of the files in folder, and since the downstream processor is stopped, the flow files are queued up. So I went ahead and did a "List Queue" to see the flow files which are queued up. I saw something like this. These are called as "flow files" in NiFi. If you click the "i" button to the leftmost side, you will see Attributes tab as shown below. You can see many attributes, but the main attributes that we need in this example are absolute.path - The location from where the file is filename - The name of the file These are the properties regarding the data we are about to fetch. Now FetchFile can read these files from the given directory and I can tell my FetchFile processor to read these files by using these "attributes" as shown below. Since I have the directory name information available in the form an attribute, I can use it while storing the data as well and hence mimicking the exact directory structure from the source. Hope that helps!
... View more
03-23-2018
04:34 AM
Also, can you please share the "actual" MR job logs that you can see when you are running your job with multiple mappers?
... View more
03-23-2018
04:22 AM
Are you using "split-by" column while not setting number of mappers to 1?
... View more
03-23-2018
03:45 AM
@Mark Lin Something like this will help you. In this case, I am trying to write the data to S3 and if it fails, redirect it through an UpdateAttribute process back to the parent processor again. For your scenario, you can fit in the e-mail logic and of course can stop the processor for some later action 🙂 Hope that helps!
... View more
03-23-2018
03:36 AM
@Christian Lunesa Can you please share your sqoop command? Are you using --direct by any chance?
... View more
03-23-2018
02:06 AM
@vishal dutt NiFi registry is not supported on Windows at this point in time. Please have a look at NiFi Registry Admin Guide for more details
... View more
03-22-2018
08:00 PM
@heta desai Have a look at this link. You can use the Hive table structure(s) given according to your log file format and process them as needed. The key is using Regex for parsing the records into individual columns. The tutorial talks about using HBase, but you can skip it if you don't want to use it at this point of time. Let know if you need any help.
... View more
03-22-2018
07:41 PM
@Vivek Singh did the suggestion worked for you?
... View more