Yep, "Map CSV Column to Table Columns" w/o any description is a pretty unclear question. I'll just assume you are trying to create a Hive table aligned with a CSV file. Apologies if I've oversimplified your (lack of?) question. 😉
NOTE: I'm lifting this VERY SIMPLE example from an old blog posting of mine; https://martin.atlassian.net/wiki/x/WIATAg.
Let's just create a simple CSV file and load it into HDFS.
[lester@n1 ~]$ cat data.txt a,1 b,2 c,3 [lester@n1 ~]$ hdfs dfs -put data.txt [lester@n1 ~]$ hdfs dfs -cat /user/lester/data.txt a,1 b,2 c,3 [lester@n1 ~]$
The following script creates a table that this data could be referenced by. You could call this file createTable.hql.
CREATE TABLE alpha_num ( alpha string, num int ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' STORED AS TEXTFILE;
Then just run it with something like hive -f createTable.hql and follow that up with the CLI to interactively describe, load, and query the new table.
[lester@n1 ~]$ hive hive> DESCRIBE alpha_num; OK alpha string from deserializer num string from deserializer Time taken: 0.462 seconds, Fetched: 2 row(s) hive> LOAD DATA INPATH '/user/lester/data.txt' INTO TABLE alpha_num; Loading data to table default.alpha_num Table default.alpha_num stats: [numFiles=1, totalSize=12] OK Time taken: 0.498 seconds hive> SELECT * FROM alpha_num; OK a 1 b 2 c 3 Time taken: 0.125 seconds, Fetched: 3 row(s) hive> exit; [lester@n1 ~]$
As always, multiple ways to do this, but if this is aligned with your question, I hope it helps.
If you are talking about further "mapping" in the classical sense, feel free to check out https://github.com/lestermartin/oss-transform-processing-comparison/tree/master/mapping for some more ramblings on this topic.
Gotcha. Well, that is really not much of a Hadoop question so I'll let you run it down outside of this forum site. Once you need help extracting data from MySQL and ingesting it into Hadoop then raise another question and there are many who can provide you help with tools like Sqoop and NiFi.
I would suggest trying to pass your CSV file first through a splitText processor so you have one FlowFile per line in your original CSV file. Then use the ExtractText processor to generate new FlowFile Attributes based off your SQL column names and a regex to extract the correct value from your content. Then you can use a ReplaceText processor to create new content in your FlowFile from the extracted attributes. Finally route the FlowFile to a PutSQL processor.