<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to load and store nvarchar in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-load-and-store-nvarchar/m-p/131512#M27209</link>
    <description>&lt;P&gt;&lt;STRONG&gt;&lt;A rel="user" href="https://community.cloudera.com/users/5134/kaliyugantagonist.html" nodeid="5134"&gt;@Kaliyug Antagonist&lt;/A&gt; &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Unicode file:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;[root@test test]# pwd&lt;/P&gt;&lt;P&gt;/root/test&lt;/P&gt;&lt;P&gt;[root@test test]# cat xyz&lt;/P&gt;&lt;P&gt;Les caractères accentués (Français)&lt;/P&gt;&lt;P&gt;En données nous avons confiance&lt;/P&gt;&lt;P&gt;Données, données, partout&lt;/P&gt;&lt;P&gt;et tous les noeuds étaient déconnecté&lt;/P&gt;&lt;P&gt;Données, données, partout&lt;/P&gt;&lt;P&gt;[root@test test]#&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;External table DDL:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;create external table demand_db.unicode &lt;/P&gt;&lt;P&gt;(data string)&lt;/P&gt;&lt;P&gt;COMMENT 'External table for data cleansing'&lt;/P&gt;&lt;P&gt;LOCATION '/tmp/test/';&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;External table location:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;[root@test ~]# hdfs dfs -mkdir -p /tmp/test&lt;/P&gt;&lt;P&gt;[root@test ~]# hdfs dfs -chmod -R 777 /tmp/test&lt;/P&gt;&lt;P&gt;[root@test ~]# hdfs dfs -ls /tmp&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Output:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;hive&amp;gt; create external table unicode&lt;/P&gt;&lt;P&gt;  &amp;gt; (data string)&lt;/P&gt;&lt;P&gt;  &amp;gt; COMMENT
'External table for data cleansing'&lt;/P&gt;&lt;P&gt;  &amp;gt; LOCATION
'/tmp/test/';&lt;/P&gt;&lt;P&gt;OK&lt;/P&gt;&lt;P&gt;Time taken: 0.502 seconds&lt;/P&gt;&lt;P&gt;hive&amp;gt; select * from unicode;&lt;/P&gt;&lt;P&gt;OK&lt;/P&gt;&lt;P&gt;Les caractères accentués (Français)&lt;/P&gt;&lt;P&gt;En données nous avons confiance&lt;/P&gt;&lt;P&gt;Données, données, partout&lt;/P&gt;&lt;P&gt;et tous les noeuds étaient déconnecté&lt;/P&gt;&lt;P&gt;Données, données, partout&lt;/P&gt;&lt;P&gt;Time taken: 0.897 seconds, Fetched: 8 row(s)&lt;/P&gt;&lt;P&gt;hive&amp;gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Conclusion: &lt;/STRONG&gt;You do not need to covert unicode character set. Also String works perfectly in this case.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
    <pubDate>Fri, 06 May 2016 23:34:44 GMT</pubDate>
    <dc:creator>rbiswas1</dc:creator>
    <dc:date>2016-05-06T23:34:44Z</dc:date>
    <item>
      <title>How to load and store nvarchar</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-load-and-store-nvarchar/m-p/131508#M27205</link>
      <description>&lt;P&gt;Stack : Installed HDP-2.3.2.0-2950 using Ambari 2.1&lt;/P&gt;&lt;P&gt;The steps that I am following :&lt;/P&gt;&lt;OL&gt;
&lt;LI&gt;Load SQL server tables onto HDFS using Sqoop&lt;/LI&gt;&lt;LI&gt;Create EXTERNAL tables in Hive&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;I didn't use anything pertaining to charset/unicode/utf-8 while executing the sqoop import commands.&lt;/P&gt;&lt;P&gt;While creating the Hive external table, I was wondering what data type shall I select for the nvarchar columns in the original sql server table, now I am worried that even in Sqoop while importing that needs to be addressed.&lt;/P&gt;&lt;OL&gt;
&lt;LI&gt;Couldn't find relevant charset/nvarchar etc. options in Sqoop import&lt;/LI&gt;&lt;LI&gt;In Hive, can varchar/string blindly be used in place of nvarchar&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;****************************edit : Added further code&lt;/P&gt;&lt;P&gt;I now have a problem, Sqoop is auto. converting nvarchar to String&lt;/P&gt;&lt;UL&gt;
&lt;LI&gt;When I mention --as-avrodatafile, .avro files and a schema file is created which is as follows :&lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;[sqoop@l1038lab root]$ hadoop fs -cat /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc
{
  "type" : "record",
  "name" : "DimSampleDesc",
  "doc" : "Sqoop import of DimSampleDesc",
  "fields" : [ {
    "name" : "SmapiName_ver",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "SmapiName_ver",
    "sqlType" : "12"
  }, {
    "name" : "SmapiColName",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "SmapiColName",
    "sqlType" : "12"
  }, {
    "name" : "ChartType",
    "type" : [ "null", "int" ],
    "default" : null,
    "columnName" : "ChartType",
    "sqlType" : "4"
  }, {
    "name" : "X_Indexet",
    "type" : [ "null", "int" ],
    "default" : null,
    "columnName" : "X_Indexet",
    "sqlType" : "4"
  }, {
    "name" : "Y_Indexet",
    "type" : [ "null", "int" ],
    "default" : null,
    "columnName" : "Y_Indexet",
    "sqlType" : "4"
  }, {
    "name" : "X_Tick",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "X_Tick",
    "sqlType" : "-9"
  }, {
    "name" : "Y_Tick",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "Y_Tick",
    "sqlType" : "-9"
  }, {
    "name" : "X_TickRange",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "X_TickRange",
    "sqlType" : "-9"
  }, {
    "name" : "X_TickRangeFrom",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "X_TickRangeFrom",
    "sqlType" : "-9"
  }, {
    "name" : "X_TickRangeTom",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "X_TickRangeTom",
    "sqlType" : "-9"
  }, {
    "name" : "Y_TickRange",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "Y_TickRange",
    "sqlType" : "-9"
  }, {
    "name" : "Y_TickRangeFrom",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "Y_TickRangeFrom",
    "sqlType" : "-9"
  }, {
    "name" : "Y_TickRangeTom",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "Y_TickRangeTom",
    "sqlType" : "-9"
  }, {
    "name" : "IndexCount",
    "type" : [ "null", "int" ],
    "default" : null,
    "columnName" : "IndexCount",
    "sqlType" : "4"
  }, {
    "name" : "X_IndexCount",
    "type" : [ "null", "int" ],
    "default" : null,
    "columnName" : "X_IndexCount",
    "sqlType" : "4"
  }, {
    "name" : "Y_IndexCount",
    "type" : [ "null", "int" ],
    "default" : null,
    "columnName" : "Y_IndexCount",
    "sqlType" : "4"
  }, {
    "name" : "X_Symbol",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "X_Symbol",
    "sqlType" : "-9"
  }, {
    "name" : "X_SymbolName",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "X_SymbolName",
    "sqlType" : "-9"
  }, {
    "name" : "X_SymbolDescr",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "X_SymbolDescr",
    "sqlType" : "-9"
  }, {
    "name" : "Y_Symbol",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "Y_Symbol",
    "sqlType" : "-9"
  }, {
    "name" : "Y_SymbolName",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "Y_SymbolName",
    "sqlType" : "-9"
  }, {
    "name" : "Y_SymbolDescr",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "Y_SymbolDescr",
    "sqlType" : "-9"
  }, {
    "name" : "SmapiName",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "SmapiName",
    "sqlType" : "12"
  }, {
    "name" : "Incorrect_Ver_FL",
    "type" : [ "null", "boolean" ],
    "default" : null,
    "columnName" : "Incorrect_Ver_FL",
    "sqlType" : "-7"
  } ],
  "tableName" : "DimSampleDesc"
}[sqoop@l1038lab root]$&lt;/PRE&gt;&lt;UL&gt;
&lt;LI&gt;I then simply create a hive external table&lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;hive&amp;gt;
    &amp;gt; CREATE EXTERNAL TABLE DimSampleDesc  ROW FORMAT SERDE  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'  STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  OUTPUTFORMAT  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'  TBLPROPERTIES (    'avro.schema.url'='hdfs://l1031lab.sss.se.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc');
OK
Time taken: 0.37 seconds
hive&amp;gt;&lt;/PRE&gt;&lt;UL&gt;
&lt;LI&gt;The table described as follows :&lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;hive (odp_dw_may2016)&amp;gt;
                     &amp;gt;
                     &amp;gt; describe formatted dimsampledesc;
OK
col_name        data_type       comment
# col_name              data_type               comment
smapiname_ver           string
smapicolname            string
charttype               int
x_indexet               int
y_indexet               int
x_tick                  string
y_tick                  string
x_tickrange             string
x_tickrangefrom         string
x_tickrangetom          string
y_tickrange             string
y_tickrangefrom         string
y_tickrangetom          string
indexcount              int
x_indexcount            int
y_indexcount            int
x_symbol                string
x_symbolname            string
x_symboldescr           string
y_symbol                string
y_symbolname            string
y_symboldescr           string
smapiname               string
incorrect_ver_fl        boolean
# Detailed Table Information
Database:               odp_dw_may2016
Owner:                  hive
CreateTime:             Mon May 09 16:37:08 CEST 2016
LastAccessTime:         UNKNOWN
Protect Mode:           None
Retention:              0
Location:               hdfs://l1031lab.sss.se.com:8020/apps/hive/warehouse/odp_dw_may2016.db/dimsampledesc
Table Type:             EXTERNAL_TABLE
Table Parameters:
        COLUMN_STATS_ACCURATE   false
        EXTERNAL                TRUE
        avro.schema.url         hdfs://l1031lab.sss.se.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc
        numFiles                6
        numRows                 -1
        rawDataSize             -1
        totalSize               8566342
        transient_lastDdlTime   1462804628
# Storage Information
SerDe Library:          org.apache.hadoop.hive.serde2.avro.AvroSerDe
InputFormat:            org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        serialization.format    1
Time taken: 0.441 seconds, Fetched: 56 row(s)
hive (odp_dw_may2016)&amp;gt;&lt;/PRE&gt;&lt;P&gt;What's the risk here and how do I proceed ?&lt;/P&gt;</description>
      <pubDate>Wed, 04 May 2016 23:29:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-load-and-store-nvarchar/m-p/131508#M27205</guid>
      <dc:creator>kaliyugantagoni</dc:creator>
      <dc:date>2016-05-04T23:29:06Z</dc:date>
    </item>
    <item>
      <title>Re: How to load and store nvarchar</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-load-and-store-nvarchar/m-p/131509#M27206</link>
      <description>&lt;P&gt;By default Hive is UTF-8, I would recommend doing a sqoop import to Hive table directly. Define a Table DDL and Import into table. Yes you can use string for nvarchar.&lt;/P&gt;</description>
      <pubDate>Thu, 05 May 2016 05:16:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-load-and-store-nvarchar/m-p/131509#M27206</guid>
      <dc:creator>rnalakurthi</dc:creator>
      <dc:date>2016-05-05T05:16:45Z</dc:date>
    </item>
    <item>
      <title>Re: How to load and store nvarchar</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-load-and-store-nvarchar/m-p/131510#M27207</link>
      <description>&lt;P&gt;Hive has both varchar and string data types, which should be used in case of nvarchar.&lt;/P&gt;&lt;P&gt;Even I am in a dilemma about whether to import directly into Hive or first-HDFS-then-Hive, maybe I will try both.&lt;/P&gt;</description>
      <pubDate>Fri, 06 May 2016 13:36:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-load-and-store-nvarchar/m-p/131510#M27207</guid>
      <dc:creator>kaliyugantagoni</dc:creator>
      <dc:date>2016-05-06T13:36:52Z</dc:date>
    </item>
    <item>
      <title>Re: How to load and store nvarchar</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-load-and-store-nvarchar/m-p/131511#M27208</link>
      <description>&lt;P&gt;By default, sqoop will directly import nvarchar to string (if you let sqoop create hive tables). &lt;/P&gt;&lt;P&gt;If you create a HCatalog table stored as ORC in advance, you can directly right to Hive as ORC and gain performance and compression. So, I suggest you try put directly into Hive. &lt;/P&gt;</description>
      <pubDate>Fri, 06 May 2016 13:42:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-load-and-store-nvarchar/m-p/131511#M27208</guid>
      <dc:creator>ravi1</dc:creator>
      <dc:date>2016-05-06T13:42:16Z</dc:date>
    </item>
    <item>
      <title>Re: How to load and store nvarchar</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-load-and-store-nvarchar/m-p/131512#M27209</link>
      <description>&lt;P&gt;&lt;STRONG&gt;&lt;A rel="user" href="https://community.cloudera.com/users/5134/kaliyugantagonist.html" nodeid="5134"&gt;@Kaliyug Antagonist&lt;/A&gt; &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Unicode file:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;[root@test test]# pwd&lt;/P&gt;&lt;P&gt;/root/test&lt;/P&gt;&lt;P&gt;[root@test test]# cat xyz&lt;/P&gt;&lt;P&gt;Les caractères accentués (Français)&lt;/P&gt;&lt;P&gt;En données nous avons confiance&lt;/P&gt;&lt;P&gt;Données, données, partout&lt;/P&gt;&lt;P&gt;et tous les noeuds étaient déconnecté&lt;/P&gt;&lt;P&gt;Données, données, partout&lt;/P&gt;&lt;P&gt;[root@test test]#&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;External table DDL:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;create external table demand_db.unicode &lt;/P&gt;&lt;P&gt;(data string)&lt;/P&gt;&lt;P&gt;COMMENT 'External table for data cleansing'&lt;/P&gt;&lt;P&gt;LOCATION '/tmp/test/';&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;External table location:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;[root@test ~]# hdfs dfs -mkdir -p /tmp/test&lt;/P&gt;&lt;P&gt;[root@test ~]# hdfs dfs -chmod -R 777 /tmp/test&lt;/P&gt;&lt;P&gt;[root@test ~]# hdfs dfs -ls /tmp&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Output:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;hive&amp;gt; create external table unicode&lt;/P&gt;&lt;P&gt;  &amp;gt; (data string)&lt;/P&gt;&lt;P&gt;  &amp;gt; COMMENT
'External table for data cleansing'&lt;/P&gt;&lt;P&gt;  &amp;gt; LOCATION
'/tmp/test/';&lt;/P&gt;&lt;P&gt;OK&lt;/P&gt;&lt;P&gt;Time taken: 0.502 seconds&lt;/P&gt;&lt;P&gt;hive&amp;gt; select * from unicode;&lt;/P&gt;&lt;P&gt;OK&lt;/P&gt;&lt;P&gt;Les caractères accentués (Français)&lt;/P&gt;&lt;P&gt;En données nous avons confiance&lt;/P&gt;&lt;P&gt;Données, données, partout&lt;/P&gt;&lt;P&gt;et tous les noeuds étaient déconnecté&lt;/P&gt;&lt;P&gt;Données, données, partout&lt;/P&gt;&lt;P&gt;Time taken: 0.897 seconds, Fetched: 8 row(s)&lt;/P&gt;&lt;P&gt;hive&amp;gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Conclusion: &lt;/STRONG&gt;You do not need to covert unicode character set. Also String works perfectly in this case.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Fri, 06 May 2016 23:34:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-load-and-store-nvarchar/m-p/131512#M27209</guid>
      <dc:creator>rbiswas1</dc:creator>
      <dc:date>2016-05-06T23:34:44Z</dc:date>
    </item>
    <item>
      <title>Re: How to load and store nvarchar</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-load-and-store-nvarchar/m-p/131513#M27210</link>
      <description>&lt;P&gt;I proceeded with varchar, what is your opinion ? Shall I use String or VARCHAR ?&lt;/P&gt;</description>
      <pubDate>Mon, 09 May 2016 18:39:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-load-and-store-nvarchar/m-p/131513#M27210</guid>
      <dc:creator>kaliyugantagoni</dc:creator>
      <dc:date>2016-05-09T18:39:22Z</dc:date>
    </item>
    <item>
      <title>Re: How to load and store nvarchar</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-load-and-store-nvarchar/m-p/131514#M27211</link>
      <description>&lt;P&gt;@Kaliyug Antagonist&lt;/P&gt;&lt;P&gt;You should use varchar as much as possible.&lt;/P&gt;&lt;P&gt;Because&lt;/P&gt;&lt;P&gt;1. Silent truncate in string.&lt;/P&gt;&lt;P&gt;2. String by default maps to varchar (32762). For smaller columns this might lead to performance issue.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Tue, 10 May 2016 12:26:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-load-and-store-nvarchar/m-p/131514#M27211</guid>
      <dc:creator>rbiswas1</dc:creator>
      <dc:date>2016-05-10T12:26:49Z</dc:date>
    </item>
    <item>
      <title>Re: How to load and store nvarchar</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-load-and-store-nvarchar/m-p/131515#M27212</link>
      <description>&lt;P&gt;Edited my original post with Sqoop and Hive outputs - the nvarchar is auto. converted to string - can you help ?&lt;/P&gt;</description>
      <pubDate>Tue, 10 May 2016 15:38:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-load-and-store-nvarchar/m-p/131515#M27212</guid>
      <dc:creator>kaliyugantagoni</dc:creator>
      <dc:date>2016-05-10T15:38:26Z</dc:date>
    </item>
    <item>
      <title>Re: How to load and store nvarchar</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-load-and-store-nvarchar/m-p/131516#M27213</link>
      <description>&lt;P&gt;@Kaliyug Antagonist&lt;/P&gt;&lt;P&gt;Good question.&lt;/P&gt;&lt;P&gt;Try this is your sqoop argument:&lt;/P&gt;&lt;P&gt;--map-column-java id=NVARCHAR,value=VARCHAR&lt;/P&gt;&lt;P&gt;There is another option while creating hive tables from sqoop:&lt;/P&gt;&lt;P&gt;--map-column-hive id=NVARCHAR,value=VARCHAR(n)&lt;/P&gt;&lt;P&gt;**Choose your n. &lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Wed, 11 May 2016 06:20:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-load-and-store-nvarchar/m-p/131516#M27213</guid>
      <dc:creator>rbiswas1</dc:creator>
      <dc:date>2016-05-11T06:20:11Z</dc:date>
    </item>
  </channel>
</rss>

