Created 08-16-2016 05:25 PM
val tweets = sqlContext.read.json("hdfs://sandbox.hortonworks.com:8020/social/twitter")
This is a directory of JSON files, with much less and flatter Twitter schema then the full twitter schema listed below. This may have been the schema the first time I ran it a few days ago.
Do I need to restart Spark History? Yarn? Server? This is on the HDP 2.4 sandbox
This was run on the sandbox as:
spark-submit --class com.dataflowdeveloper.sentiment.TwitterSentimentAnalysis --master yarn-client sentiment.jar --verbose
Error:
16/08/16 16:37:13 INFO FileInputFormat: Total input paths to process : 14635 root |-- contributors: string (nullable = true) |-- coordinates: string (nullable = true) |-- created_at: string (nullable = true) |-- entities: struct (nullable = true) | |-- hashtags: array (nullable = true) | | |-- element: struct (containsNull = true) | | | |-- indices: array (nullable = true) | | | | |-- element: long (containsNull = true) | | | |-- text: string (nullable = true) | |-- media: array (nullable = true) | | |-- element: struct (containsNull = true) | | | |-- display_url: string (nullable = true) | | | |-- expanded_url: string (nullable = true) | | | |-- id: long (nullable = true) | | | |-- id_str: string (nullable = true) | | | |-- indices: array (nullable = true) | | | | |-- element: long (containsNull = true) | | | |-- media_url: string (nullable = true) | | | |-- media_url_https: string (nullable = true) | | | |-- sizes: struct (nullable = true) | | | | |-- large: struct (nullable = true) | | | | | |-- h: long (nullable = true) | | | | | |-- resize: string (nullable = true) | | | | | |-- w: long (nullable = true) | | | | |-- medium: struct (nullable = true) | | | | | |-- h: long (nullable = true) | | | | | |-- resize: string (nullable = true) | | | | | |-- w: long (nullable = true) | | | | |-- small: struct (nullable = true) | | | | | |-- h: long (nullable = true) | | | | | |-- resize: string (nullable = true) | | | | | |-- w: long (nullable = true) | | | | |-- thumb: struct (nullable = true) | | | | | |-- h: long (nullable = true) | | | | | |-- resize: string (nullable = true) | | | | | |-- w: long (nullable = true) | | | |-- type: string (nullable = true) | | | |-- url: string (nullable = true) | |-- symbols: array (nullable = true) | | |-- element: string (containsNull = true) | |-- urls: array (nullable = true) | | |-- element: string (containsNull = true) | |-- user_mentions: array (nullable = true) | | |-- element: string (containsNull = true) |-- extended_entities: struct (nullable = true) | |-- media: array (nullable = true) | | |-- element: struct (containsNull = true) | | | |-- display_url: string (nullable = true) | | | |-- expanded_url: string (nullable = true) | | | |-- id: long (nullable = true) | | | |-- id_str: string (nullable = true) | | | |-- indices: array (nullable = true) | | | | |-- element: long (containsNull = true) | | | |-- media_url: string (nullable = true) | | | |-- media_url_https: string (nullable = true) | | | |-- sizes: struct (nullable = true) | | | | |-- large: struct (nullable = true) | | | | | |-- h: long (nullable = true) | | | | | |-- resize: string (nullable = true) | | | | | |-- w: long (nullable = true) | | | | |-- medium: struct (nullable = true) | | | | | |-- h: long (nullable = true) | | | | | |-- resize: string (nullable = true) | | | | | |-- w: long (nullable = true) | | | | |-- small: struct (nullable = true) | | | | | |-- h: long (nullable = true) | | | | | |-- resize: string (nullable = true) | | | | | |-- w: long (nullable = true) | | | | |-- thumb: struct (nullable = true) | | | | | |-- h: long (nullable = true) | | | | | |-- resize: string (nullable = true) | | | | | |-- w: long (nullable = true) | | | |-- type: string (nullable = true) | | | |-- url: string (nullable = true) |-- favorite_count: long (nullable = true) |-- favorited: boolean (nullable = true) |-- filter_level: string (nullable = true) |-- followers_count: string (nullable = true) |-- friends_count: string (nullable = true) |-- geo: string (nullable = true) |-- handle: string (nullable = true) |-- hashtags: string (nullable = true) |-- id: long (nullable = true) |-- id_str: string (nullable = true) |-- in_reply_to_screen_name: string (nullable = true) |-- in_reply_to_status_id: string (nullable = true) |-- in_reply_to_status_id_str: string (nullable = true) |-- in_reply_to_user_id: string (nullable = true) |-- in_reply_to_user_id_str: string (nullable = true) |-- is_quote_status: boolean (nullable = true) |-- lang: string (nullable = true) |-- language: string (nullable = true) |-- location: string (nullable = true) |-- msg: string (nullable = true) |-- place: string (nullable = true) |-- possibly_sensitive: boolean (nullable = true) |-- profile_image_url: string (nullable = true) |-- retweet_count: string (nullable = true) |-- retweeted: boolean (nullable = true) |-- sentiment: string (nullable = true) |-- source: string (nullable = true) |-- tag: string (nullable = true) |-- text: string (nullable = true) |-- time: string (nullable = true) |-- time_zone: string (nullable = true) |-- timestamp_ms: string (nullable = true) |-- truncated: boolean (nullable = true) |-- tweet_id: string (nullable = true) |-- unixtime: string (nullable = true) |-- user: struct (nullable = true) | |-- contributors_enabled: boolean (nullable = true) | |-- created_at: string (nullable = true) | |-- default_profile: boolean (nullable = true) | |-- default_profile_image: boolean (nullable = true) | |-- description: string (nullable = true) | |-- favourites_count: long (nullable = true) | |-- follow_request_sent: string (nullable = true) | |-- followers_count: long (nullable = true) | |-- following: string (nullable = true) | |-- friends_count: long (nullable = true) | |-- geo_enabled: boolean (nullable = true) | |-- id: long (nullable = true) | |-- id_str: string (nullable = true) | |-- is_translator: boolean (nullable = true) | |-- lang: string (nullable = true) | |-- listed_count: long (nullable = true) | |-- location: string (nullable = true) | |-- name: string (nullable = true) | |-- notifications: string (nullable = true) | |-- profile_background_color: string (nullable = true) | |-- profile_background_image_url: string (nullable = true) | |-- profile_background_image_url_https: string (nullable = true) | |-- profile_background_tile: boolean (nullable = true) | |-- profile_banner_url: string (nullable = true) | |-- profile_image_url: string (nullable = true) | |-- profile_image_url_https: string (nullable = true) | |-- profile_link_color: string (nullable = true) | |-- profile_sidebar_border_color: string (nullable = true) | |-- profile_sidebar_fill_color: string (nullable = true) | |-- profile_text_color: string (nullable = true) | |-- profile_use_background_image: boolean (nullable = true) | |-- protected: boolean (nullable = true) | |-- screen_name: string (nullable = true) | |-- statuses_count: long (nullable = true) | |-- time_zone: string (nullable = true) | |-- url: string (nullable = true) | |-- utc_offset: long (nullable = true) | |-- verified: boolean (nullable = true) |-- user_name: string (nullable = true)
Created 08-16-2016 07:19 PM
Out of 14,000+ files, 3 had the wrong old schema. So Spark picked that schema. Did a skipTrash delete on those and restarted job. Now it works.
Created 08-16-2016 05:37 PM
16/08/16 17:42:58 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 16/08/16 17:42:59 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 16/08/16 17:42:59 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 16/08/16 17:43:00 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 16/08/16 17:43:00 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 16/08/16 17:43:00 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY 16/08/16 17:43:00 INFO ObjectStore: Initialized ObjectStore 16/08/16 17:43:00 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 16/08/16 17:43:00 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException 16/08/16 17:43:00 INFO HiveMetaStore: Added admin role in metastore 16/08/16 17:43:00 INFO HiveMetaStore: Added public role in metastore 16/08/16 17:43:00 INFO HiveMetaStore: No user is added in admin role, since config is empty 16/08/16 17:43:00 INFO HiveMetaStore: 0: get_all_databases 16/08/16 17:43:00 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_all_databases 16/08/16 17:43:00 INFO HiveMetaStore: 0: get_functions: db=default pat=* 16/08/16 17:43:00 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_functions: db=default pat=* 16/08/16 17:43:00 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
Created 08-16-2016 05:40 PM
Restarted YARN services and that did not help.
Created 08-16-2016 07:19 PM
Out of 14,000+ files, 3 had the wrong old schema. So Spark picked that schema. Did a skipTrash delete on those and restarted job. Now it works.