Support Questions
Find answers, ask questions, and share your expertise

Tutorial Exercise 3 is wrong !

New Contributor

Hi

 

I spent a lot of time understanding that exercie

thaks fo for the explanation 

but I think that I figure out somthing wrong with the result 

 

I created the same approche using Phtyon jupiter using the same data, I got thos results

 

 

["Nike Men's Dri-FIT Victory Golf Polo,Perfect Fitness Perfect Rip Deck", 71862], 
["O'Brien Men's Neoprene Life Vest,Perfect Fitness Perfect Rip Deck", 66421],
["Nike Men's Dri-FIT Victory Golf Polo,O'Brien Men's Neoprene Life Vest", 57123],
["Nike Men's Free 5.0+ Running Shoe,Perfect Fitness Perfect Rip Deck", 41741],
["Perfect Fitness Perfect Rip Deck,Under Armour Girls' Toddler Spine Surge Runni", 36747],
["Nike Men's Dri-FIT Victory Golf Polo,Nike Men's Free 5.0+ Running Shoe", 35136],
["Nike Men's Free 5.0+ Running Shoe,O'Brien Men's Neoprene Life Vest", 34811],
["Nike Men's Dri-FIT Victory Golf Polo,Under Armour Girls' Toddler Spine Surge Runni", 30421],
["O'Brien Men's Neoprene Life Vest,Under Armour Girls' Toddler Spine Surge Runni", 28161],
["Nike Men's CJ Elite 2 TD Football Cleat,Perfect Fitness Perfect Rip Deck", 25234],

as you can see it's very different from the spark output

 

When I tried to understan, I used this method

 

I filtred the order_items for the order numbers <=10 to have a look manually

 

I was surprized to see that the Spark output in that case is like this

 

(62,(O'Brien Men's Neoprene Life Vest,Perfect Fitness Perfect Rip Deck))
(23,(Nike Men's Dri-FIT Victory Golf Polo,Perfect Fitness Perfect Rip Deck))
(16,(Nike Men's Dri-FIT Victory Golf Polo,O'Brien Men's Neoprene Life Vest))
(15,(Perfect Fitness Perfect Rip Deck,Perfect Fitness Perfect Rip Deck))
(10,(Perfect Fitness Perfect Rip Deck,Team Golf New England Patriots Putter Grip))
(8,(O'Brien Men's Neoprene Life Vest,Team Golf New England Patriots Putter Grip))
(6,(Nike Men's Dri-FIT Victory Golf Polo,Team Golf New England Patriots Putter Grip))
(5,(Glove It Imperial Golf Towel,Pelican Sunstream 100 Kayak))
(5,(Diamondback Women's Serene Classic Comfort Bi,Perfect Fitness Perfect Rip Deck))
(5,(Diamondback Women's Serene Classic Comfort Bi,Glove It Imperial Golf Towel))

the jupyter output on the same perimeter is 

 

"O'Brien Men's Neoprene Life Vest,Perfect Fitness Perfect Rip Deck": 62
"Nike Men's Dri-FIT Victory Golf Polo,Perfect Fitness Perfect Rip Deck": 23,
"Nike Men's Dri-FIT Victory Golf Polo,O'Brien Men's Neoprene Life Vest": 16,
"Diamondback Women's Serene Classic Comfort Bi,Perfect Fitness Perfect Rip Deck": 10,
'Perfect Fitness Perfect Rip Deck,Team Golf New England Patriots Putter Grip': 10,
"O'Brien Men's Neoprene Life Vest,Team Golf New England Patriots Putter Grip": 8,
"Nike Men's Dri-FIT Victory Golf Polo,Team Golf New England Patriots Putter Grip": 6,
"Diamondback Women's Serene Classic Comfort Bi,Glove It Imperial Golf Towel": 5,
"Nike Men's CJ Elite 2 TD Football Cleat,Nike Men's Dri-FIT Victory Golf Polo": 5,
"Nike Men's Dri-FIT Victory Golf Polo,Pelican Sunstream 100 Kayak": 5,

 

the line on spak saying

(15,(Perfect Fitness Perfect Rip Deck,Perfect Fitness Perfect Rip Deck))  should not exist and as you can see, it's showing a wrong information !

 

is that a problem with spark ? with the code ? with what ?

 

PS : datascientist shoul put more comments on their work, ...really 

 

Regards

Mhamed Ben Jmaa

 

0 REPLIES 0