java - Query two related tables (Joins) -

- July 15, 2010

this first table in hive- contains information item purchasing.

create external table if not exists table1   (this main table through comparisons need made) ( item_id bigint, created_time string, buyer_id bigint )

and data in above first table

**item_id**   **created_time**    **buyer_id** 220003038067     2012-06-21        1015826235 300003861266     2012-06-21        1015826235 140002997245     2012-06-14        1015826235 200002448035     2012-06-08        1015826235 260003553381     2012-06-07        1015826235

this second table in hive- contains information items purchasing.

create external table if not exists table2 ( user_id bigint, purchased_item array<struct<product_id: bigint,timestamps:string>> )

and data in above table-

**user_id**    **purchased_item** 1015826235     [{"product_id":220003038067,"timestamps":"1340321132000"},    {"product_id":300003861266,"timestamps":"1340271857000"},    {"product_id":140002997245,"timestamps":"1339694926000"},    {"product_id":200002448035,"timestamps":"1339172659000"},    {"product_id":260003553381,"timestamps":"1339072514000"}]

i have reduced data 1 buyer_id(user_id) make problem simple understand.

problem statement-

i need compare table2 table1, means need see if user_id table2 , buyer_id table1 (as both same thing) gets matched, purchased_item in table2 array of product_id(same item_id) , timestamps(same created_time) should same item_id , created_time in table1 particular user_id(buyer_id) , possible (means purchased_item , item_id, created_time) not same or product_id , timestamps missing table2 after comparing table1.

by mean count of product_id , timestamps in table2 should same count of item_id , created_time in table1 particular buyer_id(user_id) , content should same. if not same or entry missing table2, need print result, particular item_id , created_time missing table2 or product_id , timestamps not same after comparing table1.

so example in table1 buyer_id 1015826235 have 5 item_id , 5 created_time, in table2 should have 5 product_id , 5 timestamps same table1 same user_id(buyer_id) in 1 row. if not same or entry missing need print result showing missing or data wrong.

so make more clear-

purchased_item array of struct in table2 , contains 2 things product_id , timestamps.

if user_id , buyer_id gets matched product_id in table2 should matched item_id in table1 , timestamps in table2 should matched created_time in table1.

updated

hiveql sql query question:-

q 1) find user_id table2 product_id or timestamp not same item_id or created_time after comparing table1 on buyer_id.

query wrote first question. query right?

a 1) select table2.user_id table2 table1.user_id = table2.buyer_id , (table1.item_id <> table2.product_id or unix_timestamp(table1.created_time) <> table2.timestamps)   q 2) find `buyer_id(user_id)` , `item_id` , `created_time` missing `table2` after comparing `table1` on `buyer_id`.  2) not sure.

i suggest not use "string" data type created_time , timestamp because makes comparisons harder. instead of use date or timestamp.

and question: think big problem here using strings alone!

i'm oracle user there should in hive:

to_date({string},{format})

as used

unix_timestamp({string})

another thing: when have strucs, shouls address fields this: table2.purchased_item[{address}].product_id , not table2.product_id unknown.

and 1 more suggestion:

trunc({date},{format ex: 'ss' sseconds})

when created_time , time_stamp not in same time ticks(may 0.001 seconds difference because of difference insert time if insert or sysdate each of them) better truncate date seconds or milli-seconds or whatever think better.

one more thing: use nvl() or convert null values here well, becuase if have such problems possible have null values in table causes problems in queries, nvl() function convert null like.

hope helps.

Search This Blog

Convert PH

java - Query two related tables (Joins) -

Comments

Post a Comment

Popular posts from this blog

c# - SVN Error : "svnadmin: E205000: Too many arguments" -

c# - Copy ObservableCollection to another ObservableCollection -

All overlapping substrings matching a java regex -