In machine learning, association rules are one of the important concepts that is widely applied in problems like market basket analysis. Consider a supermarket, where all the related items such as grocery items, dairy items, cosmetics, stationary items etc are kept together in same aisle. This helps the customers to find their required items timely. This further helps them to remember the items to purchase they might have forgotten or to they may like to purchase if suggested. Association rules thus enable one to corelate among various products from a huge set of available items. Analysing the items customer buy together also helps the retailers to identify the items they can offer on discount. For example, retailer selling baby lotion and baby shampoo on MRP, but offering a discount on their combination. Customer who wished to buy only shampoo or only lotion, may now think of buying the combination. Other factors too can contribute to the purchase of combination of products. Another strategy can keep related products on the opposite ends of the shelf to prompt the customer to scan through the entire shelf hoping that he might add a few more items to his cart.
It is important to note that the association rules do not extract the customer’s preference about the
items but find the relations among the items that are generally bought together by them. The rules
only identify the frequent associations between the items. The rules work with an antecedent (if)
and a consequent (then), both connecting to the set of items. For example, if a person buys pizza,
then he may buy a cold drink too. This is because there is a strong relation between pizza and cold
drink. Association rules help to find the dependency of one item on other by consider the history
of customer’s transaction patterns.
Basic Concepts
There are few terms that one should understand before understanding the algorithm.
a. k-Itemset: It is a set of kitems. For example, 2-itemset can be {pencil, eraser} or {bread, butter}
etc., 3-itemset can be {bread, butter, milk}.
b. Support: Frequency of appearance of an item appears in all the considered transactions is called
as the support of an item. Mathematically, support of an item x is defined as:
𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝑥) = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑥 / 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑛𝑠𝑖𝑑𝑒𝑟𝑒𝑑 𝑡𝑟𝑎𝑛𝑠𝑐𝑡𝑖𝑜𝑛
c. Confidence: Confidence is defined as the likelihood of obtaining item y along with an item x.
Mathematically, it is defined as the ratio of frequency of transactions containing items x and y to
the frequency of transactions that contained item x.
𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒(𝑦) = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑥 𝑎𝑛𝑑 𝑦 / 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑛𝑠𝑖𝑑𝑒𝑟𝑒𝑑 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔
Confidence can also be defined as probability of occurrence of y, given probability of
occurrence of x.
𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒(𝑥 => 𝑦) = 𝑃(𝑦/𝑥)
where x is antecedent, and y is a consequent. In terms of support, confidence can be described
as:
𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒(𝑦) = 𝑠𝑢𝑝𝑝𝑜𝑟𝑡 (𝑥 ∪ 𝑦) / 𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝑥)
d. Frequent Itemset: An item whose support is at least the minimum support threshold is known
as a frequent itemset. For example, let minimum support threshold is 10, then an item set with
support score 11 is a frequent itemset but an item set with support score 9 is not.
0 टिप्पणियाँ:
Post a Comment