商業(yè)智能研究(十六)materialized view+dimension提高mondrian性能(二)
商業(yè)智能研究(十六) 用materialized view + dimension 來提高mondrian 的性能(二)接著上一篇的定義我們定義如下兩個(gè)dimension :
CREATE DIMENSION PRODUCT_DIM
LEVEL "product_id" IS "product"."product_id"
LEVEL "brand_name" IS "product"."brand_name"
LEVEL "product_class_id" IS "product_class"."product_class_id"
LEVEL "product_category" IS "product_class"."product_category"
LEVEL "product_department" IS "product_class"."product_department"
LEVEL "product_family" IS "product_class"."product_family"
HIERARCHY PRODUCT_ROLLUP (
"product_id" CHILD OF
"brand_name" CHILD OF
"product_class_id" CHILD OF
"product_category" CHILD OF
"product_department" CHILD OF
"product_family"
JOIN KEY ("product"."product_class_id") REFERENCES "product_class_id"
)
ATTRIBUTE "product_id" DETERMINES ("product_name")
ATTRIBUTE "product_class_id" DETERMINES ("product_subcategory");
CREATE DIMENSION TIME_DIM
LEVEL time IS "time_by_day"."time_id"
LEVEL month IS "time_by_day"."month_of_year"
LEVEL quarter IS "time_by_day"."quarter"
LEVEL year IS "time_by_day"."the_year"
HIERARCHY TIME_ROLLUP (
time CHILD OF
month CHILD OF
quarter CHILD OF
year
)
ATTRIBUTE time DETERMINES ("time_by_day"."the_date");
然后我們建立materialized view , 注意QUERY_REWRITE_INTEGRITY 和 QUERY_REWRITE_ENABLED 應(yīng)該已經(jīng)正確的設(shè)置了.
CREATE MATERIALIZED VIEW PRODUCT_SUM
BUILD IMMEDIATE
REFRESH ON DEMAND
ENABLE QUERY REWRITE
AS
SELECT "time_by_day"."time_id" ,
"product"."product_id",
"product_class"."product_class_id" ,
SUM("sales_fact_1997"."store_sales") ,
SUM("sales_fact_1997"."store_cost")
FROM "time_by_day" "time_by_day",
"product" "product" ,
"product_class" "product_class" ,
"sales_fact_1997" "sales_fact_1997"
WHERE "sales_fact_1997"."time_id" = "time_by_day"."time_id"
AND "sales_fact_1997"."product_id" = "product"."product_id"
AND "product"."product_class_id" = "product_class"."product_class_id"
GROUP BY "time_by_day"."time_id",
"product"."product_id" ,
"product_class"."product_class_id";
現(xiàn)在我們 set autotrace on .
執(zhí)行
SELECT "time_by_day"."the_date" ,
"product_class"."product_family" ,
SUM("sales_fact_1997"."store_sales") ,
SUM("sales_fact_1997"."store_cost")
FROM "time_by_day" "time_by_day",
"product" "product" ,
"product_class" "product_class" ,
"sales_fact_1997" "sales_fact_1997"
WHERE "sales_fact_1997"."time_id" = "time_by_day"."time_id"
AND "sales_fact_1997"."product_id" = "product"."product_id"
AND "product"."product_class_id" = "product_class"."product_class_id"
GROUP BY "time_by_day"."the_date",
"product_class"."product_family";
從圖一中我們看到,當(dāng)我們把product 聚合到了最高的level product_family,oracle 的 執(zhí)行計(jì)劃是從product_sum 中來做聚合的.這就是因?yàn)槲覀兘⒌膁imension 告訴了oracle product有這種層次的關(guān)系.product的dimension 即告訴了product_id 能夠決定product_name,也告訴了product_id能夠聚合product_family , 同樣的我們把Time 聚合到最高的level
SELECT "time_by_day"."the_year" ,
"product"."product_name" ,
SUM("sales_fact_1997"."store_sales") ,
SUM("sales_fact_1997"."store_cost")
FROM "time_by_day" "time_by_day",
"product" "product" ,
"product_class" "product_class" ,
"sales_fact_1997" "sales_fact_1997"
WHERE "sales_fact_1997"."time_id" = "time_by_day"."time_id"
AND "sales_fact_1997"."product_id" = "product"."product_id"
AND "product"."product_class_id" = "product_class"."product_class_id"
GROUP BY "time_by_day"."the_year",
"product"."product_name";
從圖二中的執(zhí)行計(jì)劃同樣可以看出我們只選取Time 來做聚合的時(shí)候,oracle 仍然是從product_sum 表中來做聚合, 用time_id 來決定the_date ,time_id同樣可以聚合year.
最后一個(gè)是同時(shí)聚合product 和 time
SELECT "time_by_day"."the_year" ,
"product_class"."product_family" ,
SUM("sales_fact_1997"."store_sales") ,
SUM("sales_fact_1997"."store_cost")
FROM "time_by_day" "time_by_day",
"product" "product" ,
"product_class" "product_class" ,
"sales_fact_1997" "sales_fact_1997"
WHERE "sales_fact_1997"."time_id" = "time_by_day"."time_id"
AND "sales_fact_1997"."product_id" = "product"."product_id"
AND "product"."product_class_id" = "product_class"."product_class_id"
GROUP BY "time_by_day"."the_year",
"product_class"."product_family";
同樣的,oracle 還是從product_sum 中取數(shù)據(jù) .
因?yàn)閙aterialized view 的使用,我們可以把我們要分析的Cube 作成一個(gè)或幾個(gè)非常大的materialized view , 建立正確的dimension 之后,當(dāng)你查詢的時(shí)候,由于數(shù)據(jù)已經(jīng)提前計(jì)算過了,所以查詢的速度比較快,在加上dimension可以告訴oracle 數(shù)據(jù)之間的層級關(guān)系,減少了我們建立不必要的materialized view , 所以使數(shù)據(jù)能夠得到更加充分的利用.
關(guān)于如何建立dimension 倒是比較簡單,只要你弄懂?dāng)?shù)據(jù)之間的層級關(guān)系就可以了.
對于如何建立materialized view 倒是比較麻煩,
我舉個(gè)簡單的例子吧:
在mondrian 的 foodmart 的例子中,我們可以任意的選取 指標(biāo) , product , customers , education leve , gender ,marital sttus ,promotin media ,promotions , store , store size in SQFT , store type , time , yearly income 這十三個(gè)要分析的數(shù)據(jù)來建立cube ,用戶有可能使用product 來做分析的維度,也有可能把product 來做Measure 或者不選,所以我們不可能建立所有情況考慮到的Cube .
ps : 如果你非要搞一個(gè)出來的話,我可以給你點(diǎn)提示
總的方案有2 的 13 次方:8096 種方案.也就是你要建立8096 個(gè)materialized view 就可以解決所有情況.
C 13 3 : 數(shù)學(xué)里面的概率問題, 十三個(gè)里面選3個(gè)出來,不論順序的. C 13 3 = 13*12*12/(1*2*3)
代表的意思是從13個(gè)里面選3個(gè)出來做fact table ,其余十個(gè)做dimension .不論你選不選這些dimension 都一樣,
總的方案 = c 13 1 + c 13 2 + c 13 3 + ...... + c 13 13 = 2 的十三次方 = 8096.
如何建立materialized view 還是主要是看你如何建立你的分析的維度.如果你的fact table 本身很多.而維度也很多的情況下,不可能每個(gè)fact table 都建立一個(gè)關(guān)于所有dimension 的materialized view ,對于我們的product dimension數(shù)據(jù)還算是比較少的,如果達(dá)到像大型超市那么多的產(chǎn)品,可能還需要在brand_name 或者 subcategory 來建立materialized view ,所以還是建議根據(jù)用戶的查詢sql 來分析用戶到底經(jīng)常查詢那些數(shù)據(jù).
下一篇繼續(xù)介紹 Mondrian 如何使用materialized view 來提高性能.
圖一 : product 集合到最高層 的 執(zhí)行計(jì)劃.

圖二 : Time 聚合到最高層 的執(zhí)行計(jì)劃

圖三 : 同時(shí)將Time 和 product 聚合到最高層 的執(zhí)行計(jì)劃

圖四 : drill down product 的樣子

圖五 : 十三種數(shù)據(jù),到底怎樣建materialized view 呢 ?

有哪位朋友可以推薦一下武漢的公司,最好是小一點(diǎn)的公司(大公司估計(jì)自己水品有限),如果有需要J2EE開發(fā)方向的工作職位的話,推薦一下, jj12tt@yahoo.com.cn ,先謝謝了.