Mondrian and OLAP
???Mondrian 是使用java編寫的OLAP引擎,它執行MDX語言描述的查詢,可以從關系數據庫中獲取數據。
Online Analytical Processing (OLAP)
???OLAP是指實時地分析大數據量,與Online Transaction Processing (OLTP)不同。
Year |
2000 |
2001 |
Growth |
|||
Product |
Dollar sales |
Unit sales |
Dollar sales |
Unit sales |
Dollar sales |
Unit sales |
Total |
$7,073 |
2,693 |
$7,636 |
3,008 |
8% |
12% |
Books |
$2,753 |
824 |
$3,331 |
966 |
21% |
17% |
Fiction |
$1,341 |
424 |
$1,202 |
380 |
-10% |
-10% |
Non-fiction |
$1,412 |
400 |
$2,129 |
586 |
51% |
47% |
Magazines |
$2,753 |
824 |
$2,426 |
766 |
-12% |
-7% |
— Greetings cards |
$1,567 |
1,045 |
$1,879 |
1,276 |
20% |
22% |
從上表可以看到year和Product為dimensions (維度),measures 為'Unit sales' and 'Dollar sales'。
Layers of a Mondrian system
???Mondrian OLAP System 包含4個層:
???1、the presentation layer
?????????用于描述最終用戶所看到的,其表現形式有很多,如:pivot表、pie、line和bar圖、maps和動態圖。由用戶提問,OLAP服務器返回答案。
???2、the dimensional layer
?????????這一層解析、驗證和執行MDX?查詢,首先計算軸,再計算軸上所有單元的值。metadata用于描述空間模型,及空間模型怎樣映射到關系模型上。
???3、the star layer
?????????它負責維護一個聚合的cache,aggregation 聚合是一些內存中的測量值(cells),及一些維度值。如果請求的cells不在cache中,則aggregation manager?向the storage layer發出請求。
???4、and the storage layer
?????????the storage layer為RDBMS。
What is MDX?
???'multi-dimensional expressions'為Mondrian執行的查詢。
???下面為一個基本的查詢:
SELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} ON COLUMNS,
{[Product].members} ON ROWS
FROM [Sales]
WHERE [Time].[1997].[Q2]
What is a schema??
?schema定義了一個多維數據庫。它包含一個邏輯模型,由多個cube、hierarchies、members及一個到物理模型的映射組成。
?邏輯模型包含:cubes, dimensions, hierarchies, levels, and members.
例如:
<Schema>
? <Cube name="Sales">
??? <Table name="sales_fact_1997"/>
??? <Dimension name="Gender" foreignKey="customer_id">
????? <Hierarchy hasAll="true" allMemberName="All Genders" primaryKey="customer_id">
??????? <Table name="customer"/>
??????? <Level name="Gender" column="gender" uniqueMembers="true"/>
????? </Hierarchy>
??? </Dimension>
??? <Dimension name="Time" foreignKey="time_id">
????? <Hierarchy hasAll="false" primaryKey="time_id">
??????? <Table name="time_by_day"/>
??????? <Level name="Year" column="the_year" type="Numeric" uniqueMembers="true"/>
??????? <Level name="Quarter" column="quarter" uniqueMembers="false"/>
??????? <Level name="Month" column="month_of_year" type="Numeric" uniqueMembers="false"/>
????? </Hierarchy>
??? </Dimension>
??? <Measure name="Unit Sales" column="unit_sales" aggregator="sum" formatString="#,###"/>
??? <Measure name="Store Sales" column="store_sales" aggregator="sum" formatString="#,###.##"/>
??? <CalculatedMember name="Profit" dimension="Measures" formula="[Measures].[Store Sales]-[Measures].[Store Cost]">
????? <CalculatedMemberProperty name="FORMAT_STRING" value="$#,##0.00"/>
??? </CalculatedMember>
? </Cube>
</Schema>
?
MDX查詢為:
SELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} ON COLUMNS,
? {[Time].[1997].[Q1].descendants} ON ROWS
FROM [Sales]
WHERE [Gender].[F]
[Time] | [Measures].[Unit Sales] | [Measures].[Store Sales] |
[1997].[Q1] | 0 | 0 |
[1997].[Q1].[Jan] | 0 | 0 |
[1997].[Q1].[Feb] | 0 | 0 |
[1997].[Q1].[Mar] | 0 | 0 |
mondrian.properties
???mondrian有一個配置文件可以定義它是如何運行的。
a simple star schema
???
[Sales]
has two measures [Unit sales]
and [Dollar sales]
[Product]
has levels [All Products]
, [Manufacturer]
, [Brand]
, [Prodid]
[Time]
has levels [All Time]
, [Year]
, [Quarter]
, [Month]
, [Day]
[Customer]
has levels [All Customers]
, [State]
, [City]
, [Custid]
[Payment Method]
has levels [All Payment Methods]
, [Payment Method]