問題:
???按時間順序來存放的資料量很大,此時通常需要根據(jù)一個時間間隔來抽樣數(shù)據(jù),例如說有一張表存放的是設備的運行狀態(tài),現(xiàn)在需要根據(jù)輸入的時間間隔(比如5分鍾)來進行數(shù)據(jù)抽樣。


Create ? Table ?tbRunStatus?(
????rID? int ? identity ? primary ? key ,
????snapshotTime? datetime ,
????thermometry? int ,???? -- 溫度
????hygrometry? int ???? -- 濕度
);
insert ? into ?tbrunstatus? values ?( ' 2007-05-04?12:05:12.077 ' , 30 , 85 );
insert ? into ?tbrunstatus? values ?( ' 2007-05-04?12:06:38.340 ' , 30 , 85 );
insert ? into ?tbrunstatus? values ?( ' 2007-05-04?12:10:39.750 ' , 30 , 85 );
insert ? into ?tbrunstatus? values ?( ' 2007-05-05?01:05:38.340 ' , 31 , 87 );
???當然了,這類問題為了實現(xiàn)抽樣,就一定會有幾個關鍵數(shù)據(jù)需要提供:
???1. 開始抽樣的時間,因為所謂間隔必須具有一個開始點;
???2. 抽樣的時間段;
???3. 抽樣的間隔;
解決:
???先來說一下思路,解決這個問題的關鍵在於:
???1. 先得到一個"時間上的間隔";
???這個間隔是相對於上面提到的第1個關鍵點即開始時間(比如說開始時間為'2007-05-04 12:00:00' ),因為是時間上的計算,可以利用MS-SQL的 Datediff 時間函數(shù)來實現(xiàn)。
???2. 在第1點所得到的"間隔"上進行篩選。
???現(xiàn)在就利用嵌套SQL來分步實現(xiàn)(按分鍾間隔):
???1. 得到"間隔"計算列,並應用抽樣的時間範圍(提高效率)
Select
???????? * ,? datediff (mi, ' 2007-05-04?12:00:00 ' ,snapshotTime)? As ?interval
???? From
????????tbRunStatus
???? Where
????????snapshotTime? Between ? ' 2007-05-04?00:00:00 ' ? And ? ' 2007-05-05?23:59:59 '
???????? * ,? datediff (mi, ' 2007-05-04?12:00:00 ' ,snapshotTime)? As ?interval
???? From
????????tbRunStatus
???? Where
????????snapshotTime? Between ? ' 2007-05-04?00:00:00 ' ? And ? ' 2007-05-05?23:59:59 '
???2.0 在第一步的結果集上做篩選
Select
???????? * ?
???? From
????????( Select
???????????????? * ,? datediff (mi, ' 2007-05-04?12:00:00 ' ,snapshotTime)? As ?interval
???????????? From
????????????????tbRunStatus
???????????? Where
????????????????snapshotTime? Between ? ' 2007-05-04?00:00:00 ' ? And ? ' 2007-05-05?23:59:59 '
????????)?t
???? Where
????????interval? % ? 5 ? = ? 0
???????? * ?
???? From
????????( Select
???????????????? * ,? datediff (mi, ' 2007-05-04?12:00:00 ' ,snapshotTime)? As ?interval
???????????? From
????????????????tbRunStatus
???????????? Where
????????????????snapshotTime? Between ? ' 2007-05-04?00:00:00 ' ? And ? ' 2007-05-05?23:59:59 '
????????)?t
???? Where
????????interval? % ? 5 ? = ? 0
??????需要注意的一點就是條件中所應用的 interval % 5 = 0 ,意圖是每隔5分鍾抽一條出來,但是如果是在一分鍾內會有多條而我們只需要最近的那一條數(shù)據(jù)該怎麼辦呢(如果上面的數(shù)據(jù)按天來做間隔的話,那麼前3條的間隔都會是0)?
??????2.1 這樣一來就需要用 Group By 分組來解決(按天做為間隔)
Select
?
*
?
From
?tbRunStatus?
Join
?(
Select
???????? Min (rID)? As ?rID???? -- 由於應用了分組,這裏只要取用關鍵列就好
???? From
????????( Select
???????????????? * ,? datediff ( day , ' 2007-05-04?12:00:00 ' ,snapshotTime)? As ?interval
???????????? From
????????????????tbRunStatus
???????????? Where
????????????????snapshotTime? Between ? ' 2007-05-04?00:00:00 ' ? And ? ' 2007-05-05?23:59:59 '
????????)?t
???? Where
????????interval? % ? 1 ? = ? 0
???? Group ? By
????????interval
)?tbRunStatus1? On ?tbRunStatus.rID? = ?tbRunStatus1.rID
??????可以看到,實際上的處理也並不複雜,只是將結果集與表做了一次連接就成了,最後的抽樣結果會是兩條。Select
???????? Min (rID)? As ?rID???? -- 由於應用了分組,這裏只要取用關鍵列就好
???? From
????????( Select
???????????????? * ,? datediff ( day , ' 2007-05-04?12:00:00 ' ,snapshotTime)? As ?interval
???????????? From
????????????????tbRunStatus
???????????? Where
????????????????snapshotTime? Between ? ' 2007-05-04?00:00:00 ' ? And ? ' 2007-05-05?23:59:59 '
????????)?t
???? Where
????????interval? % ? 1 ? = ? 0
???? Group ? By
????????interval
)?tbRunStatus1? On ?tbRunStatus.rID? = ?tbRunStatus1.rID
??????更深入的思考一下,如果間隔比較特殊造成並不是卡的每個時刻都會有數(shù)據(jù),例如間隔卡在4分鍾,那麼就會存在著近似的存在,看看下面SQL以及結果就明白了:
Select
???????? * ,?interval? / ? 4.0 ? As ?d
???? From
????????( Select
???????????????? * ,? datediff (mi, ' 2007-05-04?12:00:00 ' ,snapshotTime)? As ?interval
???????????? From
????????????????tbRunStatus
???????????? Where
????????????????snapshotTime? Between ? ' 2007-05-04?00:00:00 ' ? And ? ' 2007-05-05?23:59:59 '
????????)?t
??????結果集會是這樣:???????? * ,?interval? / ? 4.0 ? As ?d
???? From
????????( Select
???????????????? * ,? datediff (mi, ' 2007-05-04?12:00:00 ' ,snapshotTime)? As ?interval
???????????? From
????????????????tbRunStatus
???????????? Where
????????????????snapshotTime? Between ? ' 2007-05-04?00:00:00 ' ? And ? ' 2007-05-05?23:59:59 '
????????)?t
rID | snapshotTime | thermometry | hygrometry | interval | d |
---|---|---|---|---|---|
10 | 2007-05-04 12:05:12.077 | 30 | 85 | 5 | 1.250000 |
11 | 2007-05-04 12:06:38.340 | 30 | 85 | 6 | 1.500000 |
12 | 2007-05-04 12:10:39.750 | 30 | 85 | 10 | 2.500000 |
13 | 2007-05-05 01:05:38.340 | 31 | 87 | 785 | 196.250000 |
??????2.2 對於這種近似匹配區(qū)間的情況,我們只需要改改就好
Select
?
*
?
From
?tbRunStatus?
Join
?(
Select
???????? Min (rID)? As ?rID
???? From
????????( Select
???????????????? * ,? datediff (mi, ' 2007-05-04?12:00:00 ' ,snapshotTime)? / 4 ?? As ?interval
???????????? From
????????????????tbRunStatus
???????????? Where
????????????????snapshotTime? Between ? ' 2007-05-04?00:00:00 ' ? And ? ' 2007-05-05?23:59:59 '
????????)?t
???? Group ? By
????????interval
)?tbRunStatus1? On ?tbRunStatus.rID? = ?tbRunStatus1.rID
??????呵呵,改改那個計算列的計算邏輯就能達到目的。Select
???????? Min (rID)? As ?rID
???? From
????????( Select
???????????????? * ,? datediff (mi, ' 2007-05-04?12:00:00 ' ,snapshotTime)? / 4 ?? As ?interval
???????????? From
????????????????tbRunStatus
???????????? Where
????????????????snapshotTime? Between ? ' 2007-05-04?00:00:00 ' ? And ? ' 2007-05-05?23:59:59 '
????????)?t
???? Group ? By
????????interval
)?tbRunStatus1? On ?tbRunStatus.rID? = ?tbRunStatus1.rID