前言

在上一节受到,我们描述了Splay的为主操作rotate与splay
本节我会教大家如何用就简单个函数实现各种强大的作用
以便利讲解,我们用这道题开例题来慢慢分析

原因:

 

   
之前曾经勾勒了相同首关于列存储索引的简介http://www.cnblogs.com/wenBlog/p/4970493.html,很粗糙而基本阐明了列存储索引的好处。为了重新好之明列存储索引,接下去我们并通过列存储索引与传统的行存储索引地对比2014饱受之列存储索引带来了争改进。由于已重重介绍列存储,因此这里自己单就性能的精益求精进行重点征。

使用splay实现各种功能

率先,我们要定义有物

测试场景

   
我创建了5单测试,尽量确保测试环境避免来外界的重负载进而影响到结果。测试结果因两个单身的阐明,分别是:

  • FactTransaction_ColumnStore
    这个表仅有一个聚集列存储索引,由于列存储索引的范围,该表不再发任何索引。
  • FactTransaction_RowStore
    该表将包含一个聚集索引和一个非聚集列存储索引和一个非聚集行存储索引。

   
首先自己之所以剧本文件创建表及目录,然后据此30m行数据填充到三独表中。由于负有的测试自己都制定了无与伦比酷并行度的hint
,因此可以指定内核数量来询问。

各种指针

    struct node
    {
        int v;//权值
        int fa;//父亲节点
        int ch[2];//0代表左儿子,1代表右儿子
        int rec;//这个权值的节点出现的次数
        int sum;//子节点的数量
    };
    int pointnum,tot;//pointnum代表算上重复的有多少节点,tot表示不算重复的有多少节点

测试1-填充表

  
为了重新好地测试,一个表由列存储索引构成,而其余一个表仅有行存储索引构成。填充数据来于外一个表’FactTransaction’。

IO 和日统计

 

Table 'FactTransaction_ColumnStore'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'FactTransaction'. Scan count 1, logical reads 73462, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

  (30000000 row(s) affected)

SQL Server Execution Times:  CPU time = 98204 ms,  elapsed time = 109927 ms.

Table ' FactTransaction_RowStore '. Scan count 0, logical reads 98566047, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'FactTransaction'. Scan count 1, logical reads 73462, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 (30000000 row(s) affected)

SQL Server Execution Times:  CPU time = 111375 ms,  elapsed time = 129609 ms.

 

rotate

着眼测试
表名 填充时间 逻辑读
FacTransaction_ColumnStore 1.49 mins 0
FacTransaction_RowStore 2.09 mins 98566047

 

splay

立点儿只函数就非说话了,前面已经出口的可怜详细了

测试2-比较搜索

   注意这里在行存储索引上我指定表的hint,迫使表通过索引查找。

-- Comparing Seek.... 
SET Statistics IO,TIME ON

Select CustomerFK
From [dbo].FactTransaction_RowStore WITH(FORCESEEK)
Where transactionSK = 4000000
OPTION (MAXDOP 1)

Select CustomerFK
From [dbo].FactTransaction_ColumnStore  
Where transactionSK = 4000000
OPTION (MAXDOP 1)

SET Statistics IO,TIME OFF

 

IO 和时间统计

Table 'FactTransaction_RowStore'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:    CPU time = 0 ms,  elapsed time = 0 ms.

Table 'FactTransaction_ColumnStore'. Scan count 1, logical reads 714, physical reads 0, read-ahead reads 2510, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:    CPU time = 0 ms,  elapsed time = 83 ms.

 

实施计划

统计 1

插入

根据前说的,我们于插入一个频后,需要拿其转至干净
故此insert函数可以这么写

inline void insert(int v)
{
    int p=build(v);//p代表插到了哪里
    splay(p,root);
}

那build函数怎么形容吧?
当就棵树已没有节点的早晚,我们一直新建一个节点就好

inline int newpoint(int v,int fa)//v:权值;fa:它的爸爸是谁
{
    tree[++tot].fa=fa;
    tree[tot].v=v;
    tree[tot].sum=tree[tot].rec=1;
    return tot;
}

当就不过树出节点的上,我们根据二叉查找树的性质,不断向下移动,直到找到一个可插的触及,注意在活动的时节需要创新一个每个节点的sum值

int build(int v)
{
    pointnum++;
    if(tot==0){root=1;newpoint(v,0);}
    else
    {
        int now=root;
        while(1)
        {
            tree[now].sum++;
            if(tree[now].v==v){tree[now].rec++;return now;}//出现过
            int nxt=v<tree[now].v?0:1;
            if(!tree[now].ch[nxt])
            {
                newpoint(v,now);
                tree[now].ch[nxt]=tot;
                return tot;
            }
            now=tree[now].ch[nxt];
        }
    }
    return 0;
}
观测试2

恰巧使齐图所示,行存储索引表的索引查找远较列存储索引表查询快的多。这重大归因于2014的sqlserver不支持聚集列存储索引的目录查找。执行计划比图备受一个凡是寻找引围观导致更多之逻辑读,因此致了性能的减退。

表名 索引类型 逻辑读 运行时间
FacTransaction_ColumnStore Column 714 83 ms
FacTransaction_RowStore Row 3 0 ms

 

删除

删除的效益是:删除权值为v的节点
我们不难想到:我们得以先找到他的位置,再将这节点删掉

int find(int v)
{
    int now=root;
    while(1)
    {
        if(tree[now].v==v)   {splay(now,root);return now;}
        int nxt=v<tree[now].v?0:1;
        if(!tree[now].ch[nxt])return 0;
        now=tree[now].ch[nxt];
    }
}

夫函数可以找到权值为v的节点的职务,比较好明,注意别忘将找到的节点splay到清
此外我们还待一个清剔除的函数

inline void dele(int x)
{
    tree[x].sum=tree[x].v=tree[x].rec=tree[x].fa=tree[x].ch[0]=tree[x].ch[1]=0;
    if(x==tot)  tot--;
}

连片下的职责就是是怎才能够保证删除节点后整理株树还满足二叉查找树的性
只顾:我们在追寻了一个节点的时节曾用他转至根本了,所以他左一定都于他有点,除此之外没有比他有些的节点了(否则还要考虑他父亲于他稍的情状)

那么此时会见起几乎种状态

  • 权值为v的节点都冒出了
    这会儿直接把他的rec和sum加上1纵吓
  • 以节点没有左儿子
    直接把他的下手儿子设置成根
  • 既来左儿子,又出右儿子
    于她的左儿子中找到最好酷的,旋转至清,把其的右侧儿子当根(也就是是它太要命的左儿子)的右边儿子

末了将这个节点删掉就哼

void pop(int v)
{
    int deal=find(v);
    if(!deal)   return ;
    pointnum--;
    if(tree[deal].rec>1){tree[deal].rec--;tree[deal].sum--;return ;}
    if(!tree[deal].ch[0])    root=tree[deal].ch[1],tree[root].fa=0;
    else
    {
        int le=tree[deal].ch[0];
        while(tree[le].ch[1])    le=tree[le].ch[1];
        splay(le,tree[deal].ch[0]);
        int ri=tree[deal].ch[1];
        connect(ri,le,1);connect(le,0,1);
        update(le);
    }
    dele(deal);
}

Test 3 – Comparing SCAN

  
注意这次自己指定的hint都是寻找引围观,当然列存储索引上优化器默认为找引围观。

-- Comparing Scan.... 
SET Statistics IO,TIME ON

Select CustomerFK
From [dbo].FactTransaction_RowStore WITH(FORCESCAN)
Where transactionSK = 4000000
OPTION (MAXDOP 1)

Select CustomerFK
From [dbo].FactTransaction_ColumnStore WITH(FORCESCAN)
Where transactionSK = 4000000
OPTION (MAXDOP 1)

SET Statistics IO,TIME OFF

 

IO 和日统计

Table 'FactTransaction_RowStore'. Scan count 1, logical reads 12704, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
  CPU time = 32 ms,  elapsed time = 22 ms.

Table 'FactTransaction_ColumnStore'. Scan count 1, logical reads 714, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
  CPU time = 0 ms,  elapsed time = 2 ms. 

 

履计划

统计 2

查询x数的排行

这大概,如果我们找到了权值为x的节点,那么答案就是外的左子树的大小+1
再不的语句根据二叉查找树的性质不断的通向下活动就可以,注意要这次是通往右侧走之言语答案用丰富它左子树的深浅及夫节点的rec值

int rank(int v)// 查询值为v的数的排名 
{
    int ans=0,now=root;
    while(1)
    {
        if(tree[now].v==v)    return ans+tree[tree[now].ch[0]].sum+1;
        if(now==0)  return 0;
        if(v<tree[now].v)    now=tree[now].ch[0];
        else                 ans+=tree[tree[now].ch[0]].sum+tree[now].rec,now=tree[now].ch[1];
    }
    if(now)    splay(now,root);
    return 0;
}
观察测试3

  
正如之前提到的,索引扫描列存储要较行存储快,俩个逻辑读与周转时表明列存储索引在大表扫描上是再度漂亮的法门,因此再也符合为数据仓库的阐明。

表名 索引类型 逻辑读 运行时间
FacTransaction_ColumnStore Column 714 2 ms
FacTransaction_RowStore Row 12704 22 ms

 

询问排名为x的累

此操作就是方十分操作的逆向操作
故此used变量记录该节点和它们的左子树出微节点
而x>左子树的数都< used,那么当前节点的权值就是答案
要不然冲二叉查找树的属性继续于下活动
一如既往令人瞩目在通往右侧走的下如果更新x

int arank(int x)//查询排名为x的数是什么 
{
    int now=root;
    while(1)
    {
        int used=tree[now].sum-tree[tree[now].ch[1]].sum;
        if(x>tree[tree[now].ch[0]].sum&&x<=used)    break;
        if(x<used)    now=tree[now].ch[0];
        else    x=x-used,now=tree[now].ch[1];
    }
    splay(now,root);
    return tree[now].v;
}

测试4-聚合查询

    测试行存储表使用基于聚集索引。

SET Statistics IO,TIME ON

Select CustomerFK,BrandFK, Count(*)
From [dbo].[FactTransaction_RowStore] WITH(INDEX=RowStore_FactTransaction)
Group by CustomerFK,BrandFK
OPTION (MAXDOP 4)

 

   测试行存储表,使用CustomerFK 和BrandFK的目录。(覆盖索引)

Select CustomerFK,BrandFK, Count(*)
From [dbo].[FactTransaction_RowStore] WITH(INDEX=RowStore_CustomerFK_BrandFK)
Group by CustomerFK,BrandFK
OPTION (MAXDOP 4)

 

    测试行存储索引使用CustomerFK 和BrandFK的列存储索引(覆盖索引)

Select CustomerFK,BrandFK, Count(*) From [dbo].[FactTransaction_RowStore] WITH(INDEX=ColumnStore_CustomerFK_BrandFK) Group by CustomerFK,BrandFK OPTION (MAXDOP 4)

Test on the columnstore table using the Clustered Index.

Select CustomerFK,BrandFK, Count(*)
From [dbo].[FactTransaction_ColumnStore]
Group by CustomerFK,BrandFK
OPTION (MAXDOP 4)

SET Statistics IO,TIME OFF

 

求x的前驱

这还易,我们得以维护一个ans变量,然后针对整棵树进行遍历,同时更新ans

int lower(int v)// 小于v的最大值 
{
    int now=root;
    int ans=-maxn;
    while(now)
    {
        if(tree[now].v<v&&tree[now].v>ans)    ans=tree[now].v;
        if(v>tree[now].v)    now=tree[now].ch[1];
        else    now=tree[now].ch[0];
    }
    return ans;
}
IO 和时统计

    使用基于聚集索引查询行存储的发明。

Table 'FactTransaction_RowStore'. Scan count 5, logical reads 45977, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
  Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:  CPU time = 9516 ms,  elapsed time = 2645 ms.

 

   使用行存储的非聚集索引测试行存储表。(覆盖索引)

Table 'FactTransaction_RowStore'. Scan count 5, logical reads 71204, physical reads 0, read-ahead reads 2160, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:  CPU time = 5343 ms,  elapsed time = 1833 ms.

 

 

   使用不聚集列存储索引测试行存储表。(覆盖索引)

Table 'FactTransaction_RowStore'. Scan count 4, logical reads 785, physical reads 7, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:  CPU time = 141 ms,  elapsed time = 63 ms.

 

    使用聚集索引测试列存储表。

Table 'FactTransaction_ColumnStore'. Scan count 4, logical reads 723, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:  CPU time = 203 ms,  elapsed time = 118 ms.

 

求x的后继

此跟上一个一律,就不过细讲了

int upper(int v)
{
    int now=root;
    int ans=maxn;
    while(now)
    {
        if(tree[now].v>v&&tree[now].v<ans)    ans=tree[now].v;
        if(v<tree[now].v)    now=tree[now].ch[0];
        else    now=tree[now].ch[1];
    }
    return ans;
}

完全代码:

#include<iostream>
#include<cstdio>
using namespace std;
const int MAXN=1e6+10;
const int maxn=0x7fffff;
inline char nc()
{
    static char buf[MAXN],*p1=buf,*p2=buf;
    return p1==p2&&(p2=(p1=buf)+fread(buf,1,MAXN,stdin),p1==p2)?EOF:*p1++;
}
inline int read()
{
    char c=nc();int x=0,f=1;
    while(c<'0'||c>'9'){if(c=='-')f=-1;c=nc();}
    while(c>='0'&&c<='9'){x=x*10+c-'0';c=nc();}
    return x*f;
}
struct SPLAY
{
    #define root tree[0].ch[1]
    struct node
    {
        int v,fa,ch[2],rec,sum;
    };
    node tree[MAXN];
    int pointnum,tot;
    SPLAY(){pointnum=tot=0;}
    int iden(int x){return tree[tree[x].fa].ch[0]==x?0:1;}
    inline void connect(int x,int fa,int how){tree[x].fa=fa;tree[fa].ch[how]=x;}
    inline void update(int x){tree[x].sum=tree[tree[x].ch[0]].sum+tree[tree[x].ch[1]].sum+tree[x].rec;}
    inline void rotate(int x)
    {
        int y=tree[x].fa;
        int R=tree[y].fa;
        int Rson=iden(y);
        int yson=iden(x);
        int b=tree[x].ch[yson^1];
        connect(b,y,yson);
        connect(y,x,yson^1);
        connect(x,R,Rson);
        update(y);update(x);
    }
    void splay(int pos,int to)// 把编号为pos的节点旋转到编号为to的节点 
    {
        to=tree[to].fa;
        while(tree[pos].fa!=to)
        {
            if(tree[tree[pos].fa].fa==to)    rotate(pos);
            else if(iden(tree[pos].fa)==iden(pos))    rotate(tree[pos].fa),rotate(pos);
            else    rotate(pos),rotate(pos);
        }
    }
    inline int newpoint(int v,int fa)//
    {
        tree[++tot].fa=fa;
        tree[tot].v=v;
        tree[tot].sum=tree[tot].rec=1;
        return tot;
    }
    inline void dele(int x)
    {
        tree[x].ch[0]=tree[x].ch[1]=0;
        if(x==tot)  tot--;
    }
    int find(int v)
    {
        int now=root;
        while(1)
        {
            if(tree[now].v==v)   {splay(now,root);return now;}
            int nxt=v<tree[now].v?0:1;
            if(!tree[now].ch[nxt])return 0;
            now=tree[now].ch[nxt];
        }
    }
    int build(int v)
    {
        pointnum++;
        if(tot==0){root=1;newpoint(v,0);}
        else
        {
            int now=root;
            while(1)
            {
                tree[now].sum++;
                if(tree[now].v==v){tree[now].rec++;return now;}//出现过
                int nxt=v<tree[now].v?0:1;
                if(!tree[now].ch[nxt])
                {
                    newpoint(v,now);
                    tree[now].ch[nxt]=tot;
                    return tot;
                }
                now=tree[now].ch[nxt];
            }
        }
        return 0;
    }
    inline void insert(int v)
    {
        int p=build(v);//p代表插到了哪里
        splay(p,root);
    }
    void pop(int v)
    {
        int deal=find(v);
        if(!deal)   return ;
        pointnum--;
        if(tree[deal].rec>1){tree[deal].rec--;tree[deal].sum--;return ;}
        if(!tree[deal].ch[0])    root=tree[deal].ch[1],tree[root].fa=0;
        else
        {
            int le=tree[deal].ch[0];
            while(tree[le].ch[1])    le=tree[le].ch[1];
            splay(le,tree[deal].ch[0]);
            int ri=tree[deal].ch[1];
            connect(ri,le,1);connect(le,0,1);
            update(le);
        }
        dele(deal);
    }
    int rank(int v)// 查询值为v的数的排名 
    {
        int ans=0,now=root;
        while(1)
        {
            if(tree[now].v==v)    return ans+tree[tree[now].ch[0]].sum+1;
            if(now==0)  return 0;
            if(v<tree[now].v)    now=tree[now].ch[0];
            else                 ans+=tree[tree[now].ch[0]].sum+tree[now].rec,now=tree[now].ch[1];
        }
        if(now)    splay(now,root);
        return 0;
    }
    int arank(int x)//查询排名为x的数是什么 
    {
        int now=root;
        while(1)
        {
            int used=tree[now].sum-tree[tree[now].ch[1]].sum;
            if(x>tree[tree[now].ch[0]].sum&&x<=used)    break;
            if(x<used)    now=tree[now].ch[0];
            else    x=x-used,now=tree[now].ch[1];
        }
        splay(now,root);
        return tree[now].v;
    }
    int lower(int v)// 小于v的最大值 
    {
        int now=root;
        int ans=-maxn;
        while(now)
        {
            if(tree[now].v<v&&tree[now].v>ans)    ans=tree[now].v;
            if(v>tree[now].v)    now=tree[now].ch[1];
            else    now=tree[now].ch[0];
        }
        return ans;
    }
    int upper(int v)
    {
        int now=root;
        int ans=maxn;
        while(now)
        {
            if(tree[now].v>v&&tree[now].v<ans)    ans=tree[now].v;
            if(v<tree[now].v)    now=tree[now].ch[0];
            else    now=tree[now].ch[1];
        }
        return ans;
    }
}s;
int main()
{
    #ifdef WIN32
    freopen("a.in","r",stdin);
    #else
    #endif
    int n=read();
    while(n--)
    {
        int opt=read(),x=read();
        if(opt==1)    s.insert(x);
        else if(opt==2)    s.pop(x);
        else if(opt==3)    printf("%d\n",s.rank(x));
        else if(opt==4)    printf("%d\n",s.arank(x));
        else if(opt==5)    printf("%d\n",s.lower(x));
        else if(opt==6)    printf("%d\n",s.upper(x));
    }
} 

时至今日,splay最常用的几乎栽函数就缓解了,
下来拘禁几乎道裸题

实行计划

统计 3

例题

不亮堂为何,我的splay跑的专门快,可能是颜面太好了吧

观测试4

 

  
这里才是列存储索引开始“闪耀”的地方。两单列存储索引的表查询要比较传统的航索引在逻辑读与运作时刻上性能好得多。

表名 索引使用 索引类型 逻辑读 运行时间
FacTransaction_ColumnStore ClusteredColumnStore Column 717 118
FacTransaction_RowStore RowStore_FactTransaction Row 45957 2645
FacTransaction_RowStore RowStore_CustomerFK_BrandFK Row 71220 1833
FacTransaction_RowStore ColumnStore_CustomerFK_BrandFK Column 782 63

 

洛谷P2234 [HNOI2002]营业额统计

http://www.cnblogs.com/zwfymqz/p/7896128.html

测试5-比较创新(数据子集)

   这个测试中,我拿创新少于100m行数据,占总额仍的30分之一。

SET Statistics IO,TIME ON

Update [dbo].[FactTransaction_ColumnStore]
Set    TransactionAmount = 100
Where  CustomerFK = 112
OPTION (MAXDOP 1)

Update [dbo].[FactTransaction_RowStore]
Set    TransactionAmount = 100
Where  CustomerFK = 112

OPTION (MAXDOP 1)

SET Statistics IO,TIME OFF

 

洛谷P2286 [HNOI2004]宠物收养场

http://www.cnblogs.com/zwfymqz/p/7895794.html

IO 和日统计
 
Table 'FactTransaction_ColumnStore'. Scan count 2, logical reads 2020, physical reads 0, read-ahead reads 2598, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(913712 row(s) affected)

SQL Server Execution Times:  CPU time = 27688 ms,  elapsed time = 37638 ms.

Table 'FactTransaction_RowStore'. Scan count 1, logical reads 2800296, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(913712 row(s) affected)

SQL Server Execution Times:  CPU time = 6812 ms,  elapsed time = 6819 ms.

 

履计划

统计 4

观测试5

  以这种景象下 ,列存储索引的表要比行存储的更新缓慢的几近。

表名 索引类型 逻辑读 运行时间
FacTransaction_ColumnStore Column 2020 37638 ms
FacTransaction_RowStore Row 2800296 6819 ms

 

   
注意对实施存储表逻辑读还是要比行存储的如多森。这是归因于列存储索引的压缩比率更强,因此占有更不见的内存。

总结

   
列存储索引(包含聚集和不聚集)提供了大量的优势。但是于数据仓库上应用或如搞好准备工作。一栽适于地采用情况是免聚集索引不克被更新都禁用对底层表的翻新。如果是伟且尚未分区的发明,可能有一个题目,整个表底目每次都见面吃重建,因此要表是巨大的虽禁止以列存储索引。因此须要起好之分区策略来支撑这种索引。

   有几只使用列存储索引的地方:事实表的聚合、Fast Track Data Warehouse
Servers、恰当环境SSAS的Cube…