Measures of Query cost

Posted By on October 12, 2014


Download PDF
Overview Of Query Processing & Query Optimization
Introduction to Database Managment System

Estimation of Query-Processing Cost

  1. To choose a strategy based on reliable information, the database system may store statistics for each relation r:
    • tex2html_wrap_inline940 – the number of tuples in r.
    • tex2html_wrap_inline944 – the size in bytes of a tuple of r (for fixed-length records).
    • V(A, r) – the number of distinct values that appear in relation r for attribute A.
  2. The first two quantities allow us to estimate accurately the size of a Cartesian product.
    • The Cartesian product tex2html_wrap_inline954 contains tex2html_wrap_inline956 tuples.
    • Each tuple of tex2html_wrap_inline954 occupies tex2html_wrap_inline960 bytes.
    • The third statistic is used to estimate how many tuples satisfy a selection predicate of the form
       <attribute-name> = <value>
      
    • We need to know how often each value appears in a column.
    • If we assume each value appears with equal probability, then
        tex2html_wrap_inline970
      

      is estimated to have

      displaymath928

      tuples.

    • This may not be the case, but it is a good approximation of reality in many relations.
    • We assume such a uniform distribution for the rest of this chapter.
    • Estimation of the size of a natural join is more difficult.
    • Let tex2html_wrap_inline972 and tex2html_wrap_inline974 be relations on schemes tex2html_wrap_inline976 and tex2html_wrap_inline978 .
    • If tex2html_wrap_inline980 (no common attributes), then tex2html_wrap_inline982 is the same as tex2html_wrap_inline984 and we can estimate the size of this accurately.
    • If tex2html_wrap_inline986 is a key for tex2html_wrap_inline976 , then we know that a tuple of tex2html_wrap_inline990 will join with exactly one tuple of tex2html_wrap_inline992 .
    • Thus the number of tuples in tex2html_wrap_inline982 will be no greater than tex2html_wrap_inline996 .
    • If tex2html_wrap_inline986 is not a key for tex2html_wrap_inline976 or tex2html_wrap_inline978 , things are more difficult.
    • We use the third statistic and the assumption of uniform distribution.
    • Assume tex2html_wrap_inline1004 .
    • We assume there are

      displaymath929

      tuples in tex2html_wrap_inline990 with an A value of t[A] for tuple t in tex2html_wrap_inline992 .

    • So tuple t of tex2html_wrap_inline992 produces

      displaymath929

      tuples in tex2html_wrap_inline982

  3. Considering all the tuples in tex2html_wrap_inline992 , we estimate that there are

    displaymath931

    tuples in total in tex2html_wrap_inline982

  4. If we reverse the roles of tex2html_wrap_inline992 and tex2html_wrap_inline990 in this equation, we get a different estimate

    displaymath932

    if tex2html_wrap_inline1028 .

    • If this occurs, there are likely to be some dangling tuples that do not participate in the join.
    • Thus the lower estimate is probably the better one.
    • This estimate may still be high if the tex2html_wrap_inline1030 values in tex2html_wrap_inline992 have few values in common with the tex2html_wrap_inline1034 values in tex2html_wrap_inline990 .
    • However, it is unlikely that the estimate is far off, as dangling tuples are likely to be a small fraction of the tuples in a real world relation.
  5. To maintain accurate statistics, it is necessary to update the statistics whenever a relation is modified.

    This can be substantial, so most systems do this updating during periods of light load on the system.

Overview Of Query Processing & Query Optimization
Introduction to Database Managment System

Download PDF

Posted by Akash Kurup

Founder and C.E.O, World4Engineers Educationist and Entrepreneur by passion. Orator and blogger by hobby

Website: http://world4engineers.com