Tuesday
12:00 - HighVolume text indexing with Oracle Text - Roger Ford
create index <index_name> on <table_name> (column_name) index type is ctxsys.context;
select from table_name
where
contains (product_name, 'monitor NEAR full hd', 99) > 0
order by score(99) desc;
index types:
- context
(primary)
- ctxcat
- ctxrule
- Xquery Full
Text
context index is inverted
consists list of words each with an associated posting list
posting lists can be very long (requires IO)
are stored in $I table
SGA works as is on this object always pin this table to SAG or flash cache if possible
Common Pitfalls
Memory
- when indexing,
postings lists are built in memory then flushed to disk when memory
is exhausted
- frequent
flushes = small fragments
in memory controlled by MEMORY parameter
create index index_name
on table_name(col)
index type ctxsys.context parameters ('memory 1G');
exec ctxsys.ctx_adm.set_parameter('MAX_INDEX_MEMORY','10G');
Substring index / UTF8
context indexes are token (word) based
using leading wildcards are slow
need to turn on SUBSTRING_INDEX wordlist option
increases index build time
- : 5x long
- : %20 long
AL32UTF8 has significant indexing overhead (consider single byte character set)
Frequent syncs
- context indexes
are asynchronous
- sync(on commit)
will cause index fragmentation
- can be cured by
ctx_ddl.optimize_index
transactional will require unindexed documents to be scanned by all queries
use transactional only with sync(on commit)
if syncing frequently, optimize frequently
use 12c feature (near real time indexes)
Big
IO
- previously $I index were stored inline BLOBs
- new storage attribute BIG_IO uses secure file lobs
Seperate offsets
posting lists consists
of document Ids and word position in the document
some queries does not
need word position
new separate_offset
storage attribute will hugely reduce IO for these queries
will make proximity
(NEAR), phase and zone section searcher will be a bit slower
Near
Real Time Indexes
new storage attribute
STAGE_ITAB
- creates two level index
- small enough to keep in memory
- large main index stored in the disk
- updates are moved from staging index to main index and optimized as they move
- requires BIG_IO storage parameter in order to use
Query
Filter Cache
some expressions are
common to many queries
security restriction:
.. “(sales or marketing) WITHIN dept”
postings entries for
these queries may be very large
postings will likely be
cached in sga but still need to be decompressed and merges
query filter cache,
caches the results of subquery in memory
query operator
ctxFilterCache(subquery)
Pattern
stop class
specify regular
expression exclusion list for index
large indexes can be
%75 junk
- reference codes
- sequential/numeric data
- wrongly indexed binary data
example:
- exclude all words longer than 20 chars
- exclude all words which are mixed alpha and numeric
Partitioning
- partitioned index can be created on partitioned base tables
- use local keyword for indexes
- currently only range partitioning is supported
- partitioned indexes looks like separate indexes on partitions
- better query performance on subset
- less interference between indexing and querying
- caching of more common target positions
Parallelization
parallel index creation
is supported by using parallel n clause
can mix local and
parallel but max 1 process per partition (db restriction)
to
do intra-partition parallelization
- create index using unusable keyword
- run dbms_pclxutil.build_part_index and specify degrees on parallelization
index is unavailable
until complete
no option for
incremental builds
manual partition
builds
create index using
nopopulate parameter
build individual
partitions using
alter
index index_name rebuild partition partn parallel n;
allows complete control
over parallelization
index is usable as soon
as one operation is built
unlike unusable option;
can build incremental changes by loading new partition and rebuilding
no sync required for
non changing data
faster use partition
swapping
distributed
index builds
- indexes can be built on farm of machines
- partitioned base table on master with local indexes
- non partitioned table on remote instance
- build index on remote table, then transfer to master using transportable tablespace
swap in index using
exchange partitioning
alter table
table_name
exchange partition
main_table_p10
with table
remote_table
including indexes
without
validation;
-
almost instant ($K IOT
must be copied)
-
new data immediately
searchable
No comments:
Post a Comment