Monday, September 12, 2011

Random Data Generation and DBMS_RANDOM

Generating random data itself is already a subject of its own. But here i can only write about the needs of random data in a database environment and how a database developer or administrator can generate the random data to fulfill his/her requirements. I personally used random data generation in three main purposes up to now.

DATAMASKING

Almost in every company i have worked for, datamasking is a must-have procedure for test data generation, especially after restoring production data to a prelive or day-1 database. In common sense just truncating the sensitive data or updating the columns to NULL maybe enough for security purposes. But in some individual cases application logic needs to continue which depends on the data availability. In such cases there are two main roadmaps, hashing the real data and un-identify it or some random data may be generated for representing the real sensitive data. The generated data may also ensure the formatting of the known structures such as credit card information, postal code or telephone number of the customers.

PASSWORD GENERATION

After applying the new password policy to our database, forgotten passwords by the end users are not recovered that easily because of the password complexity functions (by applying password verify function) and the password_reuse_max parameter of the profile. This means that, the portal used by the level-1 support is not enough as the procedure lies behind the button is simply changes the password to a default value which is not changed from the beginning of the procedure at all, which causes ORA-28007: the password cannot be reused exception.

TEST DATA GENERATION

In order to test any functionality or any new features by myself, generating data for newly created tables is a necessity. Formatted varchar2 and number columns are very handy to test some functionalities and how some functionalities behave in large amount of structured table data.

Dictionary views can be used for this purpose but If it is not enough for the individual test conditions then generation a random data within the desired format is crucial.

DBMS_RANDOM PACKAGE

For all the mentioned purposes, oracle database has a built-in package which is DBMS_RANDOM to generate random data. There are several functions in this package which can be used to obtain same results with some manipulation of the output values. Main functions to produce random number would be;

DBMS_RANDOM.VALUE() whichs output value is NUMBER datatype, this means you can produce up to 32 digit numbers.


select dbms_random.value() as result from dual;

RESULT
------
0.33363563290178533954768590716355821427


DBMS_RANDOM.VALUE(arg1 number, arg2 number) returns a random value between the supplied limits. The following sql should return a random number between 1 and 20.


select dbms_random.value(1,20) as result from dual;

RESULT
------
14.25291834803147861791906299688905822678


DBMS_RANDOM.STRING() function can also be used in order to generate a random character or a string. Function has two parameters one for the type of characters to be used and the second parameter for the length of the string. The following function definition is copied here from and Oracle 11gR2 databases DBMS_RANDOM package spec. It explains itself very well and nothing needs to be added.


FUNCTION string (opt char, len NUMBER)

/* 
"opt" specifies that the returned string may contain:

'u','U': upper case alpha characters only
'l','L': lower case alpha characters only
'a','A': alpha characters only (mixed case)
'x','X': any alpha-numeric characters 'p','P': any printable characters
*/
RETURN VARCHAR2;  
--string of  characters


So, if you need to produce some random string with 8 characters long and all characters are alpha numeric and lower case then it is easy by using the DBMS_RANDOM.STRING() function:


select dbms_random.string('l',8) as result from dual;

RESULT
------
yhxctomv


DBMS_RANDOM BY EXAMPLES

All the explanations of the examples can be found in the document that i shared from here. You can also find the link at the bottom of this post to the same document.


--------------------------
--masking the card numbers
--Ex.1
--------------------------
select 
lpad(round(dbms_random.value*power(10,4)),4,0) || '-' ||
lpad(round(dbms_random.value*power(10,4)),4,0) || '-' ||
lpad(round(dbms_random.value*power(10,4)),4,0) || '-' ||
lpad(round(dbms_random.value*power(10,4)),4,0)  as card_number
from dual 
connect by level <=5; 
/*
CARD_NUMBER
-----------
2877-6639-0728-5456
6026-6002-2218-9038
7679-8441-0899-2826
8294-6783-6110-7988
1836-0407-9206-3333
*/

--------------------------
--masking the card numbers
--Ex.2
--------------------------
select 
ltrim(to_char(dbms_random.value(1,9999),'0000')) || '-' ||
ltrim(to_char(dbms_random.value(1,9999),'0000')) || '-' ||
ltrim(to_char(dbms_random.value(1,9999),'0000')) || '-' ||
ltrim(to_char(dbms_random.value(1,9999),'0000'))  as card_number
from dual 
connect by level <=5; 
/*
CARD_NUMBER
-----------
1558-9846-7194-5325
5109-3233-0641-9209
3081-5946-9840-6615
4400-9638-6333-9113
2928-9883-1771-0465
*/

--------------------------
--masking the card numbers
--Ex.3
--------------------------
select 
ltrim(replace(to_char(round(dbms_random.value*power(10,16)),'0000,0000,0000,0000'),',','-')) as card_number
from dual
connect by level <=5;
/*
CARD_NUMBER
-----------
0157-8125-6418-6025
3829-9039-1357-9048
2876-1086-5371-8152
2775-1748-2591-2523
2058-2404-1101-5320
*/

--------------------------
--masking the card numbers
--Ex.4
--------------------------
select 
substr(abs(dbms_random.random),1,4) || '-' ||
substr(abs(dbms_random.random),1,4) || '-' ||
substr(abs(dbms_random.random),1,4) || '-' ||
substr(abs(dbms_random.random),1,4)  as card_number
from dual 
connect by level <=5; 
/*
CARD_NUMBER
-----------
8639-7576-1359-3965
1317-1525-2526-1796
1043-5881-1000-7113
2106-3239-8662-3769
1461-7473-5870-6829
*/

--------------------------
--masking the phone number
--Ex.5
--------------------------
select 
  '+' || 
  round(DBMS_RANDOM.VALUE(1,99)) || '-' ||
  round(DBMS_RANDOM.VALUE(10,99)) || '-' ||
  round(DBMS_RANDOM.VALUE(1000000,9999999)) as phone_number
from dual
connect by level <= 5;
/*
PHONE_NUMBER
------------
+8-44-9146987
*/

----------------------
--masking the postcode
--Ex.6
----------------------
select 
  round(dbms_random.value(1000,9999)) || '-' || 
  dbms_random.string('U',2) as postcode 
from dual;
/*
POSTCODE
--------
4997-QP
*/



What if, you have a password verify function which commits the passwords will be at least 8 characters long and must contain alphanumeric characters and this password complexity merged with a profile which has a password_lifetime of two month and password_reuse_max is four. If there is a predefined automatic case which explained detailly in the paper mentioned before which needs random password generation then there is the example which can be used;

-----------------
--random password
--Ex.7
-----------------
select  
  DBMS_RANDOM.STRING('A',1) || 
  round(DBMS_RANDOM.VALUE()*10) || 
  DBMS_RANDOM.STRING('X',6) as password 
from dual;
/*
PASSWORD
--------
w7N3C1YG
*/


TEST DATA GENERATION

I generally use two different methods while generating test data to fill the test tables. One of them is by using the DBMS_RANDOM package and the other is filling the bulk data in the columns with the same output as you can find in the following examples.

In this first part of the following example, the code tries to simulate a username bu using lower case string and with random lengths between five and fifteen. The firstname starts with uppercase by using INICAP() function and the lastname is fully in uppercase by using the UPPER() inline function. The second part is not that clever and it just creates the same data over and over which can substitude a customer name or a username.

----------------------
--test data generation
--Ex.8
----------------------
select 
  initcap(dbms_random.string('L',round(dbms_random.value(5,15)))) || ' ' || 
  upper(dbms_random.string('L',round(dbms_random.value(5,15)))) as name
from dual
connect by level <= 5;
/*
NAME
----
Abzvsidgbcfa AGUGIR
Wvuogptkxwhdwa IPOOXTVBLLCNPV
Yiwcgh SGPFKJYCDISO
Radshiyidcrst ZNKNSEYUZXVWY
Daxeqzugq LKJILZJEYULVI
*/

select
initcap(lpad('x',round(dbms_random.value(5,15)),'x')) || ' ' || 
upper(lpad('y',round(dbms_random.value(5,15)),'y')) as name
from dual
connect by level <= 5;
/*
NAME
----
Xxxxxxxxxxxx YYYYYYYYYYY
Xxxxxxxxxxxxxxx YYYYYYYYYYYYY
Xxxxxxxxxxxxxx YYYYYYYYY
Xxxxxxxx YYYYYYYYYYY
Xxxxxx YYYYYYYY
*/


select 
  initcap(lpad('x',9,'x')) || ' ' ||
  upper(lpad('y',9,'y')) as name
from dual
connect by level <= 5;
/*
NAME
----
Xxxxxxxxx YYYYYYYYY
Xxxxxxxxx YYYYYYYYY
Xxxxxxxxx YYYYYYYYY
Xxxxxxxxx YYYYYYYYY
Xxxxxxxxx YYYYYYYYY
*/


GENERATING MEANINGFUL DATA

Upto now, the generated data was completely dummy. For example while masking the credit card number, it wasn’t important if this was a valid card number or not, or if the names are real names or not. All the data generated were dummy random data which does not make any sense, they are just string or numbers formed by number characters or digits.

But by using the following example random data can be generated which makes sense (or a little sensible than the previous methods).

In this small example, there is a lookup table (which can be extended as far as the individual case needs) and the phone number generator picks up a random country code from the lookup table to generate the phone number. Every random phone number generated will have a valid country code by using the following example. Of course this example can be extended for the area codes as well.

--------------------------
--generating sensible data
--Ex.9
--------------------------
create table t_country_codes 
  (key number(2), 
   code number(2), 
   country varchar2(20));
   
insert into t_country_codes values (1,1,'United States');
insert into t_country_codes values (2,31,'Netherlands');
insert into t_country_codes values (3,44,'United Kingdom');
insert into t_country_codes values (4,49,'Germany');
insert into t_country_codes values (5,90,'Turkey');
commit;

select 
  '+' || code || '-' || 
  round(dbms_random.value(10,99)) || '-' || 
  round(DBMS_RANDOM.VALUE(100000,999999)) as phone_number
from 
  t_country_codes 
where 
  key=(select round(DBMS_RANDOM.VALUE(1,5)) from dual);
/*  
PHONE_NUMBER
------------
+31-54-732777
*/  


This post is taken from the paper i wrote for an internal use and enchanced for my blog post. You can download the paper from the following link if you have google docs access. Generating Random Data in Oracle 11g Database