One of the common tasks requested by our support customers is to optimize slow queries. We normally ask for the table structure(s), the problematic query and sample data to be able to reproduce the problem and resolve it by modifying the query, table structure, or global/session variables. Sometimes, we are given access to the server to test the queries on their live or test environment. But, more often than not, customers will not be able to provide us access to their servers or sample data due to security and data privacy reasons. Hence, we need to generate the test data ourselves.
A convenient way of generating test data is visiting http://generatedata.com which provides a web form where you can provide the columns and its corresponding data types, and turn them into test data. The website is capable of generating data in various formats such as Excel, HTML and JSON but for MySQL, you can choose either the CSV or SQL format.
The website limits you to generate up to a maximum of 100 rows. Fortunately, the web application is available for download so you can install it in your test server to generate data up to 100,000 rows instead.
Below are instructions on installing generatedata on CentOS 6:
1. Install Apache, PHP, MySQL and wget
# yum -y install mysql-server mysql httpd php php-mysql wget
2. Ensure Apache and MySQL runs on startup
# chkconfig httpd on # chkconfig mysqld on
3. Start Apache and MySQL
# service httpd start # service mysqld start
4. Create a MySQL user and database for the application
# mysql mysql> GRANT ALL PRIVILEGES ON generatedata.* TO generatedata@localhost IDENTIFIED BY ‘my-weak-password-please-do-not-use-this-password-in-a-production-environment’; mysql> FLUSH PRIVILEGES; mysql> CREATE DATABASE generatedata; mysql> quit;
5. Download generatedata source and place it in the web directory
# wget https://github.com/benkeen/generatedata/archive/3.0.8.tar.gz -O 3.0.8.tar.gz # tar xzvf 3.0.8.tar.gz # mv generatedata-3.0.8 /var/www/html/generatedata
6. Make the application readable and writable by Apache
# chown -R apache:apache /var/www/html/generatedata
7. Open your browser to access the web application. In my case, it’s http://192.168.1.91/generatedata. It will prompt you to supply mysql credentials. Supply the MySQL credentials and click the “Continue” button to proceed.
8. The next screen will prompt you to create a settings.php file which will be used to store the custom settings for the application which include access credentials to MySQL. Click the “Create File” button to proceed.
9. You can setup an anonymous account, single account with login or multiple accounts. For testing purposes, select the default option and click the “Continue” button.
10. The next step is to install the plugins used for generating data and exporting them in a particular format. Click the “Install Plugins” button to continue.
11. Once the plugins are generated, click the “Continue” button.
12. Finally, generatedata is installed. Click “Go to script” button to continue.
Below are samples of what generatedata can produce:
- Names: “Karleigh K. Valencia”, “Claire W. Woodard”, “Yvonne Wyatt”
- Date/Time: “2013-11-29 09:19:38″, “2013-11-29″, “11:05:53″
- Phone/Fax: “(191) 919-9508″, “(847) 807-6360″, “(366) 902-0912″
- Street Address: “P.O. Box 345, 8566 Mi St.”, “587-8731 Ultrices. Avenue”, “4612 Eu St.”
- Latitude/Longitude: “37.5863, -20.25932″, “-48.29183, -69.31125″, “-78.67594″
- Text: “Fusce aliquet magna a neque.”, “Nullam ut nisi a odio”, “mollis nec, cursus a, enim.”
- Currency: “$3364.88″, “$7849.22″, “1217.18″
- Alphanumeric: “RU384″, “GL941″, “HI144″
- AutoIncrement: 1, 2, 3, 4, 5, 6, 7
- Number Range: 1, 3, 8, 2, 14, 7
- Custom List: “Dr.”, “Mr.”, “Mrs.”, “Ms.”
For our example, let’s use the employees table structure from the Employees sample database.
1. Create the employees table under the test database
# mysql test mysql> CREATE TABLE employees ( emp_no INT NOT NULL, birth_date DATE NOT NULL, first_name VARCHAR(14) NOT NULL, last_name VARCHAR(16) NOT NULL, gender ENUM ('M','F') NOT NULL, hire_date DATE NOT NULL, PRIMARY KEY (emp_no) );
2. Enter the columns and corresponding datatypes in the web form. In this example, using the appropriate data type for each column is straightforward:
- emp_no: AutoIncrement, Start at 10000, Increment 1
- birth_date: Date, From 01/01/1960, 12/31/1990, Format Y-m-d
- first_name: Names, Alex(any gender)
- last_name: Names, Smith(surname)
- gender: Custom List, Exactly 1, Values M|F
- hire_date: Date, From 01/01/2000, To 12/11/2014, Format code Y-m-d
3. Export the data in MySQL Insert format. Under the EXPORTS TYPES section, select SQL tab and enter the following information:
- Database table: employees
- Database Type: MySQL
- Remove the tick on: Include CREATE TABLE query and Include DROP TABLE query
4. Click the “Generate” button to generate the desired data. You can now copy the generated output and paste this in the MySQL console. On the other hand, you can also choose to download the generated rows as a file.
CSV output
You can also opt to generate CSV output and use LOAD DATA INFILE to import the CSV data to your table. Under the EXPORT TYPES section, select the CSV tab. Inside the Generate frame, select “Prompt to download” and click the “Generate” button.
Upload the CSV file under the /tmp directory of your MySQL server and import it via the MySQL console:
# mysql test LOAD DATA INFILE '/tmp/dataDec-11-2013.csv' INTO TABLE employees FIELDS TERMINATED BY '|' IGNORE 1 LINES;
Generating more than 100,000 rows at a time
Suppose you want to generate more than 100,000 rows, you can modify settings.php under /var/www/html/generatedata/settings.php and append $maxGeneratedRows = <maxGeneratedRows>; to the configuration file. The example below increases the maximum number of generated rows to 150,000.
<?php $dbHostname = 'localhost'; $dbName = 'generatedata'; $dbUsername = 'generatedata'; $dbPassword = 'my-weak-password-please-do-not-use-this-password-in-a-production-environment'; $dbTablePrefix = 'gd_'; $encryptionSalt = 'ZPd'; $maxGeneratedRows = 150000;
Final Notes
There is just one major caveat to generating test data. Nothing compares to the actual data for reproducing query performance problems because its data and index cardinality will be entirely different from the generated test data. So, even if you’re able to fix the problem in your test environment, the solution may or may not apply to your production environment.
The post Generating test data for MySQL tables appeared first on MySQL Performance Blog.