Large MySQL Inserts with PHP
When you need to insert a massive amount of data into MySQL—say, one million records—the naive approach of executing individual INSERT queries will bring your system to its knees. Your disk will thrash, queries will time out, and you’ll wonder why databases are so slow.
They’re not. You’re just doing it wrong.
The Problem
Imagine inserting 3D coordinates for a million points:
for ($x = 0; $x < 100; $x++) {
for ($y = 0; $y < 100; $y++) {
for ($z = 0; $z < 100; $z++) {
$db->query("INSERT INTO location (locx, locy, locz) VALUES ($x, $y, $z)");
}
}
}
This executes one million separate queries. Each one requires a round trip to the database, and each write hits the disk. It’s painfully slow.
Solution 1: Batch Your INSERTs
MySQL allows multiple value sets in a single INSERT statement:
INSERT INTO location (locx, locy, locz) VALUES (0,0,0), (1,0,0), (2,0,0), ...
Group your inserts into batches of 100 or so values. This dramatically reduces the number of queries from one million to ten thousand.
Solution 2: Use Transactions
The real performance gain comes from wrapping your batched queries in transactions:
$db->beginTransaction();
$values = [];
$count = 0;
for ($x = 0; $x < 100; $x++) {
for ($y = 0; $y < 100; $y++) {
for ($z = 0; $z < 100; $z++) {
$values[] = "($x, $y, $z)";
$count++;
if ($count >= 100) {
$db->query("INSERT INTO location (locx, locy, locz) VALUES " . implode(',', $values));
$values = [];
$count = 0;
}
}
}
}
// Don't forget remaining values
if (!empty($values)) {
$db->query("INSERT INTO location (locx, locy, locz) VALUES " . implode(',', $values));
}
$db->commit();
With transactions, the database buffers your writes in RAM and commits them in one large operation rather than a million small incremental disk writes.
Results
Using this approach with Zend Framework’s database adapter, I was able to insert one million rows in under 30 seconds on modest hardware.
For my needs, this did the job. The combination of batch INSERT statements with transaction boundaries avoids excessive disk I/O while keeping query sizes reasonable.