|

Scaling Cron Jobs with PostgreSQL Transaction Locks

Ever scaled your app to multiple instances and suddenly your cron jobs run multiple times? Yeah, that’s a fun bug to debug in production. Let me show you how I fixed this with PostgreSQL transaction locks.

The problem

You’ve got a cron job that runs every hour. Works great on one server. Then you scale to 3 instances for high availability. Now that job runs 3 times every hour. Oops.

Maybe it’s sending emails (users get 3 copies), processing payments (charge them 3 times), or generating reports (waste of resources). Not good.

The naive solution (that doesn’t work)

First thing you might try: “I’ll just check if the job is already running!”

const isRunning = await db.query(
  'SELECT * FROM job_status WHERE job_name = $1 AND status = $2',
  ['send-emails', 'running']
)

if (isRunning.rows.length > 0) {
  console.log('Job already running, skipping')
  return
}

await db.query(
  'INSERT INTO job_status (job_name, status) VALUES ($1, $2)',
  ['send-emails', 'running']
)

// Do the actual work
await sendEmails()

Looks good, right? Wrong. Race condition. Both instances check at the same time, both see no running job, both start. You just made the problem worse because now you think it’s fixed.

Enter PostgreSQL advisory locks

PostgreSQL has this thing called advisory locks. They’re application-level locks that let you say “only one process can do this thing at a time.”

Here’s the magic:

import { Pool } from 'pg'

const pool = new Pool({
  connectionString: process.env.DATABASE_URL
})

async function runJobWithLock(jobName: string, jobFn: () => Promise<void>) {
  const client = await pool.connect()
  
  try {
    // Try to get an advisory lock
    // pg_try_advisory_lock returns true if we got the lock, false if someone else has it
    const lockId = hashJobName(jobName) // Convert job name to a number
    const result = await client.query(
      'SELECT pg_try_advisory_lock($1) as locked',
      [lockId]
    )
    
    if (!result.rows[0].locked) {
      console.log(`Job ${jobName} already running on another instance, skipping`)
      return
    }
    
    console.log(`Got lock for ${jobName}, running job`)
    await jobFn()
    
  } finally {
    // Always release the lock
    await client.query('SELECT pg_advisory_unlock($1)', [lockId])
    client.release()
  }
}

function hashJobName(name: string): number {
  // Simple hash function to convert string to number
  let hash = 0
  for (let i = 0; i < name.length; i++) {
    hash = ((hash << 5) - hash) + name.charCodeAt(i)
    hash = hash & hash // Convert to 32bit integer
  }
  return Math.abs(hash)
}

Now use it:

// In your cron job
cron.schedule('0 * * * *', async () => {
  await runJobWithLock('send-emails', async () => {
    await sendEmails()
  })
})

How it works

  1. Instance A tries to get the lock - succeeds, starts the job
  2. Instance B tries to get the lock - fails, skips the job
  3. Instance C tries to get the lock - fails, skips the job
  4. Instance A finishes, releases the lock
  5. Next hour, any instance can get the lock

No race conditions. No duplicate jobs. Clean.

Transaction-level locks for more control

Sometimes you need finer control. Maybe you want to lock specific rows while processing them.

async function processOrders() {
  const client = await pool.connect()
  
  try {
    await client.query('BEGIN')
    
    // Lock rows for update - other transactions will wait
    const result = await client.query(`
      SELECT * FROM orders 
      WHERE status = 'pending' 
      AND processed_at IS NULL
      FOR UPDATE SKIP LOCKED
      LIMIT 100
    `)
    
    for (const order of result.rows) {
      await processOrder(order)
      
      await client.query(
        'UPDATE orders SET status = $1, processed_at = NOW() WHERE id = $2',
        ['processed', order.id]
      )
    }
    
    await client.query('COMMIT')
    
  } catch (error) {
    await client.query('ROLLBACK')
    throw error
  } finally {
    client.release()
  }
}

The FOR UPDATE SKIP LOCKED part is key:

  • FOR UPDATE: Locks the rows
  • SKIP LOCKED: If another transaction already locked them, skip them instead of waiting

This means multiple instances can process different orders at the same time. No conflicts, no duplicates.

Row-level locking patterns

Here’s a pattern I use for processing queues:

async function processQueue() {
  const client = await pool.connect()
  
  try {
    await client.query('BEGIN')
    
    // Grab the next item that's not locked
    const result = await client.query(`
      UPDATE queue_items
      SET 
        status = 'processing',
        locked_at = NOW(),
        locked_by = $1
      WHERE id = (
        SELECT id FROM queue_items
        WHERE status = 'pending'
        ORDER BY created_at ASC
        FOR UPDATE SKIP LOCKED
        LIMIT 1
      )
      RETURNING *
    `)
    
    if (result.rows.length === 0) {
      await client.query('COMMIT')
      return // Nothing to process
    }
    
    const item = result.rows[0]
    
    // Do the work
    await processItem(item)
    
    // Mark as done
    await client.query(
      'UPDATE queue_items SET status = $1, completed_at = NOW() WHERE id = $2',
      ['completed', item.id]
    )
    
    await client.query('COMMIT')
    
  } catch (error) {
    await client.query('ROLLBACK')
    throw error
  } finally {
    client.release()
  }
}

Handling stuck locks

What if a process crashes while holding a lock? Advisory locks are automatically released when the connection closes. But row locks in transactions? You need a timeout.

// Set a statement timeout
await client.query('SET statement_timeout = 30000') // 30 seconds

// Or set a lock timeout
await client.query('SET lock_timeout = 5000') // 5 seconds

For stuck queue items, add a cleanup job:

// Run every 5 minutes
cron.schedule('*/5 * * * *', async () => {
  await db.query(`
    UPDATE queue_items
    SET status = 'pending', locked_at = NULL, locked_by = NULL
    WHERE status = 'processing'
    AND locked_at < NOW() - INTERVAL '10 minutes'
  `)
})

What I learned

Use advisory locks for job-level locking: One job runs at a time across all instances.

Use row-level locks for queue processing: Multiple instances can process different items simultaneously.

Always use SKIP LOCKED: Don’t make instances wait for each other.

Set timeouts: Prevent stuck locks from blocking everything.

Clean up stuck items: Have a separate job that resets items that got stuck.

Test with multiple instances: Spin up 3 instances locally and make sure jobs don’t duplicate.

Monitor lock contention: If you see a lot of skipped locks, you might need to rethink your approach.

Wrapping up

Scaling cron jobs is tricky. PostgreSQL gives you the tools to do it right - advisory locks for job-level coordination, row-level locks for queue processing.

No more duplicate jobs. No more race conditions. Just clean, scalable cron jobs.

Keep pushing forward and savor every step of your coding journey.