Tuesday, 11 October 2016

// // Leave a Comment

Debug Slow Performance & High Resource Utilisation Issue in PostgreSQL

PostgreSQL DATABASE SERVER : LIST OF METRICS To MONITORDebug 

PostgreSQL Performance Issues

High CPU utilisation or poor performance of Postgresql database? Here is the guide to debug the issues.
Once I had the issue with AWS RDS instance on which the CPU utilisation was 100% even after changing it to m3.xlarge from t2.medium. The issue figured out to be some queries which were getting stuck and keeps on running for hours keeping the CPU busy. The same query when fired through console gives the output in 4-5 seconds which was also too much. Although trying the same query from console gets executed successfully but sometimes it was getting stuck and kept on running for hours.
Following are the debugging methods I tried to reach to the root cause of the problem and the fourth point helped me to get to the issue:

Watch System Factors

A comprehensive set of system performance metrics mainly for PostgreSQL are: Disk Space : You must have 10% disk space available for the Postgres Data Partition as disk space may fluctuate during Postgres vaccum-ing when high write loads. CPU Usage : High CPU usage will slowdown system performance as it also shows badly optimised queries which takes huge amount of CPU time. Bound CPU is the best situation for Postgres. I/O Usage : If your Postgres runs slow,first measure the CPU percentage of IO wait that indicates the amount of time machine is waiting for the disk

Watch Postgres Factors

1. Total Number of Connections 

max_connections determines the maximum number of concurrent transactions to a database server and give a list of clients which are leaking the database connections.
                            
SELECT count(*) FROM pg_stat_activity;
2. Number of Connections by state
 
Four possible states of connections are :
(a) active - Connection currently executing the queries of transaction.  
(b) idle - Connection not executing a transaction.
(c) idle in transaction - Connection in a long-running transaction i.i.e not executing query.
(d) idle in transaction(aborted) - Connection in a situation where transaction hasn't been rolled back due to an error.

SELECT state, count(*) FROM pg_stat_activity GROUP BY state;
3. Connections waiting for a lock

Blocked Connections waiting for a lock indicates slow execution of transaction with an exclusive lock.

SELECT count(distinct pid) FROM pg_locks WHERE granted = false
4. Maximum Transaction Age

Transactions should be as short as it will executed in less than a minute. Long-running transactions prevent Postgres from vaccum-ing old data, it may shutdown the database due to transaction ID(xid) wraparound. If this outputs more than one hour than this is a point of worry as the query is being running since that duration keeping the resources busy. Change the max age parameter (in application code) of the connection to lowest possible values like 2-3 seconds  as per the average response time of  your database query.

SELECT max(now() -xact_start) FROM pg_stat_activity WHERE state IN ('idle in transaction','active'); 
5.Checkpoint Interval

Frequent Checkpoints leads down performance.Postgres will display about those checkpoints in its log.
Also, you can check the frequency in the pg_stat_bgwriter table.

6. Query Execution Time

You have to measure it at the application level.Or, by setting and analysing the log queries periodically log_min_duration_statement=0 or by monitoring the pg_stat_statements module.     
     
As for me, the fourth point was the issue, that is some queries was running from long time and making CPU busy. The one possible solution for this issue is to find those queries and kill them This process is covered here. 

0 comments:

Post a Comment