Use SQL to quickly get information out of Confluence

All too often, speed and results take precedence over learning other (often better) ways of doing things. Such is the case with getting information out of Confluence. Ideally, I would be a whiz at the Reporting plugin and we would have installed all of the Confluence plugins that could be useful to Administrators, regardless of how resource-hungry or insecure they are. But, that ideal world has yet to arrive.

Luckily accessing Confluence’s data model through SQL is pretty easy to do.

The following examples work under SQL Server 2005’s Transact-SQL and should be adaptable for another DBMS.

Some lines may be commented (as indicated by ‘–‘ at the beginning of the line). I’ve left these in to demonstrate how you could further qualify the query.

Be sure to replace ‘your.wiki.com’ with your Wiki’s Base URL as shown in the Admin Console under General Configuration.

Attachments

Identify pages having a large total size of its attachments

SELECT S.spacekey, C.title, SUM(A.filesize) AS TotalSizeOfAttachments FROM attachments A 
INNER JOIN content C ON A.pageid = C.contentid 
INNER JOIN spaces S ON C.spaceid = S.spaceid 
GROUP BY S.spacekey, C.title 
ORDER BY SUM(A.filesize) DESC, S.spacekey, C.title

Identify pages having a large number of attachments

SELECT S.spacekey,
  C.title,
  C.contentid,
  COUNT(*) AS NumberOfAttachmentsOnPage,
  'http://your.wiki.com/pages/viewpageattachments.action?pageId=' +
     CONVERT(VARCHAR(19),C.contentid) + '&sortBy=size' AS ViewURL
FROM attachments A
INNER JOIN content C ON A.pageid = C.contentid
INNER JOIN spaces S ON C.spaceid = S.spaceid
GROUP BY S.spacekey, C.title, C.contentid
ORDER BY COUNT(*) DESC, S.spacekey, C.title, C.contentid

Content

Find content having a particular title

SELECT S.spacekey,
  C.title,
  'http://your.wiki.com/confluence/pages/viewpage.action?pageId=' +
    CONVERT(VARCHAR(10),C.contentid) AS ViewPageURL
FROM content C
LEFT OUTER JOIN spaces S ON C.spaceid = S.spaceid
WHERE C.title LIKE '%All Project Vitals%'
ORDER BY S.spacekey, C.title

 

Identify pages using particular string in content body

This output of this query includes 2 extra columns: one with the URL to view the page and another with the URL to edit the page. (This technique is described here.)

DECLARE @myString1 VARCHAR(20)
DECLARE @myString1Like VARCHAR(30)
SET @myString1 = '{cache'
SET @myString1Like = '%{cache%'

SELECT
  S.spacekey,
  C.title,
  'http://your.wiki.com/pages/editpage.action?pageId=' +
    CONVERT(CHAR(10),C.contentid) AS EditPageURL,
  'http://your.wiki.com/display/' +
    S.spacekey + '/' + REPLACE(C.title,' ','+') AS ViewPageURL,
  --C.creator AS InitialAuthor, C.lastmodifier AS LastChangedBy,
  SUBSTRING(X.body, X.BodyStart1, 60) AS ExcerptFromPageContent
FROM (SELECT
    contentid,
    Body,
    BodyStart1 =
      CASE
        WHEN CHARINDEX(@myString1, body, 1) > 0
        WHEN CHARINDEX(@myString1, body, 1) - 20
      ELSE 0
      END
    FROM bodycontent WHERE body LIKE @myString1Like) AS X
    INNER JOIN content C ON X.contentid = C.contentid
    INNER JOIN spaces S ON C.spaceid = S.spaceid
    WHERE C.content_status = 'current'
    AND S.spacekey IN ('refCPU','refHADCWP','refPRMD')
--AND S.spacekey IN ('lethbridgecc')
ORDER BY 1, 2

 

Identify pages that were updated on a particular day

SELECT C.contentid,
  C.lastmoddate,
  C.lastmodifier,
  C.title,
  S.spacekey
FROM content C
INNER JOIN spaces S ON C.spaceid = S.spaceid
WHERE C.contenttype = 'PAGE'
  AND content_status = 'current'
  AND S.spacekey = 'sandbox'
  AND Year(C.lastmoddate) = 2015
  AND month(C.lastmoddate) = 05
  AND Day(C.lastmoddate) >= 24
ORDER BY 1, 2

 

Notifications

Determine notifications in effect for a particular user

If PAGEID and SPACEID are both null, it’s a daily email, I believe. If only the PAGEID is non-null, it’s a page watch. If only the SPACEID is non-null, it’s a space watch.

SELECT S.spacekey, S.spacename, N.*
FROM notifications N
LEFT OUTER JOIN spaces S ON N.spaceid = S.spaceid
WHERE username = 'yk68'

Spaces

Determine the last date on which content was modified in each space

SELECT S.spacename, MAX(C.lastmoddate)
FROM content C
INNER JOIN spaces S ON C.spaceid = S.spaceid
GROUP BY S.spacename
ORDER BY S.spacename

Get count of pages in each space

SELECT S.spacekey, COUNT(*)
FROM content C
INNER JOIN spaces S ON C.spaceid = S.spaceid
WHERE C.contenttype = 'PAGE' AND content_status = 'current'
GROUP BY S.spacekey
ORDER BY S.spacekey

Get a list of Space Administrators

This example assumes a naming convention whereby Space Administrators will always be in a group name-admins.

SELECT G.groupname, EE.name
FROM external_entities EE
LEFT OUTER JOIN external_members EM ON EE.id = EM.extentityid
LEFT OUTER JOIN groups G ON EM.groupid = G.id
WHERE G.groupname LIKE '%-admins'
ORDER BY G.groupname, EE.name

Manually purge a space’s trash

Unfortunately, Confluence currently dies when trying to purge the Trash in a space that has a lot of pages in the Trash. This query does the equivalent.

This query will update your database!

-- Find spaces with largest amount of pages in Trash
SELECT COUNT(*) AS NbrOfTrashPages, spacekey, spacename
FROM content
INNER JOIN spaces on content.spaceid = spaces.spaceid
WHERE content_status = 'deleted'
GROUP BY spacekey, spacename
ORDER BY COUNT(*) desc, spacekey, spacename

--Run this, substituting the spacekey(s) found
--in the first SQL statement that had
--the largest number of pages in the track
SELECT * FROM spaces WHERE spacekey IN ('refcpu','refhadcwp')

--Run the following statements, a few at a time,
--after setting @spaceid to one of the
--SPACEID values found in the previous query.
DECLARE @spaceid VARCHAR(60)

SET @spaceid = 70385685

-- Have to delete rows pertaining to deleted pages
-- in other tables which have a foreign key constraint to
-- CONTENT.contentid
DELETE FROM bodycontent
WHERE contentid IN
  (SELECT contentid FROM content
    WHERE content_status = 'deleted' AND spaceid = @spaceid)
DELETE FROM content_label
WHERE contentid IN
  (SELECT contentid FROM content
  WHERE content_status = 'deleted' AND spaceid = @spaceid)
DELETE FROM links
WHERE contentid IN
  (SELECT contentid FROM content
  WHERE content_status = 'deleted' AND spaceid = @spaceid)
DELETE FROM attachmentdata
WHERE attachmentid IN
  (SELECT attachmentid FROM attachments
  WHERE pageid IN
    (SELECT contentid FROM content
    WHERE content_status = 'deleted' AND spaceid = @spaceid))
DELETE FROM attachments
WHERE pageid IN
  (SELECT contentid FROM content
    WHERE content_status = 'deleted' AND spaceid = @spaceid)
DELETE FROM extrnlnks
WHERE contentid IN
  (SELECT contentid FROM content
  WHERE content_status = 'deleted' AND spaceid = @spaceid)

-- Work on Comments associated with deleted pages
DELETE FROM bodycontent
WHERE contentid IN
 (SELECT contentid FROM content
 WHERE pageid IN
   (SELECT contentid FROM content
   WHERE content_status = 'deleted' AND spaceid = @spaceid))
DELETE FROM links
WHERE contentid IN
  (SELECT contentid FROM content
    WHERE pageid IN
    (SELECT contentid FROM content
    WHERE content_status = 'deleted' AND spaceid = @spaceid))

-- If a page, in any status, points back to a deleted page via its
-- PREVVER column, set the PREVVER to Null to break that link
UPDATE content
SET prevver = NULL
WHERE prevver IN
  (SELECT contentid FROM content
  WHERE content_status = 'deleted' AND spaceid = @spaceid)
DELETE FROM content
WHERE pageid IN
  (SELECT contentid FROM content
  WHERE content_status = 'deleted' AND spaceid = @spaceid)

DELETE FROM notifications
WHERE pageid IN
  (SELECT contentid FROM content
  WHERE content_status = 'deleted' AND spaceid = @spaceid)

-- Re: CONFANCESTORS: It looks like a page is removed
-- as a descendent of another page when it goes into
-- a deleted status. It remains in that table with its
-- ancestors so, if restored, Confluence knows where to
-- put it.
DELETE FROM confancestors
WHERE descendentid IN
  (SELECT contentid FROM content
   WHERE content_status = 'deleted' AND spaceid = @spaceid)
DELETE FROM confancestors
WHERE ancestorid IN
  (SELECT contentid FROM content
  WHERE content_status = 'deleted' AND spaceid = @spaceid)

-- Delete restrictions that may be in effect for deleted pages
DELETE from content_perm
WHERE cps_id IN
  (SELECT id FROM content_perm_set
  WHERE content_id IN
    ((SELECT contentid FROM content
    WHERE content_status = 'deleted' AND spaceid = @spaceid)))
DELETE FROM content_perm_set
WHERE content_id IN
  (SELECT contentid FROM content
  WHERE content_status = 'deleted' AND spaceid = @spaceid)

SELECT * FROM content
WHERE content_status = 'deleted' AND spaceid = @spaceid
DELETE FROM content
WHERE content_status = 'deleted'
  AND spaceid = @spaceid
  AND title <> 'Home Page'

-- See what's left!
SELECT content_status, count(*)
FROM content
WHERE spaceid = @spaceid
GROUP BY content_status
ORDER BY content_status

Remove all Page watches and Space watches

This is useful if you’ve copied your Production data to a non-Production environment. Change ‘SELECT’ to ‘DELETE’ once you’ve verified the output is as you want it.

This query will update your database!

SELECT *
FROM notifications
WHERE pageid IS NOT NULL OR spaceid IS NOT NULL

Miscellaneous

Find pages and contents of tables built with DynamicTable macro.

These could have been broken by an upgrade of the Java JVM.

SELECT S.spacename,
  S.spacekey,
  C.title AS PageTitle,
  C.version,
  PE.entity_key,
  PE.text_val
FROM os_propertyentry PE
INNER JOIN content C ON PE.entity_id = C.contentid
INNER JOIN spaces S ON C.spaceid = S.spaceid
WHERE PE.entity_key LIKE 'revere.dynamictable%'
ORDER BY S.spacename, S.spacekey, C.title, PE.entity_key

, , ,

8 Responses to Use SQL to quickly get information out of Confluence

  1. Billy March 10, 2011 at 9:54 am #

    Great Post! One of the few places I have been able to find which actually presents SQL queries to get data out of Confluence rather than just recommending plugin after plugin to do so. It might be a year old, but this article is still useful! Thanks!

  2. Bruce February 1, 2012 at 1:42 pm #

    Here’s a revised notifications query to show Page Titles:

    SELECT S.SPACEKEY, S.SPACENAME, C.TITLE, N.PAGEID, N.SPACEID, N.USERNAME
    FROM NOTIFICATIONS N
    LEFT OUTER JOIN SPACES S ON N.SPACEID = S.SPACEID
    LEFT OUTER JOIN CONTENT C ON N.PAGEID = C.CONTENTID
    WHERE (N.PAGEID IS NOT NULL OR N.SPACEID IS NOT NULL)
    AND N.USERNAME = ‘bruce.michelsen’
    ORDER BY S.SPACENAME, C.TITLE

    • Betsy February 1, 2012 at 2:21 pm #

      Thanks, Bruce!

  3. Gary Lineker April 26, 2012 at 9:12 am #

    Hi All,

    I would like to know which Confluence space does a group have access to? In Jira it can be check which project does a group access to. Is there an SQL select for Confluence too?

    I need to know this because we manage some groups only for page restriction purpose so we would like to know which group gives space permission.

    Can you make a suggestion?

    Thanks in advance!
    Gary

    • Betsy August 3, 2012 at 6:38 am #

      I think the below query would do what you want. Just replace ‘confluence-users’ with the group in which you’re interested. (You also look for ‘permgroupname IS NULL’ if you wanted to see which spaces allow for Anonymous User access.)

      Use the “permtype = ‘VIEWSPACE’ AND permusername IS NULL” part of the WHERE clause if you’re interested in finding out if the group has only a certain type of permission.

      Use the “permgroupname = ‘confluence-users’ OR permgroupname IS NULL” part of the WHERE clause if you’re interested in finding out about a certain group’s access (confluence-users, in this case) or Anonymous User access (permgroupname IS NULL).

      For the latter, SP.permtype can be one of these values: COMMENT, CREATEATTACHMENT, CREATESPACE, EDITBLOG, EDITSPACE, EXPORTPAGE, EXPORTSPACE, REMOVEATTACHMENT, REMOVEBLOG, REMOVECOMMENT, REMOVEMAIL, REMOVEPAGE, SETPAGEPERMISSIONS, SETSPACEPERMISSIONS, SYSTEMADMINISTRATOR, USECONFLUENCE, or VIEWSPACE.

      (Note: EDITSPACE corresponds to the Create Page permission.)

      SELECT S.spacekey, COUNT(SP.permtype)
      FROM spaces S
      INNER JOIN spacepermissions SP ON S.spaceid = SP.spaceid
      WHERE permtype = ‘VIEWSPACE’ AND permusername IS NULL
      AND
      (permgroupname = ‘confluence-users’ OR permgroupname IS NULL)
      GROUP BY S.spacekey
      ORDER BY S.spacekey

  4. Graham Hannington July 26, 2012 at 1:26 am #

    Hi Betsy, thank you very much for these SQL queries. You might be interested in a comment that I’ve just added to an Atlassian Confluence web page, with a link to this page.

  5. Graham Hannington August 3, 2012 at 1:19 am #

    Hi again Betsy,

    Regarding your SQL query “Identify pages using particular string in content body”: you might well feel that “user beware” applies here (fair enough), and there are ways to restrict access to the SQL macros and their data sources, but you might wish to consider mentioning that such queries bypass Confluence permissions.

    For example, your query can return excerpts from pages that Confluence would not otherwise permit the user to view. Sure, the user won’t be able to view or edit the entire page “natively”, in Confluence, but – without security measures – the user could tweak the query to get the entire content (storage format XML) of such pages. Imagine a disgruntled employee entering “password” as a search string.

    • Betsy August 3, 2012 at 6:23 am #

      Excellent point, Graham. All the above SQL queries were intended for use by a Confluence Administrator who understands the risks (and rewards) of directly looking at the Confluence database… and who knows to test a query thoroughly for performance and accuracy before attempting it in a Production environment!

Leave a Reply

Simple Share ButtonsShare:
Simple Share Buttons