file storage limits on confluence - feasibility

francis
Atlassian Partner
February 20, 2016

We have 350.000 files, distributed over 54.000 different folders for a total storage capacity of 750 GB.

  • Would Confluence be able to handle this amount of files (ie - with one page per folder)
  • What's the impact on the index (size, ...)
  • Anyone experience migrating this amount towards confluence. 

 

 

 

1 answer

0 votes
nriley
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
February 25, 2016

A very interesting problem indeed!

A simple question to start with: Would each of these 54,000 pages benefit from collaboration that Confluence provides?  I mean to say, would you imagine each of these pages requiring further conversation above and beyond simply adding a comment here or there?

Next question would be around storage: would you imagine that Confluence would be storing these files?  Would creating a page with a link to each of the 54,000 folders be adequate enough to make the jump into the relevant content? Or would you see this content embedded in some way where the contents were visible?

All of the answers to the above would pick the platform(s) you might use, and subsequently the answer of index/performance/migration would become a bit more clear I think.

Peter Hertogen
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
March 1, 2016

Thanks @Neal Riley

  1. usage: no, the majority of these 54000 pages will be just read only (or better download only) pages. Let's say that about 10% of those will be collaboration pages where people will use comments to cooperate.
  2. storage: again, I think that for most of the files it would be sufficient to have a hyperlink to jump to the relevant content. So let's assume that 70% of the content could live in another repo, and that 30% is stored in Confluence. What other file storage do you then refer to? Git, using LFS? Or more classic webdav linking?

 

 

nriley
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
March 1, 2016

Ok let's work backwards

  • Storage: The next question then is where/how do these files exist today, and who uses them?  This would dictate how one would store such a file repository.  I would caution not to use Confluence as a CDN out of the box.  Something like Bitbucket with the recent LFS support might be a good fit, but again it depends on how these files are (re)used outside of the Atlassian ecosystem.
  • Usage: Assuming the answers to the above, next would be to ask: Would you need to automatically create/update 5400 pages as files change/move etc.  Or would it be a better UX to make the process of creating a page (automatically linked to the file's current location) in Confluence so that collaboration can happen when the user needs it?  My gut says the second option would be most ideal, but this will depend on the customers requirements.
Peter Hertogen
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
March 4, 2016

Neal,

storage: today these files live in a custom CMS. Now they want one big repo for their files and want to have collaboration. So basically seen the number of files, they want to use Confluence as a DMS.

If we would be using the Git plugin, or webdav integration, are the files than still indexed by Confluence, I mean could you still search within these files?

usage: it will be a deep nested hierarchy of pages: starting from a company homepage, than having different product families, than page per product, etc.

The numbers given above in the initial question where only for 1 product family (but it's the biggest one). They guess that the total amount of files and folders for all product families is about a factor 2 to 3 more.

Do you have numbers on the biggest Confluence install out there?

thanks

nriley
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
March 8, 2016

As per the Git plugin, I would check with the vendor (http://addons.avisi.com/git-for-confluence/documentation/) to see whether they have integrated Git all the way to the Lucene index for searching.  

Confluence inherently checks Page Content/etc. , and depending on which indexing module is enabled/disabled will scan certain file types.  One could write a custom extractor using the following information: https://developer.atlassian.com/confdev/confluence-plugin-guide/confluence-plugin-module-types/extractor-module as a plugin if the information that was needed was not automatically extracted.

54,000 pages, representing the largest product as you say, is quite large, but I still question whether the entirety of such as system would necessarily need to be migrated.  In fact, your example highlights a perfect reason why the "biggest Confluence install out there" is an slightly misleading metric: Performance is impacted by the size, complexity, overall use, the underlying infrastructure, etc. etc.  I would highly suggest that such an install like this would require Confluence Data Center, but this would need to be accurately determined further along in the testing phase.

 

Thomas Bithell
Contributor
November 15, 2018

@nriley how about just simply stating what the default storage limit is for a Confluence Space. Answering questions with questions just muddled this. We are looking to understand the storage limitations.

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events