Showing posts with label Solr sharding. Show all posts
Showing posts with label Solr sharding. Show all posts

Friday, May 8, 2020

How to configure Solr Index Sharding in Alfresco

How to configure Solr Index Sharding in Alfresco 

Objectives 

Solr is used as search engine in Alfresco. Overview concept is describe here: solr-overview.html. There are two cores:


  • alfresco - used for searching all live content
  • archive - used for searching content that has been marked as deleted

So, when there is too much documents stored in Alfresco i.e. over 100 million of documents, then one instance of Solr could works slowly. There is a few possibility to change Alfresco architecture to increase performance:
  • Enterprise
  • Enterprise - scaled
  • Replicated Index
  • Shared Index

All types of architecture is described in document:

I'm going to use my previous post "Alfresco new-project" as a base to this post and build Shared Index architecture.

For simplification I've changed approach to selected architecture - I focus on Solr but web layer and DB layer are designed in my example without replication and failover. I have only Alfresco Content Repository and only One Alfresco Share.





The most important thing is how does it work? Usually one request is executed in one core. Big Lucene query can be executed very long time. If we use index shards the query will be executed in defined numbers of separated processes across the Nodes. 

Finally you can read more about "Creating Solr shards" here: https://docs.alfresco.com/search-community/tasks/solr-hash-shard.html


The definition of Index Shards

At the end I would like to have 4 Nodes, 8 index shards, 3 replicas of each index shard. So, definition of index shards should be similar to below table:


Node 1
Node 2
Node 3
Node 4
0
1
0
0
1
2
2
1
2
3
3
3
4
5
4
4
5
6
6
5
6
7
7
7
  


Lets start

 At the beginning download Alfresco Search Services - Solr 6 (https://download.alfresco.com/cloudfront/release/community/201806-GA-build-00113/alfresco-search-services-1.1.1.zip)  

Then unzip archive and copy them to the four separate folders












Go to each Solr and run the instance:
·         Solr start -p 8091
·         Solr start -p 8092
·         Solr start -p 8093
·         Solr start -p 8094



Next call the configuration using URL requests:






Each request should present response similar to below output:


The default index sharding method is DB_ID. You can read more about available methods here: https://docs.alfresco.com/6.0/concepts/solr-shard-approaches.html 
It is necessary to check Solrcore.properties  






Next step is to switch to Alfresco application and configure 
alfresco-global.properties to use previously created Solr Nodes















The results

It is necessary to test our new configuration. Lets add a few new documents to Alfresco using Share





























Lets examine our indexes in Solr


















































There are created cores and indexes. 
 
So, everything works as we want to :)