How to configure Solr Index Sharding in Alfresco
Objectives
Solr is used as search engine in Alfresco. Overview concept is describe here: solr-overview.html. There are two cores:
All types of architecture is described in document:
The most important thing is how does it work? Usually one request is executed in one core. Big Lucene query can be executed very long time. If we use index shards the query will be executed in defined numbers of separated processes across the Nodes.
Finally you can read more about "Creating Solr shards" here: https://docs.alfresco.com/search-community/tasks/solr-hash-shard.html
So, everything works as we want to :)
- alfresco - used for searching all live content
- archive - used for searching content that has been marked as deleted
So, when there is too much documents stored in Alfresco i.e. over 100 million of documents, then one instance of Solr could works slowly. There is a few possibility to change Alfresco architecture to increase performance:
- Enterprise
- Enterprise - scaled
- Replicated Index
- Shared Index
I'm going to use my previous post "Alfresco new-project" as a base to this post and build Shared Index architecture.
For simplification I've changed approach to selected architecture - I focus on Solr but web layer and DB layer are designed in my example without replication and failover. I have only Alfresco Content Repository and only One Alfresco Share.
For simplification I've changed approach to selected architecture - I focus on Solr but web layer and DB layer are designed in my example without replication and failover. I have only Alfresco Content Repository and only One Alfresco Share.
The most important thing is how does it work? Usually one request is executed in one core. Big Lucene query can be executed very long time. If we use index shards the query will be executed in defined numbers of separated processes across the Nodes.
Finally you can read more about "Creating Solr shards" here: https://docs.alfresco.com/search-community/tasks/solr-hash-shard.html
The definition of Index Shards
At the end I would like to have 4 Nodes, 8 index shards, 3 replicas of each index shard. So, definition of index shards should be similar to below table:
Node 1
|
Node 2
|
Node 3
|
Node 4
|
0
|
1
|
0
|
0
|
1
|
2
|
2
|
1
|
2
|
3
|
3
|
3
|
4
|
5
|
4
|
4
|
5
|
6
|
6
|
5
|
6
|
7
|
7
|
7
|
Lets start
At the beginning download Alfresco Search Services - Solr 6 (https://download.alfresco.com/cloudfront/release/community/201806-GA-build-00113/alfresco-search-services-1.1.1.zip)
Then unzip archive and copy them to the four separate folders
Go to each Solr and run the instance:
Next call the configuration using URL requests:
Each request should present response similar to below output:
The default index sharding method is DB_ID. You can read more about available methods here: https://docs.alfresco.com/6.0/concepts/solr-shard-approaches.html
It is necessary to check Solrcore.properties
Next step is to switch to Alfresco application and configure alfresco-global.properties to use previously created Solr Nodes
Then unzip archive and copy them to the four separate folders
Go to each Solr and run the instance:
·
Solr start -p 8091
·
Solr start -p 8092
·
Solr start -p 8093
·
Solr start -p 8094
Each request should present response similar to below output:
The default index sharding method is DB_ID. You can read more about available methods here: https://docs.alfresco.com/6.0/concepts/solr-shard-approaches.html
It is necessary to check Solrcore.properties
Next step is to switch to Alfresco application and configure alfresco-global.properties to use previously created Solr Nodes
The results
It is necessary to test our new configuration. Lets add a few new documents to Alfresco using ShareLets examine our indexes in Solr
So, everything works as we want to :)