Way to read High Record volumes in Salesforce

Situation

Finding out the way to read high record volumes (<2.5 million) in Salesforce through Adeptia Suite in chunks.

Solution:

For reading large volume data from Salesforce in chunks user should use Bulk API: PK Chunking and it is supported for the following objects in Salesforce: Account, Campaign, CampaignMember, Case, CaseHistory, Contact, Event, EventRelation, Lead, LoginHistory, Opportunity, Task, User, and custom objects.

For example, if a user is using AIS to pull information (i.e. Object IDs) from Salesforce and dumping the information in local DB instances. By using this dumped data the operation of reading large data can be carried out.

Let us say you enable PK chunking for the following query on an Account table with 10,000,000 records.

SELECT Name FROM Account

Assuming a chunk size of 250,000 and a starting record ID of 001300000000000, the query is split into the following 40 queries. Each query is submitted as a separate batch.

SELECT Name FROM Account WHERE Id >= 001300000000000 AND Id < 00130000000132G
SELECT Name FROM Account WHERE Id >= 00130000000132G AND Id < 00130000000264W
SELECT Name FROM Account WHERE Id >= 00130000000264W AND Id < 00130000000396m
...
SELECT Name FROM Account WHERE Id >= 00130000000euQ4 AND Id < 00130000000fxSK
Each query executes on a chunk of 250,000 records specified by the base-62 ID boundaries.

PK chunking is designed for extracting data from entire tables, but you can also use it for filtered queries. As records could be filtered from each query’s results, the number of returned results for each chunk can be less than the chunk size. Also, the IDs of soft-deleted records are counted when the query is split into chunks, but the records are omitted from the results. Therefore, if soft-deleted records fall within a given chunk’s ID boundaries, the number of returned results is less than the chunk size.

The default chunk size is 100,000 and the maximum size is 250,000. The default starting ID is the first record in the table. However, you can specify a different starting ID to restart a job that failed between chunked batches.

When a query is successfully chunked, the original batch’s status shows as NOT_PROCESSED. If the chunking fails, the original batch’s status shows as FAILED, but any chunked batches that were successfully queued during the chunking attempt are processed as normal. When the original batch’s status is changed to NOT_PROCESSED, monitor the subsequent batches. You can retrieve the results from each subsequent batch after it’s completed. Then you can safely close the job.

It is recommended that you should enable PK chunking when querying tables with more than 10 million records or when a bulk query consistently times out. However, the effectiveness of PK chunking depends on the specifics of the query and the queried data.

Note: Refer to this link for detailed description related to this Bulk API: PK Chunking.

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.