ylliX - Online Advertising Network
How to build GenAI mock server?

How do I store about 5 billion data points in meilisearch?


I have made an attempt at storing yearly company financials data in MeiliSearch. A typical entry for a single company might look like this in json format

[{
    "CompanyNumber":"DE123456",
           "FinancialModel": [{
                    "DataPointName":"CashInHand",
                    "Year":2021,
                    "Value":50000
            },
            {  
                    "DataPointName":"Credit",
                    "Year":2021,
                    "Value":1000000
    ...a few hundred more entries for this company
      
      }]
    ...more json array items for a few million more companies
    }]

My requirement is that I want to be able to run queries across any datapoint name such as

Find all companies that had over 1 million CashInHand in Year 2021.

Find all companies that paid over 50K in AccountantFee in Year 2022.

DataPointName can have one of any 3000 text values but these change every year so it would be difficult to create a filter specifically on an actual datapoint name like CashInHand. Not impossible , but difficult so I am exploring all available options and if there is no straightforward way to achieve my requirements then I’d need to look into that.

I do have full control over the data structure if any flattening or changes are needed.

If I store each json array item above as a single document I worry that I might soon exceed the max docs per index limit which is around 4.2 billion.

Currently, the data above is stored as one document per company with an array of objects that contains all the datapointnames, values and year but because it is an array, I cannot run a query like

DataPointName=CashInHand And Value>400000 AND Year=2021

Thanks



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *