Large database handling

This is my first challenge with a large database. The database has some 4,5 million rows. So far, none of the machines I am using for testing isn’t able to handle it crushing at every command: a M3 MacBook, an I5 13XXX with 48 GB RAM and an HP Miniserver gen 11 with Intel Xeon and 32 GB of RAM. So, if someone here can provide me with a minimum tech specs for a machine to be able to handle it? Basic selection works, but I need to perform some operation over it like merging 2 tables, adding new columns. This database is generate every 30-60 days, so I would have to perform these operations accordingly. SO far I am using mysql, but im open to suggestions.

we must assume that you are using Virtualmin?
and on a Virtual Machine or is this a Home setup?
what is the size of the DB?

sorry, it is MySQL - that should not be a problem as although Virtualmin defaults to MariaDB (they are essentially the same) I assume the MySQL is up to date.

I guess that answers that one, I also have a large database not quite 4.5M rows, but mysql does struggle with joins in queries, I’m going to test out storing the the data using sqlite to see if that makes difference

yes, it helps (must get my eyes tested) :old_man:

the size really should not be a problem - it would be more about what is being done.

I am not understanding why the operation has to be repeated every 3-60 days (surely once restructured it would be a one-off)

Database size is around 1 GB. So far I just testing, one of the Machine (I5) is replication of the production server with Virtualmin. What I am trying to do when a new database is published:

  1. database comes as csv so I have to import it after I make it json as MySQL has troubles importing that CSV.

  2. create new columns (doesn’t have a unique ID, and have to add 2 custom fields)

  3. join 2 tables

  4. check against the „old” version what are new entries and import from the new one in old

So far, I am not able to pass by step 2 - creating unique IDs for the tables. I can perform basic queries on it but whenever I try something more complicated mysql crushes with different errors.

I missed one question. The operation has to be done every 30-60 days as they publish data every time in bulk, not only new records between publishing dates, so I had to process beforehand. We don’t have any control on that.

1 Like

thank you for the extra info. so this is a production/live server you are cloning.

I do not think the size is the issue here.
are you sure that step 1 to 2 is complete? and is reversible.

adding a unique field should be a straightforward step but i would do that alone first.

also do not understand the “no control” - because adding a unique id should not be a difficult step it should only impact the query/views/joins/foreign keys used.

By no control I meant over the csv and its publishing schedule.
Step one complete (takes a while, but completes). But when going to step 2 crushes after a while, most likely over a connection timeout, which I can’t increase over the standard 3600. I have tried to 0 it, like in php, but no change. So, like that Murphy law, I want to try with a large „hammer” so I am looking for some specs for a machine that can handle that amount of data within mysql limits. Would I need 2 processors, a procesor with more cores? More RAM?

Hm, either mysql has a problem, or mysqlworkbench. I had a windows machine and run the database with SQLExpress and everything went smooth. Took a while to process, ended with a almost 5 gb sql of what I want, but it worked :slight_smile:

What the heck? That’s tiny! A 2GB system could hold the whole database in RAM. It’s literally impossible to have performance problems with a 1GB database on modern hardware.

Ah. There it is (probably). You’re trying to do this over a network connection from a local desktop app to a remote MySQL server. Push the data to your server, and import it there.

Nope, all things done on the respective machine. After this experience my conclusion is that mysql do have some problems with CSV import, couldn’t import it even in the terminal, but on Mac it accepted as json, the other had problems with the json. SQLServer also throws some error during import but it has a better management and in the end manage to correct them (mostly were inconsistencies in the CSV, but also with the column size/type). MySQLWorkbench seems rather useless. Is MariaDB any better with csvs?

It would be good to identify this error as it maybe the route cause

1 Like

Don’t think so. There were 2 types of errors: incorect date format but mysql got everything as text, and some null in columns I checked to not allow nulls (BTW, SQL Server is more flexible on imports).

MySQL doesn’t do anything with CSV. If there’s a problem, it is with whatever tool is converting the CSV to SQL that can be used by MySQL.

The myphpadmin app can import a csv file, maybe see if you have any luck with that

as i asked above, whatever tool you are using to make that first step is your problem if you cannot reverse it without errors.
i.e. convert the CSV file back and end up with identical data.

I still do not really understand why you are doing the conversion.

I was doing the conversion because MySQL is not able to import that csv as csv regardless the method. The original csv is about 600MB, the JSON was 2 GB :slight_smile:
In MS SQL the import runs ok, no need to convert. SO basically, from now on I will import the data using SQLServerExpress, process it there and then import the processed data to mysql.

as indicated above by @stefan1959 I would have used the myphpadmin app (also can export a csv file)
goodness knows what MS do to the database during their convertion :frowning:

I don’t think I ever used phpmyadmin. Only MySql Workbench or terminal. In the past i have worked extensively with MSSql, I even hold a MCP on the 2005 version, it’s a more mature product which so far Microsoft wasn’t able to ruin :joy:.

so, i assume you managed to reverse the conversion without any errors to the db :man_shrugging:

I have no MCP qualification. just over 35 years working with various databases to companies from international incs to small local consultancy but i can get rusty and out-of-date :man_running: :old_man:

This topic was automatically closed 8 days after the last reply. New replies are no longer allowed.