Android Tutorial [B4X] Database modelling

Erel · Apr 4, 2018

(Edited the title. The prefix must be [B4X] to be recognized as a cross platform thread.)

KMatle · Apr 5, 2018

Erel made a video about databases with code examples. Like all videos: Absolutely "must see".

keirS · Apr 5, 2018

KMatle said:
Size doesn't matter & most common mistakes

With a good design, size doesn't matter. If you have a db with 1 bilion entries, the selection of data is as fast as if you have with 100 entries. A good query takes 0.02 secs or less. If it's 2.4 secs (which is over a 100 times slower) there IS an error in your db design, query or even a problem in your code flow.

This isn't correct. Query speed is dependent on the indexes being able to be held in memory on the DB server. Doesn't matter how good your DB design is you will encounter significant performance bottlenecks if the DB engine has to page the index to disk.

Plenty of good queries take longer than 0.02 seconds. They certainly do if you are performing calculations on 1 billion records.

KMatle said:
Why orderitems AND items?

Items is the catalog of the items you CAN order. Orderitems are the items the customer actually orders. Imagine you change a price later. With this completed orders are "safe" as they represent the finished order as it was at the time of ordering.

In customers we have 3 customers:

CID Name
1 Bill Miller
2 Sam Smith
3 Caroline James

As you see every customer has it's uniquey ID. If we address "Sam Smith" we will use ID 2 in our app. The name is not important anymore (except to display it). Benefit, too: We save a lot of space as integers mostly need less space than a full name.

In items:

IID Itemname Price
1 Cake 10
2 Teddybear 15
3 Milk 1.5

Same here. Every item has an unique ID, too. Like in customers we only use the ID from here.

And what if I need to change the price from a specific date on?

Two thoughts here:

- the price for previous orders must be kept
- the new price is valid in the future

Just add to columns like validfrom and voidfrom. Insert a new item (same name) and by setting the dates in both you can easily set when the old is void and the new one is valid

IID Itemname Price validfrom voidfrom
1 Cake 10 2018-01-01 2018-04-04
4 Cake 12 2018-01-04 9999-12-31
2 Teddybear 15 2018-01-01 9999-12-31
3 Milk 1.5 2018-01-01 9999-12-31

To display the item list use

"Select * FROM items WHERE validfrom <= 'today' and voidfrom > 'today'"

which only displays ID 4 for new orders from April 4th and 1 before. Benefit: You can change a price for the future (like from January 1st, 2019).

So if Sam orders milk:

orders

OID CID
2233 = New Order 2 = Sam

By inserting a new row in orders the OrderId (OID) will be increased and set automatically. Customer is Sam with his ID 2 (see the customer's table)

orderitems

IOID OID IID Price
1 2233 = which order 3 = Milk 1.5

In words: Customer 2 has placed an order with an id 2233 and this order has one item with itemorderid 3 (= milk). If you add a second item:

IOID OID IID Price
1 2233 = which order 3 = Milk 1.5
2 2233 = same order 4 = Cake 12

Note: All data (like prices) in orderitems are frozen as it represents the order at the time the order was placed. This may not be important for today but you need to keep the data for years and the prices here may not change because it is a historic view (finished order).

That is an absolutely awful design for what you are trying to achieve. You are using multiple unique ID's in the items table to represent the same item. That's a big no no in DB schema design.

Instead you should have a separate itemprices table.

B4X:

PID     IID   Price   ValidFrom         VoidFrom
1          1      10       2018-01-01     2018-04-04
10         1      12       2018-01-04     9999-12-31

KMatle · Apr 5, 2018

keirS said:
That is an absolutely awful design

Denormalisation isn't awful. If it's fast and meets the needs, it's ok. Instead of beeing "rude", it was more helful if you expand this thread with useful design ideas or an own example. My example is just an example.

Do not take the 0.02 secs too mathematic. It just show that design is important. However...

keirS said:
This isn't correct. Query speed is dependent on the indexes being able to be held in memory on the DB server. Doesn't matter how good your DB design is you will encounter significant performance bottlenecks if the DB engine has to page the index to disk.

Yes, it is true. Size does not really matter if the design is good. At work we have half a billion rows and the speed is like I mentioned. If it's not, it's a design mistake. And hey, everyone is allowed to split data into several tables (= denormalisation) to come to 0.02 secs

I have a B4J app using SQlite 3 million rows in ONE table and about 15 million in ONE another (plus some additional tables). Guess how long a query takes... 12 GB RAM on my PC is enough to speed it up. SQLite is incredible fast

keirS said:
Plenty of good queries take longer than 0.02 seconds

Sure. If it's 0.2 secs, it's ok, if it's 1 sec, maybe, but longer means

- the design isn't optimal (no, it isn't)
- the design is ok, but you are doing some kind of a batch processing which is ok, but... I'm talking about a FAST online system
- you are doing "special" queries which are "outside the main design" (which is ok, too)

keirS · Apr 5, 2018

KMatle said:
Denormalisation isn't awful. If it's fast and meets the needs, it's ok. Instead of beeing "rude", it was more helful if you expand this thread with useful design ideas or an own example. My example is just an example.

So my design was not useful? It's a far more normal approach to schema design than yours. Ask a DBA which one they would prefer and I think most of them would opt for my solution. If you are going to write a tutorial about database modeling don't you think it is a good idea to present something which is a normalized design rather than some wacky denormalised solution?

Ed Brown · Apr 9, 2018

KMatle said:
Denormalisation isn't awful. If it's fast and meets the needs, it's ok.

Completely agree. Normalisation of a database was the preferred method way back when disk space was expensive. Normalisation is still being used today in a lot of databases.

It's not necessary to denormalise the entire database but, denormalising the data where speed and performance is required makes a lot of sense. It should be noted that although denormalising has a lot of performance benefits it does comes with the cost of increased storage.

keirS · Apr 9, 2018

Ed Brown said:
Completely agree. Normalisation of a database was the preferred method way back when disk space was expensive. Normalisation is still being used today in a lot of databases.

It's not necessary to denormalise the entire database but, denormalising the data where speed and performance is required makes a lot of sense. It should be noted that although denormalising has a lot of performance benefits it does comes with the cost of increased storage.

Normalization is used for easier maintenance of data consistency and simpler object relational mapping. Highly normalized databases better represent an object orientated architecture.

Ed Brown · Apr 10, 2018

keirS said:
Normalization is used for easier maintenance of data consistency and simpler object relational mapping. Highly normalized databases better represent an object orientated architecture.

A normalised database has nothing to do with object relational mapping or even object orientated architecture. Normalised databases have been around long before either of those concepts were even a thing.

Normalisation is the process of reducing data storage and improving data integrity but it does not perform well with large datasets. Denormalising a database improves efficiency of data retrieval for both large and small datasets and allows for higher throughput of queries. If Google, Twitter, Facebook etc etc used normalised databases for their searches and storage of user information then they would be terrible sites and services to use as they would be slow to respond to the query demands those services get now.

On the topic of integrity, denormalised databases offer just as much integrity providing the design is good and applies to both normalised and denormalised databases. The cost of disk space today is allowing companies like Google and Facebook etc to have huge datasets and I can guarantee that they won't be normalised databases.
Facebook, as an example, created their own database and later open-sourced it - it's called Cassandra and it's not a normalised database.

Erel · Apr 10, 2018

Like anything related to performance optimizations, you need to start with the design that makes the most sense for your requirements and is easy to maintain. In 99% of the cases such design will also have good enough performance.
In most cases normalized databases are easier to maintain and to be kept consistent.

If the same data is stored in multiple tables then any update must carefully update all tables.

keirS · Apr 10, 2018

Ed Brown said:
A normalised database has nothing to do with object relational mapping or even object orientated architecture. Normalised databases have been around long before either of those concepts were even a thing.

A bit of a history lesson:

First fully OOP language: Smalltalk in 1972
SQL developed in 1974 by IBM
Biggest investors in the propagation of Smalltalk as a development environment: IBM

The concepts of OPP and an RDBMS were both formulated at roughly the same time in the late 1960's and early 1970's.

Ed Brown said:
Normalisation is the process of reducing data storage and improving data integrity but it does not perform well with large datasets. Denormalising a database improves efficiency of data retrieval for both large and small datasets and allows for higher throughput of queries. If Google, Twitter, Facebook etc etc used normalised databases for their searches and storage of user information then they would be terrible sites and services to use as they would be slow to respond to the query demands those services get now. On the topic of integrity, denormalised databases offer just as much integrity providing the design is good and applies to both normalised and denormalised databases. The cost of disk space today is allowing companies like Google and Facebook etc to have huge datasets and I can guarantee that they won't be normalised databases.

The cost of memory has also come down. That means I can take a 40gb database and throw it on a dedicated MySQL / Maria DB server with 128gb ram and SAS SSD's and configure the InnoDB engine to use 80% of the ram which would mean the whole DB can be cached in memory for not a lot of money. I suspect most B4X developers will not be dealing with multi terabyte databases but I could be wrong.

Ed Brown said:
Facebook, as an example, created their own database and later open-sourced it - it's called Cassandra and it's not a normalised database.

Cassandra is not an RDBMS so I really can't see the relevance mentioning this as the concepts of designing a schema in Cassandra are different to the concepts designing a schema in an RDBMS.

Ed Brown · Apr 10, 2018

@KMatle, my apologies for the hijacking of this thread. @keirS and I should debate this outside of the thread.

Android Tutorial [B4X] Database modelling

Erel

B4X founder

KMatle

Expert

keirS

Well-Known Member

KMatle

Expert

keirS

Well-Known Member

Ed Brown

Active Member

keirS

Well-Known Member

Ed Brown

Active Member

Erel

B4X founder

keirS

Well-Known Member

Ed Brown

Active Member

Similar Threads