RFC-DB.txt

$Id: RFC-DB.txt,v 1.1 2008-01-26 16:18:40 clorenz Exp $
----------------------------------------------------------------------------

1. PREFACE
----------

This document should describe the database format of the next generation
of mmusic. Please don't confuse this with gmmusic, since gmmusic is only
the frontend for gmmusic.

We don't know, if this new database scheme will ever be implemented, but
we want to keep this as a kind of mind reference.

This document will cover the structure of the new database as well as
the transitions from the old database to the new one. 

The new database will be completely different from the existing one;
it should be a "clean" structure, without redundancy and without any
unconnected data (like today for example the "authors" field, which has
got no reference at all to a "person" or "artist" table). This approach
hugely blows up the database, but at the end, all data are perfectly
organized and are absolutely consistent, which allows you to efficiently
work with it.

At the end of this document, you can find some mind scratches, notes
and everything else, which is just here, since we don't want to lost it,
but haven't yet found the time to finalize the thoughts.


2. THE NEW DATA STRUCTURES
--------------------------

From various comments, I've learned, that the current data structure,
with the artist-song-recording-media tables is insufficient for anything
but modern pop music. 

Classical songs, which require for example a conductor can't be
handled properly. Or jazz songs, where one song can be performed by 
many perfomers, are not really "new" songs, but should be represented
as incarnation of a real "song".

That leads us now to the question of all questions: What exactly is a song?

(1) Something, that only consists of a name (the name of the song) and a few
    general data, like composer,writer,style,year? If yes, how can he handle
    re-issues of a song, where the style changes? Just think of "Yesterday" of
    the Beatles? I am sure, not all of the more than 1000 versions are of that
    slow style.
    This would mean, that in the end, the recording should get a "recording
    style" (mp3 category like) field, too, which usually equals the "song
    style", but in some cases it could be different.
    In the search methods, you'd have to ensure, that you all combinations
    (AND,OR,NOT) of those two styles can be searched for.

(2) Something, that consists only of the name, composer, year and writer?
    If yes, this would be a kind of "original song data", but all 
    non-original incarnations, like remixes, have to get the additional
    data (new composers/remixers, new year, etc.) somewhere else. But where?
    Do these fields really belong into the "recording" field? If yes, they
    would be redundant, if you have got two recordings of the same
    song.

    What about a new table "incarnation", which is the "middleware" between
    a song and a recording: A song can have multiple incarnations (such as
    remixes, reissues, styles, etc.), and each of those incarnations can have
    multiple true recordings.

    In this case, the "recordings" table would carry nothing, but the
    pure recording data, the quality- and a "comment" field. The "special"
    field should move into the "incarnation" table.

    Open: What about the "recording year"? Where does this field belong
    to? Especially for songs, played on TV? If a song, released in 1998
    is recorded from a TV show in 1999, without any differences (so the
    song *is* the original one), the "incarnation" table would carry the
    year 1998, while the recording year would carry the year 1999.

    This means, that we finally would use three (possibly different) year
    values for one song: One of the (original) song, one of its incarnation
    and one of its "true" recording. Overkill or brilliant? I'd tend to
    the latter ;)

(3) Something, that only consists of the song name? Certainly not.

    
My personal favourite is (2).


3. THE TABLES
-------------

I really have to draw an entity-relationship diagram...

3.1. PERSON_DATA
----------------

Need a better name for this... Maybe "people" ?
This table handles all native data of a true, existant person. However,
there's one thing, this table does *not* carry, and that is (or are) the
name (or names), the person acts in public.

Fields:
	- unique ID
	- date of birth
	- location of birth
	- date of death
	- location of death
	- cause of death
	- comment


3.2. KNOWN_AS
-------------

This table carries a 1:n connection between a true, existant person (in
table "person_data" and it's names, the person is known for. If a person
does not have any pseudonyms, it is a 1:n connection, but many persons
have a birth name and pseudonym.

If you ask, why I won't carry the birth name in the table "person data",
it's just because of redundancy. The table "person_data" will never be
directly referenced.

Maybe, this table could also be called "public".

Fields:
	- id (primary key)
	- name (varchar) 
	  (Problem: How to deal with two non-itentical true persons with
		    the same pseudonym? In this case, the frontend
		    application has to provide the user all neccessiary
		    data (of person_data), so that the *user* can select,
		    which one he'll take.)
	- person_data (foreign key -> person_data(id))


3.3. ARTIST_APPEARANCE
----------------------

This table should track, when and where an artist (e.g. in a band)
appears

Fields:
	- id (primary key)
	- known_as (foreign key to known_as(id))
	- artist (foreign key to artist(id))
	- start_date
	- start_location
	- end_date
	- end_location
	- end_cause
	- remarks

Just an example:

"Elton John" would have the following fields:
       id: 123
       known_as: 456        (dereferences to "Elton John")
       artist: 789	    (dereferences to "Elton John")
       .....

"Simon and Garfunkel" could have the following two rows:
       id: 124
       known as: 457	    (dereferences to "Simon & Garfunkel)
       artist:   365	    (dereferences to "Paul Simon")
       ......

       id: 125
       known_as: 457	    (dereferences to "Simon & Garfunkel)
       artist:	 366	    (dereferences to "Art Garfunkel")
       ....

       

3.4. ARTIST
-----------

This tables' name is a bit misleading.

This table is used to handle bands (consisting of multiple persons) and
individual persons, regardles, whether they're the singers of a band
or the composers or whatever else.

Fields:
	- id primary key (really??)
	- name (cannot be unique, since it is possible, that e.g. two
	        bands with the same name exist. For example, there do exist
		two bands, called "Atlantis")
	- printable_name

The former "consists of" relation is now handles in the ARTIST_APPEARANCE
table.


3.5. SONG
---------

Carries all native song data. We assume, that a "song" consists only of
a name. The rest is all carried by the incarnation of a song. So, the name
"Song" is perhaps misleading...

Fields:
	- id
	- name


3.6. AVAILABLE_SONG_ROLES
-------------------------

This table represents all roles, a "performer" can have in a song

Fields:
- id (need to be hardcoded!)
- role_type (need to be hardcoded!)

Possible values: "performer", "composer", "lyrics". Anything else?


3.7. SONG_ROLE
--------------

1:1:n Connection between a Song, a performer and its roles in that song.
(just think of "Albert Hammond" wrote "Free Electric Band", and 
"Albert Hammond" also sang "Free Electric Band).


Fields:
	- id
	- song (foreign key to incarnation)
	- artist (foreign key to artist) 
	- role (foreign key to available_song_roles)

3.8. DANCE STYLES
-----------------

Fields:
	- id
	- name
	- comment


3.9. GENRE
----------

This table handles all MP3 ID-Tag genre entries

Fields:
	- id
	- name
	- comment

3.10. RELEASE
-------------

This table handles all possible releases of a song, such as
S/A, S/B, LP, .....

Fields:
	- id
	- name
	- comment


3.11. INCARNATION
----------------

We clearly need a better name here :-)

Fields:
	- id
	- song (foreign key to song)
	- publishing_date
	- genre (foreign key to genre)
	- dance (foreign key to dance styles)
	- bpm
	- release (foreign key to release)
	- lyrics
	- keywords (like war, summer, love, rain, baseball etc.)
	- comment

In case, you wonder, there artist and authors are, they are represented
by the "Song_Role" tables. And if you wonder, where the "Title" is,
it is located in the "Song" table.

Disadvantage: Information, which was previously held in a single table is
	      now spread over three and more tables
Advantage:    Full referencial integrity. 



3.12. RECORDING
--------------

Fields:
	- id
	- song_id (foreign key to incarnation)
	- medium  (foreign key to medium)
	- medium_side 
	- medium_position (handles counter *and* track!)
	- length
	- year
	- longplay (bit)
	quality bits (I'd prefer true bit fields, but we need a function,
	  that sums them up to a compareable value)
	special bits (Live, Cover version, ...)
	- remarks
	- digital  

3.13. AVAILABLE_RECORDING_ROLES
------------------------------

Possible values: "engineer","conductor","technican", ....
(see song)

Fields:
	- id
	- role
	- remarks


3.14. RECORDING_ROLE
-------------------
(see song)

Fields:
	- id
	- recording (foreign key to recording)
	- artist (foreign key to artist)
	- role (foreign key to available_recording_roles)
	


3.15. MEDIUM_TYPE
-----------------
Must be as generic as possible. Possible fields:
     - id
     - name
     - description
     - number_of_sides ("2" for Singles and tapes, "1" for CDs)
     - quality value (unique! With that field, a qualitative ordering 
	       of recordings
	       of this medium should be made. A 5:1-recording on a DVD should
	       get the "best" value here, while a recording on an audio tape
	       should get the "worst" value. A select * from medium_type
	       order by quality_value will fill the toolbar of gmmusic
	       in the correct order, just as it's currently the case.)
	       I'd suggest not to use sequential numbers (which leave no
	       gaps for further injections), but steps of ten.


3.16. MEDIUM
------------

     - id
     - verbose_id (for me ;-), not unique, since a LP and a CD can have
       the same verbose_id)
     - artist
     - title
     - medium_type (foreign key for medium_type) 
     - length (for CD, Singles and LPs, this could be the sum length of
       all recordings; for all writeable media, such as ROM and tapes,
       this is the brutto recording length)
     - manufacturer (for tapes and CD-R)
     - label
     - year
     - order no
     - begin_date_a
     - begin_date_b
     - end_date
     - remarks
     - buy_date
     - buy_price
     - discid
     - category
     - track_offsets (CD, ROM only)
     - genre  (ID3 tag)
     - digital (defaults to AAA)
     - burning_date
     - production credits
     - cover art
     - purchase info


4. THE STORED PROCEDURES
------------------------

... will be developed in PL/pgSQL, see 
    http://www.ca.postgresql.org/users-lounge/docs/7.1/programmer/plpgsql.html
    (CL, 18.10.02)

5. TRANSITION FROM OLD DB TO NEW DB
-----------------------------------

5.1. AT FIRST....
-----------------

- The new DB must have a different name than mmusic2 or mmusic3, since
  these two names are already in use. Suggestion: mmusic-v2, but can be
  overridden at install time

- Installation can no longer happen, when you install the RPM, since the
  user is asked for the name of the new DB

- If gmmusic (new) finds no database mmusic-v2, it will create the
  database (with the points given above).

- If gmmusic finds an already existing mmusic (old) database, maybe
  while browsing .gmmusicrc, it will ask the user, whether he will
  do the transition and fill the new DB with the values of the old DB.
  (He will also be told, that the old DB will not be deleted)

- For every upgrade process, the new mmusic backend will (with interaction
  of the gmmusic frontend) do the upgrade. The RPM no longer does any
  upgrading.

5.2. THE OLD MEDIA TABLES
-------------------------

Table.Field old           | Table.Field new
--------------------------+---------------------------
*.id     no longer used	  | medium.id
*.medium_id		  | medium.verbose_id
*.interpret		  | medium.artist
*.titel			  | medium.title
*.remarks		  | medium.remarks
*.label			  | medium.label
*.order_no		  | medium.order_no
*.size			  | medium.length
*.firm			  | medium.manufacturer
*.begin_date		  | medium.begin_date_a
*.begin_b		  | medium.begin_date_b
*.end_date		  | medium.end_date
*.year			  | medium.year
*.buy_date		  | medium.buy_date
*.buy_price		  | medium.buy_price
*.cddb_id		  | medium.discid
*.category		  | medium.category
*.track_offsets		  | medium.track_offsets
*.id3_genre		  | medium.genre
*.digital		  | medium.digital
video_tape.system	  |        obsolete
rom.audio		  |	   obsolete
rom.rewitable		  |	   obsolete
rom.burning_date	  | medium.burning_date
files.magic		  | medium.discid
files.type		  | medium.medium_type
*			  | medium.medium_type 

All unused fields must carry the value "NULL"


5.3. THE OLD SONG TABLE
-----------------------


5.4. THE OLD TABLE STATISTICS
-----------------------------

I think, we don't need this table any more, the data should be retrived
on-the-fly every now and then and the frontend might take care itself, too.


6. UNSORTED THOUGHTS
--------------------

- Soundex !! (CL, 17.5.02)
- Implementaion in Java! 

  mmusic (the backend) should be a SOAP-Server
  jmmusic (one of the clients) should be a SOAP client

  Advantage: Client knows nothing about the database; Client can be
  "small and stupid"; Server can be a centralized big machine.  (CL,18.5.03)

----------------------------------------------------------------------------
vim: tw=78 expandtab
Das Metatechnische Kabinett

RFC-DB.txt