You are on page 1of 33

up Unless you are able to come up with a structure on your own you will never truly

vote5do be able to understand the concepts behind the application you are trying to
wn voteaccepted
create. I recommend taking a day or two just to plan out your project.

Worry about design last. I would love to help but like others here, we have our
own projects to handle :P

Best of luck!!

Edit:
Ill take a second to explain how I do this. I code primarily in PHP/MySQL.

Step 1: Think of all the things you would like your web site to accomplish.

Step 2: Think of the information that may need to be stored. Plan out your
databases according to this. List the fields, and then move on to the next step.
e.g. Job Listings Database , User Database etc.
Step 3: Think of the pages you will need. e.g Employee Login, Employer
Login, Submit Resume etc. and think about how the databases will send their
information to these pages. Refine your databases in this step, make sure you
have the correct fields to accomplish the desired tasks.
Step 4: Think of the file structure you will use. Many people follow a traditional
MVC Format. I am working on my own right now, it makes things much easier
and far more organized. Learn more about MVC
Step 5: Sign up for SO! The best thing you can have is a place to go and gather
ideas from people. Sometimes you may get a mental block!

Again, best of luck!

Chris
shareimprove this answer edited Aug 23 '09 at 22:38 answered Aug 23 '09 at 22:28

Chris Bier
7,05694787
You do make a great point. Your advice is appreciated. Frank Aug 23 '09 at 22:30

I agree completely. Simply coming up with the schema will do nothing for you. Once your site is up you'll need to look
into optimisation, potential join tables, (re)writing stored procedures, benchmarking and lots of other tedious jobs in
order to make your site bearable for the user. I'd recommend taking at least a week to get your head around everything
you need before you start and working from there. If you plan for your job search site to be functional you'll need far
more than a database structure. Mike B Aug 23 '09 at 22:38
up vote3down vote just start developing job search/resume
builder site and came up with this database
design. need to refine it but hope this might
help
you
would suggest starting with a good set of requirements and then begin identifying objects within the
application, properties of those objects, and how those objects relate to each other. These objects
could be users, companies, resumes, job postings, etc.. You can then take this information and begin
drafting an Entity Relationship (ER) diagram to depict these objects and relationships.

You could also write out some use cases to help you identify how the objects fit into various
workflows (i.e. employer registers, posts a job, applicant registers, submits resume, searches jobs,
applies for the job that the employer registered). You will probably uncover additional
objects/properties/relationships as well.

At this point, you can begin prototyping interfaces on paper (registration screens, search screens,
etc.).

It's doubtful you will get all of the requirements/design down on the first try (which is deemed the
waterfall approach). Many developers have found that a more iterative or "Agile" approach works
better where you attempt to deliver a minimal solution at first and then build on that in small
incremements that are reviewed regularly by stakeholders.

In my personal experience, I like to get as much defined up front as possible without writing a book
about it. Then I like to start prototyping and building on those prototypes until we eventually have a
solution that meets the needs of the person/group requesting the solution.

shareimprove this answer answered Aug 23 '09 at 22:42

Mayo
6,39542974
add a comment
up vote0down vote Assuming that you'll be using something along
the lines of PHP to write your website I would
highly recommend the book Build your own
Database-driven website using PHP and MySQL.
The book walks you through creating a simplistic
joke website, from writing the PHP code to
coming up with a database schema to match your
requirements.
Eventually, requirements are going to be what
drives your design. It's easy to say that you're
going to create a "Job Search site" but what do
you really want it to do? What does the user want
from your website? What inputs and outputs will
each part of your site use? How will businesses
interact with your site? Who will moderate this?

Unlike the other comments I wouldn't


recommend getting too formal when you're
dealing with a tutorial site to help you learn. At
the bare minimum you need to understand exactly
why you're doing this and why everything is as it
is. This isn't an exercise in Project Management
or Software Design Methodologies, this is an
exercise in you learning basic Web Development
and Database Management.

If you want to get practising on your own


computer download a copy of XAMPP and start
writing a couple of PHP scripts, using
phpMyAdmin as a means to access your
database. Given a week of working through
examples on the Internet and through connecting,
reading and writing to a database through PHP
you'll learn to appreciate what a database truly
does for you. There's a reason Database
Administrators get paid so much for the work
they do!
If you're looking to write a commercially viable
job search website I would recommend that you
read up on some database theory as well (Google
database theory lecture notes and you'll find a
plethora of resources). A database is for life, not
just for Christmas, and you'll need to keep that
database running smoothly if you want your
website to run without any hitches.

Good luck with your job search website!

shareimprove this answer

add a comment
up Start with brainstorming what data you will need to store for the project. This could
vote0down include:
vote
users
jobs
job categories
companies
Of course any actual web app would end up with quite a few more tables then this, but
it's a start. I am just making this up. If you have specific functionality you want to
include, or extra business logic then you should think about that now.
So starting with those three, we might create tables like this:
users
==============
user_id (pk)
first_name
last_name
email_address
password
company (fk)

jobs
==============
job_id (pk)
user_id (fk)
title
description

job_categories
==============
job_category_id (pk)
name

companies
==============
company_id (pk)
name
street_address
country
province
postal_code
phone_number
website
pk = Primary key. Must be unique. An example is the user_id, each user in the
system will have a unique identifier.
fk - Foreign key. An example is the user_id in the 'jobs' table. Say you have
user '42', and he adds a job, you use his user_id as a foreign key so you
can relate that user to that job posting
Depending on what type of database you use you man need a job_to_categories table
to store the relationship between jobs and categories. You will also need to decide what
data types to use for each table field. For ids I recommend unsigned integers. A 'text'
type would work well for the job description. The rest could probably use the 'varchar'
type. Since I don't know what type of database you're going to use I won't go into
specifics.

Database design is a big topic that really can't be glossed over in a stackoverflow
answer, or by reading a website or two.

As far as layout and design of the site, that is very subjective.


http://stackoverflow.com/questions/621884/database-development-mistakes-made-
by-application-developers?rq=1

Sign up

What should every developer know about databases?


[closed]
Ask Question

up
vote195 Whether we like it or not, many if not most of us developers either regularly work with
down databases or may have to work with one someday. And considering the amount of
vote favorite

misuse and abuse in the wild, and the volume of database-related questions that come
224
up every day, it's fair to say that there are certain concepts that developers should know
- even if they don't design or work with databases today. So:

What are the important concepts that developers and


other software professionals ought to know about
databases?

Guidelines for Responses:

Keep your list short.


One concept per answer is best.

Be specific.
"Data modelling" may be an important skill, but what does that mean precisely?

Explain your rationale.


Why is your concept important? Don't just say "use indexes." Don't fall into "best
practices." Convince your audience to go learn more.

Upvote answers you agree with.


Read other people's answers first. One high-ranked answer is a more effective
statement than two low-ranked ones. If you have more to add, either add a comment or
reference the original.

Don't downvote something just because it doesn't apply to you personally.


We all work in different domains. The objective here is to provide direction for
database novices to gain a well-founded, well-rounded understanding of database
design and database-driven development, not to compete for the title of most-important.

database language-agnostic database-design

shareimprove this question edited Dec 31 '09 at 2:02 community wiki


3 revs
Aaronaught

closed as not constructive by Oded Apr 16 '13 at 11:46

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by
facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion.
If you feel that this question can be improved and possibly reopened, visit the help center for guidance.If this
question can be reworded to fit the rules in the help center, please edit the question.

1 Why vote to close this?? It's a Community Wikia and therefore appropriate.
5 Stratton Dec 30 '09 at 18:09

5 I will vote to reopen if it gets closed... I would also like to see a list of those things
that DBAs should (but do not) know about OOP and application/Systems Software
design.. Charles Bretana Dec 30 '09 at 18:11

7 @gnovice: The word "subjective" in that context is referring to questions that are
entirely a matter of opinion. "What do you think of Joe Celko's book?" - that's a
subjective question. This question is soliciting objective information, it just so
happens that there is no single "right" answer. I think it's important to take a step
back and ask, "is this just idle banter, or is it useful for some developers?" My two
cents anyway - it's not like I'm earning rep points for this. :-) Aaronaught
18:32

6 Personally, I hate these questions. They almost always amount to piles of personal
opinions, light on usable information and heavy on subjective declarations. But I'm
not willing to close it for that reason alone; it could be half-way decent, Aaron, if you
set some guidelines for responses: single-topic answers (what should you know and
why should you know it), no duplicates, up-vote what you agree with... and most
importantly, move your own opinions into answers that demonstrate this. As it
stands, this reads like a blog post, or forum discussion, neither of which have any
business on SO. Shog9 Dec 30 '09 at 23:35

4 I find this rather interesting: "It's a Community Wiki and therefore appropriate." How
on earth can a CW make it appropriate? Either a question is appropriate or not, and
think this question is way to subjective to be helpful if someone is looking for an
answer. It might be interesting, but that's not the only characteristic a question mus
have. Georg Schlly Dec 30 '09 at 23:40

show 21 more comments


31 Answers

activeoldest votes

1 2 next

up
vote97 The very first thing developers should know about databases is this: what are databases
down for? Not how do they work, nor how do you build one, nor even how do you write code
vote to retrieve or update the data in a database. But what are they for?
+50
Unfortunately, the answer to this one is a moving target. In the heydey of databases,
the 1970s through the early 1990s, databases were for the sharing of data. If you
were using a database, and you weren't sharing data you were either involved in an
academic project or you were wasting resources, including yourself. Setting up a
database and taming a DBMS were such monumental tasks that the payback, in terms of
data exploited multiple times, had to be huge to match the investment.

Over the last 15 years, databases have come to be used for storing the persistent
data associated with just one application. Building a database for MySQL, or Access,
or SQL Server has become so routine that databases have become almost a routine part
of an ordinary application. Sometimes, that initial limited mission gets pushed upward
by mission creep, as the real value of the data becomes apparent. Unfortunately,
databases that were designed with a single purpose in mind often fail dramatically when
they begin to be pushed into a role that's enterprise wide and mission critical.

The second thing developers need to learn about databases is the whole data centric
view of the world. The data centric world view is more different from the process centric
world view than anything most developers have ever learned. Compared to this gap, the
gap between structured programming and object oriented programming is relatively
small.

The third thing developers need to learn, at least in an overview, is data modeling,
including conceptual data modeling, logical data modeling, and physical data modeling.

Conceptual data modeling is really requirements analysis from a data centric point of
view.

Logical data modeling is generally the application of a specific data model to the
requirements discovered in conceptual data modeling. The relational model is used far
more than any other specific model, and developers need to learn the relational model
for sure. Designing a powerful and relevant relational model for a nontrivial requirement
is not a trivial task. You can't build good SQL tables if you misunderstand the relational
model.

Physical data modeling is generally DBMS specific, and doesn't need to be learned in
much detail, unless the developer is also the database builder or the DBA. What
developers do need to understand is the extent to which physical database design can be
separated from logical database design, and the extent to which producing a high speed
database can be accomplished just by tweaking the physical design.

The next thing developers need to learn is that while speed (performance) is
important, other measures of design goodness are even more important, such as the
ability to revise and extend the scope of the database down the road, or simplicity of
programming.

Finally, anybody who messes with databases needs to understand that the value of data
often outlasts the system that captured it.

Whew!

shareimprove this answer edited Oct 30 '10 at 19:28 community wiki

5 revs, 3 users 75%


Walter Mitty

Very well written! And the historical perspective is great for people who weren't doing
database work at that time (i.e. me). Aaronaught Dec 30 '09 at 21:07

5 Nicely written. And I think your last point is ignored far too often by people trying to
'just get it done'. DaveEDec 30 '09 at 21:30

1 There's a connection between what I wrote and topics such as Explain Plan, Indexing,
and Data Normalization. I'd love to discuss that connection in greater depth in some
sort of discussion forum. SO is not such a forum. Walter Mitty Jan 1 '10 at 14:15

1 If you found reading this monster dautning, imagine what it felt like to write it! I didn'
set out to write an essay. Once I got started, it just seemed to flow. Whoever added
the bolding really helped the readers, IMO. Walter Mitty Feb 18 '10 at 14:02

3 @Walter You provided explanations for all of your points except for this one: "The
second thing developers need to learn about databases is the whole data centric view
of the world. The data centric world view is more different from the process centric
world view than anything most developers have ever learned. Compared to this gap,
the gap between structured programming and object oriented programming is
relatively small." Could you elaborate upon this? You stated that the gap is big, but I
guess I'd like to really understand the data-centric view and how it's decoupled from
the process view. jedd.ahyoung Apr 5 '11 at 20:55

show 1 more comment

up
vote68 Good question. The following are some thoughts in no particular order:
down
vote 1. Normalization, to at least the second normal form, is essential.

2. Referential integrity is also essential, with proper cascading delete and update
considerations.

3. Good and proper use of check constraints. Let the database do as much work as
possible.

4. Don't scatter business logic in both the database and middle tier code. Pick one
or the other, preferably in middle tier code.

5. Decide on a consistent approach for primary keys and clustered keys.

6. Don't over index. Choose your indexes wisely.

7. Consistent table and column naming. Pick a standard and stick to it.

8. Limit the number of columns in the database that will accept null values.

9. Don't get carried away with triggers. They have their use but can complicate
things in a hurry.

10. Be careful with UDFs. They are great but can cause performance problems when
you're not aware how often they might get called in a query.

11. Get Celko's book on database design. The man is arrogant but knows his stuff.

shareimprove this answer edited Jul 1 '14 at 18:28 community wiki

4 revs, 4 users 76%


Randy Minder
1 care to elaborate on item 4. This a topic that has always intrigued me.
'09 at 20:05

9 @David: I've always preferred to put it in both places. That way you're protected
against bugs as well as user error. There's no reason to make every column nullable,
or to allow values outside the range 1-12 to be inserted into a Month column. Complex
business rules are, of course, another story. Aaronaught Dec 30 '09 at 21:29

1 @Brad - Most of our applications at work were done well before solid programming
processes were put into place. Therefore, we've got business logic scattered
everywhere. Some of it's in the UI, some in the middle tier and some in the database.
It's a mess. IMO, business logic belongs in the middle tier. Randy Minder
23:45

2 @David - If it's an absolute certainty that database modifications will only occur in
applications then you might be right. However, this is probably pretty rare. Since
users will likely enter data directly into the database, it's good practice to put
validation in the database as well. Besides, some types of validation are simply more
efficiently done in the database. Randy Minder Dec 30 '09 at 23:48

1 Point #8 is indeed important. How to get the column types right in general, is a very
important thing to know. Chris Vest Feb 11 '10 at 13:45

show 13 more comments

up
vote19 First, developers need to understand that there is something to know about databases.
down They're not just magic devices where you put in the SQL and get out result sets, but
vote rather very complicated pieces of software with their own logic and quirks.

Second, that there are different database setups for different purposes. You do not want a
developer making historical reports off an on-line transactional database if there's a data
warehouse available.

Third, developers need to understand basic SQL, including joins.

Past this, it depends on how closely the developers are involved. I've worked in jobs
where I was developer and de facto DBA, where the DBAs were just down the aisle, and
where the DBAs are off in their own area. (I dislike the third.) Assuming the developers
are involved in database design:

They need to understand basic normalization, at least the first three normal forms.
Anything beyond that, get a DBA. For those with any experience with US courtrooms
(and random television shows count here), there's the mnemonic "Depend on the key, the
whole key, and nothing but the key, so help you Codd."

They need to have a clue about indexes, by which I mean they should have some idea
what indexes they need and how they're likely to affect performance. This means not
having useless indices, but not being afraid to add them to assist queries. Anything
further (like the balance) should be left for the DBA.

They need to understand the need for data integrity, and be able to point to where they're
verifying the data and what they're doing if they find problems. This doesn't have to be
in the database (where it will be difficult to issue a meaningful error message for the
user), but has to be somewhere.

They should have the basic knowledge of how to get a plan, and how to read it in
general (at least enough to tell whether the algorithms are efficient or not).

They should know vaguely what a trigger is, what a view is, and that it's possible to
partition pieces of databases. They don't need any sort of details, but they need to know
to ask the DBA about these things.

They should of course know not to meddle with production data, or production code, or
anything like that, and they should know that all source code goes into a VCS.

I've doubtless forgotten something, but the average developer need not be a DBA,
provided there is a real DBA at hand.

shareimprove this answer answered Dec 30 '09 at 20:48 community wiki

David Thornley

add a comment

up
vote18 Basic Indexing
down
vote I'm always shocked to see a table or an entire database with no indexes, or
arbitrary/useless indexes. Even if you're not designing the database and just have to write
some queries, it's still vital to understand, at a minimum:

What's indexed in your database and what's not:


The difference between types of scans, how they're chosen, and
how the way you write a query can influence that choice;
The concept of coverage (why you shouldn't just write SELECT *);
The difference between a clustered and non-clustered index;
Why more/bigger indexes are not necessarily better;
Why you should try to avoid wrapping filter columns in
functions.

Designers should also be aware of common index anti-patterns, for example:

The Access anti-pattern (indexing every column, one by one)


The Catch-All anti-pattern (one massive index on all or most
columns, apparently created under the mistaken impression that it
would speed up every conceivable query involving any of those
columns).

The quality of a database's indexing - and whether or not you take advantage of it with
the queries you write - accounts for by far the most significant chunk of performance. 9
out of 10 questions posted on SO and other forums complaining about poor performance
invariably turn out to be due to poor indexing or a non-sargable expression.

shareimprove this answer answered Dec 31 '09 at 2:13 community wiki

Aaronaught

Can you elaborate on "coverage" ? I can see why SELECT * is not a good habit to get
into, but I don't know the meaning of "coverage" and wonder if it alludes to another
reason to avoid SELECT *. Edmund Jul 9 '10 at 12:11

@Edmund: An index covers a query if all of the output fields are part of the index
(either as indexed columns or INCLUDE columns in SQL Server). If the only available
index for a given query is non-covering, then all of the rows have to be retrieved, one
by-one, which is a very slow operation, and much of the time the query optimizer will
decide that it isn't worth it and perform a full index/table scan instead. That's why you
don't write SELECT * - it virtually guarantees that no index will cover the
query. Aaronaught Jul 9 '10 at 13:00

thanks! Though as a PostgreSQL user I don't need to worry about such things (yet?):
indexes don't contain visibility information so table tuples always need to be scanned
too. In general, though, it looks like a pretty important factor. Edmund Jul 10 '10 at
0:42
@Edmund: PostgreSQL may not have INCLUDE columns (I can't say for sure), but that
doesn't mean you can't put columns you wish to cover in the actual index data. That's
what we had to do back in the SQL Server 2000 days. Coverage still matters no
matter which DBMS you're on. Aaronaught Jul 10 '10 at 3:27

add a comment

up
vote16 Normalization
down
vote It always depresses me to see somebody struggling to write an excessively complicated
query that would have been completely straightforward with a normalized design
("Show me total sales per region.").

If you understand this at the outset and design accordingly, you'll save yourself a lot of
pain later. It's easy to denormalize for performance after you've normalized; it's not so
easy to normalize a database that wasn't designed that way from the start.

At the very least, you should know what 3NF is and how to get there. With most
transactional databases, this is a very good balance between making queries easy to write
and maintaining good performance.

shareimprove this answer answered Dec 31 '09 at 2:08 community wiki

Aaronaught

add a comment

up
How Indexes Work
vote13
down It's probably not the most important, but for sure the most underestimated topic.
vote
The problem with indexing is that SQL tutorials usually don't mention them at all and
that all the toy examples work without any index.

Even more experienced developers can write fairly good (and complex) SQL without
knowing more about indexes than "An index makes the query fast".

That's because SQL databases do a very good job working as black-box:

Tell me what you need (gimme SQL), I'll take care of it.

And that works perfectly to retrieve the correct results. The author of the SQL doesn't
need to know what the system is doing behind the scenes--until everything becomes
sooo slooooow.....

That's when indexing becomes a topic. But that's usually very late and somebody (some
company?) is already suffering from a real problem.

That's why I believe indexing is the No. 1 topic not to forget when working with
databases. Unfortunately, it is very easy to forget it.

Disclaimer

The arguments are borrowed from the preface of my free eBook "Use The Index, Luke".
I am spending quite a lot of my time explaining how indexes work and how to use them
properly.

shareimprove this answer edited Dec 8 '14 at 10:16 community wiki

2 revs, 2 users 98%


Markus Winand

add a comment

up
vote12 I just want to point out an observation - that is that it seems that the majority of
down responses assume database is interchangeable with relational databases. There are also
vote object databases, flat file databases. It is important to asses the needs of the of the
software project at hand. From a programmer perspective the database decision can be
delayed until later. Data modeling on the other hand can be achieved early on and lead to
much success.

I think data modeling is a key component and is a relatively old concept yet it is one that
has been forgotten by many in the software industry. Data modeling, especially
conceptual modeling, can reveal the functional behavior of a system and can be relied on
as a road map for development.

On the other hand, the type of database required can be determined based on many
different factors to include environment, user volume, and available local hardware such
as harddrive space.

shareimprove this answer answered Dec 31 '09 at 5:24 community wiki


FernandoZ

Do you mean like in doing entity-relationship diagrams? crosenblum Jan 7 '10 at 14:51

Yes... did I forget to mention ERDs?:-) FernandoZ Jan 28 '10 at 7:02

+1... But you have to realize you are on SO: the home of plumbers spending their
days fixing the ORM impedance mismatch so all they know, eat and think is not just
relational but "SQL" :) SyntaxT3rr0r Apr 3 '10 at 3:00

add a comment

up
vote11 Avoiding SQL injection and how to secure your database
down
vote shareimprove this answer edited Jan 3 '10 at 14:18 community wiki

2 revs, 2 users 75%


iChaib

add a comment

up
vote8d Every developer should know that this is false: "Profiling a database operation is
own completely different from profiling code."
vote
There is a clear Big-O in the traditional sense. When you do an EXPLAIN PLAN (or the
equivalent) you're seeing the algorithm. Some algorithms involve nested loops and
are O( n ^ 2 ). Other algorithms involve B-tree lookups and are O( n log n ).

This is very, very serious. It's central to understanding why indexes matter. It's central to
understanding the speed-normalization-denormalization tradeoffs. It's central to
understanding why a data warehouse uses a star-schema which is not normalized for
transactional updates.

If you're unclear on the algorithm being used do the following. Stop. Explain the Query
Execution plan. Adjust indexes accordingly.

Also, the corollary: More Indexes are Not Better.

Sometimes an index focused on one operation will slow other operations down.
Depending on the ratio of the two operations, adding an index may have good effects, no
overall impact, or be detrimental to overall performance.
shareimprove this answer edited Dec 30 '09 at 19:33 community wiki

3 revs
S.Lott

I had a feeling that would be taken the wrong way. What I meant by "traditional" was
that you don't really have any control over the algorithms, only the ability to influence
which ones are used. Anyway, I removed that language as I don't want anything
overly controversial in the main post. Aaronaught Dec 30 '09 at 18:16

@Aaron: You do have control over the algorithms. That's what indexes are
for. S.Lott Dec 30 '09 at 18:17

Hmm, so you can change which type of sorting algorithm is used by the DE? What
data structures are used for the index? I'd prefer not to argue over this point, that's
why I took it out, but I stand by the basic idea that you have a lot less control when
working with database as compared to code. Aaronaught Dec 30 '09 at 18:24

@Aaron: Less control does not remove the obligation to actually understand if the
query is *O**( *n ^ 2 ) or *O**( *n log n ) or just **O**( n ). Less control does not
remove the obligation to actually understand what's going on and to find out how to
control it. S.Lott Dec 30 '09 at 18:43

@S.Lott: I think we are on the same side here, as I was suggesting a greater
burden for databases - "You need to know ... [how to] read a query plan". But my edit
seems to have been rolled back, so... I guess it belongs to the community
now. Aaronaught Dec 30 '09 at 19:02

add a comment

up
vote7d I think every developer should understand that databases require a different paradigm.
own
vote When writing a query to get at your data, a set-based approach is needed. Many people
with an interative background struggle with this. And yet, when they embrace it, they can
achieve far better results, even though the solution may not be the one that first presented
itself in their iterative-focussed minds.

shareimprove this answer answered Jan 24 '10 at 6:48 community wiki

Rob Farley
Please clarify what is meant by "set-based" approach Daniel Allen Langdon
at 20:21

1 That you should look at data as being in sets, and considering your problems as
potentially solved by set arithmetic - involving ranking functions where required,
subqueries, aggregates, and so on. Many developers think about what needs to be
done to each row, which is iterative thinking. Rob Farley Sep 28 '10 at 16:30

add a comment

up
vote6d Evolutionary Database Design. http://martinfowler.com/articles/evodb.html
own
vote These agile methodologies make database change process manageable, predictable and
testable.

Developers should know, what it takes to refactor a production database in terms of


version control, continious integration and automated testing.

Evolutionary Database Design process has administrative aspects, for example a column
is to be dropped after some life time period in all databases of this codebase.

At least know, that Database Refactoring concept and methodologies


exist.http://www.agiledata.org/essays/databaseRefactoringCatalog.html

Classification and process description makes it possible to implement tooling for these
refactorings too.

shareimprove this answer answered Dec 31 '09 at 7:49 community wiki

George Polevoy

i love the refactoring concept, but regarding DB the real big issue with it is persistent
data. refactoring DB often involves data migration which in reality is tough, especially
if you aren't allowed any downtime of the system. also rollback isn't trivial. in my view
difficulties in proper/safe rollout + rollback strategies are often showstoppers to
refactor DB as lightweight as application code. itself it often makes sense to refactor
stuff but you always have to outweigh cost/benefits. manuel aldana Dec 31 '09 at 12:49

See also Ambler's 'Refactoring Databases' (amazon.com/Refactoring-Databases-


Evolutionary-Database-Design/). Jonathan Leffler Jan 3 '10 at 14:16
add a comment

up
vote5d From my experience with relational databases, every developer should know:
own
vote - The different data types:

Using the correct type for the correct job will make your DB design more robust, your
queries faster and your life easier.

- Learn about 1xM and MxM:

This is the bread and butter for relational databases. You need to understand one-to-many
and many-to-many relations and apply then when appropriate.

- "K.I.S.S." principle applies to the DB as well:

Simplicity always works best. Provided you have studied how DB work, you will avoid
unnecessary complexity which will lead to maintenance and speed problems.

- Indices:

It's not enough if you know what they are. You need to understand when to used them and
when not to.

also:

Boolean algebra is your friend


Images: Don't store them on the DB. Don't ask why.
Test DELETE with SELECT
shareimprove this answer answered Jul 9 '10 at 12:37 community wiki

Anax

+1 for Images. I'd replace 'Images' with 'BLOBs' though. Agnel Kurian Jul 9 '10 at 12:53

I'm not really sure about the "simplicity" part. The simplest possible database is one
giant table with a bunch of varchar(max) columns. Relational databases should
be normalized, not simplified. Aaronaught Jul 9 '10 at 13:04
Your concerns are covered earlier, in the "data types" part of my post. I was referring
to the (unecessary) use of stored procedures / triggers / cursors and so on.
'10 at 13:10

add a comment

up
vote5d I would like everyone, both DBAs and developer/designer/architects, to better understand
own how to properly model a business domain, and how to map/translate that business domain
vote model into both a normalized database logical model, an optimized physical model, and
an appropriate object oriented class model, each one of which is (can be) different, for
various reasons, and understand when, why, and how they are (or should be) different
from one another.

shareimprove this answer edited Oct 30 '10 at 23:46 community wiki

2 revs, 2 users 50%


Charles Bretana

add a comment

up
vote5d I would say strong basic SQL skills. I've seen a lot of developers so far who know a little
own about databases but are always asking for tips about how to formulate a quite simple
vote query. Queries are not always that easy and simple. You do have to use multiple joins
(inner, left, etc.) when querying a well normalized database.

shareimprove this answer edited Oct 30 '10 at 23:51 community wiki

2 revs, 2 users 50%


MaxiWheat

add a comment

up
vote4d About the following comment to Walter M.'s answer:
own
vote "Very well written! And the historical perspective is great for people who weren't doing
database work at that time (i.e. me)".

The historical perspective is in a certain sense absolutely crucial. "Those who forget
history, are doomed to repeat it.". Cfr XML repeating the hierarchical mistakes of the
past, graph databases repeating the network mistakes of the past, OO systems forcing the
hierarchical model upon users while everybody with even just a tenth of a brain should
know that the hierarchical model is not suitable for general-purpose representation of the
real world, etcetera, etcetera.

As for the question itself:

Every database developer should know that "Relational" is not equal to "SQL". Then they
would understand why they are being let down so abysmally by the DBMS vendors, and
why they should be telling those same vendors to come up with better stuff (e.g. DBMS's
that are truly relational) if they want to go on sucking hilarious amounts of money out of
their customers for such crappy software).

And every database developer should know everything about the relational algebra. Then
there would no longer be a single developer left who had to post these stupid "I don't
know how to do my job and want someone else to do it for me" questions on Stack
Overflow anymore.

shareimprove this answer edited Oct 31 '10 at 0:05 community wiki

2 revs, 2 users 78%


Erwin Smout

I agree that a developer needs to know where SQL and the RDM diverge. Having said
that, judicious use of the RDM can be an invaluable aide to the database designer,
even if the implementation is SQL. Walter Mitty Dec 31 '09 at 13:48

In case you forgot, George Santayana, wrote that classic quote... crosenblum
at 14:52

add a comment

up
vote4d Excellent question. Let's see, first no one should consider querying a datbase who does
own not thoroughly understand joins. That's like driving a car without knowing where the
vote steering wheel and brakes are. You also need to know datatypes and how to choose the
best one.

Another thing that developers should understand is that there are three things you should
have in mind when designing a database:

1. Data integrity - if the data can't be relied on you essentially have no data - this
means do not put required logic in the application as many other sources may touch
the database. Constraints, foreign keys and sometimes triggers are necessary to data
integrity. Don't fail to use them because you don't like them or don't want to be
bothered to understand them.

2. Performance - it is very hard to refactor a poorly performing database and


performance should be considered from the start. There are many ways to do the
same query and some are known to be faster almost always, it is short-sighted not to
learn and use these ways. Read some books on performance tuning before designing
queries or database structures.

3. Security - this data is the life-blood of your company, it also frequently contains
personal information that can be stolen. Learn to protect your data from SQL
injection attacks and fraud and identity theft.

When querying a database, it is easy to get the wrong answer. Make sure you understand
your data model thoroughly. Remember often actual decisions are made based on the data
your query returns. When it is wrong, the wrong business decisions are made. You can kill
a company from bad queries or loose a big customer. Data has meaning, developers often
seem to forget that.

Data almost never goes away, think in terms of storing data over time instead of just how
to get it in today. That database that worked fine when it had a hundred thousand records,
may not be so nice in ten years. Applications rarely last as long as data. This is one reason
why designing for performance is critical.

Your database will probaly need fields that the application doesn't need to see. Things like
GUIDs for replication, date inserted fields. etc. You also may need to store history of
changes and who made them when and be able to restore bad changes from this
storehouse. Think about how you intend to do this before you come ask a web site how to
fix the problem where you forgot to put a where clause on an update and updated the
whole table.

Never develop in a newer version of a database than the production version. Never, never,
never develop directly against a production database.

If you don't have a database administrator, make sure someone is making backups and
knows how to restore them and has tested restoring them.

Database code is code, there is no excuse for not keeping it in source control just like the
rest of your code.

shareimprove this answer edited Oct 31 '10 at 0:11 community wiki


3 revs, 3 users 81%
HLGEM

add a comment

up
vote4d I think a lot of the technical details have been covered here and I don't want to add to
own them. The one thing I want to say is more social than technical, don't fall for the "DBA
vote knowing the best" trap as an application developer.

If you are having performance issues with query take ownership of the problem too. Do
your own research and push for the DBAs to explain what's happening and how their
solutions are addressing the problem.

Come up with your own suggestions too after you have done the research. That is, I try to
find a cooperative solution to the problem rather than leaving database issues to the
DBAs.

shareimprove this answer edited Oct 31 '10 at 0:13 community wiki

2 revs, 2 users 73%


HeretoLearn

good answer. We each have our own area we contribute to every problem or
solution. crosenblum Jan 7 '10 at 14:50

add a comment

up
vote3d Simple respect.
own
vote It's not just a repository
You probably don't know better than the vendor or the DBAs
You won't support it at 3 a.m. with senior managers shouting at
you
shareimprove this answer edited Oct 31 '10 at 0:28 community wiki

2 revs, 2 users 73%


gbn

add a comment
up
vote3d Never insert data with the wrong text encoding.
own
vote Once your database becomes polluted with multiple encodings, the best you can do is
apply some kind combination of heuristics and manual labor.

shareimprove this answer answered Oct 31 '10 at 0:42 community wiki

mikerobi

2 What is the "wrong text encoding" and how does it happen? Gennady Vanin
Oct 31 '10 at 8:02

1 @vgv8, it happens when your client allows users to submit text in any encoding you
want, you blindly store it. Then, when you need to perform some sort of
transformation or analysis, your code breaks, because your application assumes utf-8,
but some idiot added utf-16 data, and your program errors or starts spitting out
gibberish. mikerobi Oct 31 '10 at 21:42

add a comment

up
vote3d For a middle-of-the-road professional developer who uses databases a lot
own (writing/maintaining queries daily or almost daily), I think the expectation should be the
vote same as any other field: You wrote one in college.

Every C++ geek wrote a string class in college. Every graphics geek wrote a raytracer in
college. Every web geek wrote interactive websites (usually before we had "web
frameworks") in college. Every hardware nerd (and even software nerds) built a CPU in
college. Every physician dissected an entire cadaver in college, even if she's only going to
take my blood pressure and tell me my cholesterol is too high today. Why would
databases be any different?

Unfortunately, they do seem different, today, for some reason. People want .NET
programmers to know how strings work in C, but the internals of your RDBMS shouldn't
concern you too much.

It's virtually impossible to get the same level of understanding from just reading about
them, or even working your way down from the top. But if you start at the bottom and
understand each piece, then it's relatively easy to figure out the specifics for your
database. Even things that lots of database geeks can't seem to grok, like when to use a
non-relational database.
Maybe that's a bit strict, especially if you didn't study computer science in college. I'll
tone it down some: You could write one today, completely, from scratch. I don't care if
you know the specifics of how the PostgreSQL query optimizer works, but if you know
enough to write one yourself, it probably won't be too different from what they did. And
you know, it's really not that hard to write a basic one.

shareimprove this answer edited Oct 31 '10 at 12:04 community wiki

2 revs, 2 users 86%


Ken

From the linked Joel article about C strings, doesn't the following snippet of lead to
undefined behavior: char* str = "*Hello!"; str[0] = strlen(str) - 1; str is a string literal
and is general in read only memory. You cannot write to it:? HeretoLearn Jan 1 '10 at
16:02

A professional database expert, fine, but every developer? Ben Aston Jan 11 '10 at
19:42

Ben: Every professional developer who uses databases frequently, yeah. They're
really not that hard, so if you don't know how, it means you've never taken even a
little time to learn how DBs work. Every computer science major I graduated with
designed a CPU and implemented an OS. A database is simpler than either of these,
so if you spend any time using one, I don't see an excuse for not knowing about how
they work. KenJan 11 '10 at 22:43

add a comment

up
vote2d The order of columns in a non-unique index is important.
own
vote The first column should be the column that has the most variability in its content (i.e.
cardinality).

This is to aid SQL Server ability to create useful statistics in how to use the index at
runtime.

shareimprove this answer answered Feb 11 '10 at 13:58 community wiki

Mike D
-1 I not a good idea to follow rules like 'The first column should be the column that has
the most variability in its content'. If one has some basic knowledge of how indexes
work it is simple see how the order matters and that the order of the column should
depend on the way the table will be queried. miracle173 Mar 7 '14 at 12:46

thanks, but if the index was created on 3 fields, on the basis that a specific sql query
will use those 3 fields in its where clause, then, the order can be significant, and the
field with the highest cardinality appearing first\earlier can lead to performance
improvements.... or at least thats what I read in a Microsoft SQL Server performance
tuning book. I tried it out and it appeared to work out better (years ago).
'14 at 13:24

add a comment

up
vote2d Understand the tools that you use to program the database!!!
own
vote I wasted so much time trying to understand why my code was mysteriously failing.

If you're using .NET, for example, you need to know how to properly use the objects in
the System.Data.SqlClient namespace. You need to know how to manage
your SqlConnection objects to make sure they are opened, closed, and when necessary,
disposed properly.

You need to know that when you use a SqlDataReader, it is necessary to close it
separately from your SqlConnection. You need to understand how to keep connections
open when appropriate to how to minimize the number of hits to the database (because
they are relatively expensive in terms of computing time).

shareimprove this answer edited Sep 13 '10 at 19:38 community wiki

2 revs, 2 users 86%


Daniel Allen Langdon

add a comment

up
vote2d Consider Denormalization as a possible angel, not the devil, and also consider NoSQL
own databases as an alternative to relational databases.
vote
Also, I think the Entity-Relation model is a must-know for every developper even if you
don't design databases. It'll let you understand thoroughly what's your database all about.

shareimprove this answer edited Oct 31 '10 at 0:15 community wiki


2 revs, 2 users 74%
iChaib

add a comment

up
vote2d Aside from syntax and conceptual options they employ (such as joins, triggers, and stored
own procedures), one thing that will be critical for every developer employing a database is
vote this:

Know how your engine is going to perform the query you are writing with specificity.

The reason I think this is so important is simply production stability. You should know
how your code performs so you're not stopping all execution in your thread while you
wait for a long function to complete, so why would you not want to know how your query
will affect the database, your program, and perhaps even the server?

This is actually something that has hit my R&D team more times than missing semicolons
or the like. The presumtion is the query will execute quickly because it does on their
development system with only a few thousand rows in the tables. Even if the production
database is the same size, it is more than likely going to be used a lot more, and thus
suffer from other constraints like multiple users accessing it at the same time, or
something going wrong with another query elsewhere, thus delaying the result of this
query.

Even simple things like how joins affect performance of a query are invaluable in
production. There are many features of many database engines that make things easier
conceptually, but may introduce gotchas in performance if not thought of clearly.

Know your database engine execution process and plan for it.

shareimprove this answer edited Oct 31 '10 at 11:53 community wiki

2 revs, 2 users 80%


TodPunk

add a comment

up Basic SQL skills.


vote2d Indexing.
own
vote Deal with different incarnations of DATE/ TIME/ TIMESTAMP.
JDBC driver documentation for the platform you are using.
Deal with binary data types (CLOB, BLOB, etc.)
shareimprove this answer edited Oct 31 '10 at 11:58 community wiki

2 revs, 2 users 50%


JuanZe

add a comment

up
vote1d For some projects, and Object-Oriented model is better.
own
vote For other projects, a Relational model is better.

shareimprove this answer answered Dec 30 '09 at 23:28 community wiki

Mark Lutton

add a comment

up
vote1d The impedance mismatch problem, and know the common deficiencies or ORMs.
own
shareimprove this answer answered Feb 10 '10 at 9:26 community wiki
vote

Muhammad Adel

add a comment

up
vote1d RDBMS Compatibility
own
vote Look if it is needed to run the application in more than one RDBMS. If yes, it might be
necessary to:

avoid RDBMS SQL extensions


eliminate triggers and store procedures
follow strict SQL standards
convert field data types
change transaction isolation levels
Otherwise, these questions should be treated separately and different versions (or
configurations) of the application would be developed.

shareimprove this answer answered Apr 3 '10 at 2:44 community wiki

Juliano

add a comment

up
vote1d Don't depend on the order of rows returned by an SQL query.
own
shareimprove this answer answered Jul 9 '10 at 12:52 community wiki
vote

Agnel Kurian

2 ...unless there's an ORDER BY clause in it? Aaronaught Jul 9 '10 at 13:05

And don't use ORDER BY unnecessarily because it adds load to the SQL server
Allen Langdon Jul 9 '10 at 14:00

add a comment

up
vote1d http://www.reddit.com/r/programming/comments/azdd7/programmers_sit_your_butt_dow
own n_i_need_to_have_a/
vote
shareimprove this answer answered Jul 9 '10 at 13:40
http://developer.android.com/training/basics/data-storage/index.html.

https://pizaini.wordpress.com/2013/06/17/membuat-aplikasi-client-server-menggunakan-android-
php-dan-mysql/

http://www.databasedev.co.uk/data_models.html

https://www.codeproject.com/articles/359654/important-database-designing-rules-which-i-fo#Rule2:-
Breakyourdataintologicalpieces,makelifesimpler

http://database-programmer.blogspot.co.id/2008/01/table-design-patterns-cross-reference.html

You might also like