NoSQL Solutions - When to Use Them?

June 14, 2014 / category: Architecture / 0 comments

For the purpose of one of the IT events I'm attending, I created a presentation containing NoSQL solutions overview. I also made a deliberation on whether they are worth our attention or not. Enjoy!

Transcription:

My eBook: “Memoirs of a Software Team Leader”
Read more »


Agenda

  1. Introduction to NoSQL
  2. Taxonomy
  3. Representative solutions
  4. When to use them?

1. Introduction

Origin

  • the “NoSQL” term
    • late 90s
  • relational database
  • June 2009, SF
  • NoSQL? Not Only SQL?
  • polyglot persistence

What’s wrong with RDBMSs?

  • rigid schema
  • schema migration
  • moderate performance
  • unnatural data modelling
  • object-relational mapping
  • aggregates
  • clustering support

Does NoSQL shine?

  • easier schema migration
    • or no schema at all
  • performance
  • relaxed consistency
  • natural modelling
  • clustering

No free lunch

  • migration is hidden
  • CAP theorem
    • Consistency
    • Availability
    • Partition tolerance
    • Pick two!
  • ACID, BASE

No free lunch

  • Basically Available
  • Soft state
  • Eventually consistent
  • ?

2. Taxonomy

Families

  • key-value
  • document
  • columnar
  • graph

3. Representative solutions

Key-value

Redis

  • ≫ a key-value store
  • sky/RAM is the limit
  • ≫ a memcached replacement
  • really fast
  • tens of thousands of operations/s

Redis

  • lack of clustering support
  • master-slave replication, though
  • a plethora of client libraries

Data structures

  • hashes
  • lists
  • (sorted) sets

Additional capabilities

  • HyperLogLog
  • publish/subscribe
  • transactions
    • but…
  • persistence
    • none…fsync at every query

Where to take keys from?

Obvious:

  • e-mail
    • uniqueness
  • date
    • an event takes place once a day

Need to generate them:

  • UUID
  • Snowflake
    • retired
  • …?

Reminds a scalable RDBMS

  • queried by key only
  • key-based access ⇒ easier caching
  • no relationships, no joins

Applications

  • wherever high performance is required
  • data structures
    • easier to describe than using SQL
  • features

Document

mongoDB

  • JSON
  • JavaScript querying
  • Map-reduce
  • clustering

Documents

  • no migrations required
  • nice querying
  • aggregates

Map-reduce

Distribution

  • replica set
  • sharding

Performance

  • "humongous", but…
  • not as fast as advertised
  • eats up all resources
  • indexes required
  • global write lock

Problems

  • a honeymoon and then…
  • too many promises made

Applications

  • uhmm…
  • small data sets?
  • no performance requirements?
  • Map-reduce analytics

Columnar

Cassandra

  • big data
  • data analytics
  • fully distributed

Distribution

  • peer-to-peer cluster
  • replication
  • scales well
  • no SPOF

Applications

  • analytics
  • time series
  • column scanning
  • when almost real-time is enough

Graph

Neo4j

  • natural modelling
  • simple querying
    • Cypher, Gremlin
  • built-in REST API
  • not that performant

Graphs

  • relations
  • friends of friends of people who like…

Distribution

  • master + slaves
  • master is not a master
  • ZooKeeper
    • not anymore

Applications

  • |data| ≤ one instance
  • ACID required
  • less round-trips
    • procedure-like
  • no massive updates

Applications

  • recommendations
  • fraud detection

4. When to use them?

Common problems

  • lack of knowledge and experience
  • investment
  • not as good as advertised
  • possible failure
  • limited capabilities

Should you use them?

YES, but:

  • Only if you really can’t solve your problems right now.
  • Don’t try for trying’s sake.
  • Cost-effective?
  • Expect the unexpected.
    • A honeymoon, remember?
  • Preparations.
  • Admit the failure.

Thank you!

Comments

There are no comments yet / Submit your comment

You can use Markdown in your comments if you wish. Examples:

*emphasis*
emphasis
**strong**
strong
`inline code`
inline code
[My blog](http://lukaszwrobel.pl)
My blog
# use 4 spaces to indent
# a block of code
    def my_method(x)
      x = x + 1
    end
def my_method(x)
  x = x + 1
end

* First.
* Second.
  • First.
  • Second.

> This is a citation.
> Even more citation.

I don't agree with you.

This is a citation. Even more citation.

I don't agree with you.


Submit your comment

(required)

(optional)

(required, Markdown supported)


Preview:

My eBook: “Memoirs of a Software Team Leader”

Read more »