Professional Documents
Culture Documents
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Table of contents
Introduction pgloader, the what? Architecture Main components Parallel Organisation Conguration examples & Usage Current status & TODO
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
ETL
Denition An ETL process data to load into the database from a at le.
1 2 3
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
pgloaders features
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
pgloaders features
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
pgloaders features
PGLoader will: Load CSV data Load pretend-to-be CSV data Continue loading when confronted to errors
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
pgloaders features
PGLoader will: Load CSV data Load pretend-to-be CSV data Continue loading when confronted to errors Apply user dene transformation to data, on the y
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
pgloaders features
PGLoader will: Load CSV data Load pretend-to-be CSV data Continue loading when confronted to errors Apply user dene transformation to data, on the y Optionaly have all your cores participate into processing
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Conguration
We rst parse the conguration, with templating system Example [simple] use_template table filename columns
= = = =
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Loading: le reading
PGLoader supports many input formats, even if they all look like CSV, the rough time is parsing data: Read les one line at a time
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Loading: le reading
PGLoader supports many input formats, even if they all look like CSV, the rough time is parsing data: Read les one line at a time Parse physical lines into logical lines
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Loading: le reading
PGLoader supports many input formats, even if they all look like CSV, the rough time is parsing data: Read les one line at a time Parse physical lines into logical lines Supports several readers
csvreader
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Loading: le reading
PGLoader supports many input formats, even if they all look like CSV, the rough time is parsing data: Read les one line at a time Parse physical lines into logical lines Supports several readers
textreader csvreader
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Loading: le reading
PGLoader supports many input formats, even if they all look like CSV, the rough time is parsing data: Read les one line at a time Parse physical lines into logical lines Supports several readers
textreader csvreader xedreader
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Processing lines
Parsing data is the CPU intensive part of the job. You could even have to guess where lines begin and end.
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Processing lines
Parsing data is the CPU intensive part of the job. You could even have to guess where lines begin and end. Then you add: columns restrictions
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Processing lines
Parsing data is the CPU intensive part of the job. You could even have to guess where lines begin and end. Then you add: columns restrictions columns reordering
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Processing lines
Parsing data is the CPU intensive part of the job. You could even have to guess where lines begin and end. Then you add: columns restrictions columns reordering user dened columns (constants)
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Processing lines
Parsing data is the CPU intensive part of the job. You could even have to guess where lines begin and end. Then you add: columns restrictions columns reordering user dened columns (constants) user dened reformating modules
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
COPYing to PostgreSQL
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
COPYing to PostgreSQL
This is how we do it: python cStringIO buers congurable size (copy every)
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
COPYing to PostgreSQL
This is how we do it: python cStringIO buers congurable size (copy every) using copy expert() when available (CVS)
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
COPYing to PostgreSQL
This is how we do it: python cStringIO buers congurable size (copy every) using copy expert() when available (CVS) dichotomic error search
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
PGLoader will continue processing your input when it contains erroneous data. reject data le
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
PGLoader will continue processing your input when it contains erroneous data. reject data le reject log le, containing error messages
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
PGLoader will continue processing your input when it contains erroneous data. reject data le reject log le, containing error messages errors count in summary
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Error logging
PGLoader will continue processing your input when it contains erroneous data, and will make it so that you know about the failures. log le
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Error logging
PGLoader will continue processing your input when it contains erroneous data, and will make it so that you know about the failures. log le console log level: client min messages
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Error logging
PGLoader will continue processing your input when it contains erroneous data, and will make it so that you know about the failures. log le console log level: client min messages logle log level: log min messages
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Loading is IO bound, not CPU bound, right? for large disks array, not so much
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Loading is IO bound, not CPU bound, right? for large disks array, not so much with complex parsing, not so much
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Loading is IO bound, not CPU bound, right? for large disks array, not so much with complex parsing, not so much with heavy user rewritting, not so much
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Ok... How?
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Ok... How?
mutli-threading is easy to start with in python then you add in dequeues and semaphores (critical sections) and signals
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Ok... How?
mutli-threading is easy to start with in python then you add in dequeues and semaphores (critical sections) and signals Giant Interpreter Lock
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Ok... How?
mutli-threading is easy to start with in python then you add in dequeues and semaphores (critical sections) and signals Giant Interpreter Lock fork() based reimplementation could be of interrest Example class PGLoader(threading.Thread):
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Parallelism choices
Has beed asked by some hackers, their use cases dictated two dierent modes.
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Parallelism choices
Has beed asked by some hackers, their use cases dictated two dierent modes. The idea is to have a parallel pg restore testbed, interresting with large input les (100GB to several TB). PGLoaders cant compete to plain COPY, due to clientserver roundtrips compared to local le reading, but with some more CPUs feeding the disk array, should show up nice improvements.
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Parallelism choices
Has beed asked by some hackers, their use cases dictated two dierent modes. The idea is to have a parallel pg restore testbed, interresting with large input les (100GB to several TB). PGLoaders cant compete to plain COPY, due to clientserver roundtrips compared to local le reading, but with some more CPUs feeding the disk array, should show up nice improvements. Testing and feeback more than welcome!
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Split le reader
The le is split into N blocks and theres as much pgloader doing the same job in parallel as there are blocks.
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Split le reader
The le is split into N blocks and theres as much pgloader doing the same job in parallel as there are blocks. Example [rrr] section_threads = 3 split_file_reading = True
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Examples
PGLoader distribution comes with diverse examples, dont forget to see about them.
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
simple
That simple:
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
simple
That simple: Example [simple] table filename format datestyle field_sep trailing_sep columns
= = = = = = =
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
= = = =
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
= = = = = =
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
At http://pgloader.projects.postgresql.org/ or man pgloader Example > pgloader --help > pgloader --version > pgloader -DTsc pgloader.conf
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
TODO
http://pgloader.projects.postgresql.org/dev/TODO.html
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
TODO
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
TODO
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
TODO
http://pgloader.projects.postgresql.org/dev/TODO.html Constraint Exclusion support Reject Behaviour XML support with user dened XSLT StyleSheet
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
TODO
http://pgloader.projects.postgresql.org/dev/TODO.html Constraint Exclusion support Reject Behaviour XML support with user dened XSLT StyleSheet Facilities
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
TODO
http://pgloader.projects.postgresql.org/dev/TODO.html Constraint Exclusion support Reject Behaviour XML support with user dened XSLT StyleSheet Facilities Dont be shy and just ask for new features!
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
pgfoundry, 1 developper, some users, no mailing list yet (no one asking for one), some mails sometime, seldom bug reports (xed)
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
pgfoundry, 1 developper, some users, no mailing list yet (no one asking for one), some mails sometime, seldom bug reports (xed) Support ongoing at #postgresql and #postgresqlfr
Dimitri Fontaine
Outline Introduction Architecture Conguration examples & Usage Current status & TODO
pgfoundry, 1 developper, some users, no mailing list yet (no one asking for one), some mails sometime, seldom bug reports (xed) Support ongoing at #postgresql and #postgresqlfr packages for debian, FreeBSD, OpenBSD, CentOS, RHEL and Fedora.
Dimitri Fontaine