Catherine Devlin
Wanted: RDBMS superpower summary for app developers
At last night's WWCode Cincinnati panel, I recommended that developers talk to their DBA about what advanced capabilities their RDBMS can offer, so that they don't end up reimplementing functionality in the app that are already available (better and more efficiently) in the database itself. Devs can waste a lot of effort by thinking of databases as dumb, inert data boxes.
I was asked an excellent question: "Where can a dev quickly familiarize herself with what those capabilities are?" My answer was, "Um."
Do not say they should read the docs. That is a "let them eat cake" answer. The PostgreSQL docs are over 2900 pages. That's not what they need.
Suggestions, folks? Python developers have built great summary sites, like the Hitchhiker's Guide to Python. What are the equivalents in the database world? Do they exist? Do we need to write them?
Code Studio rocks; diversity does, too
If you want to quickly get some kids introduced to computer programming concepts, you could do a lot worse than using Code Studiofrom code.org. That's what I did the last couple weeks - took two hours to lightly shepherd the Dayton YWCA day camp through a programming intro.
It's really well-organized and easy to understand - frankly, it pretty much drives itself. It's based on block-dragging for turtle graphics and/or simple 2D games, all easy and appealing stuff. (They even got their turtle graphics branded as the sisters from Frozen ice-skating!) I didn't need to do much more than stand there and demonstrate that programmers actually exist in the flesh, and occasionally nudge a student over a bump. Though, by pair programming, they did most of the nudging themselves.
Here's most of my awesome class. Sorry I'm as bad at photography as at CSS.
Hey - we got demographics, huh? Right - if you announce that you're teaching a coding class through your usual geeky circles, they spread the word among their circles and recruit you a class that looks pretty much like the industry already looks. And if you seek a venue through your geeky circles, the usual suspects will step up to host. In badly segregated Dayton, that means "as far from the colored parts of town as possible." That's less than inviting to the people who don't live there.
But if you partner with groups that already have connections in diverse communities - like the YWCA, which makes anti-racism one of its keystones - getting some fresh faces can be pretty easy! And there are venues available outside the bleached-white exurbs you're used to - you just need to think to look.
Another benefit of Code Studio is that it's entirely web-based, so you don't need to restrict your demographics to "kids whose parents can afford to get them laptops". The public library's computer classroom did the job with flying colors.
Seriously, this was about the easiest outreach I've ever done. I'm working on the follow-up, but I think I'll be able to find further lazy options. Quite likely it will leverage CodeAcademy. So, what's your excuse for not doing it in your city?
Now, in other news: You are running out of time to register for PyOhio, a fantastic, friendly, free, all-levels Python conference, and my pride and joy. The schedule is amazing this year, and for better or for worse, I'm keynoting. So please come and add to my terror.
rdbms-subsetter
I've never had a tool I really liked that would extract a chunk of a large production database for testing purposes while respecting the database's foreign keys. This past week I finally got to write one: rdbms-subsetter.
rdbms-subsetter postgresql://user:passwd@host/source_db postgresql://user:passwd@host/excerpted_db 0.001
Getting it to respect referential integrity "upward" - guaranteeing every needed parent record would be included for each child row - took less than a day. Trying to get it to also guarantee referential integrity "downward" - including all child records for each parent record - was a Quixotic idea that had me tilting at windmills for days. It's important, because parent records without child records are often useless or illogical. Yet trying to pull them all in led to an endlessly propagating process - percolation, in chemical engineering terms - that threatened to make every test database a complete (but extremely slow) clone of production. After all, if every row in parent table P1 demands rows in child tables C1, C2, and C3, and those child rows demand new rows in parent tables P2 and P3, which demand more rows in C1, C2, and C3, which demand more rows in their parent tables... I felt like I was trying to cut a little sweater out of a big sweater without snipping any yarns.
So I can't guarantee child records - instead, the final process prioritizes creating records that will fill out the empty child slots in existing parent records. But there will almost inevitably be some child slots left open when the program is done.
I've been using it against one multi-GB, highly interconnected production data warehouse, so it's had some testing, but your bug reports are welcome.
Like virtually everything else I do, this project depends utterly on SQLAlchemy.
I developed this for use at 18F, and my choice of a workplace where everything defaults to open was wonderfully validated when I asked about the procedure for releasing my 18F work to PyPI. The procedure is - and I quote -
Just go for it.%sql: To Pandas and Back
A Pandas DataFrame has a nice to_sql(table_name, sqlalchemy_engine) method that saves itself to a database.
The only trouble is that coming up with the SQLAlchemy Engine object is a little bit of a pain, and if you're using the IPython %sql magic, your %sql session already has an SQLAlchemy engine anyway. So I created a bogus PERSIST
pseudo-SQL command that simply calls to_sql
with the open database connection:
%sql PERSIST mydataframe
The result is that your data can make a very convenient round-trip from your database, to Pandas and whatever transformations you want to apply there, and back to your database:
In [1]: %load_ext sql
In [2]: %sql postgresql://@localhost/
Out[2]: u'Connected: @'
In [3]: ohio = %sql select * from cities_of_ohio;
246 rows affected.
In [4]: df = ohio.DataFrame()
In [5]: montgomery = df[df['county']=='Montgomery County']
In [6]: %sql PERSIST montgomery
Out[6]: u'Persisted montgomery'
In [7]: %sql SELECT * FROM montgomery
11 rows affected.
Out[7]:
[(27L, u'Brookville', u'5,884', u'Montgomery County'),
(54L, u'Dayton', u'141,527', u'Montgomery County'),
(66L, u'Englewood', u'13,465', u'Montgomery County'),
(81L, u'Germantown', u'6,215', u'Montgomery County'),
(130L, u'Miamisburg', u'20,181', u'Montgomery County'),
(136L, u'Moraine', u'6,307', u'Montgomery County'),
(157L, u'Oakwood', u'9,202', u'Montgomery County'),
(180L, u'Riverside', u'25,201', u'Montgomery County'),
(210L, u'Trotwood', u'24,431', u'Montgomery County'),
(220L, u'Vandalia', u'15,246', u'Montgomery County'),
(230L, u'West Carrollton', u'13,143', u'Montgomery County')]
auto-generate SQLAlchemy models
PyOhio gave my lightning talk on ddlgenerator a warm reception, and Brandon Lorenz got me thinking, and PyOhio sprints filled my with py-drenaline, and now ddlgenerator can inspect your data and spit out SQLAlchemy model definitions for you:
$ cat merovingians.yaml
-
name: Clovis I
reign:
from: 486
to: 511
-
name: Childebert I
reign:
from: 511
to: 558
$ ddlgenerator --inserts sqlalchemy merovingians.yaml
from sqlalchemy import create_engine, Column, Integer, Table, Unicode
engine = create_engine(r'sqlite:///:memory:')
metadata = MetaData(bind=engine)
merovingians = Table('merovingians', metadata,
Column('name', Unicode(length=12), nullable=False),
Column('reign_from', Integer(), nullable=False),
Column('reign_to', Integer(), nullable=False),
schema=None)
metadata.create_all()
conn = engine.connect()
inserter = merovingians.insert()
conn.execute(inserter, **{'name': 'Clovis I', 'reign_from': 486, 'reign_to': 511})
conn.execute(inserter, **{'name': 'Childebert I', 'reign_from': 511, 'reign_to': 558})
conn.connection.commit()
Brandon's working on a pull request to provide similar functionality for Django models!
18F
Yesterday was my first day at 18F!
What is 18F? We're a small, little-known government organization that works outside the usual channels to accomplish special projects. It involves black outfits and a lot of martial arts.
Kidding! Sort of. 18F is a new agency within the GSA that does citizen-focused work for other parts of the U.S. Government, working small, quick projects to make information more accessible. We're using all the tricks: small teams, agile development, rapid iteration, open-source software, test-first, continuous integration. We do our work in the open.
Sure, this is old hat to you, faithful blog readers. But bringing it into government IT work is what makes it exciting. We're hoping that the techniques we use will ripple out beyond the immediate projects we work on, popularizing them throughout government IT and helping efficiency and responsiveness throughout. This is a chance to put all the techniques I've learned from you to work for all of us. Who wouldn't love to get paid to work for the common good?
Obviously, this is still my personal blog, so nothing I say about 18F counts as official information. Just take it as my usual enthusiastic babbling.
ddlgenerator
I've had it on github for a while, but I finally released ddlgenerator to PyPI.
I've been frustrated for years that there was no good open-source way to set up RDBMS tables from flat data files. Sure, you could import the data - after setting up the DDL by hand. ddlgenerator handles that; in fact, you can go from zero, setting up and populating a table in a single line. Nothing up my sleeve:
$ psql -c "SELECT * FROM knights"
ERROR: relation "knights" does not exist
LINE 1: SELECT * FROM knights
^
$ ddlgenerator --inserts postgresql knights.yaml | psql
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
$ psql -c "SELECT * FROM knights"
name | dob | kg | brave
------------+---------------------+---------+-------
Lancelot | 0471-01-09 00:00:00 | 82.0000 | t
Gawain | | 69.2000 | t
Robin | 0471-01-09 00:00:00 | | f
Reepacheep | | 0.0691 | t
This is a fairly complex tool so I'm sure you'll be using the bug tracker. But I hope you'll enjoy it nonetheless!
data_dispenser
I went down a refactoring rabbit hole on ddl-generator and ended up pulling out the portion that pulls in data from various file formats. Perhaps it will be useful to others.
>>> from data_dispenser.sources import Source
>>> for row in Source('animals.csv'):
... print(row)
...
OrderedDict([('name', 'Alfred'), ('species', 'wart hog'), ('kg', '22'), ('notes', 'loves turnips')])
OrderedDict([('name', 'Gertrude'), ('species', 'polar bear'), ('kg', '312.7'), ('notes', 'deep thinker')])
OrderedDict([('name', 'Emily'), ('species', 'salamander'), ('kg', '0.3'), ('notes', '')])
Basically, I wanted a consistent way to consume rows of data - no matter where those rows come from. Right now, JSON, CSV, YAML, etc. all require separate libraries, each with its own API. This abstracts all that out, for reading purposes; now each data source is just a Source.
I'd love bug reports, and sample files to test against. And feel free to contribute patches! For example, it wouldn't be hard to add MS Excel as a data source.
G+ Public Hangout Fail
Before the PyCon 2014 CFP came due, PyLadies hosted several G+ hangouts for talk proposal brainstorming. Potential speakers could talk over and flesh out their ideas with each other, producing better talk proposals. More importantly, it was a nice psychological stepping stone on the way to filling out that big, scary CFP form all alone. I thought they went great.
I wanted to emulate them for Postgres Open and PyOhio, which both have CFPs open now. The PyLadies hangouts had used EventBrite to preregister attendees, and I unfortunately did not consider this and the reasons why. Instead, I just scheduled hangouts, made them public, and sent out invitations with the hangout URLs, encouraging people to forward the invites onward. Why make participating any harder than it has to be?
The more worldly of you are already shaking your heads at my naiveté. It turns out that the world's exhibitionists have figured out how to automatically detect and join public hangouts. For several seconds I tried kicking out and banning them as they joined, but new ones kept arriving, faster than one per second. Then I hung up - which unfortunately did not terminate the hangout. It took me frantic minutes to find how to delete a hangout in progress. I dearly hope that no actual tech community members made it to the hangout during that time.
I had intended to create a place where new speakers, and women especially, would feel safe increasing their community participation. The absoluteness of my failure infuriates me.
Hey, Google: public G+ hangouts have been completely broken, not by technical failure, but by the degraded human condition. You need to remove them immediately. The option can only cause harm, as people accidentally expose themselves and others to sexual harrassment.
In the future, a "public" hangout URL should actually take you to a page where you request entrance from the organizer by text message (which should get the same spam filtration that an email would). But fix that later. Take the public hangouts away now.
Everybody else, if you had heard about the hangouts and were planning to participate, THANK YOU - but I've cancelled the rest of them. You should present anyway, though! I'd love to be contacted directly to talk over your ideas for proposals.
TRUCEConf
Please consider participating in TRUCEConf (March 18-19 in Cincinnati)!
The goal is to help the tech community heal, through learning from others outside our industry and having an open dialogue and on how we can be better humans to each other in the world of tech.You may remember fierce controversy around TRUCEConf when virtually nothing was known about it but its name; without solid information, it was easy to read bad connotations into the name. I would have been uneasy myself if I hadn't known the founder, Elizabeth Naramore.
But now there's plenty of information, including the schedule, that should replace those concerns with enthusiasm. I think the format - a day of mind-opening speakers from all over, followed by an unconference day - should be very productive!
I'm really looking forward to it and hope that many of you can come. If you can't come in person, consider supporting the conference with a donation - they're going without corporate sponsors so your individual support means a ton. Thanks!
SacredPy seeking collaborators
I'm looking for collaborators who want to build web programming experience on an interesting project...
During my job search, I was contacted by Kai Schraml, a seminary graduate who wants to scratch an itch. Seminarians have a serious need to discuss, debate, and seek consensus on the translations of difficult texts, like sacred scriptures. But the software tools currently available for the purpose are closed-source and expensive. That just seems wrong - not just because seminary students are broke, but because of the nature of the texts themselves. After all, Jesus released his teachings under a very strong open-source license!*
So we're starting to work on an alternative, provisionally called "SacredPy". (It could be applied to any difficult texts, of course, so if Beowulf is sacred to you, have at it.) I'm quite employed now, but I'm dabbling at it a bit for the sheer interest and open-sourcey glory of it all. It's possible income could eventually come from this project - Kai could tell you more about the prospects - but certainly not soon, so this is no substitute for proper employment. But it might be great resume builder for a new Python programmer. It looks like we'll most likely build something atop Askbot, a Django-based project, so if you'd like to move into the thriving "experienced Djano developer" segment of the economy...
Let me know at moc.liamg@nilved.enirehtac and we'll talk!
* - Matthew 10:8 - δωρεὰν ἐλάβετε, δωρεὰν δότε ("Freely you have received, freely give")
Presentation links
I have just had a VERY. Busy. Week. (In a good way!) I've promised the world many talk materials, so:
- At Ohio LinuxFest, IPython for non-Pythonistas
- At APCUG Regional Conference, Python
- At Postgres Open, IPython: your new SQL client
As for Postgres Open, I absolutely loved it! So happy I finally got to go. I am proud to say that I was the very first to buy my admission for 2014! Hope to blog more about that later...
I'm not available
I'm happy to say that I'll shortly be starting a new position as a PostgreSQL DBA and Python developer for Zoro Tools!
We software types seem to have hardware envy sometimes. We have "builds" and "engines" and "forges" and "factory functions". But as it turns out, the "Tools" in "Zoro Tools" isn't a metaphor for cleverly arranged bytes. That's right - they're talking about the physical objects in your garage! Imagine! Lucky for me the interviewers didn't ask to review my junior high shop project.
So disregard my earlier post about being available. Thanks for all your well-wishes!
Depending on how you reckon it, my job search arguably only took forty minutes, though it took a while for gears to grind and finalize everything. Years of building relationships at PyCon made this the best job search ever; the only unpleasant part was having to choose from among the opportunities to work with my favorite technologies and people. I'm very glad I made the investment in PyCon over the years... and if you're thinking "that's easy for you to say, I can't afford it", don't forget PyCon's financial aid program.
And speaking of conferences, I'll be at Postgres Open next month (my first one!) - hope to see some of you there!
IPython at Ohio LinuxFest 2013
Are you signed up yet for Ohio LinuxFest on Sep. 13-15? I'll be there to present
IPython for non-Pythonistas Break out of your (bash) shell! IPython and the IPython Notebook have swept over the Python programming community, but they're not just for Python programmers - they make for high-powered shell replacements even with little to no Python knowledge. They'll also let you document your work and collaborate with others like never before. Find out how these beautiful tools can improve your daily Linux work!At PyOhio, I argued that all Python programmers need IPython. At OLF, I'll make the case that non-Pythonistas need IPython, too. Perhaps my next talk will be "Even Your Cat Needs IPython".
Also at OLF, look for PyOhio's booth for info on next year's PyOhio, other Python events around the region, and general Python love!
Python Workshop for Women Indy #2 and CMH #2 coming up!
The Midwest Python Workshop for women and their friends is back! We've got new workshops scheduled, ready to take new batches of students:
Indianapolis Python Workshop, Sep. 27-28, 2013; sponsored by Six Feet Up and hosted at Launch Fishers
Columbus Python Workshop, Oct. 18-19, 2013; sponsored by LeadingEdje and hosted at The Forge by Pillar
The Workshop is a free, friendly, hands-on introduction to computer programming using Python. Women of all ages and backgrounds are the primary target (but you can bring a male participant as your guest).
Please spread the word!
I'm available
I'm available for hire! If you need a database expert with lots of programming skill, or a Python programmer with deep database experience, please check out:
But: you must be telecommute-friendly, or in the Dayton area. I'm sorry, but I'm not available to relocate.
IPython %helloworld extension
At Monday's after-PyOhio sprint, I changed ipython-sql from an IPython Plugin to an Extension; this makes it compatible with IPython 1.0. Fortunately, this was really easy; mostly I just deleted Plugin code I didn't understand anyway.
But I do feel like "Writing Extensions" docs are lacking a "Hello World" example. Here's mine.
from IPython.core.magic import Magics, magics_class, line_magic, cell_magic
@magics_class
class HelloWorldMagics(Magics):
"""A simple Hello, <name> magic.
"""
@line_magic # or ``@line_magic("hi")`` to make ``%hi`` the name of the magic
@cell_magic
def helloworld(self, line='', cell=None):
"""Virtually empty magic for demonstration purposes.
Example::
In [1]: %load_ext helloworld
In [2]: %helloworld Catherine
Out[2]: u'Hello, Catherine'
"""
return "Hello, %s\n%s" % (line, cell or "")
def load_ipython_extension(ip):
ip.register_magics(HelloWorldMagics)
PyOhio Stone Soup
Loved PyOhio once again! Thanks so much to everybody who came, participated, and made it happen! I get such a rush of joy from seeing the Ohio Union fill up with happy Pythonistas.
PyOhio has been a classic case of the Stone Soup story. When we started planning the first one, we really didn't have the resources to pull off a conference; we were just a handful of PyCon 2008 attendees who wanted to bring something like PyCon home. But as we put it together, people appeared, pitched in, and we had a modest, amateurish - but fun! - little conference in the Columbus Public Library. PyOhio 2008 drew participants and volunteers who helped make PyOhio 2009 bigger and better; 2009 drew in more involvement for 2010; and so forth, year after year.
July 26-27, 2014. See you in Columbus!
The IPython Notebook Revolution
I'd like to focus on aspects of IPython outside the traditional number-crunching, plot-making realm, simply because those have been covered so well already - videos by the actual IPython team already have. I'd like to fill up a talk with edgy, imaginative, experimental uses of IPython that aren't well-known yet, or that suggest new ways IPython (and especially the Notebook) may be used in the future. I have a bunch of ideas along those lines...
... but I'd like your input! I don't want to miss anything awesome just because I wasn't aware, and there's a lot being done in the IPython world - more than I've been able to keep track of. Erik Welch has already thoughtfully given me a bunch of links and suggestions from SciPy. Let's crowdsource my talk even further!
Some of the goodies I already plan to include:
- notebook-based presentations
- ipython_blocks: probably my Holy Grail of imaginative uses)
- d3js in IPython: (OK, this still fits the data graphing theme, but it's also ultra-snazzy)
- ipython_sql: (everybody's got to toot her own horn sometimes)
- ipfl (web-style forms in a Notebook - very preliminary but an interesting idea)
- xkcd and hand-drawn mode
- Wakari
How would you shake up people's notions of "what IPython is for"?
Easy HTML output in IPython Notebook
import markdown
class MD(str):
def _repr_html_(self):
return markdown.markdown(self)
Four little lines, and you can do this!