Home » RDBMS Server » Server Utilities » loading data from twitter with sqlldr (sqlloader )
loading data from twitter with sqlldr [message #667129] |
Fri, 08 December 2017 12:18 |
|
rocioperezn
Messages: 2 Registered: December 2017
|
Junior Member |
|
|
Hello all,
I have some problems loading a twitter data from a csv, with some multiline records. I have 4 columns id_tweet,text,created_at, username .
When text is enclosed by " the register is multiline or has comas inside, I show you 3 examples, 1st enclosed by " with comas inside the text, 2nd enclosed by " and multiline, 3rd no enclosed by coma no multiline:
793418980342063104,"@realDonaldTrump I am also a $ donor, i can not give much in the way of $ but i do.(poor) My advice is more valuable if you have used it.",2016-11-01T11:46:40.000Z, ReynardGosso
793418981415583745,"@FranklinFoer
Thanks @realDonaldTrump you’re doing a great Job helping to elect @HillaryClinton !
",2016-11-01T11:46:40.000Z,759714473556840448,false,wabroussard1,,,0,-1,1,0
793418985458929664,This is the 5 year old Nazi German-American @realDonaldTrump that wants 2 be president. No Americans elect adults.,2016-11-01T11:46:41.000Z, AnthonyNnani
I´m using sqldr but when I use
load data
infile 'tweetv1.csv' "str '\r\n'"
append continueif last != ','
into table TWEETS
fields terminated by ','
OPTIONALLY ENCLOSED BY '"' AND '"'
trailing nullcols
( id_TWEET CHAR(4000),
text CHAR(4000),
created_at CHAR(4000),
userid CHAR(4000)
)
It doesn´t recognize when it's a new logical record.
Any advice on how to do it?
Thanks a lot,
Rocio
|
|
|
Re: loading data from twitter with sqlldr [message #667131 is a reply to message #667129] |
Fri, 08 December 2017 22:45 |
|
Barbara Boehmer
Messages: 9101 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
You need to have something in your data that can be used to either identify when a record begins or when a record ends. You have used
continueif last != ','
to tell it that the record is continued on the next line if the last character on the current line is not a comma, but there are no commas at the ends of your records. In the following example, I have instead used
continueif next preserve(1) != '7'
to tell it that the record is continued on the next line if the next character on the next line is not a 7 and, if it is a 7, then that begins the next record and to preserve that 7 as part of the first field. This, obviously will only work if every id_tweet begins with a 7. If not, then you will need to obtain your data in a different format, in order to ensure being able to load all records, finding some way to either add an end of record character or beginning of record character.
SCOTT@orcl_12.1.0.2.0> host type tweetv1.csv
793418980342063104,"@realDonaldTrump I am also a $ donor, i can not give much in the way of $ but i do.(poor) My advice is more valuable if you have used it.",2016-11-01T11:46:40.000Z, ReynardGosso
793418981415583745,"@FranklinFoer
Thanks @realDonaldTrump youΓÇÖre doing a great Job helping to elect @HillaryClinton !
",2016-11-01T11:46:40.000Z,759714473556840448,false,wabroussard1,,,0,-1,1,0
793418985458929664,This is the 5 year old Nazi German-American @realDonaldTrump that wants 2 be president. No Americans elect adults.,2016-11-01T11:46:41.000Z, AnthonyNnani
SCOTT@orcl_12.1.0.2.0> host type test.ctl
load data
infile 'tweetv1.csv' "str '\r\n'"
append continueif next preserve(1) != '7'
into table TWEETS
fields terminated by ','
OPTIONALLY ENCLOSED BY '"' AND '"'
trailing nullcols
( id_TWEET CHAR(4000)
, text CHAR(4000)
, created_at CHAR(4000)
, userid CHAR(4000) )
SCOTT@orcl_12.1.0.2.0> create table tweets
2 (id_tweet varchar2(4000),
3 text varchar2(4000),
4 created_at varchar2(4000),
5 userid varchar2(4000))
6 /
Table created.
SCOTT@orcl_12.1.0.2.0> host sqlldr scott/tiger control=test.ctl log=test.log
SQL*Loader: Release 12.1.0.2.0 - Production on Fri Dec 8 20:38:26 2017
Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.
Path used: Conventional
Commit point reached - logical record count 2
Commit point reached - logical record count 3
Table TWEETS:
3 Rows successfully loaded.
Check the log file:
test.log
for more information about the load.
SCOTT@orcl_12.1.0.2.0> column id_tweet format a18
SCOTT@orcl_12.1.0.2.0> column text format a60 word_wrapped
SCOTT@orcl_12.1.0.2.0> column created_at format a24
SCOTT@orcl_12.1.0.2.0> column userid format a13 word_wrapped
SCOTT@orcl_12.1.0.2.0> select * from tweets
2 /
ID_TWEET TEXT CREATED_AT USERID
------------------ ------------------------------------------------------------ ------------------------ -------------
793418980342063104 @realDonaldTrump I am also a $ donor, i can not give much in 2016-11-01T11:46:40.000Z ReynardGosso
the way of $ but i do.(poor) My advice is more valuable if
you have used it.
793418981415583745 @FranklinFoerThanks @realDonaldTrump youΓÇÖre doing a great 2016-11-01T11:46:40.000Z 7597144735568
Job helping to elect @HillaryClinton ! 40448
793418985458929664 This is the 5 year old Nazi German-American @realDonaldTrump 2016-11-01T11:46:41.000Z AnthonyNnani
that wants 2 be president. No Americans elect adults.
3 rows selected.
|
|
|
|
|
Goto Forum:
Current Time: Mon Dec 23 02:35:59 CST 2024
|