Tuesday, February 2, 2016

Use Python to upload files to swift storage.

Use Python to upload files to swift storage.

NYPD Motor Vehicle Accidents
From Bluemix example NYPD Motor Vehicle Accidents, use the code upto the point where we create spark dataframe.

# adding the PySpark modul to SparkContext
sc.addPyFile("https://raw.githubusercontent.com/seahboonsiew/pyspark-csv/master/pyspark_csv.py")
import pyspark_csv as pycsv

collisions = sc.textFile("swift://hivecontainer." + credentials['name'] + "/NYPD_Motor_Vehicle_Collisions.csv")

# create Spark DataFrame using pyspark-csv
collisions_df = pycsv.csvToDataFrame(sqlContext, collisions_body, sep=",", columns=collisions_header_list)

#save the Spark DataFrame to local storage
collisions_df.toPandas().to_csv('mycsv.csv’)

# Now there are two ways to upload this resultant csv file back to swift object storage

#1. You can use install swift CLI and run swift commands with !(magic) https://www.ng.bluemix.net/docs/services/ObjectStorage/index.html#using-swift-cli but that
# seems to erring out since python-dev module and other module required for “!pip —user install python-swiftclient” and  pip install python-keystoneclient and 
# pip install urllib3 certifi pyopenssl

#2. You can use !(magic) with curl and swift storage REST API (which is tested and working fine)

!curl -i -H "Content-Type: application/json" -d '{"auth": {"identity": {"methods": ["password"],"password": {"user": {"id": "cc8b1374d0de412fa1c7e201a4e90bce","password": "jo..F04#N(bDR1OZ"}}},"scope": {"project": {"id": "e4321c16ed084c06a9dc62ba810a61bf"}}}}'  https://identity.open.softlayer.com/v3/auth/tokens

HTTP/1.1 201 Created
Date: Wed, 03 Feb 2016 01:43:18 GMT
Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips mod_wsgi/3.4 Python/2.7.5
X-Subject-Token: gAAAAABWsVs3GT2stF………
Vary: X-Auth-Token
x-openstack-request-id: req-fb9adbc9-425d-4551-817b-9c19995f3107
Content-Length: 17448
Content-Type: application/json

Copy the X-Subject-Token value and save it to token variable for multiple uses.

token = “gAAAAABWsVs3GT2stFDdr…"

Now to get the size of the exported file run following
!ls -l mycsv.csv
-rw-r--r-- 1 s027-20bcfe6e4297e8-2c631c8ff999 users 145549225 Feb  1 15:08 mycsv.csv


Now replace the content-length value with the size of the exported file and replace other parameters to form object storage URL
as explain in the link specified here
https://www.ng.bluemix.net/docs/services/ObjectStorage/index.html#using-swift-restapi 

!curl -X PUT -H "X-Auth-Token:$token" -H "Content-Length: 145549225" https://dal.objectstorage.open.softlayer.com/v3/AUTH_e4321c16ed084c06a9dc62ba810a61bf/hivecontainer/mycsv.csv -T mycsv.csv


You can now verify that mycsv.csv is uploaded to swift storage by going to swift service.


Thanks,
Charles.

No comments:

Post a Comment