Jeff Moss
Using Lifecycle Management on Azure Blob Storage Account Containers
One of our clients wanted to set up a process to delete files from their blob storage containers after they aged beyond a certain time limit. Initially the conversation went down the route of creating some configuration information to drive the process, stored in a JSON file or a database and then periodically a process would use that configuration information to apply the rules across all the storage…I then thought hey surely Microsoft have dealt with this…and of course, they have…Lifecycle Management on the Storage Accounts, as documented here.
The Lifecycle Management BladeLet’s take a rule we wanted to create and see how we can do that using Lifecycle Management. Imagine we want to remove files from a given folder in a given container on the Storage Account after thirty days.
On the Lifecycle Management blade click on “Add a rule” which brings up the first page of the wizard. Give the rule a sensible name (probably useful to define a naming standard that your organisation is comfortable with).
Set the rule scope which is either all blobs in the storage account or limit with filter – in this case we want to limit it to a particular container/folder and you’ll notice that when you select that radio button it adds a third page to the wizard (to set the limit settings).
Set the Blob type and Subtype – in this case I just want the defaults of Block Blobs and Base Blobs.
The page looks like this:
Lifecycle Management Wizard – Page 1Click Next to move to page 2 of the wizard where we need to set the rules of what to do. The screen shows only the option of “Last Modified” but requires you to fill out the value for “More than (days ago)” – 30 in my case. Leave the Then part of the rule as “Delete the blob” so page 2 looks like this:
Lifecycle Management Wizard – Page 2Click Next and we move on to page 3 where we need to set the Prefix Match. In my case I want to apply this rule to files found in the “output” container and the “myFolder” folder so I set the values and page 3 looks like this:
Lifecycle Management Wizard – Page 3Now click on Add and the rule is created:
Lifecycle Management Rule AddedYou’ll see there is a List View showing the new Policy in place. There is also a Code View which shows the policy in JSON format like so:
Lifecycle Management Policy – Code ViewThat’s it.
Other things to know…
- The feature is free – well the feature itself and deletes of block blobs are but if you want to use the feature for moving files to a different tier then that is a Set Blob Tier API call which incurs a regular operation cost
- Maintenance of the policies is via Portal, Powershell, CLI or REST APIs
- Can also roll out policies using ARM Templates using the Microsoft.Storage/storageAccounts/managementPolicies type
Make sure you use base64 encoding on a Key Vault Secret for an sFTP Azure Data Factory Linked Service Connection
A quick post on setting up an sFTP Linked Service Connection in Azure Data Factory such that it uses a Key Vault for the SSH Key.
A friend of mine had tried setting this up but was getting the following error when testing the new Linked Service:
The Linked Service was using a Key Vault to obtain the SSH Key to be used in the connection. The SSH Key had been uploaded as a Secret to the Key Vault using code similar to the following:
az keyvault secret set --name sshkey --vault-name akv-dev --file test.ssh --description "Test SSH Key"
After reading through the documentation on the az keyvault secret set call I noticed this:
So, the default is not base64 but utf-8.
We modified the az call to something like this:
az keyvault secret set --name sshkey --vault-name akv-dev --file test.ssh --description "Test SSH Key" --encoding base64
i.e. with the addition of the –encoding base64 part and then it worked fine.
Encrypt Acquired FTP file Dynamically Before Storing In Storage Account
I encuntered a requirement to acquire some unencrypted files from an on premise FTP server and place them on a target Azure Storage Account after first encrypting them. The goal is to not have these files in Azure unless they are encrypted.
This can be achieved in possibly a number of a ways but this post is about using Logic Apps to do this.
Pre-RequisitesThe following are assumed to be in place already:
- An accessible FTP Server where the source file is hosted
- An accessible Storage Account where the encrypted file is to be targeted to
- An accessible Key vault with an encryption Key
I’m going to use Azure Portal to do this.
First type Logic App in the search bar on the dashboard/home screen. Click on Logic App to bring up the screen.
You can see I have already attempted the action but we’ll create a new one for the purposes of writing this post. Hit Add and fill out the details for a new Logic App:
I’ve used an existing resource group but you could create one if needed. I called the Logic App “encrypt-ftp-to-storageaccount”. Hit Review + Create and it will validate before offering the create screen:
Hit Create and the deployment will execute until completion:
Now hit Go to resource and it will bring up the initial screen for the new Logic App…
Scroll down and click on “Blank Logic App”:
Notice that because I’ve already been playing in this area my Recent selections include some of the components we’re going to use here.
We need some kind of simple trigger to kick this off so let’s just use a Schedule with a once a day recurrence. Click on Schedule and then Recurrence and set the Interval to 1 and the Frequency to Day:
Now add a new step – we want to acquire the file from the FTP server.
Click on New Step.
Click on FTP if it is in your recent list or search for it. Then choose Get File Content (or Get File Content With Path if you need to specify a path). You then get the action on the editor with a number of fields to fill out:
Give the Connection a name (e.g. “ftp source”). Set the ftp server address, username and password and port number. I’ve not tried changing any of the other details but some of them might be sensible to address in your own environment. Click Create to create this action.
Now you can select a file – this could be programmed but for this exercise I’m just going to point it at a fixed file (robots.txt):
Now add another step by clicking “+ New Step”:
Choose Azure Key Vault from the Recent (if available) or by searching and then select “Encrypt data with key”. This creates the step:
Depending on whether you are already connected you may see that it tries to use an existing connection or if not asks you to specify the connection to the required Key Vault. In my case above I have a connection but I’ll choose to change that to show what needs to be set. Hitting Change Connection brings up this:
Choose Add New and it shows:
Set the Vault name to the name of the Key Vault where the Key for encryption is held, in my case akv-dev and click Sign In:
Azure brings up the usual credentials access dialog to allow you to connect.
Once connected you get the dialog box to select the Key:
I choose my Key (akv-dev-testkey) and set the Raw Data to the Dynamic Content value of File Content.
Now click on “+ New Step” again to add the write out of the data to the Storage Account.
Choose Azure Blob Storage and Create Blob:
I set the Connection Name and choose the Storage Account (oramossadls2) and then hit Create.
This creates the Create Blob Step and we can specify the folder path on the target Storage Account where we want to create the file. We can specify the target file name (robots.txt) and then we should specify the Dynamic Content of the encrypted data as the Blob Content but notice that the Dynamic Content doesn’t show it. It does, however, show the message “We can’t find any outputs to match this input format. Select See more to see all outputs from previous actions”:
Click on the “See More” and it will show the “encryptedData” as an option:
Choose “encryptedData” so that the Create blob dialog looks like:
Save the Logic App and Run it.
The Logic App runs and the output looks like this:
If we look on the Storage Account we see the file robots.txt:
And if I looked at the file in an editor it looks like:
Hope this helps.
Configure Linked Templates Use for Azure Data Factory with Azure DevOps Deployment
I’ve finally worked out how to do this so I thought I’d write a post on it since I can’t find any single resource that accurately covers it all – my apologies, in advance, if someone has already done this.
I’ve been using Azure Data Factory v2 for quite a while now and have it integrated with Azure DevOps for CI/CD between environments. I follow the standard approach which is documented here and I won’t repeat.
I’ll assume that you have a git enabled source Data Factory and a non git enabled target Data Factory and that your main code branch is “master” and the publish branch is “adf_publish”.
As the documentation there says:
“If you’ve configured Git, the linked templates are generated and saved alongside the full Resource Manager templates in the adf_publish branch in a new folder called linkedTemplates”
…that happens when you publish the master branch from Azure Data Factory.
We can see the non linked template files (ARMTemplateForFactory.json, ARMTemplateParametersForFactory.json) and linked template files (ArmTemplate_master.json, ArmTemplateParameters_master.json and ARMTemplate_0.json) in the picture below:
Note – this is a very small demonstration factory and there is only one linked template file (ArmTemplate_0.json) – as the factory grows in size additional, consecutively numbered, files will appear.
The question then is how do you get Azure DevOps to use those Linked Template files instead of the non linked ones sitting in the adf_publish branch root directory?
Supposedly you can just follow this link but unfortunately that document is a little out of date and no longer being updated. The document covers the deployment of a VNET with Network Security Group and makes no mention of Azure Data Factory but it still provides some useful pointers.
The document correctly points out that in order for the linked templates to be deployed they need to be accessible to Azure Resource Manager and the easiest way of doing that is by having the files in an Azure Storage Account – that article illustrates the use of “Storage (general purpose v1)” but I used a Gen 2 ADLS Storage Account instead and that worked fine.
My Gen 2 Storage Account “oramossadls2” looks like this:
I then created a Shared Access Signature for oramossadls2 Storage Account and copied the SAS token which I then put into a secret called StorageSASToken in an Azure Key Vault called akv-dev:
I created an Access Policy on this Key Vault to allow the Azure DevOps Service Principal to be able to read the Secret:
On my Gen 2 ADLS Storage Account I then created a Container called demo:
From the container properties the URL looks like:
https://oramossadls2.blob.core.windows.net/demo
In Azure Storage Explorer, I grant access to the container to the Service Principal of my Azure DevOps site in order that it can access the files:
In Azure DevOps I then created a Variable Group called akv-dev which brings in the StorageSASToken Secret from the akv-dev Azure Key Vault:
I created a second Variable Group called “Production-Static” in which I created some more variables for use later on:
Following this article from Kamil Nowinski I have a Build Pipeline “ADF-CI” in Azure DevOps which stores the ARM template files as artifacts ready for use on a Release.
Now for the bit that took me a while to work out…the Release Pipeline.
My release pipeline has the artifacts from the latest Pipeline Build (_ADF-CI) and the code from the “master” branch as artifacts and a single stage with five tasks:
The Production-Static and akv-dev Variable Groups are attached to the Pipeline:
The five tasks are:
The first step copies the files on the latest Build, attached as an artifact (_ADF-CI) to this Pipeline, to the ADLS Gen2 Storage Account that I’ve created:
The value of StorageAccountName from the “Production-Static” Variable Group is “oramossadls2” and blobContainerName is “demo”.
The YAML for step 1 is:
steps: task: AzureFileCopy@3 displayName: 'AzureBlob File Copy' inputs: SourcePath: '$(System.DefaultWorkingDirectory)/_ADF-CI' azureSubscription: 'Pay-As-You-Go (xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)' Destination: AzureBlob storage: '$(StorageAccountName)' ContainerName: '$(blobContainerName)' BlobPrefix: adf
When this step eventually runs the Storage Account will look like this:
The second step stops the ADF Triggers – if you don’t stop active triggers the deployment can fail. I use a Powershell script to do this:
The YAML for step 2 looks like this:
steps: task: AzurePowerShell@4 displayName: 'Stop ADF Triggers' inputs: azureSubscription: 'Pay-As-You-Go (xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)' ScriptPath: '$(System.DefaultWorkingDirectory)/_ADF/powershell/SetADFTriggersState.ps1' ScriptArguments: '-DataFactoryName $(DataFactoryName) -DataFactoryResourceGroupName $(DataFactoryResourceGroupName) -State "Stop" -ReleaseIdentifier $(Release.Artifacts._ADF.BuildId)' azurePowerShellVersion: LatestVersion
The third step actually deploys the template to the target environment (in this case oramoss-prod Data Factory resource group “adf-prod-rg”):
It took a while to work this out. First I tried using Template Location of “Linked Artifact” which allows you just to select the template/parameter file from the attached artifacts but that doesn’t work because “nested templates ALWAYS have to be deployed from url” according to this. So, we have to set Template Location to “URL of the file”. We then have to specify the Template and Template Parameter file link using a URL which consists of the Primary Blob Service Endpoint, the container, Blob Prefix, folder hierarchy, filename and the SAS Storage Key, i.e.
Template File:
https://oramossadls2.blob.core.windows.net/demo/adf/drop/linkedTemplates/ArmTemplate_master.json$(StorageSASToken)
Parameters File:
https://oramossadls2.blob.core.windows.net/demo/adf/drop/linkedTemplates/ArmTemplateParameters_master.json$(StorageSASToken)
Note – we are getting the Storage SAS Token from the attached Variable group akv-dev.
We then have to override some parameters:
-factoryName "oramoss-prod" -containerUri $(AzureBlobStorageURL)/$(blobContainerName)/adf/drop/linkedTemplates -containerSasToken $(StorageSASToken)
The factoryName needs to be overridden because we are moving the code from one Data Factory to the next.
In this article, it suggests that the parameter for the the URI of the template files is called “templateBaseUrl” but this appears to now be changed to “containerUri” and we set it to the Primary Blob Service Endpoint, Container Name and directory where the template files are held.
The article also suggest the Storage SAS Token parameter is called “SASToken” but it appears to now be “containerSasToken”.
The YAML for step 3 looks like:
steps: task: AzureResourceManagerTemplateDeployment@3 displayName: 'Deploy ADF ARM Template' inputs: azureResourceManagerConnection: 'Pay-As-You-Go (xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)' subscriptionId: 'xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' resourceGroupName: 'adf-prod-rg' location: 'North Europe' templateLocation: 'URL of the file' csmFileLink: 'https://oramossadls2.blob.core.windows.net/demo/adf/drop/linkedTemplates/ArmTemplate_master.json$(StorageSASToken)' csmParametersFileLink: 'https://oramossadls2.blob.core.windows.net/demo/adf/drop/linkedTemplates/ArmTemplateParameters_master.json$(StorageSASToken)' overrideParameters: '-factoryName "oramoss-prod" -containerUri $(AzureBlobStorageURL)/$(blobContainerName)/adf/drop/linkedTemplates -containerSasToken $(StorageSASToken)' deploymentName: 'oramoss-prod-deploy'
The fourth step removes orphaned resources. Because we use an incremental approach to pushing the ARM template to oramoss-prod it means that if we dropped an element from oramoss-dev Data Factory it would not automatically get removed from oramoss-prod Data Factory, i.e. the element would be orphaned in oramoss-prod. This step removes any such elements it finds using a Powershell script.
The YAML for step 4 looks like:
steps: task: AzurePowerShell@4 displayName: 'Remove Orphans' inputs: azureSubscription: 'Pay-As-You-Go (xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)' ScriptPath: '$(System.DefaultWorkingDirectory)/_ADF/powershell/RemoveOrphanedADFResources.ps1' ScriptArguments: '-DataFactoryName $(DataFactoryName) -DataFactoryResourceGroupName $(DataFactoryResourceGroupName) -armTemplate $(System.DefaultWorkingDirectory)/_ADF-CI/drop/ARMTemplateForFactory.json' azurePowerShellVersion: LatestVersion
The last step runs the same script as the second step but with State set to “StartPriorEnabled” instead of “Stop”.
The YAML for step 5 looks like:
steps: task: AzurePowerShell@4 displayName: 'Start ADF Triggers' inputs: azureSubscription: 'Pay-As-You-Go (xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)' ScriptPath: '$(System.DefaultWorkingDirectory)/_ADF/powershell/SetADFTriggersState.ps1' ScriptArguments: '-DataFactoryName $(DataFactoryName) -DataFactoryResourceGroupName $(DataFactoryResourceGroupName) -State "StartPriorEnabled" -ReleaseIdentifier $(Release.Artifacts._ADF.BuildId)' azurePowerShellVersion: LatestVersion
That’s it. Save the Release Pipeline and run it and the output should look like similar to:
Helpful LinksGet Azure Networking Hierarchy Components With Powershell
I needed to see what VNETs, their subnets and the NIC/IPs attached to those subnets which is all available in Azure Portal but I wanted a nice hierarchical listing so here is a Powershell script for that…
$vnets = Get-AzVirtualNetwork
foreach ($vnet in $vnets)
{
"VNET: " + $vnet.Name
"Subnet Count: " + $vnet.Subnets.count
foreach ($subnet in $vnet.Subnets)
{
"..Subnet: " + $subnet.Name
if ($Subnet.IpConfigurations.count -eq 0)
{"Subnet has no IpConfigurations"}
else
{
foreach ($IpConfiguration in $Subnet.IpConfigurations)
{
"…." + $IpConfiguration.Id.substring($IpConfiguration.Id.indexof("resourceGroups")+15)
}
}
}
}
The output looks like this (redacted):
VNET: myvnet1
Subnet Count: 1
..Subnet: mysubnet1
Subnet has no IpConfigurations
VNET: myvnet2
Subnet Count: 1
..Subnet: mysubnet2
….myresourcegroup2/providers/Microsoft.Network/networkInterfaces/myvm1-nic1/ipConfigurations/ipconfig1
….myresourcegroup2/providers/Microsoft.Network/networkInterfaces/myvm2-nic1/ipConfigurations/ipconfig1
VNET: myvnet3
Subnet Count: 0
Azure Advisor And Fixing Errors
Azure can be configured to send you advisor reports detailing things that are not quite right in your environment. The advisor is not necessarily always right but it’s sensible to review the outputs periodically, even if they relate to non production environments.
A few issues popped up on an advisor report on my recent travels and although you can just use the entries on the report on the portal to target the offending resources, I thought it might be helpful to write some Powershell to identify the offending resources as an alternative.
Secure transfer to storage accounts should be enabledThis error shows up similar to this on the report:
Fairly obvious what this means really – the storage account has a setting which is currently set to allow insecure transfers (via http rather than https) – an example looks like this under the Configuration blade of the Storage Account:
The advisor highlights this and the solution is to just set the toggle to Enabled for “Secure transfer required” and press save.
To identify all the storage accounts which have this issue use the following:
Get-AzStorageAccount | where {$_.EnableHttpsTrafficOnly -eq $False}
code
This gives output similar to the following (redacted):
PS Azure:> Get-AzStorageAccount | where {$_.EnableHttpsTrafficOnly -eq $False}
StorageAccountName ResourceGroupName Location SkuName Kind AccessTier CreationTime ProvisioningState EnableHttps TrafficOnlyAn Azure Active Directory
------------------ ----------------- -------- ------- ---- ---------- ------------ ----------------- -----------
XXXXXXXXXXXXXXXXXX AAAAAAAAAAAAAAA northeurope Standard_LRS Storage 9/6/19 9:51:53 PM Succeeded False
YYYYYYYYYYYYYYYYYY AAAAAAAAAAAAAAA northeurope Standard_LRS Storage 6/26/19 3:29:38 PM Succeeded False
administrator should be
provisioned for SQL servers
This one appears like the following in the advisor output:
As a long term Oracle guy I’m no SQL Server expert so I can’t quite see why this is an issue if you have a SQL Server authenticated administrative user active – no doubt a friendly SQL DBA will chime in and explain.
To fix this navigate to the SQL Server in question and the Active Directory admin blade and select “Set admin”, choose a user from the Active Directory and press Save.
To find all SQL Servers affected by this I wrote the following Powershell:
$sqlservers = Get-AzResource -ResourceType Microsoft.Sql/servers foreach ($sqlserver in $sqlservers) { $sqlserver.Name $ADAdmin = Get-AzureRmSqlServerActiveDirectoryAdministrator -ServerName $sqlserver.Name -ResourceGroupName $sqlserver.ResourceGroupName "AD Administrator:" + $ADAdmin.DisplayName + "/" + $ADAdmin.ObjectId }
code
This returns output similar to the following (redacted):
mysqlserver1 AD Administrator:clark.kent@dailyplanet.com/fe93c742-d83c-2b4c-bc38-83bc34def38c mysqlserver2 AD Administrator:/ mysqlserver3 AD Administrator:clark.kent@dailyplanet.com/fe93c742-d83c-2b4c-bc38-83bc34def38c mysqlserver4 AD Administrator:clark.kent@dailyplanet.com/fe93c742-d83c-2b4c-bc38-83bc34def38c
From the above you can that mysqlserver2 has no AD Administrator and will be showing up on the advisor report.
Enable virtual machine backup toprotect your data from corruption
and accidental deletion
This one appears like the following in the advisor output:
To fix this, navigate to the Backup blade on the VM Resource in question and set the appropriate settings to enable the backup.
To identify VMs where this issue is evident use the following Powershell:
$VMs = Get-AzVM foreach ($VM in $VMs) { "VM: " + $VM.Name $RecoveryServicesVaults = Get-AzRecoveryServicesVault foreach ($RecoveryServicesVault in $RecoveryServicesVaults) { Get-AzRecoveryServicesBackupContainer -VaultID $RecoveryServicesVault.ID -ContainerType "AzureVM" -Status "Registered" -FriendlyName $VM.Name } }
code
This gives results similar to the following, allowing you to see VMs where no backup is enabled:
VM: myVM1
FriendlyName ResourceGroupName Status ContainerType ------------ ----------------- ------ ------------- myVM1 myResourceGroup Registered AzureVM myVM1 myResourceGroup Registered AzureVM myVM1 myResourceGroup Registered AzureVM VM: myVM2 VM: myVM3 myVM3 myResourceGroup Registered AzureVM myVM3 myResourceGroup Registered AzureVM myVM3 myResourceGroup Registered AzureVM
What you can see from the above is myVM1 and myVM3 both have registered backups unlike myVM2 which has none and therefore myVM2 needs backup enabling.
Finding databases on each SQL Server using Powershell
A client had the requirement to list out the SQL Servers and the databases they have installed on those SQL Servers in their Azure Cloud environment this week. The reason for the requirement was to find SQL Servers that no longer had any databases on them so they could be considered for removal.
Essentially, it gathers a list of SQL Server resources, loops through them and counts and itemises them, not including the master database since that’s not relevant to the requirement.
I wrote the following powershell:
$sqlservers = Get-AzResource -ResourceType Microsoft.Sql/servers foreach ($sqlserver in $sqlservers) { $sqlserver.Name $databases = Get-AzResource -ResourceType Microsoft.Sql/servers/databases|Where-Object {$_.Name -notlike "master"}|Where-Object {$_.Name -like $sqlserver.Name + "/*"} "Database Count:" + $databases.Count "Databases…" ">>>" + $databases.Name }
Which returns the following type of output (amended for privacy):
mytestsqlserver1 Database Count:0 Databases... >>> mytestsqlserver2 Database Count:1 Databases... >>>mytestsqlserver2/mydatabase1 mytestsqlserver3 Database Count:1 Databases... >>>mytestsqlserver3/mydatabase2 mytestsqlserver4 Database Count:3 Databases... >>>mytestsqlserver4/mydatabase3 mytestsqlserver4/mydatabase4 mytestsqlserver4/mydatabase5
Azure Active Directory (AAD)
PS Azure:> $azureaduser=$(az ad user list --filter "userPrincipalName eq 'Fred.Smith@acme.com'" --query [].objectId --output tsv) Azure:/ PS Azure:> $azureaduser a5cga394-f1da-4f2c-bc4b-931c10bad9e1
Show all users in AAD:
az ad user list --query [].userPrincipalName
Virtual Machines (VMs)
$Location="northeurope"
Get-AzureRmVMImagePublisher -Location $Location
Get-AzureRmVMImageOffer -Location $Location -PublisherName "MicrosoftSQLServer"
Get-AzureRmVMImageSku -Location $Location -PublisherName "MicrosoftSQLServer" -Offer "SQL2019-WS2016"
Get-AzureRmVMImage -Location $Location -PublisherName "MicrosoftSQLServer" -Offer "SQL2019-WS2016" -Skus "SQLDEV"
Tagging
Show tags for all resources
az group list --query [].tags
Create tag
az tag create --name "Review Date"
Create tag with values
az tag add-value --name Environment --value Development
Set a tag to a value for a resource group
az group update -n example-resource-group --set tags.Environment=prod tags.CostCenter=IT
Tagging
— Create a new tag
New-AzureRmTag -Name "Review Date"
— Create a tag and set the Purpose
New-AzureRmTag -Name Purpose -Value "Azure DevOps Self Hosted Agent"
— Get details of all tags
Get-AzureRmTag -Detailed
— Get selected column details of all tags
Get-AzureRmTag -Detailed | select name,values
— Remove a tag
Remove-AzureRmTag -Name "Review Date"
Translating Chinese to English in SQL with Microsoft Translator
In Oracle, I had a table of data which had some Chinese words that I needed to translate into English on the fly, in SQL…this is how I did that…
Microsoft have a translator facility here with the Translator Text API v3.0 to allow you to call it programmatically. I’m using Microsoft as I’m currently working on Azure – of course, there are other translation facilities available.
The API has a translate method which one needs to construct a call to. The format of the call is:
https://api.cognitive.microsofttranslator.com/translate?api-version=3.0&from=xxxx&to=yyyy
…where xxxx is the from language, e.g. zh-Hans for Simplified Chinese (my case) and yyyy is the to language, e.g. en for English.
In the body of the request needs to be some JSON of the form:
[{"Text": "zzzz"}]
…where zzzz is the text that needs to be converted from Simplified Chinese to English.
Calling the API would result in a response which contains the translated text in JSON format.
So, what we need to do is create an Oracle Function which can be called from SQL passing in the text that needs translating from a selected column. The function will call the Microsoft Translator API via UTL_HTTP to translate the text and return the translated text which is then displayed in the SQL output.
Thanks to Tim Hall for this article and Lucas Jellema for this article which helped me with some of this – I just had to do a few tweaks to get things to work in my use case, namely:
- Set up the Oracle Wallet for using HTTPS
- Convert the publish_cinema_event procedure Lucas wrote to a function so I could call it in SQL
- Use LENGTHB instead of LENGTH to determine the length of the text to be translated due to the text being multi byte
- Use WRITE_RAW and UTL_RAW.CAST_TO_RAW rather than WRITE_TEXT otherwise the chinese characters get mangled
- Set the body text of the request to be UTF-8 by calling UTL_HTTP.SET_BODY_CHARSET
Firstly the calls to the Microsoft Translator are via HTTPS rather than HTTP so I needed to set up Oracle Wallet with keys to facilitate that. I tried to follow the instructions on Tim’s page about using Chrome to get the certificate but no matter which option I chose it wouldn’t include the keys/certificates in the output file. Instead, I chose to go onto our Linux server and do it this way (adjust to suit your paths):
mkdir -p /u01/app/oracle/admin/ORCL/wallet openssl s_client -showcerts -connect api.cognitive.microsofttranslator.com:443 </dev/null 2>/dev/null|openssl x509 -outform DER >/u01/app/oracle/admin/ORCL/wallet/ms_translate_key.der
This seemed to work fine – at least everything else after worked and the end result was that we could call the API so whatever the above did differently to Chrome I don’t know but it worked.
I then created a wallet on the Linux server:
orapki wallet create -wallet /u01/app/oracle/admin/ORCL/wallet -pwd MyPassword -auto_login orapki wallet add -wallet /u01/app/oracle/admin/ORCL/wallet -trusted_cert -cert "/u01/app/oracle/admin/ORCL/wallet/ms_translate_key.der" -pwd MyPassword
Now once the wallet is created I created the following function:
SET DEFINE OFF
CREATE OR REPLACE FUNCTION translate_text(p_text_to_translate in varchar2
,p_language_from in varchar2
,p_language_to in varchar2
) RETURN VARCHAR2 IS
req utl_http.req;
res utl_http.resp;
url VARCHAR2(4000) := 'https://api.cognitive.microsofttranslator.com/translate?api-version=3.0&from='||
p_language_from||'&to='||p_language_to;
buffer VARCHAR2(4000);
content VARCHAR2(4000) := '[{"Text": "'||p_text_to_translate||'"}]';
BEGIN
dbms_output.put_line('URL:'||url);
dbms_output.put_line('CONTENT:'||content);
dbms_output.put_line('CONTENT LENGTH:'||TO_CHAR(LENGTH(content)));
req := utl_http.begin_request(url, 'POST',' HTTP/1.1');
utl_http.set_header(req, 'user-agent', 'mozilla/4.0');
utl_http.set_header(req, 'content-type', 'application/json');
utl_http.set_header(req, 'Ocp-Apim-Subscription-Key', 'OCP_APIM_SUBSCRIPTION_KEY');
utl_http.set_header(req, 'Content-Length', LENGTHB(content));
utl_http.set_body_charset(req, 'UTF-8');
utl_http.write_raw(req,utl_raw.cast_to_raw(content));
res := utl_http.get_response(req);
utl_http.read_line(res, buffer);
utl_http.end_response(res);
RETURN buffer;
EXCEPTION
WHEN OTHERS
THEN utl_http.end_response(res);
RAISE;
END translate_text;
/
NOTE – The SET DEFINE OFF is important given the embedded ampersand characters. The OCP_APIM_SUBSCRIPTION_KEY value needs to have whatever is relevant for your subscription as well. You may need to set up ACLs for the user running this code – Tim and Lucas cover that in their articles.
Now to run the code, login to the database and run this to engage the wallet:
EXEC UTL_HTTP.set_wallet('file:/u01/app/oracle/admin/ORCL/wallet', NULL);
Create a test table with some Simplified Chinese in it:
create table test_chinese(chinese_text varchar2(200)); insert into test_chinese values('敏捷的棕色狐狸跳过了懒狗'); commit;
Now select the data out using the translate_text function and see what we get:
select chinese_text,translate_text(chinese_text,'zh-Hans','en') from test_chinese;
The returned translation is in JSON format but of course if you wanted you could extract the text element from it easily.
That’s it.
DBMS_COMPRESSION can be run in parallel, at least from 11.2.0.4
I was trying to use DBMS_COMPRESSION on an 11gR2 (11.2.0.4 on RHEL 6.3) database the other day and was helped by this article from Neil Johnson. I was having some trouble with permissions until I read that article which nicely summarises everything you need to know – thanks Neil!
One thing I did notice is that Neil stated that you can’t parallelise the calls to the advisor since it uses the same named objects each time and this would then cause conflicts (and problems). Neil illustrated the example calls that the advisor is making based on him tracing the sessions…
create table "ACME".DBMS_TABCOMP_TEMP_UNCMP tablespace "SCRATCH" nologging as select /*+ DYNAMIC_SAMPLING(0) FULL("ACME"."ACCS") */ * from "ACME"."ACCS" sample block( 99) mytab create table "ACME".DBMS_TABCOMP_TEMP_CMP organization heap tablespace "SCRATCH" compress for all operations nologging as select /*+ DYNAMIC_SAMPLING(0) */ * from "ACME".DBMS_TABCOMP_TEMP_UNCMP mytab
Because I kept having permissions issues I was repeatedly running the advisor and I ended up with a situation where one of the transient objects (above, or so I thought) had been left in place and when I tried the next rerun it complained that the object existed. I can’t reproduce this as I can’t remember all the steps that I took and I wasn’t recording my session at the time – it’s not really the point of this blog in any case, rather the knowledge it led to. Because the error was that the object existed, I figured I just needed to find the object and drop it and I’d be good to carry on – obviously I looked at the above code fragments and started to search for the two objects in question (DBMS_TABCOMP_TEMP_UNCMP and DBMS_TABCOMP_TEMP_CMP) but found nothing. I started looking for DBMS_TABCOMP% and again found nothing.
Somewhat confused, I then looked for the latest object created and found that the objects were actually called something completely different and of the form CMPx$yyyyyyyy. I think this must have changed since Neil wrote his article (it is from 2013 after all).
I can’t work out what “x” is – at first I thought it was the RAC instance but that was just a coincidence that I saw a 3 and I was on instance 3 of a RAC cluster. In fact on a single instance database (test below) I saw numbers higher than 1 so it’s not the RAC instance number and I can’t work out what it is. “yyyyyyyy” is definitely the OBJECT_ID of the data object, confirmed by cross referencing the data dictionary.
Given this naming standard is therefore object specific, it suggests that you could execute these advisor calls in parallel.
Just to be clear, I’m not sure what version of 11g Neil was using but I am using 11.2.0.4:
SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production PL/SQL Release 11.2.0.4.0 - Production CORE 11.2.0.4.0 Production TNS for Linux: Version 11.2.0.4.0 - Production NLSRTL Version 11.2.0.4.0 - Production
A little test using Neil’s code (and a bit more as I don’t have acme user on my database):
First create a script called dbms_comp.sql with the following content:
set serveroutput on set feedback on set verify off declare blkcnt_cmp BINARY_integer; blkcnt_uncmp BINARY_integer; row_cmp BINARY_integer; row_uncmp BINARY_integer; cmp_ratio number; comptype_str varchar2(60); begin dbms_compression.get_compression_ratio( scratchtbsname => upper('&3.') , ownname => upper('&1.') , tabname => upper('&2.') , partname => null , comptype => dbms_compression.comp_for_oltp , blkcnt_cmp => blkcnt_cmp , blkcnt_uncmp => blkcnt_uncmp , row_cmp => row_cmp , row_uncmp => row_uncmp , cmp_ratio => cmp_ratio , comptype_str => comptype_str , subset_numrows => &4. ); DBMS_OUTPUT.PUT_LINE('Block count compressed = ' || blkcnt_cmp); DBMS_OUTPUT.PUT_LINE('Block count uncompressed = ' || blkcnt_uncmp); DBMS_OUTPUT.PUT_LINE('Row count per block compressed = ' || row_cmp); DBMS_OUTPUT.PUT_LINE('Row count per block uncompressed = ' || row_uncmp); --DBMS_OUTPUT.PUT_LINE('Compression type = ' ||comptype_str); DBMS_OUTPUT.PUT_LINE('Compression ratio = '||round(blkcnt_uncmp/blkcnt_cmp,1)||' to 1'); DBMS_OUTPUT.PUT_LINE('Compression % benefit = '||round((blkcnt_uncmp-blkcnt_cmp)/blkcnt_uncmp*100,1)); --DBMS_OUTPUT.PUT_LINE('Compression ratio org= '||cmp_ratio); end; / set verify on
Then create another script called setup.sql with the following content – I’m using auditing (thanks Tim!) to see the statements rather than tracing in this instance:
conn sys as sysdba drop user acme cascade; drop user nj cascade; drop tablespace acme including contents and datafiles; drop tablespace scratch including contents and datafiles; drop role nj_dba; create user acme identified by acme; grant create session,create table to acme; create tablespace acme datafile '/u01/app/oracle/oradata/db11g/acme01.dbf' size 2G; alter user acme quota unlimited on acme; create tablespace scratch datafile '/u01/app/oracle/oradata/db11g/scratch01.dbf' size 2G; create role nj_dba; create user nj identified by nj; REM Use auditing instead of tracing to identify the statements run: audit all by nj by access; audit create table by nj by access; grant create session, create any table, drop any table, select any table to nj_dba; grant execute on sys.dbms_monitor to nj_dba; grant nj_dba to nj; alter user acme quota unlimited on scratch; alter user nj quota unlimited on scratch; grant ANALYZE ANY to NJ_DBA;
Now login to sqlplus /nolog and run setup.sql which should show this:
SQL> @setup Enter password: Connected. User dropped. User dropped. Tablespace dropped. Tablespace dropped. Role dropped. User created. Grant succeeded. Tablespace created. User altered. Tablespace created. Role created. User created. Audit succeeded. Audit succeeded. Grant succeeded. Grant succeeded. Grant succeeded. User altered. User altered. Grant succeeded.
Now login to acme and create the subject table for the compression advisor:
conn acme/acme create table test tablespace acme as select lpad(TO_CHAR(ROWNUM),2000,'x') char_col from dual connect by level < 300000; Table created.
Now check the compression using the advisor (ignore the actual compression results as we’re not interested in those at this time):
conn nj/nj @dbms_comp acme test scratch 200000
Block count compressed = 2048 Block count uncompressed = 2048 Row count per block compressed = 3 Row count per block uncompressed = 3 Compression ratio = 1 to 1 Compression % benefit = 0 PL/SQL procedure successfully completed.
Now check the audit trail to find the statements run:
conn sys as sysdba column username format a8 column obj_name format a30 column timestamp format a30 column action_name format a20 set linesize 200 set pagesize 1000 SELECT USERNAME, action_name,obj_name, to_char(extended_TIMESTAMP,'DD-MON-YYYY HH24:MI:SSxFF') timestamp FROM DBA_AUDIT_TRAIL WHERE USERNAME='NJ' and action_name='CREATE TABLE'; USERNAME ACTION_NAME OBJ_NAME TIMESTAMP -------- -------------------- ------------------------------ -------------------- NJ CREATE TABLE CMP3$87401 10-DEC-2017 18:01:15.119890 NJ CREATE TABLE CMP4$87401 10-DEC-2017 18:01:15.177518 (Abridged to remove non relevant tests)
Now check the dictionary to see the OBJECT_ID:
select object_name from dba_objects where object_id=87401; OBJECT_NAME -------------------------------------------------------------------------------------------------------------------------------- TEST 1 row selected.
OK, how about the parallelism? Let’s create a second table called TEST2 in ACME:
conn acme/acme create table test2 tablespace acme as select lpad(TO_CHAR(ROWNUM),2000,'x') char_col from dual connect by level < 300000; Table created.
Now run two parallel sessions – I did it by firing off the calls manually in separate SQL*Plus sessions rather than being clever:
In session 1:
conn nj/nj @dbms_comp acme test scratch 200000
In session 2:
conn nj/nj @dbms_comp acme test2 scratch 200000
First one gives:
Block count compressed = 1920 Block count uncompressed = 1920 Row count per block compressed = 3 Row count per block uncompressed = 3 Compression ratio = 1 to 1 Compression % benefit = 0 PL/SQL procedure successfully completed.
Second one gives:
Block count compressed = 2432 Block count uncompressed = 2432 Row count per block compressed = 3 Row count per block uncompressed = 3 Compression ratio = 1 to 1 Compression % benefit = 0 PL/SQL procedure successfully completed.
Both ran at the same time and didn’t fail
Now check the audit trail:
conn sys as sysdba column username format a8 column obj_name format a30 column timestamp format a30 column action_name format a20 set linesize 200 set pagesize 1000 SELECT USERNAME, action_name,obj_name, to_char(extended_TIMESTAMP,'DD-MON-YYYY HH24:MI:SSxFF') timestamp FROM DBA_AUDIT_TRAIL WHERE USERNAME='NJ' and action_name='CREATE TABLE'; USERNAME ACTION_NAME OBJ_NAME TIMESTAMP -------- -------------------- ------------------------------ -------------------- NJ CREATE TABLE CMP3$87401 10-DEC-2017 18:01:15.119890 NJ CREATE TABLE CMP4$87401 10-DEC-2017 18:01:15.177518 NJ CREATE TABLE CMP3$87408 10-DEC-2017 18:12:18.114321 NJ CREATE TABLE CMP3$87409 10-DEC-2017 18:12:18.114353 NJ CREATE TABLE CMP4$87408 10-DEC-2017 18:12:22.730715 NJ CREATE TABLE CMP4$87409 10-DEC-2017 18:12:22.735908 (Abridged to remove non relevant tests)
And from the dictionary:
select object_name from dba_objects where object_id IN(87408,87409); OBJECT_NAME -------------------------------------------------------------------------------------------------------------------------------- TEST TEST2 2 rows selected.
So, it appears to allow parallel running without issue.
If anyone works out what the “x” part of the object names is (3 and 4 in the above example), please shout out in the comments…
Installing Hortonworks Data Platform 2.5 on Microsoft Azure
I presented this topic to the Big Data Meetup in Nottingham on Thursday but sometimes people prefer a blog to a presentation, so I’ve fashioned this article from the slides…
This article assumes the following:
- You know what Azure is
- You know what Hortonworks Data Platform is
- You have an account to go on Microsoft Azure (Microsoft offer £150 free credits for a month trial)
- You have a web browser
- You’ve generated a public/private SSH key pair
Start by navigating to the Azure login page and enter your details. If you have never visited before your screen will look like this:
If you’ve logged in before the page will show your login and you can just click it:
After you login, you’ll arrive at the Dashboard:
Choose the “Marketplace” link at the bottom right, which leads to the following screen where you can type “HDP” and it will show you the options for Hortonworks Data Platform. There are currently two options 2.4 and 2.5 – I chose 2.5:
When you choose 2.5 it will bring up this screen which shows the details of the option you have chosen and offers you the “Create” button to go ahead and start the creation process – click on Create:
After clicking on Create, the process moves on to a five step wizard, the first step of which allows you to choose “Basic options” for the VM. I set the following options:
Name: oramosshdp25sandbox
VM Disk Type: SSD
User name: jeff
SSH Public key: my public SSH key
Subscription: Leave set to Free Trial (if that’s what you are using, as per screenshot, or your Corporate/Pay As You Go subscription if you have one)
Resource Group: Create New called hdp25sandbox_rg
Location: UK West
A screenshot of these options looks like this:
Click on OK and move on to the 2nd step in the wizard for choosing the size of the VM. I chose the DS3_V2 size which seemed to work OK – you might be able to get away with something smaller, perhaps.
Click on Select and move on to step 3 of the wizard which is about configuring optional features. For this step I set the following:
Use managed disks: Yes
Leaving all other options as defaults this looks like:
Click on OK and move on to step 4 which is just a summary of the configuration:
If you’re happy, click on OK and move on to step 5 where you accept the terms of use and “buy” the VM:
If you’re happy, click on Purchase and that’s the end of the wizard. Azure then goes off to deploy the VM, which can take a few minutes. You’ll be returned to the dashboard screen where you’ll see the VM at the top right with the word Deploying on it:
As I say, it takes a few minutes to complete, but when it does, you’ll see a popup notification in the top right of the screen and the VM tile will change to look as below:
So, you now have the Sandbox VM up and running.
The VM by default only has inbound SSH access enabled and can only be accessed by IP address so we’ll make some changes to these next. First we’ll give the VM a DNS name which allows you to access it on the internet via a name rather than an IP address. From the dashboard screen (above) click on the VM and it takes you to this screen:
You’ll notice the Public IP address which is a hyperlink…click on that link and it takes you to the following screen where you can specify the DNS Name which means the machine will have a Fully Qualified Domain Name that you can access via the internet. I set my DNS Name to oramosshdp25sandbox and given I’d previously chosen to use UK West as the location, the Fully Qualified Domain Name is thus oramosshdp25sandbox.ukwest.cloudapp.azure.com as per the screenshot below:
Now, navigate to the Inbound Security Rules page which is under the Network Security Group page (access from the Resource List on the dashboard). Notice that the only rule existing is one to allow inbound SSH communication:
In order to facilitate additional capabilities you should open up a few more ports, as follows:
- 8888 – HDP
- 8080 – Ambari
- 4200 – Web SSH access
- 50070 – Default Node Name
- 21000 – Atlas
- 9995 – Zeppelin
- 15000 – Falcon
- 6080 – Ranger
Click on Inbound Security Rule which takes you to the page for maintaining these rules and enter the details for the 8888 port. I specified the name as default-allow-8888 and the port as 8888 as shown below:
Click on OK to create the rule. Carry out the same process for the other ports.
Now that we’ve undertaken these additional activities we can access the VM using an SSH terminal logging onto oramosshdp25sandbox.ukwest.cloudapp.azure.com as the user you have created (jeff in my case) and the private SSH key:
Whilst you are in the SSH terminal you can reset the Ambari password. This is not strictly necessary unless you want to login to Ambari as admin, but I’ll describe it anyway.
First become root with:
sudo su - root
Now SSH into the Docker Image as root:
ssh root@172.17.0.2
You will be prompted to change the password for root on this first login – the current password is hadoop.
After changing the password run the Ambari password reset process:
ambari-admin-password-reset
Follow the instructions to reset the password and after that it will start the Ambari server process.
Once all that is done, exit out of the sessions and the original SSH terminal.
Now go into HDP via the web interface by logging on to the following URL:
http://oramosshdp25sandbox.ukwest.cloudapp.azure.com:8888
The first time you access this URL you’ll be given a welcome (marketing) page which asks for your details:
Fill out the details and hit Submit which will take you to the main entry page for HDP:
Choose the Launch Dashboard option on the left, which brings up a pair of browser windows that use the entire desktop and show the Ambari login page on the left hand browser and the Tutorials website on the right hand browser like this:
You can use either the admin user that you just reset the password for or the predefined user raj_ops (password raj_ops) to access Ambari. Click on Sign In on the left hand browser once you entered the credentials and it takes you into the main Ambari homepage:
This is the main systems management environment for Hortonworks – more documentation here.
If we close this pair of browsers now and go back to the main HDP entry page and choose the Quick Links option on the right we get this page:
From here you can choose to use any of these specific components.
NOTE – I couldn’t get Atlas and Falcon to work – they need more configuration/setup to get them functional. Ranger, Zeppelin and the Web SSH client work fine though.
Just a basic introduction but I hope you find it useful.
12cR2 tightens up ORA-01841 for zero year ANSI dates, but not for Oracle SQL syntax
In moving some more code from an 11gR2 database to a 12cR2 database, I found another change where a piece of code that works in 11gR2 doesn’t compile in 12cR2.
In this instance a view was being created with a projected date column which used the ANSI DATE syntax. Here is a simplified test script:
CREATE OR REPLACE VIEW test1 AS SELECT date '0000-01-01' date_col FROM dual / DROP VIEW test / CREATE OR REPLACE VIEW test2 AS SELECT TO_DATE('0000-01-01','YYYY-MM-DD') date_col FROM dual /
Running this on 11gR2 gives:
SQL>CREATE OR REPLACE VIEW test1 AS 2 SELECT date '0000-01-01' date_col 3 FROM dual 4 / View created. SQL>CREATE OR REPLACE VIEW test2 AS 2 SELECT TO_DATE('0000-01-01','YYYY-MM-DD') date_col 3 FROM dual 4 / View created.
Now running this on 12cR2 gives:
SQL> CREATE OR REPLACE VIEW test1 AS 2 SELECT date '0000-01-01' date_col 3 FROM dual 4 / SELECT date '0000-01-01' date_col * ERROR at line 2: ORA-01841: (full) year must be between -4713 and +9999, and not be 0 SQL> CREATE OR REPLACE VIEW test2 AS 2 SELECT TO_DATE('0000-01-01','YYYY-MM-DD') date_col 3 FROM dual 4 / View created.
The date is zero and thus the error message is correct in 12cR2 for the ANSI DATE syntax.
ORA-54002 when trying to create Virtual Column using REGEXP_REPLACE on Oracle 12cR2
I encountered an issue today trying to create a table in an Oracle 12cR2 database, the DDL for which, I extracted from an Oracle 11gR2 database. The error returned when trying to create the table was:
ORA-54002: only pure functions can be specified in a virtual column expression
The definition of the table included a Virtual Column which used a REGEXP_REPLACE call to derive a value from another column on the table.
Here is a simplified test case illustrating the scenario (Thanks Tim for the REGEXP_REPLACE example code):
select * from v$version / create table test_ora54002_12c( col1 VARCHAR2(20 CHAR) NOT NULL ,virtual_column1 VARCHAR2(4000 CHAR) GENERATED ALWAYS AS(REGEXP_REPLACE(col1, '([A-Z])', ' \1', 2)) VIRTUAL VISIBLE ) / drop table test_ora54002_12c purge /
Running this on 11gR2 gives:
SQL> select * from v$version 2 / BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production PL/SQL Release 11.2.0.4.0 - Production CORE 11.2.0.4.0 Production TNS for Linux: Version 11.2.0.4.0 - Production NLSRTL Version 11.2.0.4.0 - Production 5 rows selected. Elapsed: 00:00:00.40 SQL> create table test_ora54002_12c( 2 col1 VARCHAR2(20 CHAR) NOT NULL 3 ,virtual_column1 VARCHAR2(4000 CHAR) GENERATED ALWAYS AS(REGEXP_REPLACE(col1, '([A-Z])', ' \1', 2)) VIRTUAL VISIBLE 4 ) 5 / Table created. Elapsed: 00:00:00.24 SQL> drop table test_ora54002_12c purge 2 / Table dropped.
Running this on 12cR2 gives:
SQL> select * from v$version / 2 BANNER CON_ID -------------------------------------------------------------------------------- ---------- Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production 0 PL/SQL Release 12.2.0.1.0 - Production 0 CORE 12.2.0.1.0 Production 0 TNS for Linux: Version 12.2.0.1.0 - Production 0 NLSRTL Version 12.2.0.1.0 - Production 0 SQL> create table test_ora54002_12c( col1 VARCHAR2(20 CHAR) NOT NULL ,virtual_column1 VARCHAR2(4000 CHAR) GENERATED ALWAYS AS(REGEXP_REPLACE(col1, '([A-Z])', ' \1', 2)) VIRTUAL VISIBLE ) / 2 3 4 5 ,virtual_column1 VARCHAR2(4000 CHAR) GENERATED ALWAYS AS(REGEXP_REPLACE(col1, '([A-Z])', ' \1', 2)) VIRTUAL VISIBLE * ERROR at line 3: ORA-54002: only pure functions can be specified in a virtual column expression SQL> drop table test_ora54002_12c purge / 2 drop table test_ora54002_12c purge * ERROR at line 1: ORA-00942: table or view does not exist
As you can see, 12cR2 gives the ORA-54002 error.
Looking on MOS, highlights this article, which suggests that you shouldn’t have been able to do this in 11gR2, i.e. it was a bug and that 12cR2 has fixed this bug and thus you can no longer create such a virtual column (the article refers to functional index and check constraint use cases as well).
In my case, I was able to rewrite the virtual column to use simple string functions such as SUBSTR, TRANSLATE and INSTR to achieve what I wanted and the virtual column was allowed to be created with these – problem solved – a shame really as the REGEXP_REPLACE approach was far neater.
Create Jupyterhub Container on Centos 7 on Proxmox
These instructions show how to create a Centos 7 container on Proxmox running JupyterHub.
Note – the instructions are just a guide and for use on my environment – you may need/wish to adjust for your own environment as necessary.
Versionsroot@billy:~# pveversion pve-manager/4.4-12/e71b7a74 (running kernel: 4.4.40-1-pve)
The version of the notebook server is 4.4.1 Python 3.4.5 (default, Nov 9 2016, 16:24:59) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)]Create Container
pct create 153 u01:vztmpl/centos-7-default_20160205_amd64.tar.xz -rootfs 10 -hostname jupyterhub -memory 2048 -nameserver 192.168.1.25 -searchdomain oramoss.com -net0 name=eth0,bridge=vmbr0,gw=192.168.1.1,ip=192.168.1.153/24 -swap 2048 -cpulimit 2 -storage u01Installation
yum update -y yum install epel-release -y yum install wget -y cd ~ wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u111-b14/jdk-8u111-linux-x64.rpm" yum localinstall jdk-8u111-linux-x64.rpm -y rm -f ~/jdk-8u111-linux-x64.rpm vi /etc/environment export JAVA_HOME=/usr/java/jdk1.8.0_111/jre vi ~/.bash_profile export PATH=${JAVA_HOME}/bin:$PATH . ~/.bash_profile java -version yum install pdsh -y yum -y install yum-cron chkconfig yum-cron on vi /etc/yum/yum-cron.conf update_messages = no apply_updates = yes email_to = oramoss.jeffmoss@gmail.com email_host = smtp.gmail.com service yum-cron start cd /etc/yum.repos.d wget http://yum.oracle.com/public-yum-ol7.repo yum install python34 -y curl -O https://bootstrap.pypa.io/get-pip.py /usr/bin/python3.4 get-pip.py yum install npm nodejs-legacy -y --nogpgcheck yum install anaconda -y --nogpgcheck python3 -m pip install jupyterhub npm config set strict-ssl false npm install -g configurable-http-proxy python3 -m pip install notebook wget http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol7 -O /etc/pki/rpm-gpg/RPM-GPG-KEY-oracle wget ftp://ftp.icm.edu.pl/vol/rzm5/linux-slc/centos/7.0.1406/cernonly/x86_64/Packages/oracle-instantclient12.1-basic-12.1.0.2.0-1.x86_64.rpm wget ftp://bo.mirror.garr.it/1/slc/centos/7.1.1503/cernonly/x86_64/Packages/oracle-instantclient12.1-sqlplus-12.1.0.2.0-1.x86_64.rpm wget ftp://bo.mirror.garr.it/1/slc/centos/7.1.1503/cernonly/x86_64/Packages/oracle-instantclient12.1-devel-12.1.0.2.0-1.x86_64.rpm yum install oracle-instantclient12.1-basic-12.1.0.2.0-1.x86_64.rpm -y yum install oracle-instantclient12.1-devel-12.1.0.2.0-1.x86_64.rpm -y yum install oracle-instantclient12.1-sqlplus-12.1.0.2.0-1.x86_64.rpm -y vi ~/.bash_profile export ORACLE_HOME=/usr/lib/oracle/12.1/client64 export PATH=$ORACLE_HOME/bin:$PATH export LD_LIBRARY_PATH=$ORACLE_HOME/lib:$LD_LIBRARY_PATH vi /etc/environment export ORACLE_HOME=/usr/lib/oracle/12.1/client64 export PATH=$ORACLE_HOME/bin:$PATH export LD_LIBRARY_PATH=$ORACLE_HOME/lib:$LD_LIBRARY_PATH . ~/.bash_profile yum install gcc -y yum install python-devel -y yum install python34-devel -y pip install cx_Oracle pip install ipython-sql jupyterhub --generate-config vi /root/jupyterhub_config.py # ensure the following are set: c.Spawner.env_keep = ['LD_LIBRARY_PATH'] c.Spawner.environment = dict(LD_LIBRARY_PATH='/usr/lib/oracle/12.1/client64/lib:$LD_LIBRARY_PATH') systemctl stop firewalld systemctl disable firewalld vi /lib/systemd/system/jupyterhub.service [Unit] Description=Jupyterhub After=network-online.target [Service] User=root ExecStart=/usr/bin/jupyterhub --ip=192.168.1.10 WorkingDirectory=/root [Install] WantedBy=multi-user.target systemctl enable jupyterhub systemctl start jupyterhub systemctl status jupyterhub
That should be it…navigate to http://192.168.1.153:8000 and login with a unix user on that node.
Installing Oracle 12c Release 2 Database on a Proxmox Container
Obviously nobody could beat Tim to getting the comprehensive installation instructions out first, but here are my notes for installing it on a proxmox container environment which is what I use as my research platform. Some of the calls used are from or based on Tim’s prior 12cR1 installation article – thanks Tim.
NOTE – this post is just a guide and is based on my environment – you will likely need to make changes to suit your own environment.
Environmentroot@billy:~# pveversion pve-manager/4.4-12/e71b7a74 (running kernel: 4.4.40-1-pve)Host Preparation
Some of the activities required involve changing linux parameters but these can’t be applied inside a proxmox container – you’ll see errors like these if you try:
[root@db12cr2 ~]# sysctl -p sysctl: setting key "fs.file-max": Read-only file system
Instead you have to do these at the host level – and only if you think they are relevant and that those settings wouldn’t upset all of your other environments running on that host. I haven’t tried but you could potentially just tell the GUI installer to ignore the warnings relating to these entries and not make these changes at all especially if you’re only using it for small scale research purposes.
As root on the proxmox host, run the following:
echo "fs.file-max = 6815744" >>/etc/sysctl.d/98-oracle.conf echo "kernel.panic_on_oops = 1" >>/etc/sysctl.d/98-oracle.conf echo "net.ipv4.conf.default.rp_filter = 2" >>/etc/sysctl.d/98-oracle.conf /sbin/sysctl -pCreate And Prepare The Container
I use Centos 7 as the template for most of my activities and these notes are based around that.
pct create 130 u01:vztmpl/centos-7-default_20160205_amd64.tar.xz -rootfs 60 -hostname db12cr2 -memory 10240 -nameserver 192.168.1.25 -searchdomain oramoss.com -net0 name=eth0,bridge=vmbr0,gw=192.168.1.1,ip=192.168.1.130/24 -swap 10240 -cpulimit 4 -storage local
You’ll have your own way of getting the installation files to be available to the container but I do it by adding a mount point so I can access the area where all my software is:
vi /etc/pve/nodes/${HOSTNAME}/lxc/130.conf
…and add this:
mp0: /mnt/backups/common_share,mp=/mnt/common_shareStart And Enter The Container
pct start 130 pct enter 130Install Additional Packages
I’m going to use the Oracle Preinstall package but there are still a few things to add:
yum install gcc-c++ wget openssh-server -y
gcc-c++ is not necessary according to the 12cR2 installation manuals, but the GUI installer complains during the prerequisite checks if it’s not there.
wget is needed to download some files and it’s not on the Centos 7 template.
openssh server is to allow me to login remotely via SSH for the GUI install later.
Get OpenSSH To Autostartsystemctl enable sshd.service systemctl start sshd.service systemctl status sshd.serviceInstall Oracle Preinstall Package
#Get the Oracle Linux 7 repo - this works for Centos 7. cd /etc/yum.repos.d/ wget http://public-yum.oracle.com/public-yum-ol7.repo
#The following stops GPG Key errors:
wget http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol7 -O /etc/pki/rpm-gpg/RPM-GPG-KEY-oracle #Update everything yum update -y #Install the preinstall package yum install oracle-database-server-12cR2-preinstall -yConfigure System Limits
echo "oracle soft nofile 1024" >>/etc/security/limits.d/oracle-rdbms-server-12cR2-preinstall.conf echo "oracle hard nofile 65536" >>/etc/security/limits.d/oracle-rdbms-server-12cR2-preinstall.conf echo "oracle soft nproc 16384" >>/etc/security/limits.d/oracle-rdbms-server-12cR2-preinstall.conf echo "oracle hard nproc 16384" >>/etc/security/limits.d/oracle-rdbms-server-12cR2-preinstall.conf echo "oracle soft stack 10240" >>/etc/security/limits.d/oracle-rdbms-server-12cR2-preinstall.conf echo "oracle hard stack 32768" >>/etc/security/limits.d/oracle-rdbms-server-12cR2-preinstall.conf echo "oracle hard memlock 134217728" >>/etc/security/limits.d/oracle-rdbms-server-12cR2-preinstall.conf echo "oracle soft memlock 134217728" >>/etc/security/limits.d/oracle-rdbms-server-12cR2-preinstall.confChange Password For “oracle” User
passwd oracle <<set a password>>Create Oracle Home Directory
mkdir -p /u01/app/oracle/product/12.2.0.1/db_1 chown -R oracle:oinstall /u01 chmod -R 775 /u01Modify The Profile Of “oracle” User
echo "# Oracle Settings" >>/home/oracle/.bash_profile echo "export TMP=/tmp" >>/home/oracle/.bash_profile echo "export TMPDIR=\$TMP" >>/home/oracle/.bash_profile echo "export ORACLE_HOSTNAME=db12cr2.oramoss.com" >>/home/oracle/.bash_profile echo "export ORACLE_UNQNAME=cdb1" >>/home/oracle/.bash_profile echo "export ORACLE_BASE=/u01/app/oracle" >>/home/oracle/.bash_profile echo "export ORACLE_HOME=\$ORACLE_BASE/product/12.2.0.1/db_1" >>/home/oracle/.bash_profile echo "export ORACLE_SID=cdb1" >>/home/oracle/.bash_profile echo "export PATH=/usr/sbin:\$PATH" >>/home/oracle/.bash_profile echo "export PATH=\$ORACLE_HOME/bin:\$PATH" >>/home/oracle/.bash_profile echo "export LD_LIBRARY_PATH=\$ORACLE_HOME/lib:/lib:/usr/lib" >>/home/oracle/.bash_profile echo "export CLASSPATH=\$ORACLE_HOME/jlib:\$ORACLE_HOME/rdbms/jlib" >>/home/oracle/.bash_profileCreate Software Directory And Copy Files Over
mkdir -p /u01/software cp /mnt/common_share/linuxx64_12201_database.zip /u01/software unzip linuxx64_12201_database.zip rm /u01/software/linuxx64_12201_database.zipRun The Installer
Log in as the “oracle” user
cd /u01/software/database ./runInstaller
Install the software and a database by running through the GUI screens and following the instructions. The installer complains on the prerequisite checks screen about some of the kernel memory parameters (rmem%, wmem%) which you can ignore.
Configure Auto StartFollow these instructions from Tim to setup auto start using the runuser method – make sure you change the ORACLE_HOME to be 12.2.0.1 not 12.1.0.2 that is mentioned.
Now reboot the container and it should return with the database automatically started.
Check Oracle Database Auto Starts[oracle@db12cr2 ~]$ sqlplus /nolog SQL*Plus: Release 12.2.0.1.0 Production on Thu Mar 2 14:16:53 2017 Copyright (c) 1982, 2016, Oracle. All rights reserved. SQL> conn sys/Password01 as sysdba Connected. SQL> show sga Total System Global Area 3221225472 bytes Fixed Size 8797928 bytes Variable Size 687866136 bytes Database Buffers 2516582400 bytes Redo Buffers 7979008 bytes SQL>Conclusion
All pretty painless and relatively quick. I’ll take a dump of the container next in order to use it as a template for building future containers.
Fixing blank charts on ambari home page (Hortonworks Data Platform)
I created a 4 node container based (Proxmox LXC) Hortonworks Data Platform 2.5 Hadoop cluster recently and all went well apart from all the charts on the Ambari homepage were blank or showing “N/A”, like this:
An outline of the environment:
- 4 node cluster of LXC containers on Proxmox host
- Centos 7 Linux OS
- Nodes are called bishdp0[1-4], all created from same template and identical configuration
- All containers are on 192.168.1.0/24 network
- DNS Server also available on same network and all hosts can resolve each other via DNS
- Hortonworks Data Platform version 2.5
- Proxmox host sits on a corporate network and the host has iptables set to allow the containers on 192.168.1.0/24 to reach the internet via the corporate proxy server, e.g. for yum access
- Other than the blank charts everything appears to be working fine
After much reading around it turns out that I hadn’t quite set up the proxy serving correctly, specifically that I hadn’t told Ambari to ignore some hosts, namely the bishdp0[1-4] hosts on the 192.168.1.0/24 network, when proxying. I can’t find a 2.5 HDP version of the document for setting up the proxy serving for Ambari but the 2.2 instructions worked.
Steps I took to fix the problem:
First stop the services on the cluster. Log on to the node with the Ambari Server where I have a script called ambari-stop-all-services.sh which I created based on part of this article. Thanks slm.
Run the script:
./ambari-stop-all-services.sh
Now stop the Ambari agent on all the servers:
pdsh -w bishdp0[1-4] service ambari-agent stop
Now stop the Ambari Server:
service ambari-server stop
Now edit the Ambari environment script:
vi /var/lib/ambari-server/ambari-env.sh
Look for the line that begins “export AMBARI_JVM_ARGS” and ensure it has entries for the following parameters:
- http.proxyHost
- http.proxyPort
- http.proxyUser
- http.proxyPassword
- http.nonProxyHosts
It’s the last one that was missing in my case, which meant that Ambari was trying to go to the proxy server even for these containers on the 192.168.1.0/24 network.
After editing, the line looked like this (I’ve redacted the specifics – just replace the entries with values suited to your environment):
export AMBARI_JVM_ARGS=$AMBARI_JVM_ARGS’ -Xms512m -Xmx2048m -XX:MaxPermSize=128m -Dhttp.proxyHost=<proxy IP> -Dhttp.proxyPort=<proxy port> -Dhttp.proxyUser=<user> -Dhttp.proxyPassword=<password> -Dhttp.nonProxyHosts=<*.domain> -Djava.security.auth.login.config=$ROOT/etc/ambari-server/conf/krb5JAASLogin.conf -Djava.security.krb5.conf=/etc/krb5.conf -Djavax.security.auth.useSubjectCredsOnly=false’
Now restart everything, Ambari server first:
service ambari-server start
…then the agents on all nodes (pdsh is great – thanks Robin Moffatt for your notes!)
pdsh -w bishdp0[1-4] service ambari-agent start
And finally start the services on the cluster using the ambari-start-all-services.sh script.
./ambari-start-all-services.sh
After I did this, the charts started showing details:
Transferring default domains for SQL Developer Data Modeller
Notice: You are using wrong way to display KC Content, Correct It Now
I got a new laptop the other day and installed all the software, including SQL Developer Data Modeller all fine. I then opened a model which had a bunch of tables with columns based off Domains…the columns did not have Domains against them but had been replaced with Logical data types instead.
After some research, the fix, in this case, involved copying the file “defaultdomains.xml” from the following directory on my old laptop, to the same place on the new laptop:
%SQL Developer Home%\sqldeveloper\extensions\oracle.datamodeler\types
After restarting and reopening the model all was back to normal.
What I probably could have done in the first place was to have created my own Domains file for the Design, saved in the Design folder and then when I transferred the Design by copying across the Design folder the domains would have come with it. I could have then just opened the Domain file on the new laptop. I guess it depends on whether I would want these domains to be Design specific or part of the defaults for all designs.